10780010@unknown@formal@none@1@S@
Spanish language
@@@@1@2@@danf@17-8-2009 10780020@unknown@formal@none@1@S@'''Spanish''' or '''Castilian''' (''castellano'') is an [[Indo-European]], [[Romance languages|Romance language]] that originated in northern [[Spain]], and gradually spread in the [[Kingdom of Castile]] and evolved into the principal language of government and trade.@@@@1@33@@danf@17-8-2009 10780030@unknown@formal@none@1@S@It was taken to [[Spanish Empire#Territories in Africa (1898–1975)|Africa]], the [[Spanish colonization of the Americas|Americas]], and [[Spanish East Indies|Asia Pacific]] with the expansion of the [[Spanish Empire]] between the fifteenth and nineteenth centuries.@@@@1@33@@danf@17-8-2009 10780040@unknown@formal@none@1@S@Today, between 322 and 400 million people speak Spanish as a native language, making it the world's second most-spoken language by native speakers (after [[Standard Mandarin|Mandarin Chinese]]).@@@@1@27@@danf@17-8-2009 10780050@unknown@formal@none@1@S@==Hispanosphere==@@@@1@1@@danf@17-8-2009 10780060@unknown@formal@none@1@S@It is estimated that the combined total of native and non-native Spanish speakers is approximately 500 million, likely making it the third most spoken language by total number of speakers (after [[English_language|English]] and [[Chinese_language|Chinese]]).@@@@1@34@@danf@17-8-2009 10780070@unknown@formal@none@1@S@Today, Spanish is an official language of Spain, most [[Latin American]] countries, and [[Equatorial Guinea]]; 21 nations speak it as their primary language.@@@@1@23@@danf@17-8-2009 10780080@unknown@formal@none@1@S@Spanish also is one of [[United Nations#Languages|six official languages]] of the [[United Nations]].@@@@1@13@@danf@17-8-2009 10780090@unknown@formal@none@1@S@[[Mexico]] has the world's largest Spanish-speaking population, and Spanish is the second most-widely spoken language in the [[United States]] and the most popular studied foreign language in [[United States|U.S.]] schools and universities.@@@@1@32@@danf@17-8-2009 10780100@unknown@formal@none@1@S@[[Global internet usage]] statistics for 2007 show Spanish as the third most commonly used language on the Internet, after English and [[Chinese language|Chinese]].@@@@1@23@@danf@17-8-2009 10780110@unknown@formal@none@1@S@==Naming and origin==@@@@1@3@@danf@17-8-2009 10780120@unknown@formal@none@1@S@Spaniards tend to call this language {{lang|es|'''''español'''''}} (Spanish) when contrasting it with languages of other states, such as [[French language|French]] and [[English language|English]], but call it {{lang|es|'''''castellano'''''}} (Castilian), that is, the language of the [[Castile (historical region)|Castile]] region, when contrasting it with other [[languages of Spain|languages spoken in Spain]] such as [[Galician language|Galician]], [[Basque language|Basque]], and [[Catalan language|Catalan]].@@@@1@58@@danf@17-8-2009 10780130@unknown@formal@none@1@S@This reasoning also holds true for the language's preferred name in some [[Hispanic America]]n countries.@@@@1@15@@danf@17-8-2009 10780140@unknown@formal@none@1@S@In this manner, the [[Spanish Constitution of 1978]] uses the term {{lang|es|''castellano''}} to define the [[official language]] of the whole Spanish State, as opposed to {{lang|es|''las demás lenguas españolas''}} (lit. ''the other Spanish languages'').@@@@1@34@@danf@17-8-2009 10780150@unknown@formal@none@1@S@Article III reads as follows:@@@@1@5@@danf@17-8-2009 10780160@unknown@formal@none@1@S@The name ''castellano'' is, however, widely used for the language as a whole in Latin America.@@@@1@16@@danf@17-8-2009 10780170@unknown@formal@none@1@S@Some Spanish speakers consider ''{{lang|es|castellano}}'' a generic term with no political or ideological links, much as "Spanish" is in English.@@@@1@20@@danf@17-8-2009 10780180@unknown@formal@none@1@S@Often Latin Americans use it to differentiate their own variety of Spanish as opposed to the variety of Spanish spoken in Spain, or variety of Spanish which is considered as standard in the region.@@@@1@34@@danf@17-8-2009 10780190@unknown@formal@none@1@S@==Classification and related languages==@@@@1@4@@danf@17-8-2009 10780200@unknown@formal@none@1@S@Spanish is closely related to the other [[West Iberian languages|West Iberian]] Romance languages: [[Asturian language|Asturian]] ({{lang|ast|''asturianu''}}), [[Galician language|Galician]] ({{lang|gl|''galego''}}), [[Ladino language|Ladino]] ({{lang|lad|''dzhudezmo/spanyol/kasteyano''}}), and [[Portuguese language|Portuguese]] ({{lang|pt|''português''}}).@@@@1@26@@danf@17-8-2009 10780210@unknown@formal@none@1@S@Catalan, an [[Iberian Romance languages|East Iberian language]] which exhibits many [[Gallo-Romance]] traits, is more similar to the neighbouring [[Occitan language]] ({{lang|oc|''occitan''}}) than to Spanish, or indeed than Spanish and Portuguese are to each other.@@@@1@34@@danf@17-8-2009 10780220@unknown@formal@none@1@S@Spanish and Portuguese share similar grammars and vocabulary as well as a common history of [[Influence of Arabic on other languages|Arabic influence]] while a great part of the peninsula was under [[Timeline of the Muslim presence in the Iberian peninsula|Islamic rule]] (both languages expanded over [[Islamic empire|Islamic territories]]).@@@@1@48@@danf@17-8-2009 10780230@unknown@formal@none@1@S@Their [[lexical similarity]] has been estimated as 89%.@@@@1@8@@danf@17-8-2009 10780240@unknown@formal@none@1@S@See [[Differences between Spanish and Portuguese]] for further information.@@@@1@9@@danf@17-8-2009 10780250@unknown@formal@none@1@S@===Ladino===@@@@1@1@@danf@17-8-2009 10780260@unknown@formal@none@1@S@Ladino, which is essentially medieval Spanish and closer to modern Spanish than any other language, is spoken by many descendants of the [[Sephardi Jews]] who were [[Alhambra decree|expelled from Spain in the 15th century]].@@@@1@34@@danf@17-8-2009 10780270@unknown@formal@none@1@S@Ladino speakers are currently almost exclusively [[Sephardim|Sephardi]] Jews, with family roots in Turkey, Greece or the Balkans: current speakers mostly live in Israel and Turkey, with a few pockets in Latin America.@@@@1@32@@danf@17-8-2009 10780280@unknown@formal@none@1@S@It lacks the [[Amerindian languages|Native American vocabulary]] which was influential during the [[Spanish Empire|Spanish colonial period]], and it retains many archaic features which have since been lost in standard Spanish.@@@@1@30@@danf@17-8-2009 10780290@unknown@formal@none@1@S@It contains, however, other vocabulary which is not found in standard Castilian, including vocabulary from [[Hebrew language|Hebrew]], some French, Greek and [[Turkish language|Turkish]], and other languages spoken where the Sephardim settled.@@@@1@31@@danf@17-8-2009 10780300@unknown@formal@none@1@S@Ladino is in serious danger of extinction because many native speakers today are elderly as well as elderly ''olim'' (immigrants to [[Israel]]) who have not transmitted the language to their children or grandchildren.@@@@1@33@@danf@17-8-2009 10780310@unknown@formal@none@1@S@However, it is experiencing a minor revival among Sephardi communities, especially in music.@@@@1@13@@danf@17-8-2009 10780320@unknown@formal@none@1@S@In the case of the Latin American communities, the danger of extinction is also due to the risk of assimilation by modern Castilian.@@@@1@23@@danf@17-8-2009 10780330@unknown@formal@none@1@S@A related dialect is [[Haketia]], the Judaeo-Spanish of northern Morocco.@@@@1@10@@danf@17-8-2009 10780340@unknown@formal@none@1@S@This too tended to assimilate with modern Spanish, during the Spanish occupation of the region.@@@@1@15@@danf@17-8-2009 10780350@unknown@formal@none@1@S@===Vocabulary comparison===@@@@1@2@@danf@17-8-2009 10780360@unknown@formal@none@1@S@Spanish and [[Italian language|Italian]] share a very similar phonological system.@@@@1@10@@danf@17-8-2009 10780370@unknown@formal@none@1@S@At present, the [[lexical similarity]] with Italian is estimated at 82%.@@@@1@11@@danf@17-8-2009 10780380@unknown@formal@none@1@S@As a result, Spanish and Italian are mutually intelligible to various degrees.@@@@1@12@@danf@17-8-2009 10780390@unknown@formal@none@1@S@The lexical similarity with [[Portuguese language|Portuguese]] is greater, 89%, but the vagaries of Portuguese pronunciation make it less easily understood by Hispanophones than Italian.@@@@1@24@@danf@17-8-2009 10780400@unknown@formal@none@1@S@[[Mutual intelligibility]] between Spanish and [[French language|French]] or [[Romanian language|Romanian]] is even lower (lexical similarity being respectively 75% and 71%): comprehension of Spanish by French speakers who have not studied the language is as low as an estimated 45% - the same as of English.@@@@1@45@@danf@17-8-2009 10780410@unknown@formal@none@1@S@The common features of the writing systems of the Romance languages allow for a greater amount of interlingual reading comprehension than oral communication would.@@@@1@24@@danf@17-8-2009 10780420@unknown@formal@none@1@S@ 1. also {{lang|pt|''nós outros''}} in early modern Portuguese (e.g. ''[[The Lusiads]]'')@@@@1@12@@danf@17-8-2009 10780430@unknown@formal@none@1@S@2. {{lang|it|''noi '''altri'''''}} in Southern [[List of languages of Italy|Italian dialects and languages]]@@@@1@13@@danf@17-8-2009 10780440@unknown@formal@none@1@S@3. Alternatively {{lang|fr|''nous '''autres'''''}} @@@@1@5@@danf@17-8-2009 10780460@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10780470@unknown@formal@none@1@S@Spanish evolved from [[Vulgar Latin]], with major [[Arabic influence on the Spanish language|influences from Arabic]] in vocabulary during the [[Al-Andalus|Andalusian]] period and minor surviving influences from [[Basque language|Basque]] and [[Celtiberian language|Celtiberian]], as well as [[Germanic languages]] via the [[Visigoths]].@@@@1@39@@danf@17-8-2009 10780480@unknown@formal@none@1@S@Spanish developed along the remote cross road strips among the [[Alava]], [[Cantabria]], [[Burgos]], [[Soria]] and [[La Rioja (autonomous community)|La Rioja]] provinces of Northern Spain, as a strongly innovative and differing variant from its nearest cousin, [[Asturian|Leonese speech]], with a higher degree of Basque influence in these regions (see [[Iberian Romance languages]]).@@@@1@51@@danf@17-8-2009 10780490@unknown@formal@none@1@S@Typical features of Spanish diachronical [[phonology]] include [[lenition]] (Latin {{lang|la|''vita''}}, Spanish {{lang|es|''vida''}}), [[palatalization]] (Latin {{lang|la|''annum''}}, Spanish {{lang|es|''año''}}, and Latin {{lang|la|''anellum''}}, Spanish {{lang|es|''anillo''}}) and [[diphthong]]ation ([[stem (linguistics)|stem]]-changing) of short ''e'' and ''o'' from Vulgar Latin (Latin {{lang|la|''terra''}}, Spanish {{lang|es|''tierra''}}; Latin {{lang|la|''novus''}}, Spanish {{lang|es|''nuevo''}}).@@@@1@42@@danf@17-8-2009 10780500@unknown@formal@none@1@S@Similar phenomena can be found in other Romance languages as well.@@@@1@11@@danf@17-8-2009 10780510@unknown@formal@none@1@S@During the {{lang|es|''[[Reconquista]]''}}, this northern dialect from [[Cantabria]] was carried south, and remains a [[minority language]] in the northern coastal [[Morocco]].@@@@1@21@@danf@17-8-2009 10780520@unknown@formal@none@1@S@The first Latin-to-Spanish grammar ({{lang|es|''Gramática de la Lengua Castellana''}}) was written in [[Salamanca]], Spain, in 1492, by [[Antonio de Nebrija|Elio Antonio de Nebrija]].@@@@1@23@@danf@17-8-2009 10780530@unknown@formal@none@1@S@When it was presented to [[Isabel de Castilla]], she asked, "What do I want a work like this for, if I already know the language?", to which he replied, "Your highness, the language is the instrument of the Empire."@@@@1@39@@danf@17-8-2009 10780540@unknown@formal@none@1@S@From the 16th century onwards, the language was taken to the [[Americas]] and the [[Spanish East Indies]] via [[Spanish colonization of the Americas|Spanish colonization]].@@@@1@24@@danf@17-8-2009 10780550@unknown@formal@none@1@S@In the 20th century, Spanish was introduced to [[Equatorial Guinea]] and the [[Western Sahara]], the United States, such as in [[Spanish Harlem]], in [[New York City]], that had not been part of the Spanish Empire.@@@@1@35@@danf@17-8-2009 10780560@unknown@formal@none@1@S@For details on borrowed words and other external influences upon Spanish, see [[Influences on the Spanish language]].@@@@1@17@@danf@17-8-2009 10780570@unknown@formal@none@1@S@===Characterization===@@@@1@1@@danf@17-8-2009 10780580@unknown@formal@none@1@S@A defining characteristic of Spanish was the [[diphthong]]ization of the Latin short vowels ''e'' and ''o'' into ''ie'' and ''ue'', respectively, when they were stressed.@@@@1@25@@danf@17-8-2009 10780590@unknown@formal@none@1@S@Similar [[sound law|sound changes]] are found in other Romance languages, but in Spanish they were significant.@@@@1@16@@danf@17-8-2009 10780600@unknown@formal@none@1@S@Some examples:@@@@1@2@@danf@17-8-2009 10780610@unknown@formal@none@1@S@* Lat. {{lang|la|''petra''}} > Sp. {{lang|es|''piedra''}}, It. {{lang|it|''pietra''}}, Fr. {{lang|fr|''pierre''}}, Rom. {{lang|ro|''piatrǎ''}}, Port./Gal. {{lang|pt|''pedra''}} "stone".@@@@1@15@@danf@17-8-2009 10780620@unknown@formal@none@1@S@* Lat. {{lang|la|''moritur''}} > Sp. {{lang|es|''muere''}}, It. {{lang|it|''muore''}}, Fr. {{lang|fr|''meurt''}} / {{lang|fr|''muert''}}, Rom. {{lang|ro|''moare''}}, Port./Gal. {{lang|pt|''morre''}} "die".@@@@1@17@@danf@17-8-2009 10780630@unknown@formal@none@1@S@Peculiar to early Spanish (as in the [[Gascon]] dialect of Occitan, and possibly due to a Basque [[substratum]]) was the mutation of Latin initial ''f-'' into ''h-'' whenever it was followed by a vowel that did not diphthongate.@@@@1@38@@danf@17-8-2009 10780640@unknown@formal@none@1@S@Compare for instance:@@@@1@3@@danf@17-8-2009 10780650@unknown@formal@none@1@S@* Lat. {{lang|la|''filium''}} > It. {{lang|it|''figlio''}}, Port. {{lang|pt|''filho''}}, Gal. {{lang|gl|''fillo''}}, Fr. {{lang|fr|''fils''}}, Occitan {{lang|oc|''filh''}} (but Gascon {{lang|gsc|''hilh''}}) Sp. {{lang|es|''hijo''}} (but Ladino {{lang|lad|''fijo''}});@@@@1@22@@danf@17-8-2009 10780660@unknown@formal@none@1@S@* Lat. {{lang|la|''fabulari''}} > Lad. {{lang|lad|''favlar''}}, Port./Gal. {{lang|pt|''falar''}}, Sp. {{lang|es|''hablar''}};@@@@1@10@@danf@17-8-2009 10780670@unknown@formal@none@1@S@* but Lat. {{lang|la|''focum''}} > It. {{lang|it|''fuoco''}}, Port./Gal. {{lang|pt|''fogo''}}, Sp./Lad. {{lang|es|''fuego''}}.@@@@1@11@@danf@17-8-2009 10780680@unknown@formal@none@1@S@Some [[consonant cluster]]s of Latin also produced characteristically different results in these languages, for example:@@@@1@15@@danf@17-8-2009 10780690@unknown@formal@none@1@S@* Lat. {{lang|la|''clamare''}}, acc. {{lang|la|''flammam''}}, {{lang|la|''plenum''}} > Lad. {{lang|lad|''lyamar''}}, {{lang|lad|''flama''}}, {{lang|lad|''pleno''}}; Sp. {{lang|es|''llamar''}}, {{lang|es|''llama''}}, {{lang|es|''lleno''}}.@@@@1@15@@danf@17-8-2009 10780700@unknown@formal@none@1@S@However, in Spanish there are also the forms {{lang|la|''clamar''}}, {{lang|lad|''flama''}}, {{lang|lad|''pleno''}}; Port. {{lang|pt|''chamar''}}, {{lang|pt|''chama''}}, {{lang|pt|''cheio''}}; Gal. {{lang|gl|''chamar''}}, {{lang|gl|''chama''}}, {{lang|gl|''cheo''}}.@@@@1@19@@danf@17-8-2009 10780710@unknown@formal@none@1@S@* Lat. acc. {{lang|la|''octo''}}, {{lang|la|''noctem''}}, {{lang|la|''multum''}} > Lad. {{lang|lad|''ocho''}}, {{lang|lad|''noche''}}, {{lang|lad|''muncho''}}; Sp. {{lang|es|''ocho''}}, {{lang|es|''noche''}}, {{lang|es|''mucho''}}; Port. {{lang|pt|''oito''}}, {{lang|pt|''noite''}}, {{lang|pt|''muito''}}; Gal. {{lang|gl|''oito''}}, {{lang|gl|''noite''}}, {{lang|gl|''moito''}}.@@@@1@23@@danf@17-8-2009 10780720@unknown@formal@none@1@S@==Geographic distribution==@@@@1@2@@danf@17-8-2009 10780730@unknown@formal@none@1@S@Spanish is one of the official languages of the [[European Union]], the [[Organization of American States]], the [[Organization of Ibero-American States]], the [[United Nations]], and the [[Union of South American Nations]].@@@@1@31@@danf@17-8-2009 10780740@unknown@formal@none@1@S@===Europe===@@@@1@1@@danf@17-8-2009 10780750@unknown@formal@none@1@S@Spanish is an official language of Spain, the country for which it is named and from which it originated.@@@@1@19@@danf@17-8-2009 10780760@unknown@formal@none@1@S@It is also spoken in [[Gibraltar]], though English is the official language.@@@@1@12@@danf@17-8-2009 10780770@unknown@formal@none@1@S@Likewise, it is spoken in [[Andorra]] though [[Catalan language|Catalan]] is the official language.@@@@1@13@@danf@17-8-2009 10780780@unknown@formal@none@1@S@It is also spoken by small communities in other European countries, such as the [[United Kingdom]], [[France]], and [[Germany]].@@@@1@19@@danf@17-8-2009 10780790@unknown@formal@none@1@S@Spanish is an official language of the [[European Union]].@@@@1@9@@danf@17-8-2009 10780800@unknown@formal@none@1@S@In Switzerland, Spanish is the [[mother tongue]] of 1.7% of the population, representing the first minority after the 4 official languages of the country.@@@@1@24@@danf@17-8-2009 10780810@unknown@formal@none@1@S@===The Americas ===@@@@1@3@@danf@17-8-2009 10780820@unknown@formal@none@1@S@====Latin America====@@@@1@2@@danf@17-8-2009 10780830@unknown@formal@none@1@S@Most Spanish speakers are in [[Latin America]]; of most countries with the most Spanish speakers, only [[Spain]] is outside of the [[Americas]].@@@@1@22@@danf@17-8-2009 10780840@unknown@formal@none@1@S@[[Mexico]] has most of the world's native speakers.@@@@1@8@@danf@17-8-2009 10780850@unknown@formal@none@1@S@Nationally, Spanish is the official language of [[Argentina]], [[Bolivia]] (co-official [[Quechua]] and [[Aymara language|Aymara]]), [[Chile]], [[Colombia]], [[Costa Rica]], [[Cuba]], [[Dominican Republic]], [[Ecuador]], [[El Salvador]], [[Guatemala]], [[Honduras]], [[Mexico]] , [[Nicaragua]], [[Panama]], [[Paraguay]] (co-official [[Guarani language|Guaraní]]), [[Peru]] (co-official [[Quechua]] and, in some regions, [[Aymara language|Aymara]]), [[Uruguay]], and [[Venezuela]].@@@@1@46@@danf@17-8-2009 10780860@unknown@formal@none@1@S@Spanish is also the official language (co-official with [[English language|English]]) in the U.S. commonwealth of [[Puerto Rico]].@@@@1@17@@danf@17-8-2009 10780870@unknown@formal@none@1@S@Spanish has no official recognition in the former [[British overseas territories|British colony]] of [[Belize]]; however, per the 2000 census, it is spoken by 43% of the population.@@@@1@27@@danf@17-8-2009 10780880@unknown@formal@none@1@S@Mainly, it is spoken by Hispanic descendants who remained in the region since the 17th century; however, English is the official language.@@@@1@22@@danf@17-8-2009 10780890@unknown@formal@none@1@S@Spain colonized [[Trinidad and Tobago]] first in [[1498]], leaving the [[Carib]] people the Spanish language.@@@@1@15@@danf@17-8-2009 10780900@unknown@formal@none@1@S@Also the [[Cocoa Panyol]]s, laborers from Venezuela, took their culture and language with them; they are accredited with the music of "[[Parang]]" ("[[Parranda]]") on the island.@@@@1@26@@danf@17-8-2009 10780910@unknown@formal@none@1@S@Because of Trinidad's location on the South American coast, the country is much influenced by its Spanish-speaking neighbors.@@@@1@18@@danf@17-8-2009 10780920@unknown@formal@none@1@S@A recent census shows that more than 1,500 inhabitants speak Spanish.@@@@1@11@@danf@17-8-2009 10780930@unknown@formal@none@1@S@In 2004, the government launched the ''Spanish as a First Foreign Language'' (SAFFL) initiative in March 2005.@@@@1@17@@danf@17-8-2009 10780940@unknown@formal@none@1@S@Government regulations require Spanish to be taught, beginning in primary school, while thirty percent of public employees are to be linguistically competent within five years.@@@@1@25@@danf@17-8-2009 10780950@unknown@formal@none@1@S@The government also announced that Spanish will be the country's second official language by [[2020]], beside English.@@@@1@17@@danf@17-8-2009 10780960@unknown@formal@none@1@S@Spanish is important in [[Brazil]] because of its proximity to and increased trade with its Spanish-speaking neighbors; for example, as a member of the [[Mercosur]] trading bloc.@@@@1@27@@danf@17-8-2009 10780970@unknown@formal@none@1@S@In 2005, the [[National Congress of Brazil]] approved a bill, signed into law by the [[President of Brazil|President]], making Spanish available as a foreign language in secondary schools.@@@@1@28@@danf@17-8-2009 10780980@unknown@formal@none@1@S@In many border towns and villages (especially on the Uruguayan-Brazilian border), a [[mixed language]] known as [[Riverense Portuñol|Portuñol]] is spoken.@@@@1@20@@danf@17-8-2009 10780990@unknown@formal@none@1@S@====United States====@@@@1@2@@danf@17-8-2009 10781000@unknown@formal@none@1@S@In the 2006 census, 44.3 million people of the U.S. population were [[Hispanic]] or [[Latino]] by origin; 34 million people, 12.2 percent, of the population older than 5 years speak Spanish at home.@@@@1@33@@danf@17-8-2009 10781005@unknown@formal@none@1@S@Spanish has a [[Spanish in the United States|long history in the United States]] (many south-western states were part of Mexico and Spain), and it recently has been revitalized by much immigration from Latin America.@@@@1@34@@danf@17-8-2009 10781010@unknown@formal@none@1@S@Spanish is the most widely taught foreign language in the country.@@@@1@11@@danf@17-8-2009 10781020@unknown@formal@none@1@S@Although the United States has no formally designated "official languages," Spanish is formally recognized at the state level beside English; in the U.S. state of [[New Mexico]], 30 per cent of the population speak it.@@@@1@35@@danf@17-8-2009 10781030@unknown@formal@none@1@S@It also has strong influence in metropolitan areas such as Los Angeles, Miami and New York City.@@@@1@17@@danf@17-8-2009 10781040@unknown@formal@none@1@S@Spanish is the dominant spoken language in [[Puerto Rico]], a U.S. territory.@@@@1@12@@danf@17-8-2009 10781050@unknown@formal@none@1@S@In total, the U.S. has the world's fifth-largest Spanish-speaking population.@@@@1@10@@danf@17-8-2009 10781060@unknown@formal@none@1@S@===Asia===@@@@1@1@@danf@17-8-2009 10781070@unknown@formal@none@1@S@Spanish was an official language of the [[Philippines]] but was never spoken by a majority of the population.@@@@1@18@@danf@17-8-2009 10781080@unknown@formal@none@1@S@Movements for most of the masses to learn the language were started but were stopped by the friars.@@@@1@18@@danf@17-8-2009 10781090@unknown@formal@none@1@S@Its importance fell in the first half of the 20th century following the U.S. occupation and administration of the islands.@@@@1@20@@danf@17-8-2009 10781100@unknown@formal@none@1@S@The introduction of the English language in the Philippine government system put an end to the use of Spanish as the official language.@@@@1@23@@danf@17-8-2009 10781110@unknown@formal@none@1@S@The language lost its official status in 1973 during the [[Ferdinand Marcos]] administration.@@@@1@13@@danf@17-8-2009 10781120@unknown@formal@none@1@S@Spanish is spoken mainly by small communities of Filipino-born Spaniards, Latin Americans, and Filipino [[mestizo]]s (mixed race), descendants of the early colonial Spanish settlers.@@@@1@24@@danf@17-8-2009 10781130@unknown@formal@none@1@S@Throughout the 20th century, the Spanish language has declined in importance compared to English and [[Tagalog language|Tagalog]].@@@@1@17@@danf@17-8-2009 10781140@unknown@formal@none@1@S@According to the 1990 Philippine census, there were 2,658 native speakers of Spanish.@@@@1@13@@danf@17-8-2009 10781150@unknown@formal@none@1@S@No figures were provided during the 1995 and 2000 censuses; however, figures for 2000 did specify there were over 600,000 native speakers of [[Chavacano language|Chavacano]], a Spanish based [[Creole language|creole]] language spoken in [[Cavite]] and [[Zamboanga]].@@@@1@36@@danf@17-8-2009 10781160@unknown@formal@none@1@S@Some other sources put the number of Spanish speakers in the Philippines around two to three million; however, these sources are disputed.@@@@1@22@@danf@17-8-2009 10781170@unknown@formal@none@1@S@In Tagalog, there are 4,000 Spanish adopted words and around 6,000 Spanish adopted words in Visayan and other Philippine languages as well.@@@@1@22@@danf@17-8-2009 10781180@unknown@formal@none@1@S@Today Spanish is offered as a foreign language in Philippines schools and universities.@@@@1@13@@danf@17-8-2009 10781190@unknown@formal@none@1@S@===Africa===@@@@1@1@@danf@17-8-2009 10781200@unknown@formal@none@1@S@In Africa, Spanish is official in the UN-recognised but Moroccan-occupied [[Western Sahara]] (co-official [[Arabic language|Arabic]]) and [[Equatorial Guinea]] (co-official [[French language|French]] and [[Portuguese language|Portuguese]]).@@@@1@24@@danf@17-8-2009 10781210@unknown@formal@none@1@S@Today, nearly 200,000 refugee Sahrawis are able to read and write in Spanish, and several thousands have received [[university]] education in foreign countries as part of aid packages (mainly [[Cuba]] and [[Spain]]).@@@@1@32@@danf@17-8-2009 10781220@unknown@formal@none@1@S@In Equatorial Guinea, Spanish is the predominant language when counting native and non-native speakers (around 500,000 people), while [[Fang language|Fang]] is the most spoken language by a number of native speakers.@@@@1@31@@danf@17-8-2009 10781230@unknown@formal@none@1@S@It is also spoken in the Spanish cities in [[Plazas de soberanía|continental North Africa]] ([[Ceuta]] and [[Melilla]]) and in the autonomous community of [[Canary Islands]] (143,000 and 1,995,833 people, respectively).@@@@1@30@@danf@17-8-2009 10781240@unknown@formal@none@1@S@Within Northern Morocco, a former [[History of Morocco#European influence|Franco-Spanish protectorate]] that is also geographically close to Spain, approximately 20,000 people speak Spanish.@@@@1@22@@danf@17-8-2009 10781250@unknown@formal@none@1@S@It is spoken by some communities of [[Angola]], because of the Cuban influence from the [[Cold War]], and in [[Nigeria]] by the descendants of [[Afro-Cuban]] ex-slaves.@@@@1@26@@danf@17-8-2009 10781260@unknown@formal@none@1@S@In [[Côte d'Ivoire]] and [[Senegal]], Spanish can be learned as a second foreign language in the public education system.@@@@1@19@@danf@17-8-2009 10781270@unknown@formal@none@1@S@In 2008, [[Cervantes Institute]]s centers will be opened in [[Lagos]] and [[Johannesburg]], the first one in the [[Sub-Saharan Africa]]@@@@1@19@@danf@17-8-2009 10781280@unknown@formal@none@1@S@===Oceania===@@@@1@1@@danf@17-8-2009 10781290@unknown@formal@none@1@S@Among the countries and territories in [[Oceania]], Spanish is also spoken in [[Easter Island]], a territorial possession of Chile.@@@@1@19@@danf@17-8-2009 10781300@unknown@formal@none@1@S@According to the 2001 census, there are approximately 95,000 speakers of Spanish in Australia, 44,000 of which live in Greater Sydney , where the older [[:Category: Australians of Mexican descent|Mexican]], [[:Category:Australians of Colombian descent|Colombian]], and [[:Category: Australians of Spanish descent|Spanish]] populations and newer [[:Category:Australians of Argentine descent|Argentine]], Salvadoran and [[:Category:Australians of Uruguayan descent|Uruguyan]] communities live.@@@@1@55@@danf@17-8-2009 10781310@unknown@formal@none@1@S@The island nations of [[Guam]], [[Palau]], [[Northern Marianas]], [[Marshall Islands]] and [[Federated States of Micronesia]] all once had Spanish speakers, since [[Marianas Islands|Marianas]] and [[Caroline Islands]] were Spanish colonial possessions until late 19th century (see [[Spanish-American War]]), but Spanish has since been forgotten.@@@@1@43@@danf@17-8-2009 10781320@unknown@formal@none@1@S@It now only exists as an influence on the local native languages and also spoken by [[Hispanics in the United States|Hispanic American]] resident populations.@@@@1@24@@danf@17-8-2009 10781330@unknown@formal@none@1@S@==Dialectal variation==@@@@1@2@@danf@17-8-2009 10781340@unknown@formal@none@1@S@There are important variations among the regions of Spain and throughout Spanish-speaking America.@@@@1@13@@danf@17-8-2009 10781350@unknown@formal@none@1@S@In countries in Hispanophone America, it is preferable to use the word ''castellano'' to distinguish their version of the language from that of Spain, thus asserting their autonomy and national identity.@@@@1@31@@danf@17-8-2009 10781360@unknown@formal@none@1@S@In Spain the Castilian dialect's pronunciation is commonly regarded as the national standard, although a use of slightly different pronouns called [[Loísmo|{{lang|es|''laísmo''}}]] of this dialect is deprecated.@@@@1@27@@danf@17-8-2009 10781370@unknown@formal@none@1@S@More accurately, for nearly everyone in Spain, "standard Spanish" means "pronouncing everything exactly as it is written," an ideal which does not correspond to any real dialect, though the northern dialects are the closest to it.@@@@1@36@@danf@17-8-2009 10781380@unknown@formal@none@1@S@In practice, the standard way of speaking Spanish in the media is "written Spanish" for formal speech, "Madrid dialect" (one of the transitional variants between Castilian and Andalusian) for informal speech.@@@@1@31@@danf@17-8-2009 10781390@unknown@formal@none@1@S@===Voseo===@@@@1@1@@danf@17-8-2009 10781400@unknown@formal@none@1@S@Spanish has three [[grammatical person|second-person]] [[grammatical number|singular]] [[pronoun]]s: {{lang|es|''tú''}}, {{lang|es|''usted''}}, and in some parts of Latin America, {{lang|es|''vos''}} (the use of this pronoun and/or its verb forms is called ''voseo'').@@@@1@30@@danf@17-8-2009 10781410@unknown@formal@none@1@S@In those regions where it is used, generally speaking, {{lang|es|''tú''}} and {{lang|es|''vos''}} are informal and used with friends; in other countries, {{lang|es|''vos''}} is considered an archaic form.@@@@1@27@@danf@17-8-2009 10781415@unknown@formal@none@1@S@{{lang|es|''Usted''}} is universally regarded as the formal address (derived from {{lang|es|''vuestra merced''}}, "your grace"), and is used as a mark of respect, as when addressing one's elders or strangers.@@@@1@29@@danf@17-8-2009 10781420@unknown@formal@none@1@S@{{lang|es|''Vos''}} is used extensively as the primary spoken form of the second-person singular pronoun, although with wide differences in social consideration, in many countries of [[Latin America]], including [[Argentina]], [[Chile]], [[Costa Rica]], the central mountain region of [[Ecuador]], the State of [[Chiapas]] in [[Mexico]], [[El Salvador]], [[Guatemala]], [[Honduras]], [[Nicaragua]], [[Paraguay]], [[Uruguay]], the [[Paisa region]] and Caleños of [[Colombia]] and the [[States]] of [[Zulia]] and Trujillo in [[Venezuela]].@@@@1@67@@danf@17-8-2009 10781430@unknown@formal@none@1@S@There are some differences in the verbal endings for ''vos'' in each country.@@@@1@13@@danf@17-8-2009 10781440@unknown@formal@none@1@S@In Argentina, Uruguay, and increasingly in Paraguay and some Central American countries, it is also the standard form used in the [[mass media|media]], but the media in other countries with {{lang|es|''voseo''}} generally continue to use {{lang|es|''usted''}} or {{lang|es|''tú''}} except in advertisements, for instance.@@@@1@43@@danf@17-8-2009 10781445@unknown@formal@none@1@S@{{lang|es|''Vos''}} may also be used regionally in other countries.@@@@1@9@@danf@17-8-2009 10781450@unknown@formal@none@1@S@Depending on country or region, usage may be considered standard or (by better educated speakers) to be unrefined.@@@@1@18@@danf@17-8-2009 10781460@unknown@formal@none@1@S@Interpersonal situations in which the use of ''vos'' is acceptable may also differ considerably between regions.@@@@1@16@@danf@17-8-2009 10781470@unknown@formal@none@1@S@===Ustedes===@@@@1@1@@danf@17-8-2009 10781480@unknown@formal@none@1@S@Spanish forms also differ regarding second-person plural pronouns.@@@@1@8@@danf@17-8-2009 10781490@unknown@formal@none@1@S@The Spanish dialects of Latin America have only one form of the second-person plural for daily use, {{lang|es|''ustedes''}} (formal or familiar, as the case may be, though {{lang|es|''vosotros''}} non-formal usage can sometimes appear in poetry and rhetorical or literary style).@@@@1@40@@danf@17-8-2009 10781500@unknown@formal@none@1@S@In Spain there are two forms — {{lang|es|''ustedes''}} (formal) and {{lang|es|''vosotros''}} (familiar).@@@@1@12@@danf@17-8-2009 10781510@unknown@formal@none@1@S@The pronoun {{lang|es|''vosotros''}} is the plural form of {{lang|es|''tú''}} in most of Spain, but in the Americas (and certain southern Spanish cities such as [[Cádiz]] or [[Seville]], and in the [[Canary Islands]]) it is replaced with {{lang|es|''ustedes''}}.@@@@1@37@@danf@17-8-2009 10781520@unknown@formal@none@1@S@It is notable that the use of {{lang|es|''ustedes''}} for the informal plural "you" in southern Spain does not follow the usual rule for pronoun-verb [[agreement (linguistics)|agreement]]; e.g., while the formal form for "you go", {{lang|es|''ustedes van''}}, uses the third-person plural form of the verb, in Cádiz or Seville the informal form is constructed as {{lang|es|''ustedes vais''}}, using the second-person plural of the verb.@@@@1@63@@danf@17-8-2009 10781530@unknown@formal@none@1@S@In the Canary Islands, though, the usual pronoun-verb agreement is preserved in most cases.@@@@1@14@@danf@17-8-2009 10781540@unknown@formal@none@1@S@Some words can be different, even embarrassingly so, in different Hispanophone countries.@@@@1@12@@danf@17-8-2009 10781550@unknown@formal@none@1@S@Most Spanish speakers can recognize other Spanish forms, even in places where they are not commonly used, but Spaniards generally do not recognise specifically American usages.@@@@1@26@@danf@17-8-2009 10781560@unknown@formal@none@1@S@For example, Spanish ''mantequilla'', ''aguacate'' and ''albaricoque'' (respectively, "butter", "avocado", "apricot") correspond to ''manteca'', ''palta'', and ''damasco'', respectively, in Argentina, Chile and Uruguay.@@@@1@23@@danf@17-8-2009 10781570@unknown@formal@none@1@S@The everyday Spanish words ''coger'' (to catch, get, or pick up), ''pisar'' (to step on) and ''concha'' (seashell) are considered extremely rude in parts of Latin America, where the meaning of ''coger'' and ''pisar'' is also "to have sex" and ''concha'' means "vulva".@@@@1@43@@danf@17-8-2009 10781580@unknown@formal@none@1@S@The Puerto Rican word for "bobby pin" (''pinche'') is an obscenity in Mexico, and in [[Nicaragua]] simply means "stingy".@@@@1@19@@danf@17-8-2009 10781590@unknown@formal@none@1@S@Other examples include ''[[taco]]'', which means "swearword" in Spain but is known to the rest of the world as a Mexican dish.@@@@1@22@@danf@17-8-2009 10781600@unknown@formal@none@1@S@''Pija'' in many countries of Latin America is an obscene slang word for "penis", while in [[Spain]] the word also signifies "posh girl" or "snobby".@@@@1@25@@danf@17-8-2009 10781610@unknown@formal@none@1@S@''Coche'', which means "car" in Spain, for the vast majority of Spanish-speakers actually means "baby-stroller", in Guatemala it means "pig", while ''carro'' means "car" in some Latin American countries and "cart" in others, as well as in Spain.@@@@1@38@@danf@17-8-2009 10781620@unknown@formal@none@1@S@The {{lang|es|[[Real Academia Española]]}} (Royal Spanish Academy), together with the 21 other national ones (see [[Association of Spanish Language Academies]]), exercises a standardizing influence through its publication of dictionaries and widely respected grammar and style guides.@@@@1@36@@danf@17-8-2009 10781630@unknown@formal@none@1@S@Due to this influence and for other sociohistorical reasons, a standardized form of the language ([[Standard Spanish]]) is widely acknowledged for use in literature, academic contexts and the media.@@@@1@29@@danf@17-8-2009 10781640@unknown@formal@none@1@S@==Writing system==@@@@1@2@@danf@17-8-2009 10781650@unknown@formal@none@1@S@Spanish is written using the [[Latin alphabet]], with the addition of the character ''[[ñ]]'' (''eñe'', representing the phoneme {{IPA|/ɲ/}}, a letter distinct from ''n'', although typographically composed of an ''n'' with a [[tilde]]) and the [[digraph (orthography)|digraph]]s ''ch'' ({{lang|es|''che''}}, representing the phoneme {{IPA|/tʃ/}}) and ''ll'' ({{lang|es|''elle''}}, representing the phoneme {{IPA|/ʎ/}}).@@@@1@50@@danf@17-8-2009 10781660@unknown@formal@none@1@S@However, the digraph ''rr'' ({{lang|es|''erre fuerte''}}, "strong ''r''", {{lang|es|''erre doble''}}, "double ''r''", or simply {{lang|es|''erre''}}), which also represents a distinct phoneme {{IPA|/r/}}, is not similarly regarded as a single letter.@@@@1@30@@danf@17-8-2009 10781670@unknown@formal@none@1@S@Since 1994, the digraphs ''ch'' and ''ll'' are to be treated as letter pairs for [[collation]] purposes, though they remain a part of the alphabet.@@@@1@25@@danf@17-8-2009 10781680@unknown@formal@none@1@S@Words with ''ch'' are now alphabetically sorted between those with ''ce'' and ''ci'', instead of following ''cz'' as they used to, and similarly for ''ll''.@@@@1@25@@danf@17-8-2009 10781690@unknown@formal@none@1@S@Thus, the Spanish alphabet has the following 29 letters:@@@@1@9@@danf@17-8-2009 10781700@unknown@formal@none@1@S@:a, b, c, ch, d, e, f, g, h, i, j, k, l, ll, m, n, ñ, o, p, q, r, s, t, u, v, w, x, y, z.@@@@1@29@@danf@17-8-2009 10781710@unknown@formal@none@1@S@With the exclusion of a very small number of regional terms such as ''México'' (see [[Toponymy of Mexico]]) and some neologisms like ''software'', pronunciation can be entirely determined from spelling.@@@@1@30@@danf@17-8-2009 10781720@unknown@formal@none@1@S@A typical Spanish word is stressed on the [[syllable]] before the last if it ends with a vowel (not including ''y'') or with a vowel followed by ''n'' or ''s''; it is stressed on the last syllable otherwise.@@@@1@38@@danf@17-8-2009 10781730@unknown@formal@none@1@S@Exceptions to this rule are indicated by placing an [[acute accent]] on the [[stress (linguistics)|stressed vowel]].@@@@1@16@@danf@17-8-2009 10781740@unknown@formal@none@1@S@The acute accent is used, in addition, to distinguish between certain [[homophone]]s, especially when one of them is a stressed word and the other one is a [[clitic]]: compare {{lang|es|''el''}} ("the", masculine singular definite article) with {{lang|es|''él''}} ("he" or "it"), or {{lang|es|''te''}} ("you", object pronoun), {{lang|es|''de''}} (preposition "of" or "from"), and {{lang|es|''se''}} (reflexive pronoun) with {{lang|es|''té''}} ("tea"), {{lang|es|''dé''}} ("give") and {{lang|es|''sé''}} ("I know", or imperative "be").@@@@1@66@@danf@17-8-2009 10781750@unknown@formal@none@1@S@The interrogative pronouns ({{lang|es|''qué''}}, {{lang|es|''cuál''}}, {{lang|es|''dónde''}}, {{lang|es|''quién''}}, etc.) also receive accents in direct or indirect questions, and some demonstratives ({{lang|es|''ése''}}, {{lang|es|''éste''}}, {{lang|es|''aquél''}}, etc.) must be accented when used as pronouns.@@@@1@30@@danf@17-8-2009 10781760@unknown@formal@none@1@S@The conjunction {{lang|es|''o''}} ("or") is written with an accent between numerals so as not to be confused with a zero: e.g., {{lang|es|''10 ó 20''}} should be read as {{lang|es|''diez o veinte''}} rather than {{lang|es|''diez mil veinte''}} ("10,020").@@@@1@37@@danf@17-8-2009 10781770@unknown@formal@none@1@S@Accent marks are frequently omitted in capital letters (a widespread practice in the early days of computers where only lowercase vowels were available with accents), although the [[Real Academia Española|RAE]] advises against this.@@@@1@33@@danf@17-8-2009 10781780@unknown@formal@none@1@S@When ''u'' is written between ''g'' and a front vowel (''e'' or ''i''), if it should be pronounced, it is written with a [[diaeresis (diacritic)|diaeresis]] (''ü'') to indicate that it is not silent as it normally would be (e.g., ''cigüeña'', "stork", is pronounced {{IPA|/θiˈɣweɲa/}}; if it were written ''cigueña'', it would be pronounced {{IPA|/θiˈɣeɲa/}}.@@@@1@54@@danf@17-8-2009 10781790@unknown@formal@none@1@S@Interrogative and exclamatory clauses are introduced with [[Inverted question and exclamation marks|inverted question ( ¿ ) and exclamation ( ¡ ) marks]].@@@@1@22@@danf@17-8-2009 10781800@unknown@formal@none@1@S@==Sounds==@@@@1@1@@danf@17-8-2009 10781810@unknown@formal@none@1@S@The phonemic inventory listed in the following table includes [[phoneme]]s that are preserved only in some dialects, other dialects having merged them (such as ''[[yeísmo]]''); these are marked with an asterisk (*).@@@@1@32@@danf@17-8-2009 10781820@unknown@formal@none@1@S@Sounds in parentheses are [[allophone]]s.@@@@1@5@@danf@17-8-2009 10781830@unknown@formal@none@1@S@By the 16th century, the consonant system of Spanish underwent the following important changes that differentiated it from [[Iberian Romance languages|neighboring Romance languages]] such as [[Portuguese language|Portuguese]] and [[Catalan language|Catalan]]:@@@@1@30@@danf@17-8-2009 10781840@unknown@formal@none@1@S@*Initial {{IPA|/f/}}, when it had evolved into a vacillating {{IPA|/h/}}, was lost in most words (although this etymological ''h-'' is preserved in spelling and in some Andalusian dialects is still aspirated).@@@@1@31@@danf@17-8-2009 10781850@unknown@formal@none@1@S@*The [[bilabial approximant]] {{IPA|/β̞/}} (which was written ''u'' or ''v'') merged with the bilabial oclusive {{IPA|/b/}} (written ''b'').@@@@1@18@@danf@17-8-2009 10781860@unknown@formal@none@1@S@There is no difference between the pronunciation of orthographic ''b'' and ''v'' in contemporary Spanish, excepting emphatic pronunciations that cannot be considered standard or natural.@@@@1@25@@danf@17-8-2009 10781870@unknown@formal@none@1@S@*The [[voiced alveolar fricative]] {{IPA|/z/}} which existed as a separate phoneme in medieval Spanish merged with its voiceless counterpart {{IPA|/s/}}.@@@@1@20@@danf@17-8-2009 10781880@unknown@formal@none@1@S@The phoneme which resulted from this merger is currently spelled ''s''.@@@@1@11@@danf@17-8-2009 10781890@unknown@formal@none@1@S@*The [[voiced postalveolar fricative]] {{IPA|/ʒ/}} merged with its voiceless counterpart {{IPA|/ʃ/}}, which evolved into the modern velar sound {{IPA|/x/}} by the 17th century, now written with ''j'', or ''g'' before ''e, i''.@@@@1@32@@danf@17-8-2009 10781900@unknown@formal@none@1@S@Nevertheless, in most parts of Argentina and in Uruguay, ''y'' and ''ll'' have both evolved to {{IPA|/ʒ/}} or {{IPA|/ʃ/}}.@@@@1@19@@danf@17-8-2009 10781910@unknown@formal@none@1@S@*The [[voiced alveolar affricate]] {{IPA|/dz/}} merged with its voiceless counterpart {{IPA|/ts/}}, which then developed into the interdental {{IPA|/θ/}}, now written ''z'', or ''c'' before ''e, i''.@@@@1@26@@danf@17-8-2009 10781920@unknown@formal@none@1@S@But in [[Andalusia]], the [[Canary Islands]] and the Americas this sound merged with {{IPA|/s/}} as well.@@@@1@16@@danf@17-8-2009 10781930@unknown@formal@none@1@S@See ''[[Ceceo]]'', for further information.@@@@1@5@@danf@17-8-2009 10781940@unknown@formal@none@1@S@The consonant system of Medieval Spanish has been better preserved in [[Ladino language|Ladino]] and in Portuguese, neither of which underwent these shifts.@@@@1@22@@danf@17-8-2009 10781950@unknown@formal@none@1@S@===Lexical stress===@@@@1@2@@danf@17-8-2009 10781960@unknown@formal@none@1@S@Spanish is a [[syllable-timed language]], so each syllable has the same duration regardless of stress.@@@@1@15@@danf@17-8-2009 10781970@unknown@formal@none@1@S@Stress most often occurs on any of the last three syllables of a word, with some rare exceptions at the fourth last.@@@@1@22@@danf@17-8-2009 10781980@unknown@formal@none@1@S@The ''tendencies'' of stress assignment are as follows:@@@@1@8@@danf@17-8-2009 10781990@unknown@formal@none@1@S@* In words ending in vowels and {{IPA|/s/}}, stress most often falls on the penultimate syllable.@@@@1@16@@danf@17-8-2009 10782000@unknown@formal@none@1@S@* In words ending in all other consonants, the stress more often falls on the ultimate syllable.@@@@1@17@@danf@17-8-2009 10782010@unknown@formal@none@1@S@* Preantepenultimate stress occurs rarely and only in words like ''guardándoselos'' ('saving them for him/her') where a clitic follows certain verbal forms.@@@@1@22@@danf@17-8-2009 10782020@unknown@formal@none@1@S@In addition to the many exceptions to these tendencies, there are numerous [[minimal pair]]s which contrast solely on stress.@@@@1@19@@danf@17-8-2009 10782030@unknown@formal@none@1@S@For example, ''sabana'', with penultimate stress, means 'savannah' while ''{{lang|es|sábana}}'', with antepenultimate stress, means 'sheet'; ''{{lang|es|límite}}'' ('boundary'), ''{{lang|es|limite}}'' ('[that] he/she limits') and ''{{lang|es|limité}}'' ('I limited') also contrast solely on stress.@@@@1@30@@danf@17-8-2009 10782040@unknown@formal@none@1@S@Phonological stress may be marked orthographically with an [[acute accent]] (''ácido'', ''distinción'', etc).@@@@1@13@@danf@17-8-2009 10782050@unknown@formal@none@1@S@This is done according to the mandatory stress rules of [[Spanish orthography]] which are similar to the tendencies above (differing with words like ''distinción'') and are defined so as to unequivocally indicate where the stress lies in a given written word.@@@@1@41@@danf@17-8-2009 10782060@unknown@formal@none@1@S@An acute accent may also be used to differentiate homophones (such as ''[[wikt:té#Spanish|té]]'' for 'tea' and ''[[wikt:te#Spanish|te]]''@@@@1@17@@danf@17-8-2009 10782070@unknown@formal@none@1@S@An amusing example of the significance of intonation in Spanish is the phrase ''{{lang|es|¿Cómo "cómo como"?@@@@1@16@@danf@17-8-2009 10782080@unknown@formal@none@1@S@¡Como como como!}}''@@@@1@3@@danf@17-8-2009 10782090@unknown@formal@none@1@S@("What do you mean / 'how / do I eat'? / I eat / the way / I eat!").@@@@1@19@@danf@17-8-2009 10782100@unknown@formal@none@1@S@==Grammar==@@@@1@1@@danf@17-8-2009 10782110@unknown@formal@none@1@S@Spanish is a relatively [[inflected]] language, with a two-[[Grammatical gender|gender]] system and about fifty [[Grammatical conjugation|conjugated]] forms per [[verb]], but limited inflection of [[noun]]s, [[adjective]]s, and [[determiner]]s.@@@@1@27@@danf@17-8-2009 10782120@unknown@formal@none@1@S@(For a detailed overview of verbs, see [[Spanish verbs]] and [[Spanish irregular verbs]].)@@@@1@13@@danf@17-8-2009 10782130@unknown@formal@none@1@S@It is [[Branching (linguistics)|right-branching]], uses [[preposition]]s, and usually, though not always, places [[adjective]]s after [[noun]]s.@@@@1@15@@danf@17-8-2009 10782140@unknown@formal@none@1@S@Its [[syntax]] is generally [[Subject Verb Object]], though variations are common.@@@@1@11@@danf@17-8-2009 10782150@unknown@formal@none@1@S@It is a [[pro-drop language]] (allows the deletion of pronouns when pragmatically unnecessary) and [[verb framing|verb-framed]].@@@@1@16@@danf@17-8-2009 10782160@unknown@formal@none@1@S@== Samples ==@@@@1@3@@danf@17-8-2009 10790010@unknown@formal@none@1@S@
Speech recognition
@@@@1@2@@danf@17-8-2009 10790020@unknown@formal@none@1@S@'''Speech recognition''' (also known as '''automatic speech recognition''' or '''computer speech recognition''') converts spoken words to machine-readable input (for example, to keypresses, using the binary code for a string of [[Character (computing)|character]] codes).@@@@1@33@@danf@17-8-2009 10790030@unknown@formal@none@1@S@The term [[speaker recognition|voice recognition]] may also be used to refer to speech recognition, but more precisely refers to '''speaker recognition''', which attempts to identify the person speaking, as opposed to what is being said.@@@@1@35@@danf@17-8-2009 10790040@unknown@formal@none@1@S@Speech recognition applications include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), [[domotic]] appliance control and content-based spoken audio search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., [[word processor]]s or [[email]]s), and in aircraft [[cockpit]]s (usually termed [[Direct Voice Input]]).@@@@1@70@@danf@17-8-2009 10790050@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10790060@unknown@formal@none@1@S@One of the most notable domains for the commercial application of speech recognition in the United States has been health care and in particular the work of the [[medical transcription]]ist (MT).@@@@1@31@@danf@17-8-2009 10790070@unknown@formal@none@1@S@According to industry experts, at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted.@@@@1@32@@danf@17-8-2009 10790080@unknown@formal@none@1@S@It was also the case that SR at that time was often technically deficient.@@@@1@14@@danf@17-8-2009 10790090@unknown@formal@none@1@S@Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do.@@@@1@26@@danf@17-8-2009 10790100@unknown@formal@none@1@S@The biggest limitation to speech recognition automating transcription, however, is seen as the software.@@@@1@14@@danf@17-8-2009 10790110@unknown@formal@none@1@S@The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system.@@@@1@27@@danf@17-8-2009 10790120@unknown@formal@none@1@S@Another limitation has been the extensive amount of time required by the user and/or system provider to train the software.@@@@1@20@@danf@17-8-2009 10790130@unknown@formal@none@1@S@A distinction in ASR is often made between "artificial syntax systems" which are usually domain-specific and "natural language processing" which is usually language-specific.@@@@1@23@@danf@17-8-2009 10790140@unknown@formal@none@1@S@Each of these types of application presents its own particular goals and challenges.@@@@1@13@@danf@17-8-2009 10790150@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10790160@unknown@formal@none@1@S@===Health care===@@@@1@2@@danf@17-8-2009 10790170@unknown@formal@none@1@S@In the [[health care]] domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete.@@@@1@22@@danf@17-8-2009 10790180@unknown@formal@none@1@S@Many experts in the field anticipate that with increased use of speech recognition technology, the services provided may be redistributed rather than replaced.@@@@1@23@@danf@17-8-2009 10790190@unknown@formal@none@1@S@Speech recognition can be implemented in front-end or back-end of the medical documentation process.@@@@1@14@@danf@17-8-2009 10790200@unknown@formal@none@1@S@Front-End SR is where the provider dictates into a speech-recognition engine, the recognized words are displayed right after they are spoken, and the dictator is responsible for editing and signing off on the document.@@@@1@34@@danf@17-8-2009 10790210@unknown@formal@none@1@S@It never goes through an MT/editor.@@@@1@6@@danf@17-8-2009 10790220@unknown@formal@none@1@S@Back-End SR or Deferred SR is where the provider dictates into a digital dictation system, and the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the MT/editor, who edits the draft and finalizes the report.@@@@1@48@@danf@17-8-2009 10790230@unknown@formal@none@1@S@Deferred SR is being widely used in the industry currently.@@@@1@10@@danf@17-8-2009 10790240@unknown@formal@none@1@S@Many [[Electronic Medical Records]] (EMR) applications can be more effective and may be performed more easily when deployed in conjunction with a speech-recognition engine.@@@@1@24@@danf@17-8-2009 10790250@unknown@formal@none@1@S@Searches, queries, and form filling may all be faster to perform by voice than by using a keyboard.@@@@1@18@@danf@17-8-2009 10790260@unknown@formal@none@1@S@****************************************************************************************@@@@1@1@@danf@17-8-2009 10790270@unknown@formal@none@1@S@**********************************@@@@1@1@@danf@17-8-2009 10790280@unknown@formal@none@1@S@*****************@@@@1@1@@danf@17-8-2009 10790290@unknown@formal@none@1@S@===Military===@@@@1@1@@danf@17-8-2009 10790300@unknown@formal@none@1@S@====High-performance fighter aircraft====@@@@1@3@@danf@17-8-2009 10790310@unknown@formal@none@1@S@Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft.@@@@1@20@@danf@17-8-2009 10790320@unknown@formal@none@1@S@Of particular note are the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/[[F-16]] aircraft ([[F-16 VISTA]]), the program in France on installing speech recognition systems on [[Mirage (aircraft)|Mirage]] aircraft, and programs in the UK dealing with a variety of aircraft platforms.@@@@1@45@@danf@17-8-2009 10790330@unknown@formal@none@1@S@In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays.@@@@1@33@@danf@17-8-2009 10790340@unknown@formal@none@1@S@Generally, only very limited, constrained vocabularies have been used successfully, and a major effort has been devoted to integration of the speech recognizer with the avionics system.@@@@1@27@@danf@17-8-2009 10790350@unknown@formal@none@1@S@Some important conclusions from the work were as follows:@@@@1@9@@danf@17-8-2009 10790360@unknown@formal@none@1@S@#Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently.@@@@1@16@@danf@17-8-2009 10790370@unknown@formal@none@1@S@#Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system.@@@@1@31@@danf@17-8-2009 10790380@unknown@formal@none@1@S@#More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained.@@@@1@22@@danf@17-8-2009 10790390@unknown@formal@none@1@S@Laboratory research in robust speech recognition for military environments has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft.@@@@1@29@@danf@17-8-2009 10790400@unknown@formal@none@1@S@Working with Swedish pilots flying in the [[JAS-39]] Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads.@@@@1@18@@danf@17-8-2009 10790410@unknown@formal@none@1@S@It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly.@@@@1@25@@danf@17-8-2009 10790420@unknown@formal@none@1@S@Contrary to what might be expected, no effects of the broken English of the speakers were found.@@@@1@17@@danf@17-8-2009 10790430@unknown@formal@none@1@S@It was evident that spontaneous speech caused problems for the recognizer, as could be expected.@@@@1@15@@danf@17-8-2009 10790440@unknown@formal@none@1@S@A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially.@@@@1@18@@danf@17-8-2009 10790450@unknown@formal@none@1@S@The [[Eurofighter Typhoon]] currently in service with the UK [[RAF]] employs a speaker-dependent system, i.e. it requires each pilot to create a template.@@@@1@23@@danf@17-8-2009 10790460@unknown@formal@none@1@S@The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other [[cockpit]] functions.@@@@1@33@@danf@17-8-2009 10790470@unknown@formal@none@1@S@Voice commands are confirmed by visual and/or aural feedback.@@@@1@9@@danf@17-8-2009 10790480@unknown@formal@none@1@S@The system is seen as a major design feature in the reduction of pilot [[workload]], and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands.@@@@1@40@@danf@17-8-2009 10790490@unknown@formal@none@1@S@====Helicopters====@@@@1@1@@danf@17-8-2009 10790500@unknown@formal@none@1@S@The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment.@@@@1@24@@danf@17-8-2009 10790510@unknown@formal@none@1@S@The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot generally does not wear a facemask, which would reduce acoustic noise in the microphone.@@@@1@40@@danf@17-8-2009 10790520@unknown@formal@none@1@S@Substantial test and evaluation programs have been carried out in the post decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK.@@@@1@41@@danf@17-8-2009 10790530@unknown@formal@none@1@S@Work in France has included speech recognition in the Puma helicopter.@@@@1@11@@danf@17-8-2009 10790540@unknown@formal@none@1@S@There has also been much useful work in Canada.@@@@1@9@@danf@17-8-2009 10790550@unknown@formal@none@1@S@Results have been encouraging, and voice applications have included: control of communication radios; setting of navigation systems; and control of an automated target handover system.@@@@1@25@@danf@17-8-2009 10790560@unknown@formal@none@1@S@As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness.@@@@1@17@@danf@17-8-2009 10790570@unknown@formal@none@1@S@Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment.@@@@1@19@@danf@17-8-2009 10790580@unknown@formal@none@1@S@Much remains to be done both in speech recognition and in overall speech recognition technology, in order to consistently achieve performance improvements in operational settings.@@@@1@25@@danf@17-8-2009 10790590@unknown@formal@none@1@S@====Battle management====@@@@1@2@@danf@17-8-2009 10790600@unknown@formal@none@1@S@Battle management command centres generally require rapid access to and control of large, rapidly changing information databases.@@@@1@17@@danf@17-8-2009 10790610@unknown@formal@none@1@S@Commanders and system operators need to query these databases as conveniently as possible, in an eyes-busy environment where much of the information is presented in a display format.@@@@1@28@@danf@17-8-2009 10790620@unknown@formal@none@1@S@Human machine interaction by voice has the potential to be very useful in these environments.@@@@1@15@@danf@17-8-2009 10790630@unknown@formal@none@1@S@A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments.@@@@1@17@@danf@17-8-2009 10790640@unknown@formal@none@1@S@In one feasibility study, speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications.@@@@1@21@@danf@17-8-2009 10790650@unknown@formal@none@1@S@Users were very optimistic about the potential of the system, although capabilities were limited.@@@@1@14@@danf@17-8-2009 10790660@unknown@formal@none@1@S@Speech understanding programs sponsored by the Defense Advanced Research Projects Agency (DARPA) in the U.S. has focused on this problem of natural speech interface..@@@@1@24@@danf@17-8-2009 10790670@unknown@formal@none@1@S@Speech recognition efforts have focused on a database of continuous speech recognition (CSR), large-vocabulary speech which is designed to be representative of the naval resource management task.@@@@1@27@@danf@17-8-2009 10790680@unknown@formal@none@1@S@Significant advances in the state-of-the-art in CSR have been achieved, and current efforts are focused on integrating speech recognition and natural language processing to allow spoken language interaction with a naval resource management system.@@@@1@34@@danf@17-8-2009 10790690@unknown@formal@none@1@S@====Training air traffic controllers====@@@@1@4@@danf@17-8-2009 10790700@unknown@formal@none@1@S@Training for military (or civilian) air traffic controllers (ATC) represents an excellent application for speech recognition systems.@@@@1@17@@danf@17-8-2009 10790710@unknown@formal@none@1@S@Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation.@@@@1@40@@danf@17-8-2009 10790720@unknown@formal@none@1@S@Speech recognition and synthesis techniques offer the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel.@@@@1@25@@danf@17-8-2009 10790730@unknown@formal@none@1@S@Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task.@@@@1@26@@danf@17-8-2009 10790740@unknown@formal@none@1@S@The U.S. Naval Training Equipment Center has sponsored a number of developments of prototype ATC trainers using speech recognition.@@@@1@19@@danf@17-8-2009 10790750@unknown@formal@none@1@S@Generally, the recognition accuracy falls short of providing graceful interaction between the trainee and the system.@@@@1@16@@danf@17-8-2009 10790760@unknown@formal@none@1@S@However, the prototype training systems have demonstrated a significant potential for voice interaction in these systems, and in other training applications.@@@@1@21@@danf@17-8-2009 10790770@unknown@formal@none@1@S@The U.S. Navy has sponsored a large-scale effort in ATC training systems, where a commercial speech recognition unit was integrated with a complex training system including displays and scenario creation.@@@@1@30@@danf@17-8-2009 10790780@unknown@formal@none@1@S@Although the recognizer was constrained in vocabulary, one of the goals of the training programs was to teach the controllers to speak in a constrained language, using specific vocabulary specifically designed for the ATC task.@@@@1@35@@danf@17-8-2009 10790790@unknown@formal@none@1@S@Research in France has focussed on the application of speech recognition in ATC training systems, directed at issues both in speech recognition and in application of task-domain grammar constraints.@@@@1@29@@danf@17-8-2009 10790800@unknown@formal@none@1@S@The USAF, USMC, US Army, and FAA are currently using ATC simulators with speech recognition provided by Adacel Systems Inc (ASI).@@@@1@21@@danf@17-8-2009 10790810@unknown@formal@none@1@S@Adacel's MaxSim software uses speech recognition and synthetic speech to enable the trainee to control aircraft and ground vehicles in the simulation without the need for pseudo pilots.@@@@1@28@@danf@17-8-2009 10790820@unknown@formal@none@1@S@Adacel's ATC In A Box Software provideds a synthetic ATC environment for flight simulators.@@@@1@14@@danf@17-8-2009 10790830@unknown@formal@none@1@S@The "real" pilot talks to a virtual controller using speech recognition and the virtual controller responds with synthetic speech.@@@@1@19@@danf@17-8-2009 10790840@unknown@formal@none@1@S@It will be an application format@@@@1@6@@danf@17-8-2009 10790850@unknown@formal@none@1@S@===Telephony and other domains===@@@@1@4@@danf@17-8-2009 10790860@unknown@formal@none@1@S@ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread.@@@@1@22@@danf@17-8-2009 10790870@unknown@formal@none@1@S@Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use.@@@@1@29@@danf@17-8-2009 10790880@unknown@formal@none@1@S@The improvement of mobile processor speeds let create speech-enabled Symbian and Windows Mobile Smartphones.@@@@1@14@@danf@17-8-2009 10790890@unknown@formal@none@1@S@Current speech-to-text programs are too large and require too much CPU power to be practical for the Pocket PC.@@@@1@19@@danf@17-8-2009 10790900@unknown@formal@none@1@S@Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands.@@@@1@17@@danf@17-8-2009 10790910@unknown@formal@none@1@S@Leading software vendors in this field are: Microsoft Corporation (Microsoft Voice Command); Nuance Communications (Nuance Voice Control); Vito Technology (VITO Voice2Go); Speereo Software (Speereo Voice Translator).@@@@1@26@@danf@17-8-2009 10790920@unknown@formal@none@1@S@===People with Disabilities===@@@@1@3@@danf@17-8-2009 10790930@unknown@formal@none@1@S@People with disabilities are another part of the population that benefit from using speech recognition programs.@@@@1@16@@danf@17-8-2009 10790940@unknown@formal@none@1@S@It is especially useful for people who have difficulty with or are unable to use their hands, from mild repetitive stress injuries to involved disabilities that require alternative input for support with accessing the computer.@@@@1@35@@danf@17-8-2009 10790950@unknown@formal@none@1@S@In fact, people who used the keyboard a lot and developed [[Repetitive Strain Injury|RSI]] became an urgent early market for speech recognition.@@@@1@22@@danf@17-8-2009 10790960@unknown@formal@none@1@S@Speech recognition is used in [[deaf]] [[telephony]], such as [[spinvox]] voice-to-text voicemail, [[relay services]], and [[Telecommunications Relay Service#Captioned_telephone|captioned telephone]].@@@@1@19@@danf@17-8-2009 10790970@unknown@formal@none@1@S@===Further applications===@@@@1@2@@danf@17-8-2009 10790980@unknown@formal@none@1@S@*Automatic translation@@@@1@2@@danf@17-8-2009 10790990@unknown@formal@none@1@S@*Automotive speech recognition (e.g., [[Ford Sync]])@@@@1@6@@danf@17-8-2009 10791000@unknown@formal@none@1@S@*Telematics (e.g. vehicle Navigation Systems)@@@@1@5@@danf@17-8-2009 10791010@unknown@formal@none@1@S@*Court reporting (Realtime Voice Writing)@@@@1@5@@danf@17-8-2009 10791020@unknown@formal@none@1@S@*[[Hands-free computing]]: voice command recognition computer [[user interface]]@@@@1@8@@danf@17-8-2009 10791030@unknown@formal@none@1@S@*[[Home automation]]@@@@1@2@@danf@17-8-2009 10791040@unknown@formal@none@1@S@*[[Interactive voice response]]@@@@1@3@@danf@17-8-2009 10791050@unknown@formal@none@1@S@*[[Mobile telephony]], including mobile email@@@@1@5@@danf@17-8-2009 10791060@unknown@formal@none@1@S@*[[Multimodal interaction]]@@@@1@2@@danf@17-8-2009 10791070@unknown@formal@none@1@S@*[[Pronunciation]] evaluation in computer-aided language learning applications@@@@1@7@@danf@17-8-2009 10791080@unknown@formal@none@1@S@*[[Robotics]]@@@@1@1@@danf@17-8-2009 10791090@unknown@formal@none@1@S@*[[Transcription (linguistics)|Transcription]] (digital speech-to-text).@@@@1@4@@danf@17-8-2009 10791100@unknown@formal@none@1@S@*Speech-to-Text (Transcription of speech into mobile text messages)@@@@1@8@@danf@17-8-2009 10791110@unknown@formal@none@1@S@==Performance of speech recognition systems==@@@@1@5@@danf@17-8-2009 10791120@unknown@formal@none@1@S@The performance of speech recognition systems is usually specified in terms of accuracy and speed.@@@@1@15@@danf@17-8-2009 10791130@unknown@formal@none@1@S@Accuracy may be measured in terms of performance accuracy which is usually rated with [[word error rate]] (WER), whereas speed is measured with the [[real time factor]].@@@@1@27@@danf@17-8-2009 10791140@unknown@formal@none@1@S@Other measures of accuracy include [[Single Word Error Rate]] (SWER) and [[Command Success Rate]] (CSR).@@@@1@15@@danf@17-8-2009 10791150@unknown@formal@none@1@S@Most speech recognition users would tend to agree that dictation machines can achieve very high performance in controlled conditions.@@@@1@19@@danf@17-8-2009 10791160@unknown@formal@none@1@S@There is some confusion, however, over the interchangeability of the terms "speech recognition" and "dictation".@@@@1@15@@danf@17-8-2009 10791170@unknown@formal@none@1@S@Commercially available speaker-dependent dictation systems usually require only a short period of training (sometimes also called `enrollment') and may successfully capture continuous speech with a large vocabulary at normal pace with a very high accuracy.@@@@1@35@@danf@17-8-2009 10791180@unknown@formal@none@1@S@Most commercial companies claim that recognition software can achieve between 98% to 99% accuracy if operated under optimal conditions.@@@@1@19@@danf@17-8-2009 10791190@unknown@formal@none@1@S@`Optimal conditions' usually assume that users:@@@@1@6@@danf@17-8-2009 10791200@unknown@formal@none@1@S@* have speech characteristics which match the training data,@@@@1@9@@danf@17-8-2009 10791210@unknown@formal@none@1@S@* can achieve proper speaker adaptation, and@@@@1@7@@danf@17-8-2009 10791220@unknown@formal@none@1@S@* work in a clean noise environment (e.g. quiet office or laboratory space).@@@@1@13@@danf@17-8-2009 10791230@unknown@formal@none@1@S@This explains why some users, especially those whose speech is heavily accented, might achieve recognition rates much lower than expected.@@@@1@20@@danf@17-8-2009 10791240@unknown@formal@none@1@S@Speech recognition in video has become a popular search technology used by several video search companies.@@@@1@16@@danf@17-8-2009 10791250@unknown@formal@none@1@S@Limited vocabulary systems, requiring no training, can recognize a small number of words (for instance, the ten digits) as spoken by most speakers.@@@@1@23@@danf@17-8-2009 10791260@unknown@formal@none@1@S@Such systems are popular for routing incoming phone calls to their destinations in large organizations.@@@@1@15@@danf@17-8-2009 10791270@unknown@formal@none@1@S@Both [[Acoustic Model|acoustic modeling]] and [[language model]]ing are important parts of modern statistically-based speech recognition algorithms.@@@@1@16@@danf@17-8-2009 10791280@unknown@formal@none@1@S@Hidden Markov models (HMMs) are widely used in many systems.@@@@1@10@@danf@17-8-2009 10791290@unknown@formal@none@1@S@Language modeling has many other applications such as [[smart keyboard]] and [[document classification]].@@@@1@13@@danf@17-8-2009 10791300@unknown@formal@none@1@S@===Hidden Markov model (HMM)-based speech recognition===@@@@1@6@@danf@17-8-2009 10791310@unknown@formal@none@1@S@Modern general-purpose speech recognition systems are generally based on [[Hidden Markov Model|HMMs]].@@@@1@12@@danf@17-8-2009 10791320@unknown@formal@none@1@S@These are statistical models which output a sequence of symbols or quantities.@@@@1@12@@danf@17-8-2009 10791330@unknown@formal@none@1@S@One possible reason why HMMs are used in speech recognition is that a speech signal could be viewed as a piecewise stationary signal or a short-time stationary signal.@@@@1@28@@danf@17-8-2009 10791340@unknown@formal@none@1@S@That is, one could assume in a short-time in the range of 10 milliseconds, speech could be approximated as a [[stationary process]].@@@@1@22@@danf@17-8-2009 10791350@unknown@formal@none@1@S@Speech could thus be thought of as a [[Markov model]] for many stochastic processes.@@@@1@14@@danf@17-8-2009 10791360@unknown@formal@none@1@S@Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use.@@@@1@21@@danf@17-8-2009 10791370@unknown@formal@none@1@S@In speech recognition, the hidden Markov model would output a sequence of ''n''-dimensional real-valued vectors (with ''n'' being a small integer, such as 10), outputting one of these every 10 milliseconds.@@@@1@31@@danf@17-8-2009 10791380@unknown@formal@none@1@S@The vectors would consist of [[cepstrum|cepstral]] coefficients, which are obtained by taking a [[Fourier transform]] of a short time window of speech and decorrelating the spectrum using a [[cosine transform]], then taking the first (most significant) coefficients.@@@@1@37@@danf@17-8-2009 10791390@unknown@formal@none@1@S@The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians which will give a likelihood for each observed vector.@@@@1@31@@danf@17-8-2009 10791400@unknown@formal@none@1@S@Each word, or (for more general speech recognition systems), each [[phoneme]], will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.@@@@1@44@@danf@17-8-2009 10791410@unknown@formal@none@1@S@Described above are the core elements of the most common, HMM-based approach to speech recognition.@@@@1@15@@danf@17-8-2009 10791420@unknown@formal@none@1@S@Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above.@@@@1@24@@danf@17-8-2009 10791430@unknown@formal@none@1@S@A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation.@@@@1@64@@danf@17-8-2009 10791440@unknown@formal@none@1@S@The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semitied covariance transform (also known as maximum likelihood linear transform, or MLLT).@@@@1@60@@danf@17-8-2009 10791450@unknown@formal@none@1@S@Many systems use so-called discriminative training techniques which dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data.@@@@1@28@@danf@17-8-2009 10791460@unknown@formal@none@1@S@Examples are maximum [[mutual information]] (MMI), minimum classification error (MCE) and minimum phone error (MPE).@@@@1@15@@danf@17-8-2009 10791470@unknown@formal@none@1@S@Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the [[Viterbi algorithm]] to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model which includes both the acoustic and language model information, or combining it statically beforehand (the [[finite state transducer]], or FST, approach).@@@@1@72@@danf@17-8-2009 10791480@unknown@formal@none@1@S@===Dynamic time warping (DTW)-based speech recognition===@@@@1@6@@danf@17-8-2009 10791490@unknown@formal@none@1@S@Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach.@@@@1@25@@danf@17-8-2009 10791500@unknown@formal@none@1@S@Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed.@@@@1@19@@danf@17-8-2009 10791510@unknown@formal@none@1@S@For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another they were walking more quickly, or even if there were accelerations and decelerations during the course of one observation.@@@@1@42@@danf@17-8-2009 10791520@unknown@formal@none@1@S@DTW has been applied to video, audio, and graphics – indeed, any data which can be turned into a linear representation can be analyzed with DTW.@@@@1@25@@danf@17-8-2009 10791530@unknown@formal@none@1@S@A well known application has been automatic speech recognition, to cope with different speaking speeds.@@@@1@15@@danf@17-8-2009 10791540@unknown@formal@none@1@S@In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other.@@@@1@35@@danf@17-8-2009 10791550@unknown@formal@none@1@S@This sequence alignment method is often used in the context of hidden Markov models.@@@@1@14@@danf@17-8-2009 10791560@unknown@formal@none@1@S@==Further information==@@@@1@2@@danf@17-8-2009 10791570@unknown@formal@none@1@S@Popular speech recognition conferences held each year or two include ICASSP, Eurospeech/ICSLP (now named Interspeech) and the IEEE ASRU.@@@@1@19@@danf@17-8-2009 10791580@unknown@formal@none@1@S@Conferences in the field of [[Natural Language Processing]], such as ACL, NAACL, EMNLP, and HLT, are beginning to include papers on speech processing.@@@@1@23@@danf@17-8-2009 10791590@unknown@formal@none@1@S@Important journals include the [[IEEE]] Transactions on Speech and Audio Processing (now named [[IEEE]] Transactions on Audio, Speech and Language Processing), Computer Speech and Language, and Speech Communication.@@@@1@28@@danf@17-8-2009 10791600@unknown@formal@none@1@S@Books like "Fundamentals of Speech Recognition" by [[Lawrence Rabiner]] can be useful to acquire basic knowledge but may not be fully up to date (1993).@@@@1@25@@danf@17-8-2009 10791610@unknown@formal@none@1@S@Another good source can be "Statistical Methods for Speech Recognition" by Frederick Jelinek which is a more up to date book (1998).@@@@1@22@@danf@17-8-2009 10791620@unknown@formal@none@1@S@Even more up to date is "Computer Speech", by Manfred R. Schroeder, second edition published in 2004.@@@@1@17@@danf@17-8-2009 10791630@unknown@formal@none@1@S@A good insight into the techniques used in the best modern systems can be gained by paying attention to government sponsored evaluations such as those organised by [[DARPA]] (the largest speech recognition-related project ongoing as of 2007 is the GALE project, which involves both speech recognition and translation components).@@@@1@49@@danf@17-8-2009 10791640@unknown@formal@none@1@S@In terms of freely available resources, the [[HTK (software)|HTK]] book (and the accompanying HTK toolkit) is one place to start to both learn about speech recognition and to start experimenting.@@@@1@30@@danf@17-8-2009 10791650@unknown@formal@none@1@S@Another such resource is [[Carnegie Mellon University]]'s SPHINX toolkit.@@@@1@9@@danf@17-8-2009 10791660@unknown@formal@none@1@S@The AT&T libraries [http://www.research.att.com/projects/mohri/fsm FSM Library], [http://www.research.att.com/projects/mohri/grm GRM library], and [http://www.cs.nyu.edu/~mohri DCD library] are also general software libraries for large-vocabulary speech recognition.@@@@1@22@@danf@17-8-2009 10791670@unknown@formal@none@1@S@A useful review of the area of robustness in ASR is provided by Junqua and Haton (1995).@@@@1@17@@danf@17-8-2009 10800010@unknown@formal@none@1@S@
Speech synthesis
@@@@1@2@@danf@17-8-2009 10800020@unknown@formal@none@1@S@'''Speech synthesis''' is the artificial production of human [[Speech communication|speech]].@@@@1@10@@danf@17-8-2009 10800030@unknown@formal@none@1@S@A computer system used for this purpose is called a '''speech synthesizer''', and can be implemented in [[software]] or [[Computer hardware|hardware]].@@@@1@21@@danf@17-8-2009 10800040@unknown@formal@none@1@S@A '''text-to-speech (TTS)''' system converts normal language text into speech; other systems render [[symbolic linguistic representation]]s like [[phonetic transcription]]s into speech.@@@@1@21@@danf@17-8-2009 10800050@unknown@formal@none@1@S@Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a [[database]].@@@@1@17@@danf@17-8-2009 10800060@unknown@formal@none@1@S@Systems differ in the size of the stored speech units; a system that stores [[phone]]s or [[diphone]]s provides the largest output range, but may lack clarity.@@@@1@26@@danf@17-8-2009 10800070@unknown@formal@none@1@S@For specific usage domains, the storage of entire words or sentences allows for high-quality output.@@@@1@15@@danf@17-8-2009 10800080@unknown@formal@none@1@S@Alternatively, a synthesizer can incorporate a model of the [[vocal tract]] and other human voice characteristics to create a completely "synthetic" voice output.@@@@1@23@@danf@17-8-2009 10800090@unknown@formal@none@1@S@The quality of a speech synthesizer is judged by its similarity to the human voice, and by its ability to be understood.@@@@1@22@@danf@17-8-2009 10800100@unknown@formal@none@1@S@An intelligible text-to-speech program allows people with [[visual impairment]]s or [[reading disability|reading disabilities]] to listen to written works on a home computer.@@@@1@22@@danf@17-8-2009 10800110@unknown@formal@none@1@S@Many computer operating systems have included speech synthesizers since the early 1980s.@@@@1@12@@danf@17-8-2009 10800120@unknown@formal@none@1@S@== Overview of text processing ==@@@@1@6@@danf@17-8-2009 10800130@unknown@formal@none@1@S@A text-to-speech system (or "engine") is composed of two parts: a [[front-end]] and a back-end.@@@@1@15@@danf@17-8-2009 10800140@unknown@formal@none@1@S@The front-end has two major tasks.@@@@1@6@@danf@17-8-2009 10800150@unknown@formal@none@1@S@First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words.@@@@1@17@@danf@17-8-2009 10800160@unknown@formal@none@1@S@This process is often called ''text normalization'', ''pre-processing'', or ''[[tokenization]]''.@@@@1@10@@danf@17-8-2009 10800170@unknown@formal@none@1@S@The front-end then assigns [[phonetic transcription]]s to each word, and divides and marks the text into [[prosody (linguistics)|prosodic units]], like [[phrase]]s, [[clause]]s, and [[sentence (linguistics)|sentence]]s.@@@@1@25@@danf@17-8-2009 10800180@unknown@formal@none@1@S@The process of assigning phonetic transcriptions to words is called ''text-to-phoneme'' or ''[[grapheme]]-to-phoneme'' conversion.@@@@1@14@@danf@17-8-2009 10800190@unknown@formal@none@1@S@Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end.@@@@1@18@@danf@17-8-2009 10800200@unknown@formal@none@1@S@The back-end—often referred to as the ''synthesizer''—then converts the symbolic linguistic representation into sound.@@@@1@14@@danf@17-8-2009 10800210@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10800220@unknown@formal@none@1@S@Long before [[electronics|electronic]] [[signal processing]] was invented, there were those who tried to build machines to create human speech.@@@@1@19@@danf@17-8-2009 10800230@unknown@formal@none@1@S@Some early legends of the existence of [[Brazen Head|"speaking heads"]] involved [[Pope Silvester II|Gerbert of Aurillac]] (d. 1003 AD), [[Albertus Magnus]] (1198–1280), and [[Roger Bacon]] (1214–1294).@@@@1@26@@danf@17-8-2009 10800240@unknown@formal@none@1@S@In 1779, the [[Denmark|Danish]] scientist Christian Kratzenstein, working at the [[Russian Academy of Sciences]], built models of the human [[vocal tract]] that could produce the five long [[vowel]] sounds (in [[help:IPA|International Phonetic Alphabet]] notation, they are {{IPA|[aː]}}, {{IPA|[eː]}}, {{IPA|[iː]}}, {{IPA|[oː]}} and {{IPA|[uː]}}).@@@@1@42@@danf@17-8-2009 10800250@unknown@formal@none@1@S@This was followed by the [[bellows]]-operated "acoustic-mechanical speech machine" by [[Wolfgang von Kempelen]] of [[Vienna]], [[Austria]], described in a 1791 paper.@@@@1@21@@danf@17-8-2009 10800260@unknown@formal@none@1@S@This machine added models of the tongue and lips, enabling it to produce [[consonant]]s as well as vowels.@@@@1@18@@danf@17-8-2009 10800270@unknown@formal@none@1@S@In 1837, [[Charles Wheatstone]] produced a "speaking machine" based on von Kempelen's design, and in 1857, M. Faber built the "Euphonia".@@@@1@21@@danf@17-8-2009 10800280@unknown@formal@none@1@S@Wheatstone's design was resurrected in 1923 by Paget.@@@@1@8@@danf@17-8-2009 10800290@unknown@formal@none@1@S@In the 1930s, [[Bell Labs]] developed the [[Vocoder|VOCODER]], a keyboard-operated electronic speech analyzer and synthesizer that was said to be clearly intelligible.@@@@1@22@@danf@17-8-2009 10800300@unknown@formal@none@1@S@[[Homer Dudley]] refined this device into the VODER, which he exhibited at the [[1939 New York World's Fair]].@@@@1@18@@danf@17-8-2009 10800310@unknown@formal@none@1@S@The [[Pattern playback]] was built by [[Franklin S. Cooper|Dr. Franklin S. Cooper]] and his colleagues at [[Haskins Laboratories]] in the late 1940s and completed in 1950.@@@@1@26@@danf@17-8-2009 10800320@unknown@formal@none@1@S@There were several different versions of this hardware device but only one currently survives.@@@@1@14@@danf@17-8-2009 10800330@unknown@formal@none@1@S@The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound.@@@@1@19@@danf@17-8-2009 10800340@unknown@formal@none@1@S@Using this device, [[Alvin Liberman]] and colleagues were able to discover acoustic cues for the perception of [[phonetic]] segments (consonants and vowels).@@@@1@22@@danf@17-8-2009 10800350@unknown@formal@none@1@S@Early electronic speech synthesizers sounded robotic and were often barely intelligible.@@@@1@11@@danf@17-8-2009 10800360@unknown@formal@none@1@S@However, the quality of synthesized speech has steadily improved, and output from contemporary speech synthesis systems is sometimes indistinguishable from actual human speech.@@@@1@23@@danf@17-8-2009 10800370@unknown@formal@none@1@S@=== Electronic devices ===@@@@1@4@@danf@17-8-2009 10800380@unknown@formal@none@1@S@The first computer-based speech synthesis systems were created in the late 1950s, and the first complete text-to-speech system was completed in 1968.@@@@1@22@@danf@17-8-2009 10800390@unknown@formal@none@1@S@In 1961, physicist [[John Larry Kelly, Jr]] and colleague Louis Gerstman used an [[IBM 704]] computer to synthesize speech, an event among the most prominent in the history of [[Bell Labs]].@@@@1@31@@danf@17-8-2009 10800400@unknown@formal@none@1@S@Kelly's voice recorder synthesizer (vocoder) recreated the song "[[Daisy Bell]]", with musical accompaniment from [[Max Mathews]].@@@@1@16@@danf@17-8-2009 10800410@unknown@formal@none@1@S@Coincidentally, [[Arthur C. Clarke]] was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility.@@@@1@19@@danf@17-8-2009 10800420@unknown@formal@none@1@S@Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel ''[[2001: A Space Odyssey (novel)|2001: A Space Odyssey]]'', where the [[HAL 9000]] computer sings the same song as it is being put to sleep by astronaut [[Dave Bowman]].@@@@1@49@@danf@17-8-2009 10800430@unknown@formal@none@1@S@Despite the success of purely electronic speech synthesis, research is still being conducted into mechanical speech synthesizers.@@@@1@17@@danf@17-8-2009 10800440@unknown@formal@none@1@S@== Synthesizer technologies ==@@@@1@4@@danf@17-8-2009 10800450@unknown@formal@none@1@S@The most important qualities of a speech synthesis system are ''naturalness'' and ''[[Intelligibility]]''.@@@@1@13@@danf@17-8-2009 10800460@unknown@formal@none@1@S@Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood.@@@@1@21@@danf@17-8-2009 10800470@unknown@formal@none@1@S@The ideal speech synthesizer is both natural and intelligible.@@@@1@9@@danf@17-8-2009 10800480@unknown@formal@none@1@S@Speech synthesis systems usually try to maximize both characteristics.@@@@1@9@@danf@17-8-2009 10800490@unknown@formal@none@1@S@The two primary technologies for generating synthetic speech waveforms are ''concatenative synthesis'' and ''[[formant]] synthesis''.@@@@1@15@@danf@17-8-2009 10800500@unknown@formal@none@1@S@Each technology has strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used.@@@@1@21@@danf@17-8-2009 10800510@unknown@formal@none@1@S@=== Concatenative synthesis ===@@@@1@4@@danf@17-8-2009 10800520@unknown@formal@none@1@S@Concatenative synthesis is based on the [[concatenation]] (or stringing together) of segments of recorded speech.@@@@1@15@@danf@17-8-2009 10800530@unknown@formal@none@1@S@Generally, concatenative synthesis produces the most natural-sounding synthesized speech.@@@@1@9@@danf@17-8-2009 10800540@unknown@formal@none@1@S@However, differences between natural variations in speech and the nature of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output.@@@@1@26@@danf@17-8-2009 10800550@unknown@formal@none@1@S@There are three main sub-types of concatenative synthesis.@@@@1@8@@danf@17-8-2009 10800560@unknown@formal@none@1@S@
==== Unit selection synthesis ====@@@@1@6@@danf@17-8-2009 10800570@unknown@formal@none@1@S@Unit selection synthesis uses large [[database]]s of recorded speech.@@@@1@9@@danf@17-8-2009 10800580@unknown@formal@none@1@S@During database creation, each recorded utterance is segmented into some or all of the following: individual [[phone]]s, [[diphone]]s, half-phones, [[syllable]]s, [[morpheme]]s, [[word]]s, [[phrase]]s, and [[Sentence (linguistics)|sentence]]s.@@@@1@26@@danf@17-8-2009 10800590@unknown@formal@none@1@S@Typically, the division into segments is done using a specially modified [[speech recognition|speech recognizer]] set to a "forced alignment" mode with some manual correction afterward, using visual representations such as the [[waveform]] and [[spectrogram]].@@@@1@34@@danf@17-8-2009 10800600@unknown@formal@none@1@S@An [[index (database)|index]] of the units in the speech database is then created based on the segmentation and acoustic parameters like the [[fundamental frequency]] ([[pitch (music)|pitch]]), duration, position in the syllable, and neighboring phones.@@@@1@34@@danf@17-8-2009 10800610@unknown@formal@none@1@S@At [[runtime]], the desired target utterance is created by determining the best chain of candidate units from the database (unit selection).@@@@1@21@@danf@17-8-2009 10800620@unknown@formal@none@1@S@This process is typically achieved using a specially weighted [[decision tree]].@@@@1@11@@danf@17-8-2009 10800630@unknown@formal@none@1@S@Unit selection provides the greatest naturalness, because it applies only a small amount of [[digital signal processing]] (DSP) to the recorded speech.@@@@1@22@@danf@17-8-2009 10800640@unknown@formal@none@1@S@DSP often makes recorded speech sound less natural, although some systems use a small amount of signal processing at the point of concatenation to smooth the waveform.@@@@1@27@@danf@17-8-2009 10800650@unknown@formal@none@1@S@The output from the best unit-selection systems is often indistinguishable from real human voices, especially in contexts for which the TTS system has been tuned.@@@@1@25@@danf@17-8-2009 10800660@unknown@formal@none@1@S@However, maximum naturalness typically require unit-selection speech databases to be very large, in some systems ranging into the [[gigabyte]]s of recorded data, representing dozens of hours of speech.@@@@1@28@@danf@17-8-2009 10800670@unknown@formal@none@1@S@Also, unit selection algorithms have been known to select segments from a place that results in less than ideal synthesis (e.g. minor words become unclear) even when a better choice exists in the database.
@@@@1@34@@danf@17-8-2009 10800680@unknown@formal@none@1@S@
==== Diphone synthesis ====@@@@1@5@@danf@17-8-2009 10800690@unknown@formal@none@1@S@Diphone synthesis uses a minimal speech database containing all the [[diphone]]s (sound-to-sound transitions) occurring in a language.@@@@1@17@@danf@17-8-2009 10800700@unknown@formal@none@1@S@The number of diphones depends on the [[phonotactics]] of the language: for example, Spanish has about 800 diphones, and German about 2500.@@@@1@22@@danf@17-8-2009 10800710@unknown@formal@none@1@S@In diphone synthesis, only one example of each diphone is contained in the speech database.@@@@1@15@@danf@17-8-2009 10800720@unknown@formal@none@1@S@At runtime, the target [[prosody]] of a sentence is superimposed on these minimal units by means of [[digital signal processing]] techniques such as [[linear predictive coding]], [[PSOLA]] or [[MBROLA]].@@@@1@29@@danf@17-8-2009 10800730@unknown@formal@none@1@S@The quality of the resulting speech is generally worse than that of unit-selection systems, but more natural-sounding than the output of formant synthesizers.@@@@1@23@@danf@17-8-2009 10800740@unknown@formal@none@1@S@Diphone synthesis suffers from the sonic glitches of concatenative synthesis and the robotic-sounding nature of formant synthesis, and has few of the advantages of either approach other than small size.@@@@1@30@@danf@17-8-2009 10800750@unknown@formal@none@1@S@As such, its use in commercial applications is declining, although it continues to be used in research because there are a number of freely available software implementations.
@@@@1@27@@danf@17-8-2009 10800760@unknown@formal@none@1@S@
==== Domain-specific synthesis ====@@@@1@5@@danf@17-8-2009 10800770@unknown@formal@none@1@S@Domain-specific synthesis concatenates prerecorded words and phrases to create complete utterances.@@@@1@11@@danf@17-8-2009 10800780@unknown@formal@none@1@S@It is used in applications where the variety of texts the system will output is limited to a particular domain, like transit schedule announcements or weather reports.@@@@1@27@@danf@17-8-2009 10800790@unknown@formal@none@1@S@The technology is very simple to implement, and has been in commercial use for a long time, in devices like talking clocks and calculators.@@@@1@24@@danf@17-8-2009 10800800@unknown@formal@none@1@S@The level of naturalness of these systems can be very high because the variety of sentence types is limited, and they closely match the prosody and intonation of the original recordings.@@@@1@31@@danf@17-8-2009 10800810@unknown@formal@none@1@S@Because these systems are limited by the words and phrases in their databases, they are not general-purpose and can only synthesize the combinations of words and phrases with which they have been preprogrammed.@@@@1@33@@danf@17-8-2009 10800820@unknown@formal@none@1@S@The blending of words within naturally spoken language however can still cause problems unless the many variations are taken into account.@@@@1@21@@danf@17-8-2009 10800830@unknown@formal@none@1@S@For example, in [[Rhotic and non-rhotic accents|non-rhotic]] dialects of English the in words like {{IPA|/ˈkliːə/}} is usually only pronounced when the following word has a vowel as its first letter (e.g. is realized as {{IPA|/ˌkliːəɹˈɑʊt/}}).@@@@1@39@@danf@17-8-2009 10800840@unknown@formal@none@1@S@Likewise in [[French language|French]], many final consonants become no longer silent if followed by a word that begins with a vowel, an effect called [[Liaison (French)|liaison]].@@@@1@26@@danf@17-8-2009 10800845@unknown@formal@none@1@S@This [[alternation (linguistics)|alternation]] cannot be reproduced by a simple word-concatenation system, which would require additional complexity to be [[context-sensitive]].
@@@@1@19@@danf@17-8-2009 10800850@unknown@formal@none@1@S@=== Formant synthesis ===@@@@1@4@@danf@17-8-2009 10800860@unknown@formal@none@1@S@[[Formant]] synthesis does not use human speech samples at runtime.@@@@1@10@@danf@17-8-2009 10800870@unknown@formal@none@1@S@Instead, the synthesized speech output is created using an acoustic model.@@@@1@11@@danf@17-8-2009 10800880@unknown@formal@none@1@S@Parameters such as [[fundamental frequency]], [[phonation|voicing]], and [[noise]] levels are varied over time to create a [[waveform]] of artificial speech.@@@@1@20@@danf@17-8-2009 10800890@unknown@formal@none@1@S@This method is sometimes called ''rules-based synthesis''; however, many concatenative systems also have rules-based components.@@@@1@15@@danf@17-8-2009 10800900@unknown@formal@none@1@S@Many systems based on formant synthesis technology generate artificial, robotic-sounding speech that would never be mistaken for human speech.@@@@1@19@@danf@17-8-2009 10800910@unknown@formal@none@1@S@However, maximum naturalness is not always the goal of a speech synthesis system, and formant synthesis systems have advantages over concatenative systems.@@@@1@22@@danf@17-8-2009 10800920@unknown@formal@none@1@S@Formant-synthesized speech can be reliably intelligible, even at very high speeds, avoiding the acoustic glitches that commonly plague concatenative systems.@@@@1@20@@danf@17-8-2009 10800930@unknown@formal@none@1@S@High-speed synthesized speech is used by the visually impaired to quickly navigate computers using a [[screen reader]].@@@@1@17@@danf@17-8-2009 10800940@unknown@formal@none@1@S@Formant synthesizers are usually smaller programs than concatenative systems because they do not have a database of speech samples.@@@@1@19@@danf@17-8-2009 10800950@unknown@formal@none@1@S@They can therefore be used in [[embedded system]]s, where [[data storage device|memory]] and [[microprocessor]] power are especially limited.@@@@1@18@@danf@17-8-2009 10800960@unknown@formal@none@1@S@Because formant-based systems have complete control of all aspects of the output speech, a wide variety of prosodies and [[Intonation (linguistics)|intonation]]s can be output, conveying not just questions and statements, but a variety of emotions and tones of voice.@@@@1@39@@danf@17-8-2009 10800970@unknown@formal@none@1@S@Examples of non-real-time but highly accurate intonation control in formant synthesis include the work done in the late 1970s for the [[Texas Instruments]] toy [[Speak & Spell (game)|Speak & Spell]], and in the early 1980s [[Sega]] [[Video arcade|arcade]] machines.@@@@1@39@@danf@17-8-2009 10800980@unknown@formal@none@1@S@Creating proper intonation for these projects was painstaking, and the results have yet to be matched by real-time text-to-speech interfaces.@@@@1@20@@danf@17-8-2009 10800990@unknown@formal@none@1@S@=== Articulatory synthesis ===@@@@1@4@@danf@17-8-2009 10801000@unknown@formal@none@1@S@[[Articulatory synthesis]] refers to computational techniques for synthesizing speech based on models of the human [[vocal tract]] and the articulation processes occurring there.@@@@1@23@@danf@17-8-2009 10801010@unknown@formal@none@1@S@The first articulatory synthesizer regularly used for laboratory experiments was developed at [[Haskins Laboratories]] in the mid-1970s by [[Philip Rubin]], Tom Baer, and Paul Mermelstein.@@@@1@25@@danf@17-8-2009 10801020@unknown@formal@none@1@S@This synthesizer, known as ASY, was based on vocal tract models developed at [[Bell Laboratories]] in the 1960s and 1970s by Paul Mermelstein, Cecil Coker, and colleagues.@@@@1@27@@danf@17-8-2009 10801030@unknown@formal@none@1@S@Until recently, articulatory synthesis models have not been incorporated into commercial speech synthesis systems.@@@@1@14@@danf@17-8-2009 10801040@unknown@formal@none@1@S@A notable exception is the [[NeXT]]-based system originally developed and marketed by Trillium Sound Research, a spin-off company of the [[University of Calgary]], where much of the original research was conducted.@@@@1@31@@danf@17-8-2009 10801050@unknown@formal@none@1@S@Following the demise of the various incarnations of NeXT (started by [[Steve Jobs]] in the late 1980s and merged with Apple Computer in 1997), the Trillium software was published under the [[GNU General Public License]], with work continuing as ''gnuspeech''.@@@@1@40@@danf@17-8-2009 10801060@unknown@formal@none@1@S@The system, first marketed in 1994, provides full articulatory-based text-to-speech conversion using a waveguide or transmission-line analog of the human oral and nasal tracts controlled by Carré's "distinctive region model".@@@@1@30@@danf@17-8-2009 10801070@unknown@formal@none@1@S@=== HMM-based synthesis ===@@@@1@4@@danf@17-8-2009 10801080@unknown@formal@none@1@S@HMM-based synthesis is a synthesis method based on [[hidden Markov model]]s.@@@@1@11@@danf@17-8-2009 10801090@unknown@formal@none@1@S@In this system, the [[frequency spectrum]] ([[vocal tract]]), [[fundamental frequency]] (vocal source), and duration ([[prosody]]) of speech are modeled simultaneously by HMMs.@@@@1@22@@danf@17-8-2009 10801100@unknown@formal@none@1@S@Speech [[waveforms]] are generated from HMMs themselves based on the [[maximum likelihood]] criterion.@@@@1@13@@danf@17-8-2009 10801110@unknown@formal@none@1@S@=== Sinewave synthesis ===@@@@1@4@@danf@17-8-2009 10801120@unknown@formal@none@1@S@[[Sinewave synthesis]] is a technique for synthesizing speech by replacing the [[formants]] (main bands of energy) with pure tone whistles.@@@@1@20@@danf@17-8-2009 10801130@unknown@formal@none@1@S@== Challenges ==@@@@1@3@@danf@17-8-2009 10801140@unknown@formal@none@1@S@=== Text normalization challenges ===@@@@1@5@@danf@17-8-2009 10801150@unknown@formal@none@1@S@The process of normalizing text is rarely straightforward.@@@@1@8@@danf@17-8-2009 10801160@unknown@formal@none@1@S@Texts are full of [[Heteronym (linguistics)|heteronym]]s, [[number]]s, and [[abbreviation]]s that all require expansion into a phonetic representation.@@@@1@17@@danf@17-8-2009 10801170@unknown@formal@none@1@S@There are many spellings in English which are pronounced differently based on context.@@@@1@13@@danf@17-8-2009 10801180@unknown@formal@none@1@S@For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".@@@@1@19@@danf@17-8-2009 10801190@unknown@formal@none@1@S@Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective.@@@@1@26@@danf@17-8-2009 10801200@unknown@formal@none@1@S@As a result, various [[heuristic]] techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.@@@@1@27@@danf@17-8-2009 10801210@unknown@formal@none@1@S@Deciding how to convert numbers is another problem that TTS systems have to address.@@@@1@14@@danf@17-8-2009 10801220@unknown@formal@none@1@S@It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five."@@@@1@20@@danf@17-8-2009 10801230@unknown@formal@none@1@S@However, numbers occur in many different contexts; when a year or part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a [[social security number]], as "one three two five".@@@@1@36@@danf@17-8-2009 10801240@unknown@formal@none@1@S@A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.@@@@1@33@@danf@17-8-2009 10801250@unknown@formal@none@1@S@Similarly, abbreviations can be ambiguous.@@@@1@5@@danf@17-8-2009 10801260@unknown@formal@none@1@S@For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street".@@@@1@30@@danf@17-8-2009 10801270@unknown@formal@none@1@S@TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs.@@@@1@29@@danf@17-8-2009 10801280@unknown@formal@none@1@S@=== Text-to-phoneme challenges ===@@@@1@4@@danf@17-8-2009 10801290@unknown@formal@none@1@S@Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme or grapheme-to-phoneme conversion ([[phoneme]] is the term used by linguists to describe distinctive sounds in a language).@@@@1@42@@danf@17-8-2009 10801300@unknown@formal@none@1@S@The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciations is stored by the program.@@@@1@30@@danf@17-8-2009 10801310@unknown@formal@none@1@S@Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary.@@@@1@29@@danf@17-8-2009 10801320@unknown@formal@none@1@S@The other approach is rule-based, in which pronunciation rules are applied to words to determine their pronunciations based on their spellings.@@@@1@21@@danf@17-8-2009 10801330@unknown@formal@none@1@S@This is similar to the "sounding out", or [[synthetic phonics]], approach to learning reading.@@@@1@14@@danf@17-8-2009 10801340@unknown@formal@none@1@S@Each approach has advantages and drawbacks.@@@@1@6@@danf@17-8-2009 10801350@unknown@formal@none@1@S@The dictionary-based approach is quick and accurate, but completely fails if it is given a word which is not in its dictionary.@@@@1@22@@danf@17-8-2009 10801360@unknown@formal@none@1@S@As dictionary size grows, so too does the memory space requirements of the synthesis system.@@@@1@15@@danf@17-8-2009 10801370@unknown@formal@none@1@S@On the other hand, the rule-based approach works on any input, but the complexity of the rules grows substantially as the system takes into account irregular spellings or pronunciations.@@@@1@29@@danf@17-8-2009 10801380@unknown@formal@none@1@S@(Consider that the word "of" is very common in English, yet is the only word in which the letter "f" is pronounced [v].)@@@@1@23@@danf@17-8-2009 10801390@unknown@formal@none@1@S@As a result, nearly all speech synthesis systems use a combination of these approaches.@@@@1@14@@danf@17-8-2009 10801400@unknown@formal@none@1@S@Some languages, like [[Spanish language|Spanish]], have a very regular writing system, and the prediction of the pronunciation of words based on their spellings is quite successful.@@@@1@26@@danf@17-8-2009 10801410@unknown@formal@none@1@S@Speech synthesis systems for such languages often use the rule-based method extensively, resorting to dictionaries only for those few words, like foreign names and borrowings, whose pronunciations are not obvious from their spellings.@@@@1@33@@danf@17-8-2009 10801420@unknown@formal@none@1@S@On the other hand, speech synthesis systems for languages like [[English language|English]], which have extremely irregular spelling systems, are more likely to rely on dictionaries, and to use rule-based methods only for unusual words, or words that aren't in their dictionaries.@@@@1@41@@danf@17-8-2009 10801430@unknown@formal@none@1@S@=== Evaluation challenges ===@@@@1@4@@danf@17-8-2009 10801440@unknown@formal@none@1@S@It is very difficult to evaluate speech synthesis systems consistently because there is no subjective criterion and usually different organizations use different speech data.@@@@1@24@@danf@17-8-2009 10801450@unknown@formal@none@1@S@The quality of a speech synthesis system highly depends on the quality of recording.@@@@1@14@@danf@17-8-2009 10801460@unknown@formal@none@1@S@Therefore, evaluating speech synthesis systems is almost the same as evaluating the recording skills.@@@@1@14@@danf@17-8-2009 10801470@unknown@formal@none@1@S@Recently researchers start evaluating speech synthesis systems using the common speech dataset.@@@@1@12@@danf@17-8-2009 10801480@unknown@formal@none@1@S@This may help people to compare the difference between technologies rather than recordings.@@@@1@13@@danf@17-8-2009 10801490@unknown@formal@none@1@S@=== Prosodics and emotional content ===@@@@1@6@@danf@17-8-2009 10801500@unknown@formal@none@1@S@A recent study reported in the journal "'''Speech Communication'''" by Amy Drahota and colleagues at the [[University of Portsmouth]], [[UK]], reported that listeners to voice recordings could determine, at better than chance levels, whether or not the speaker was smiling.@@@@1@40@@danf@17-8-2009 10801510@unknown@formal@none@1@S@It was suggested that identification of the vocal features which signal emotional content may be used to help make synthesized speech sound more natural.@@@@1@24@@danf@17-8-2009 10801520@unknown@formal@none@1@S@== Dedicated hardware ==@@@@1@4@@danf@17-8-2009 10801530@unknown@formal@none@1@S@*Votrax@@@@1@1@@danf@17-8-2009 10801540@unknown@formal@none@1@S@**SC-01A (analog formant)@@@@1@3@@danf@17-8-2009 10801550@unknown@formal@none@1@S@**SC-02 / SSI-263 / "Arctic 263"@@@@1@6@@danf@17-8-2009 10801560@unknown@formal@none@1@S@*General Instruments SP0256-AL2 (CTS256A-AL2, MEA8000)@@@@1@5@@danf@17-8-2009 10801570@unknown@formal@none@1@S@*Magnevation SpeakJet (www.speechchips.com TTS256)@@@@1@4@@danf@17-8-2009 10801580@unknown@formal@none@1@S@*Savage Innovations SoundGin@@@@1@3@@danf@17-8-2009 10801590@unknown@formal@none@1@S@*National Semiconductor DT1050 Digitalker (Mozer)@@@@1@5@@danf@17-8-2009 10801600@unknown@formal@none@1@S@*Silicon Systems SSI 263 (analog formant)@@@@1@6@@danf@17-8-2009 10801610@unknown@formal@none@1@S@*Texas Instruments@@@@1@2@@danf@17-8-2009 10801620@unknown@formal@none@1@S@**TMS5110A (LPC)@@@@1@2@@danf@17-8-2009 10801630@unknown@formal@none@1@S@**TMS5200@@@@1@1@@danf@17-8-2009 10801640@unknown@formal@none@1@S@*Oki Semiconductor@@@@1@2@@danf@17-8-2009 10801650@unknown@formal@none@1@S@**MSM5205@@@@1@1@@danf@17-8-2009 10801660@unknown@formal@none@1@S@**MSM5218RS (ADPCM)@@@@1@2@@danf@17-8-2009 10801670@unknown@formal@none@1@S@*Toshiba T6721A@@@@1@2@@danf@17-8-2009 10801680@unknown@formal@none@1@S@*Philips PCF8200@@@@1@2@@danf@17-8-2009 10801690@unknown@formal@none@1@S@== Computer operating systems or outlets with speech synthesis ==@@@@1@10@@danf@17-8-2009 10801700@unknown@formal@none@1@S@=== Apple ===@@@@1@3@@danf@17-8-2009 10801710@unknown@formal@none@1@S@The first speech system integrated into an [[operating system]] was [[Apple Computer]]'s [[PlainTalk#The original MacInTalk|MacInTalk]] in 1984.@@@@1@17@@danf@17-8-2009 10801720@unknown@formal@none@1@S@Since the 1980s Macintosh Computers offered text to speech capabilities through The MacinTalk software.@@@@1@14@@danf@17-8-2009 10801730@unknown@formal@none@1@S@In the early 1990s Apple expanded its capabilities offering system wide text-to-speech support.@@@@1@13@@danf@17-8-2009 10801740@unknown@formal@none@1@S@With the introduction of faster PowerPC based computers they included higher quality voice sampling.@@@@1@14@@danf@17-8-2009 10801750@unknown@formal@none@1@S@Apple also introduced [[speech recognition]] into its systems which provided a fluid command set.@@@@1@14@@danf@17-8-2009 10801760@unknown@formal@none@1@S@More recently, Apple has added sample-based voices.@@@@1@7@@danf@17-8-2009 10801770@unknown@formal@none@1@S@Starting as a curiosity, the speech system of Apple [[Macintosh (computer)|Macintosh]] has evolved into a cutting edge fully-supported program, [[PlainTalk]], for people with vision problems.@@@@1@25@@danf@17-8-2009 10801780@unknown@formal@none@1@S@[[VoiceOver]] was included in Mac OS Tiger and more recently Mac OS Leopard.@@@@1@13@@danf@17-8-2009 10801790@unknown@formal@none@1@S@The voice shipping with Mac OS X 10.5 ("Leopard") is called "Alex" and features the taking of realistic-sounding breaths between sentences, as well as improved clarity at high read rates.@@@@1@30@@danf@17-8-2009 10801800@unknown@formal@none@1@S@=== AmigaOS ===@@@@1@3@@danf@17-8-2009 10801810@unknown@formal@none@1@S@The second operating system with advanced speech synthesis capabilities was [[AmigaOS]], introduced in 1985.@@@@1@14@@danf@17-8-2009 10801820@unknown@formal@none@1@S@The voice synthesis was licensed by [[Commodore International]] from a third-party software house (Don't Ask Software, now Softvoice, Inc.) and it featured a complete system of voice emulation, with both male and female voices and "stress" indicator markers, made possible by advanced features of the [[Amiga]] hardware audio [[chipset]].@@@@1@49@@danf@17-8-2009 10801830@unknown@formal@none@1@S@It was divided into a narrator device and a translator library.@@@@1@11@@danf@17-8-2009 10801840@unknown@formal@none@1@S@Amiga [[AmigaOS#Speech synthesis|Speak Handler]] featured a text-to-speech translator.@@@@1@8@@danf@17-8-2009 10801850@unknown@formal@none@1@S@AmigaOS considered speech synthesis a virtual hardware device, so the user could even redirect console output to it.@@@@1@18@@danf@17-8-2009 10801860@unknown@formal@none@1@S@Some Amiga programs, such as word processors, made extensive use of the speech system.@@@@1@14@@danf@17-8-2009 10801870@unknown@formal@none@1@S@=== Microsoft Windows ===@@@@1@4@@danf@17-8-2009 10801880@unknown@formal@none@1@S@Modern [[Microsoft Windows|Windows]] systems use [[Speech Application Programming Interface#SAPI 1-4 API family|SAPI4]]- and [[Speech Application Programming Interface#SAPI 5 API family|SAPI5]]-based speech systems that include a [[speech recognition]] engine (SRE).@@@@1@29@@danf@17-8-2009 10801890@unknown@formal@none@1@S@SAPI 4.0 was available on Microsoft-based operating systems as a third-party add-on for systems like [[Windows 95]] and [[Windows 98]].@@@@1@20@@danf@17-8-2009 10801900@unknown@formal@none@1@S@[[Windows 2000]] added a speech synthesis program called [[Microsoft Narrator|Narrator]], directly available to users.@@@@1@14@@danf@17-8-2009 10801910@unknown@formal@none@1@S@All Windows-compatible programs could make use of speech synthesis features, available through menus once installed on the system.@@@@1@18@@danf@17-8-2009 10801920@unknown@formal@none@1@S@[[Microsoft Speech Server]] is a complete package for voice synthesis and recognition, for commercial applications such as [[call centers]].@@@@1@19@@danf@17-8-2009 10801930@unknown@formal@none@1@S@=== Internet ===@@@@1@3@@danf@17-8-2009 10801940@unknown@formal@none@1@S@Currently, there are a number of [[Application software|applications]], [[plugin]]s and [[gadget]]s that can read messages directly from an [[e-mail client]] and web pages from a [[web browser]].@@@@1@27@@danf@17-8-2009 10801950@unknown@formal@none@1@S@Some specialized [[Computer software|software]] can narrate [[RSS|RSS-feeds]].@@@@1@7@@danf@17-8-2009 10801960@unknown@formal@none@1@S@On one hand, online RSS-narrators simplify information delivery by allowing users to listen to their favourite news sources and to convert them to [[podcast]]s.@@@@1@24@@danf@17-8-2009 10801970@unknown@formal@none@1@S@On the other hand, on-line RSS-readers are available on almost any [[Personal computer|PC]] connected to the Internet.@@@@1@17@@danf@17-8-2009 10801980@unknown@formal@none@1@S@Users can download generated audio files to portable devices, e.g. with a help of [[podcast]] receiver, and listen to them while walking, jogging or commuting to work.@@@@1@27@@danf@17-8-2009 10801990@unknown@formal@none@1@S@A growing field in internet based TTS technology is web-based assistive technology, e.g. Talklets.@@@@1@14@@danf@17-8-2009 10802000@unknown@formal@none@1@S@This web based approach to a traditionally locally installed form of software application can afford many of those requiring software for accessibility reason, the ability to access web content from public machines, or those belonging to others.@@@@1@37@@danf@17-8-2009 10802010@unknown@formal@none@1@S@While responsiveness is not as immediate as that of applications installed locally, the 'access anywhere' nature of it is the key benefit to this approach.@@@@1@25@@danf@17-8-2009 10802020@unknown@formal@none@1@S@=== Others ===@@@@1@3@@danf@17-8-2009 10802030@unknown@formal@none@1@S@* Some models of Texas Instruments home computers produced in 1979 and 1981 ([[TI-99/4A|Texas Instruments TI-99/4 and TI-99/4A]]) were capable of text-to-phoneme synthesis or reciting complete words and phrases (text-to-dictionary), using a very popular Speech Synthesizer peripheral.@@@@1@37@@danf@17-8-2009 10802040@unknown@formal@none@1@S@TI used a proprietary [[codec]] to embed complete spoken phrases into applications, primarily video games.@@@@1@15@@danf@17-8-2009 10802050@unknown@formal@none@1@S@* Systems that operate on free and open source software systems including [[Linux|GNU/Linux]] are various, and include [[open-source]] programs such as the [[Festival Speech Synthesis System]] which uses diphone-based synthesis (and can use a limited number of [[MBROLA]] voices), and gnuspeech which uses articulatory synthesis from the [[Free Software Foundation]].@@@@1@50@@danf@17-8-2009 10802060@unknown@formal@none@1@S@Other commercial vendor software also runs on GNU/Linux.@@@@1@8@@danf@17-8-2009 10802070@unknown@formal@none@1@S@* Several commercial companies are also developing speech synthesis systems (this list is reporting them just for the sake of information, not endorsing any specific product): [http://www.acapela-group.com Acapela Group], [[AT&T]], [[Cepstral]], [[DECtalk]], [[IBM ViaVoice]], [[IVONA|IVONA TTS]], [http://www.loquendo.com Loquendo TTS], [http://www.neospeech.com NeoSpeech TTS], [[Nuance Communications]], Rhetorical Systems, [http://www.svox.com SVOX] and [http://www.yakitome.com YAKiToMe!].@@@@1@51@@danf@17-8-2009 10802080@unknown@formal@none@1@S@* Companies which developed speech synthesis systems but which are no longer in this business include BeST Speech (bought by L&H), [[Lernout & Hauspie]] (bankrupt), [[SpeechWorks]] (bought by Nuance)@@@@1@29@@danf@17-8-2009 10802090@unknown@formal@none@1@S@== Speech synthesis markup languages ==@@@@1@6@@danf@17-8-2009 10802100@unknown@formal@none@1@S@A number of [[markup language]]s have been established for the rendition of text as speech in an [[XML]]-compliant format.@@@@1@19@@danf@17-8-2009 10802110@unknown@formal@none@1@S@The most recent is [[Speech Synthesis Markup Language]] (SSML), which became a [[W3C recommendation]] in 2004.@@@@1@16@@danf@17-8-2009 10802120@unknown@formal@none@1@S@Older speech synthesis markup languages include Java Speech Markup Language ([[JSML]]) and [[SABLE]].@@@@1@13@@danf@17-8-2009 10802130@unknown@formal@none@1@S@Although each of these was proposed as a standard, none of them has been widely adopted.@@@@1@16@@danf@17-8-2009 10802140@unknown@formal@none@1@S@Speech synthesis markup languages are distinguished from dialogue markup languages.@@@@1@10@@danf@17-8-2009 10802150@unknown@formal@none@1@S@[[VoiceXML]], for example, includes tags related to speech recognition, dialogue management and touchtone dialing, in addition to text-to-speech markup.@@@@1@19@@danf@17-8-2009 10802160@unknown@formal@none@1@S@==Applications==@@@@1@1@@danf@17-8-2009 10802170@unknown@formal@none@1@S@===Accessibility===@@@@1@1@@danf@17-8-2009 10802180@unknown@formal@none@1@S@Speech synthesis has long been a vital [[assistive technology]] tool and its application in this area is significant and widespread.@@@@1@20@@danf@17-8-2009 10802190@unknown@formal@none@1@S@It allows environmental barriers to be removed for people with a wide range of disabilities.@@@@1@15@@danf@17-8-2009 10802200@unknown@formal@none@1@S@The longest application has been in the use of [[screenreaders]] for people with [[visual impairment]], but text-to-speech systems are now commonly used by people with [[dyslexia]] and other reading difficulties as well as by pre-literate youngsters.@@@@1@36@@danf@17-8-2009 10802210@unknown@formal@none@1@S@They are also frequently employed to aid those with severe [[speech impairment]] usually through a dedicated [[voice output communication aid]].@@@@1@20@@danf@17-8-2009 10802220@unknown@formal@none@1@S@===News service===@@@@1@2@@danf@17-8-2009 10802230@unknown@formal@none@1@S@Sites such as [[Ananova]] have used speech synthesis to convert written news to audio content, which can be used for mobile applications.@@@@1@22@@danf@17-8-2009 10802240@unknown@formal@none@1@S@===Entertainment===@@@@1@1@@danf@17-8-2009 10802250@unknown@formal@none@1@S@Speech synthesis techniques are used as well in the entertainment productions such as games, anime and similar.@@@@1@17@@danf@17-8-2009 10802260@unknown@formal@none@1@S@In 2007, Animo Limited announced the development of a software application package based on its speech synthesis software FineSpeech, explicitly geared towards customers in the entertainment industries, able to generate narration and lines of dialogue according to user specifications.@@@@1@39@@danf@17-8-2009 10802270@unknown@formal@none@1@S@Software such as [[Vocaloid]] can generate singing voices via lyrics and melody.@@@@1@12@@danf@17-8-2009 10802280@unknown@formal@none@1@S@This is also the aim of the Singing Computer project (which uses the [[GNU General Public License|GPL]] software [[GNU LilyPond|Lilypond]] and [[Festival Speech Synthesis System|Festival]]) to help blind people check their lyric input.@@@@1@33@@danf@17-8-2009 10810010@unknown@formal@none@1@S@
Statistical classification
@@@@1@2@@danf@17-8-2009 10810020@unknown@formal@none@1@S@'''Statistical classification''' is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a [[training set]] of previously labeled items.@@@@1@43@@danf@17-8-2009 10810030@unknown@formal@none@1@S@Formally, the problem can be stated as follows: given training data \\{(\\mathbf{x_1},y_1),\\dots,(\\mathbf{x_n}, y_n)\\} produce a classifier h:\\mathcal{X}\\rightarrow\\mathcal{Y} which maps an object \\mathbf{x} \\in \\mathcal{X} to its classification label y \\in \\mathcal{Y}.@@@@1@31@@danf@17-8-2009 10810040@unknown@formal@none@1@S@For example, if the problem is filtering spam, then \\mathbf{x_i} is some representation of an email and y is either "Spam" or "Non-Spam".@@@@1@23@@danf@17-8-2009 10810050@unknown@formal@none@1@S@Statistical classification algorithms are typically used in [[pattern recognition]] systems.@@@@1@10@@danf@17-8-2009 10810060@unknown@formal@none@1@S@'''Note:''' in [[community ecology]], the term "classification" is synonymous with what is commonly known (in [[machine learning]]) as [[data clustering|clustering]].@@@@1@20@@danf@17-8-2009 10810070@unknown@formal@none@1@S@See that article for more information about purely [[unsupervised learning|unsupervised]] techniques.@@@@1@11@@danf@17-8-2009 10810080@unknown@formal@none@1@S@* The second problem is to consider classification as an [[estimation]] problem, where the goal is to estimate a function of the form@@@@1@23@@danf@17-8-2009 10810090@unknown@formal@none@1@S@:P({\\rm class}|{\\vec x}) = f\\left(\\vec x;\\vec \\theta\\right) where the feature vector input is \\vec x, and the function f is typically parameterized by some parameters \\vec \\theta.@@@@1@27@@danf@17-8-2009 10810100@unknown@formal@none@1@S@In the [[Bayesian statistics|Bayesian]] approach to this problem, instead of choosing a single parameter vector \\vec \\theta, the result is integrated over all possible thetas, with the thetas weighted by how likely they are given the training data D:@@@@1@39@@danf@17-8-2009 10810110@unknown@formal@none@1@S@:P({\\rm class}|{\\vec x}) = \\int f\\left(\\vec x;\\vec \\theta\\right)P(\\vec \\theta|D) d\\vec \\theta@@@@1@11@@danf@17-8-2009 10810120@unknown@formal@none@1@S@* The third problem is related to the second, but the problem is to estimate the [[conditional probability|class-conditional probabilities]] P(\\vec x|{\\rm class}) and then use [[Bayes' rule]] to produce the class probability as in the second problem.@@@@1@37@@danf@17-8-2009 10810130@unknown@formal@none@1@S@Examples of classification algorithms include:@@@@1@5@@danf@17-8-2009 10810140@unknown@formal@none@1@S@* [[Linear classifier]]s@@@@1@3@@danf@17-8-2009 10810150@unknown@formal@none@1@S@** [[Fisher's linear discriminant]]@@@@1@4@@danf@17-8-2009 10810160@unknown@formal@none@1@S@** [[Logistic regression]]@@@@1@3@@danf@17-8-2009 10810170@unknown@formal@none@1@S@** [[Naive Bayes classifier]]@@@@1@4@@danf@17-8-2009 10810180@unknown@formal@none@1@S@** [[Perceptron]]@@@@1@2@@danf@17-8-2009 10810190@unknown@formal@none@1@S@** [[Support vector machine]]s@@@@1@4@@danf@17-8-2009 10810200@unknown@formal@none@1@S@* [[Quadratic classifier]]s@@@@1@3@@danf@17-8-2009 10810210@unknown@formal@none@1@S@* [[Nearest_neighbor_(pattern_recognition)|k-nearest neighbor]]@@@@1@3@@danf@17-8-2009 10810220@unknown@formal@none@1@S@* [[Boosting]]@@@@1@2@@danf@17-8-2009 10810230@unknown@formal@none@1@S@* [[Decision tree]]s@@@@1@3@@danf@17-8-2009 10810240@unknown@formal@none@1@S@** [[Random forest]]s@@@@1@3@@danf@17-8-2009 10810250@unknown@formal@none@1@S@* [[Artificial neural networks|Neural network]]s@@@@1@5@@danf@17-8-2009 10810260@unknown@formal@none@1@S@* [[Bayesian network]]s@@@@1@3@@danf@17-8-2009 10810270@unknown@formal@none@1@S@* [[Hidden Markov model]]s@@@@1@4@@danf@17-8-2009 10810280@unknown@formal@none@1@S@An intriguing problem in pattern recognition yet to be solved is the relationship between the problem to be solved (data to be classified) and the performance of various pattern recognition algorithms (classifiers).@@@@1@32@@danf@17-8-2009 10810290@unknown@formal@none@1@S@Van der Walt and Barnard (see reference section) investigated very specific artificial data sets to determine conditions under which certain classifiers perform better and worse than others.@@@@1@27@@danf@17-8-2009 10810300@unknown@formal@none@1@S@Classifier performance depends greatly on the characteristics of the data to be classified.@@@@1@13@@danf@17-8-2009 10810310@unknown@formal@none@1@S@There is no single classifier that works best on all given problems (a phenomenon that may be explained by the [[No free lunch in search and optimization|No-free-lunch theorem]]).@@@@1@28@@danf@17-8-2009 10810320@unknown@formal@none@1@S@Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance.@@@@1@21@@danf@17-8-2009 10810330@unknown@formal@none@1@S@Determining a suitable classifier for a given problem is however still more an art than a science.@@@@1@17@@danf@17-8-2009 10810340@unknown@formal@none@1@S@The most widely used classifiers are the [[Neural Network]] (Multi-layer Perceptron), [[Support Vector Machines]], [[KNN|k-Nearest Neighbours]], Gaussian Mixture Model, Gaussian, [[Naive Bayes]], [[Decision Tree]] and [[Radial Basis Function|RBF]] classifiers.@@@@1@29@@danf@17-8-2009 10810350@unknown@formal@none@1@S@== Evaluation ==@@@@1@3@@danf@17-8-2009 10810360@unknown@formal@none@1@S@The measures [[Precision and Recall]] are popular metrics used to evaluate the quality of a classification system.@@@@1@17@@danf@17-8-2009 10810370@unknown@formal@none@1@S@More recently, [[Receiver Operating Characteristic]] (ROC) curves have been used to evaluate the tradeoff between true- and false-positive rates of classification algorithms.@@@@1@22@@danf@17-8-2009 10810380@unknown@formal@none@1@S@==Application domains==@@@@1@2@@danf@17-8-2009 10810390@unknown@formal@none@1@S@* [[Computer vision]]@@@@1@3@@danf@17-8-2009 10810400@unknown@formal@none@1@S@** [[Medical Imaging]] and Medical Image Analysis@@@@1@7@@danf@17-8-2009 10810410@unknown@formal@none@1@S@** [[Optical character recognition]]@@@@1@4@@danf@17-8-2009 10810420@unknown@formal@none@1@S@* [[Geostatistics]]@@@@1@2@@danf@17-8-2009 10810430@unknown@formal@none@1@S@* [[Speech recognition]]@@@@1@3@@danf@17-8-2009 10810440@unknown@formal@none@1@S@* [[Handwriting recognition]]@@@@1@3@@danf@17-8-2009 10810450@unknown@formal@none@1@S@* [[Biometric]] identification@@@@1@3@@danf@17-8-2009 10810460@unknown@formal@none@1@S@* [[Natural language processing]]@@@@1@4@@danf@17-8-2009 10810470@unknown@formal@none@1@S@* [[Document classification]]@@@@1@3@@danf@17-8-2009 10810480@unknown@formal@none@1@S@* Internet [[search engines]]@@@@1@4@@danf@17-8-2009 10810490@unknown@formal@none@1@S@* [[Credit scoring]]@@@@1@3@@danf@17-8-2009 10820010@unknown@formal@none@1@S@
Statistical machine translation
@@@@1@3@@danf@17-8-2009 10820020@unknown@formal@none@1@S@'''Statistical machine translation''' ('''SMT''') is a [[machine translation]] paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual [[text corpora]].@@@@1@30@@danf@17-8-2009 10820030@unknown@formal@none@1@S@The statistical approach contrasts with the rule-based approaches to [[machine translation]] as well as with [[example-based machine translation]].@@@@1@18@@danf@17-8-2009 10820040@unknown@formal@none@1@S@The first ideas of statistical machine translation were introduced by [[Warren Weaver]] in 1949, including the ideas of applying [[Claude Shannon]]'s [[information theory]].@@@@1@23@@danf@17-8-2009 10820050@unknown@formal@none@1@S@Statistical machine translation was re-introduced in 1991 by researchers at [[IBM]]'s [[Thomas J. Watson Research Center]] and has contributed to the significant resurgence in interest in machine translation in recent years.@@@@1@31@@danf@17-8-2009 10820060@unknown@formal@none@1@S@As of 2006, it is by far the most widely-studied machine translation paradigm.@@@@1@13@@danf@17-8-2009 10820070@unknown@formal@none@1@S@==Benefits==@@@@1@1@@danf@17-8-2009 10820080@unknown@formal@none@1@S@The benefits of statistical machine translation over traditional paradigms that are most often cited are the following:@@@@1@17@@danf@17-8-2009 10820090@unknown@formal@none@1@S@* '''Better use of resources'''@@@@1@5@@danf@17-8-2009 10820100@unknown@formal@none@1@S@**There is a great deal of natural language in machine-readable format.@@@@1@11@@danf@17-8-2009 10820110@unknown@formal@none@1@S@**Generally, SMT systems are not tailored to any specific pair of languages.@@@@1@12@@danf@17-8-2009 10820120@unknown@formal@none@1@S@**Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages.@@@@1@23@@danf@17-8-2009 10820130@unknown@formal@none@1@S@* '''More natural translations'''@@@@1@4@@danf@17-8-2009 10820140@unknown@formal@none@1@S@The ideas behind statistical machine translation come out of [[information theory]].@@@@1@11@@danf@17-8-2009 10820150@unknown@formal@none@1@S@Essentially, the document is translated on the [[probability]] p(e|f) that a string e in native language (for example, English) is the translation of a string f in foreign language (for example, French).@@@@1@32@@danf@17-8-2009 10820160@unknown@formal@none@1@S@Generally, these probabilities are estimated using techniques of [[parameter estimation]].@@@@1@10@@danf@17-8-2009 10820170@unknown@formal@none@1@S@The [[Bayes Theorem]] is applied to p(e|f), the probability that the foreign string produces the native string to get p(e|f) \\propto p(f|e) p(e), where the [[translation model]] p(f|e) is the probability that the native string is the translation of the foreign string, and the [[language model]] p(e) is the probability of seeing that native string.@@@@1@55@@danf@17-8-2009 10820180@unknown@formal@none@1@S@Mathematically speaking, finding the best translation \\tilde{e} is done by picking up the one that gives the highest probability:@@@@1@19@@danf@17-8-2009 10820190@unknown@formal@none@1@S@: \\tilde{e} = arg \\max_{e \\in e^*} p(e|f) = arg \\max_{e\\in e^*} p(f|e) p(e) .@@@@1@15@@danf@17-8-2009 10820200@unknown@formal@none@1@S@For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e^* in the native language.@@@@1@24@@danf@17-8-2009 10820210@unknown@formal@none@1@S@Performing the search efficiently is the work of a [[machine translation decoder]] that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality.@@@@1@34@@danf@17-8-2009 10820220@unknown@formal@none@1@S@This trade-off between quality and time usage can also be found in [[speech recognition]].@@@@1@14@@danf@17-8-2009 10820230@unknown@formal@none@1@S@As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough.@@@@1@29@@danf@17-8-2009 10820240@unknown@formal@none@1@S@Language models are typically approximated by smoothed ''n''-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages.@@@@1@34@@danf@17-8-2009 10820250@unknown@formal@none@1@S@The statistical translation models were initially [[word]] based (Models 1-5 from [[IBM]]), but significant advances were made with the introduction of [[phrase]] based models.@@@@1@24@@danf@17-8-2009 10820260@unknown@formal@none@1@S@Recent work has incorporated [[syntax]] or quasi-syntactic structures.@@@@1@8@@danf@17-8-2009 10820270@unknown@formal@none@1@S@==Word-based translation==@@@@1@2@@danf@17-8-2009 10820280@unknown@formal@none@1@S@In word-based translation, translated elements are words.@@@@1@7@@danf@17-8-2009 10820290@unknown@formal@none@1@S@Typically, the number of words in translated sentences are different due to compound words, morphology and idioms.@@@@1@17@@danf@17-8-2009 10820300@unknown@formal@none@1@S@The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces.@@@@1@23@@danf@17-8-2009 10820310@unknown@formal@none@1@S@Simple word-based translation is not able to translate language pairs with fertility rates different from one.@@@@1@16@@danf@17-8-2009 10820320@unknown@formal@none@1@S@To make word-based translation systems manage, for instance, high fertility rates, the system could be able to map a single word to multiple words, but not vice versa.@@@@1@28@@danf@17-8-2009 10820330@unknown@formal@none@1@S@For instance, if we are translating from French to English, each word in English could produce zero or more French words.@@@@1@21@@danf@17-8-2009 10820340@unknown@formal@none@1@S@But there's no way to group two English words producing a single French word.@@@@1@14@@danf@17-8-2009 10820350@unknown@formal@none@1@S@An example of a word-based translation system is the freely available [[GIZA++]] package ([[GPL]]ed), which includes [[IBM]] models.@@@@1@18@@danf@17-8-2009 10820360@unknown@formal@none@1@S@==Phrase-based translation==@@@@1@2@@danf@17-8-2009 10820370@unknown@formal@none@1@S@In phrase-based translation, the restrictions produced by word-based translation have been tried to reduce by translating sequences of words to sequences of words, where the lengths can differ.@@@@1@28@@danf@17-8-2009 10820380@unknown@formal@none@1@S@The sequences of words are called, for instance, blocks or phrases, but typically are not linguistic [[phrase]]s but phrases found using statistical methods from the corpus.@@@@1@26@@danf@17-8-2009 10820390@unknown@formal@none@1@S@Restricting the phrases to linguistic phrases has been shown to decrease translation quality.@@@@1@13@@danf@17-8-2009 10820400@unknown@formal@none@1@S@==Syntax-based translation==@@@@1@2@@danf@17-8-2009 10820410@unknown@formal@none@1@S@==Challenges with statistical machine translation==@@@@1@5@@danf@17-8-2009 10820420@unknown@formal@none@1@S@Problems that statistical machine translation have to deal with include@@@@1@10@@danf@17-8-2009 10820430@unknown@formal@none@1@S@=== Compound words ===@@@@1@4@@danf@17-8-2009 10820440@unknown@formal@none@1@S@=== Idioms ===@@@@1@3@@danf@17-8-2009 10820450@unknown@formal@none@1@S@=== Morphology ===@@@@1@3@@danf@17-8-2009 10820460@unknown@formal@none@1@S@=== Different word orders ===@@@@1@5@@danf@17-8-2009 10820470@unknown@formal@none@1@S@Word order in languages differ.@@@@1@5@@danf@17-8-2009 10820480@unknown@formal@none@1@S@Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages.@@@@1@32@@danf@17-8-2009 10820490@unknown@formal@none@1@S@There are also additional differences in word orders, for instance, where modifiers for nouns are located.@@@@1@16@@danf@17-8-2009 10820500@unknown@formal@none@1@S@In [[Speech Recognition]], the speech signal and the corresponding textual representation can be mapped to each other in blocks in order.@@@@1@21@@danf@17-8-2009 10820510@unknown@formal@none@1@S@This is not always the case with the same text in two languages.@@@@1@13@@danf@17-8-2009 10820520@unknown@formal@none@1@S@For SMT, the translation model is only able to translate small sequences of words and word order has to be taken into account somehow.@@@@1@24@@danf@17-8-2009 10820530@unknown@formal@none@1@S@Typical solution has been re-ordering models, where a distribution of location changes for each item of translation is approximated from aligned bi-text.@@@@1@22@@danf@17-8-2009 10820540@unknown@formal@none@1@S@Different location changes can be ranked with the help of the language model and the best can be selected.@@@@1@19@@danf@17-8-2009 10820550@unknown@formal@none@1@S@=== Syntax ===@@@@1@3@@danf@17-8-2009 10820560@unknown@formal@none@1@S@=== Out of vocabulary (OOV) words ===@@@@1@7@@danf@17-8-2009 10820570@unknown@formal@none@1@S@SMT systems store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated.@@@@1@30@@danf@17-8-2009 10820580@unknown@formal@none@1@S@Main reasons for out of vocabulary words are the limitation of training data, domain changes and morphology.@@@@1@17@@danf@17-8-2009 10830010@unknown@formal@none@1@S@
Statistics
@@@@1@1@@danf@17-8-2009 10830020@unknown@formal@none@1@S@'''Statistics''' is a [[Mathematics|mathematical science]] pertaining to the collection, analysis, interpretation or explanation, and presentation of [[data]].@@@@1@17@@danf@17-8-2009 10830030@unknown@formal@none@1@S@It is applicable to a wide variety of [[academic discipline]]s, from the [[Natural science|natural]] and [[social science]]s to the [[humanities]], government and business.@@@@1@23@@danf@17-8-2009 10830040@unknown@formal@none@1@S@Statistical methods can be used to summarize or describe a collection of data; this is called '''[[descriptive statistics]]'''.@@@@1@18@@danf@17-8-2009 10830050@unknown@formal@none@1@S@In addition, patterns in the data may be [[mathematical model|modeled]] in a way that accounts for [[random]]ness and uncertainty in the observations, and then used to draw inferences about the process or population being studied; this is called '''[[inferential statistics]]'''.@@@@1@40@@danf@17-8-2009 10830060@unknown@formal@none@1@S@Both descriptive and inferential statistics comprise '''applied statistics'''.@@@@1@8@@danf@17-8-2009 10830070@unknown@formal@none@1@S@There is also a discipline called '''[[mathematical statistics]]''', which is concerned with the theoretical basis of the subject.@@@@1@18@@danf@17-8-2009 10830080@unknown@formal@none@1@S@The word '''''statistics''''' is also the plural of '''''[[statistic]]''''' (singular), which refers to the result of applying a statistical algorithm to a set of data, as in [[economic statistics]], [[crime statistics]], etc.@@@@1@32@@danf@17-8-2009 10830090@unknown@formal@none@1@S@==History==@@@@1@1@@danf@17-8-2009 10830100@unknown@formal@none@1@S@:@@@@1@1@@danf@17-8-2009 10830110@unknown@formal@none@1@S@''"Five men, [[Hermann Conring|Conring]],[[Gottfried Achenwall| Achenwall]], [[Johann Peter Süssmilch|Süssmilch]], [[John Graunt|Graunt]] and [[William Petty|Petty]] have been honored by different writers as the founder of statistics."'' claims one source (Willcox, Walter (1938) ''The Founder of Statistics''.@@@@1@35@@danf@17-8-2009 10830120@unknown@formal@none@1@S@Review of the [[International Statistical Institute]] 5(4):321-328.)@@@@1@7@@danf@17-8-2009 10830130@unknown@formal@none@1@S@Some scholars pinpoint the origin of statistics to 1662, with the publication of "[[Observations on the Bills of Mortality]]" by John Graunt.@@@@1@22@@danf@17-8-2009 10830140@unknown@formal@none@1@S@Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data.@@@@1@19@@danf@17-8-2009 10830150@unknown@formal@none@1@S@The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general.@@@@1@23@@danf@17-8-2009 10830160@unknown@formal@none@1@S@Today, statistics is widely employed in government, business, and the natural and social sciences.@@@@1@14@@danf@17-8-2009 10830170@unknown@formal@none@1@S@Because of its empirical roots and its applications, statistics is generally considered not to be a subfield of pure mathematics, but rather a distinct branch of applied mathematics.@@@@1@28@@danf@17-8-2009 10830180@unknown@formal@none@1@S@Its mathematical foundations were laid in the 17th century with the development of [[probability theory]] by [[Pascal]] and [[Fermat]].@@@@1@19@@danf@17-8-2009 10830190@unknown@formal@none@1@S@Probability theory arose from the study of games of chance.@@@@1@10@@danf@17-8-2009 10830200@unknown@formal@none@1@S@The [[method of least squares]] was first described by [[Carl Friedrich Gauss]] around 1794.@@@@1@14@@danf@17-8-2009 10830210@unknown@formal@none@1@S@The use of modern [[computer]]s has expedited large-scale statistical computation, and has also made possible new methods that are impractical to perform manually.@@@@1@23@@danf@17-8-2009 10830220@unknown@formal@none@1@S@==Overview==@@@@1@1@@danf@17-8-2009 10830230@unknown@formal@none@1@S@In applying statistics to a scientific, industrial, or societal problem, one begins with a process or [[statistical population|population]] to be studied.@@@@1@21@@danf@17-8-2009 10830240@unknown@formal@none@1@S@This might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period.@@@@1@28@@danf@17-8-2009 10830250@unknown@formal@none@1@S@It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a [[time series]].@@@@1@24@@danf@17-8-2009 10830260@unknown@formal@none@1@S@For practical reasons, rather than compiling data about an entire population, one usually studies a chosen subset of the population, called a [[sampling (statistics)|sample]].@@@@1@24@@danf@17-8-2009 10830270@unknown@formal@none@1@S@Data are collected about the sample in an observational or [[experiment]]al setting.@@@@1@12@@danf@17-8-2009 10830280@unknown@formal@none@1@S@The data are then subjected to statistical analysis, which serves two related purposes: description and inference.@@@@1@16@@danf@17-8-2009 10830290@unknown@formal@none@1@S@*[[Descriptive statistics]] can be used to summarize the data, either numerically or graphically, to describe the sample.@@@@1@17@@danf@17-8-2009 10830300@unknown@formal@none@1@S@Basic examples of numerical descriptors include the [[mean]] and [[standard deviation]].@@@@1@11@@danf@17-8-2009 10830310@unknown@formal@none@1@S@Graphical summarizations include various kinds of charts and graphs.@@@@1@9@@danf@17-8-2009 10830320@unknown@formal@none@1@S@*[[Inferential statistics]] is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population.@@@@1@20@@danf@17-8-2009 10830330@unknown@formal@none@1@S@These inferences may take the form of answers to yes/no questions ([[hypothesis testing]]), estimates of numerical characteristics ([[estimation]]), descriptions of association ([[correlation]]), or modeling of relationships ([[regression analysis|regression]]).@@@@1@28@@danf@17-8-2009 10830340@unknown@formal@none@1@S@Other [[mathematical model|modeling]] techniques include [[ANOVA]], [[time series]], and [[data mining]].@@@@1@11@@danf@17-8-2009 10830350@unknown@formal@none@1@S@The concept of correlation is particularly noteworthy.@@@@1@7@@danf@17-8-2009 10830360@unknown@formal@none@1@S@Statistical analysis of a [[data set]] may reveal that two variables (that is, two properties of the population under consideration) tend to vary together, as if they are connected.@@@@1@29@@danf@17-8-2009 10830370@unknown@formal@none@1@S@For example, a study of annual income and age of death among people might find that poor people tend to have shorter lives than affluent people.@@@@1@26@@danf@17-8-2009 10830380@unknown@formal@none@1@S@The two variables are said to be correlated (which is a positive correlation in this case).@@@@1@16@@danf@17-8-2009 10830390@unknown@formal@none@1@S@However, one cannot immediately infer the existence of a causal relationship between the two variables.@@@@1@15@@danf@17-8-2009 10830400@unknown@formal@none@1@S@(See [[Correlation does not imply causation]].)@@@@1@6@@danf@17-8-2009 10830410@unknown@formal@none@1@S@The correlated phenomena could be caused by a third, previously unconsidered phenomenon, called a [[lurking variable]] or [[confounding variable]].@@@@1@19@@danf@17-8-2009 10830420@unknown@formal@none@1@S@If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole.@@@@1@25@@danf@17-8-2009 10830430@unknown@formal@none@1@S@A major problem lies in determining the extent to which the chosen sample is representative.@@@@1@15@@danf@17-8-2009 10830440@unknown@formal@none@1@S@Statistics offers methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place.@@@@1@30@@danf@17-8-2009 10830450@unknown@formal@none@1@S@(See [[experimental design]].)@@@@1@3@@danf@17-8-2009 10830460@unknown@formal@none@1@S@The fundamental mathematical concept employed in understanding such randomness is [[probability]].@@@@1@11@@danf@17-8-2009 10830470@unknown@formal@none@1@S@[[Mathematical statistics]] (also called [[statistical theory]]) is the branch of [[applied mathematics]] that uses probability theory and [[mathematical analysis|analysis]] to examine the theoretical basis of statistics.@@@@1@26@@danf@17-8-2009 10830480@unknown@formal@none@1@S@The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method.@@@@1@24@@danf@17-8-2009 10830490@unknown@formal@none@1@S@[[Misuse of statistics]] can produce subtle but serious errors in description and interpretation — subtle in the sense that even experienced professionals sometimes make such errors, serious in the sense that they may affect, for instance, social policy, medical practice and the reliability of structures such as bridges.@@@@1@48@@danf@17-8-2009 10830500@unknown@formal@none@1@S@Even when statistics is correctly applied, the results can be difficult for the non-expert to interpret.@@@@1@16@@danf@17-8-2009 10830510@unknown@formal@none@1@S@For example, the [[statistical significance]] of a trend in the data, which measures the extent to which the trend could be caused by random variation in the sample, may not agree with one's intuitive sense of its significance.@@@@1@38@@danf@17-8-2009 10830520@unknown@formal@none@1@S@The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as [[statistical literacy]].@@@@1@25@@danf@17-8-2009 10830530@unknown@formal@none@1@S@==Statistical methods==@@@@1@2@@danf@17-8-2009 10830540@unknown@formal@none@1@S@===Experimental and observational studies===@@@@1@4@@danf@17-8-2009 10830550@unknown@formal@none@1@S@A common goal for a statistical research project is to investigate [[causality]], and in particular to draw a conclusion on the effect of changes in the values of predictors or [[independent variable]]s on response or [[dependent variable]]s.@@@@1@37@@danf@17-8-2009 10830560@unknown@formal@none@1@S@There are two major types of causal statistical studies, experimental studies and observational studies.@@@@1@14@@danf@17-8-2009 10830570@unknown@formal@none@1@S@In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed.@@@@1@24@@danf@17-8-2009 10830580@unknown@formal@none@1@S@The difference between the two types lies in how the study is actually conducted.@@@@1@14@@danf@17-8-2009 10830590@unknown@formal@none@1@S@Each can be very effective.@@@@1@5@@danf@17-8-2009 10830600@unknown@formal@none@1@S@An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements.@@@@1@35@@danf@17-8-2009 10830610@unknown@formal@none@1@S@In contrast, an observational study does not involve experimental manipulation.@@@@1@10@@danf@17-8-2009 10830620@unknown@formal@none@1@S@Instead, data are gathered and correlations between predictors and response are investigated.@@@@1@12@@danf@17-8-2009 10830630@unknown@formal@none@1@S@An example of an experimental study is the famous [[Hawthorne studies]], which attempted to test the changes to the working environment at the Hawthorne plant of the Western Electric Company.@@@@1@30@@danf@17-8-2009 10830640@unknown@formal@none@1@S@The researchers were interested in determining whether increased illumination would increase the productivity of the [[assembly line]] workers.@@@@1@18@@danf@17-8-2009 10830650@unknown@formal@none@1@S@The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected the productivity.@@@@1@29@@danf@17-8-2009 10830660@unknown@formal@none@1@S@It turned out that the productivity indeed improved (under the experimental conditions).@@@@1@12@@danf@17-8-2009 10830663@unknown@formal@none@1@S@(See [[Hawthorne effect]].)@@@@1@3@@danf@17-8-2009 10830665@unknown@formal@none@1@S@However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a [[control group]] and [[double-blind|blindedness]].@@@@1@22@@danf@17-8-2009 10830670@unknown@formal@none@1@S@An example of an observational study is a study which explores the correlation between smoking and lung cancer.@@@@1@18@@danf@17-8-2009 10830680@unknown@formal@none@1@S@This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis.@@@@1@21@@danf@17-8-2009 10830690@unknown@formal@none@1@S@In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a [[case-control study]], and then look for the number of cases of lung cancer in each group.@@@@1@32@@danf@17-8-2009 10830700@unknown@formal@none@1@S@The basic steps of an experiment are;@@@@1@7@@danf@17-8-2009 10830710@unknown@formal@none@1@S@# Planning the research, including determining information sources, research subject selection, and [[ethics|ethical]] considerations for the proposed research and method.@@@@1@20@@danf@17-8-2009 10830720@unknown@formal@none@1@S@# [[Design of experiments]], concentrating on the system model and the interaction of independent and dependent variables.@@@@1@17@@danf@17-8-2009 10830730@unknown@formal@none@1@S@# [[summary statistics|Summarizing a collection of observations]] to feature their commonality by suppressing details.@@@@1@14@@danf@17-8-2009 10830740@unknown@formal@none@1@S@([[Descriptive statistics]])@@@@1@2@@danf@17-8-2009 10830750@unknown@formal@none@1@S@# Reaching consensus about what [[statistical inference|the observations tell]] about the world being observed.@@@@1@14@@danf@17-8-2009 10830760@unknown@formal@none@1@S@([[Statistical inference]])@@@@1@2@@danf@17-8-2009 10830770@unknown@formal@none@1@S@# Documenting / presenting the results of the study.@@@@1@9@@danf@17-8-2009 10830780@unknown@formal@none@1@S@===Levels of measurement===@@@@1@3@@danf@17-8-2009 10830790@unknown@formal@none@1@S@:''See: [[Levels of measurement|Stanley Stevens' "Scales of measurement" (1946): nominal, ordinal, interval, ratio]]''@@@@1@13@@danf@17-8-2009 10830800@unknown@formal@none@1@S@There are four types of measurements or [[level of measurement|levels of measurement]] or measurement scales used in statistics: nominal, ordinal, interval, and ratio.@@@@1@23@@danf@17-8-2009 10830810@unknown@formal@none@1@S@They have different degrees of usefulness in statistical [[research]].@@@@1@9@@danf@17-8-2009 10830820@unknown@formal@none@1@S@Ratio measurements have both a zero value defined and the distances between different measurements defined; they provide the greatest flexibility in statistical methods that can be used for analyzing the data.@@@@1@31@@danf@17-8-2009 10830830@unknown@formal@none@1@S@Interval measurements have meaningful distances between measurements defined, but have no meaningful zero value defined (as in the case with IQ measurements or with temperature measurements in [[Fahrenheit]]).@@@@1@28@@danf@17-8-2009 10830840@unknown@formal@none@1@S@Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values.@@@@1@16@@danf@17-8-2009 10830850@unknown@formal@none@1@S@Nominal measurements have no meaningful rank order among values.@@@@1@9@@danf@17-8-2009 10830860@unknown@formal@none@1@S@Since variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are called together as categorical variables, whereas ratio and interval measurements are grouped together as quantitative or [[continuous variables]] due to their numerical nature.@@@@1@40@@danf@17-8-2009 10830870@unknown@formal@none@1@S@===Statistical techniques===@@@@1@2@@danf@17-8-2009 10830880@unknown@formal@none@1@S@Some well known statistical [[Statistical hypothesis testing|test]]s and [[procedure]]s for [[research]] [[observation]]s are:@@@@1@13@@danf@17-8-2009 10830890@unknown@formal@none@1@S@* [[Student's t-test]]@@@@1@3@@danf@17-8-2009 10830900@unknown@formal@none@1@S@* [[chi-square test]]@@@@1@3@@danf@17-8-2009 10830910@unknown@formal@none@1@S@* [[Analysis of variance]] (ANOVA)@@@@1@5@@danf@17-8-2009 10830920@unknown@formal@none@1@S@* [[Mann-Whitney U]]@@@@1@3@@danf@17-8-2009 10830930@unknown@formal@none@1@S@* [[Regression analysis]]@@@@1@3@@danf@17-8-2009 10830940@unknown@formal@none@1@S@* [[Factor Analysis]]@@@@1@3@@danf@17-8-2009 10830950@unknown@formal@none@1@S@* [[Correlation]]@@@@1@2@@danf@17-8-2009 10830960@unknown@formal@none@1@S@* [[Pearson product-moment correlation coefficient]]@@@@1@5@@danf@17-8-2009 10830970@unknown@formal@none@1@S@* [[Spearman's rank correlation coefficient]]@@@@1@5@@danf@17-8-2009 10830980@unknown@formal@none@1@S@* [[Time Series Analysis]]@@@@1@4@@danf@17-8-2009 10830990@unknown@formal@none@1@S@==Specialized disciplines==@@@@1@2@@danf@17-8-2009 10831000@unknown@formal@none@1@S@Some fields of inquiry use applied statistics so extensively that they have [[specialized terminology]].@@@@1@14@@danf@17-8-2009 10831010@unknown@formal@none@1@S@These disciplines include:@@@@1@3@@danf@17-8-2009 10831020@unknown@formal@none@1@S@* [[Actuarial science]]@@@@1@3@@danf@17-8-2009 10831030@unknown@formal@none@1@S@* [[Applied information economics]]@@@@1@4@@danf@17-8-2009 10831040@unknown@formal@none@1@S@* [[Biostatistics]]@@@@1@2@@danf@17-8-2009 10831050@unknown@formal@none@1@S@* [[Bootstrapping (statistics)|Bootstrap]] & [[Resampling (statistics)|Jackknife Resampling]]@@@@1@7@@danf@17-8-2009 10831060@unknown@formal@none@1@S@* [[Business statistics]]@@@@1@3@@danf@17-8-2009 10831070@unknown@formal@none@1@S@* [[Data analysis]]@@@@1@3@@danf@17-8-2009 10831080@unknown@formal@none@1@S@* [[Data mining]] (applying statistics and [[pattern recognition]] to discover knowledge from data)@@@@1@13@@danf@17-8-2009 10831090@unknown@formal@none@1@S@* [[Demography]]@@@@1@2@@danf@17-8-2009 10831100@unknown@formal@none@1@S@* [[Economic statistics]] (Econometrics)@@@@1@4@@danf@17-8-2009 10831110@unknown@formal@none@1@S@* [[Energy statistics]]@@@@1@3@@danf@17-8-2009 10831120@unknown@formal@none@1@S@* [[Engineering statistics]]@@@@1@3@@danf@17-8-2009 10831130@unknown@formal@none@1@S@* [[Environmental Statistics]]@@@@1@3@@danf@17-8-2009 10831140@unknown@formal@none@1@S@* [[Epidemiology]]@@@@1@2@@danf@17-8-2009 10831150@unknown@formal@none@1@S@* [[Geography]] and [[Geographic Information Systems]], more specifically in [[Spatial analysis]]@@@@1@11@@danf@17-8-2009 10831160@unknown@formal@none@1@S@* [[Image processing]]@@@@1@3@@danf@17-8-2009 10831170@unknown@formal@none@1@S@* [[Multivariate statistics|Multivariate Analysis]]@@@@1@4@@danf@17-8-2009 10831180@unknown@formal@none@1@S@* [[Psychological statistics]]@@@@1@3@@danf@17-8-2009 10831190@unknown@formal@none@1@S@* [[Quality]]@@@@1@2@@danf@17-8-2009 10831200@unknown@formal@none@1@S@* [[Social statistics]]@@@@1@3@@danf@17-8-2009 10831210@unknown@formal@none@1@S@* [[Statistical literacy]]@@@@1@3@@danf@17-8-2009 10831220@unknown@formal@none@1@S@* [[Statistical modeling]]@@@@1@3@@danf@17-8-2009 10831230@unknown@formal@none@1@S@* [[Statistical survey]]s@@@@1@3@@danf@17-8-2009 10831240@unknown@formal@none@1@S@* Process analysis and [[chemometrics]] (for analysis of data from [[analytical chemistry]] and [[chemical engineering]])@@@@1@15@@danf@17-8-2009 10831250@unknown@formal@none@1@S@* [[Structured data analysis (statistics)]]@@@@1@5@@danf@17-8-2009 10831260@unknown@formal@none@1@S@* [[Survival analysis]]@@@@1@3@@danf@17-8-2009 10831270@unknown@formal@none@1@S@* [[Reliability engineering]]@@@@1@3@@danf@17-8-2009 10831280@unknown@formal@none@1@S@* Statistics in various sports, particularly [[Baseball statistics|baseball]] and [[Cricket statistics|cricket]]@@@@1@11@@danf@17-8-2009 10831290@unknown@formal@none@1@S@Statistics form a key basis tool in business and manufacturing as well.@@@@1@12@@danf@17-8-2009 10831300@unknown@formal@none@1@S@It is used to understand measurement systems variability, control processes (as in [[statistical process control]] or SPC), for summarizing data, and to make data-driven decisions.@@@@1@25@@danf@17-8-2009 10831310@unknown@formal@none@1@S@In these roles, it is a key tool, and perhaps the only reliable tool.@@@@1@14@@danf@17-8-2009 10831320@unknown@formal@none@1@S@==Statistical computing==@@@@1@2@@danf@17-8-2009 10831330@unknown@formal@none@1@S@The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science.@@@@1@28@@danf@17-8-2009 10831340@unknown@formal@none@1@S@Early statistical models were almost always from the class of [[linear model]]s, but powerful computers, coupled with suitable numerical [[algorithms]], caused an increased interest in [[nonlinear regression|nonlinear models]] (especially [[neural networks]] and [[decision tree]]s) as well as the creation of new types, such as [[generalized linear model|generalised linear model]]s and [[multilevel model]]s.@@@@1@52@@danf@17-8-2009 10831350@unknown@formal@none@1@S@Increased computing power has also led to the growing popularity of computationally-intensive methods based on [[resampling (statistics)|resampling]], such as permutation tests and the [[bootstrapping (statistics)|bootstrap]], while techniques such as [[Gibbs sampling]] have made Bayesian methods more feasible.@@@@1@37@@danf@17-8-2009 10831360@unknown@formal@none@1@S@The computer revolution has implications for the future of statistics with new emphasis on "experimental" and "empirical" statistics.@@@@1@18@@danf@17-8-2009 10831370@unknown@formal@none@1@S@A large number of both general and special purpose [[List of statistical packages|statistical software]] are now available.@@@@1@17@@danf@17-8-2009 10831380@unknown@formal@none@1@S@== Misuse ==@@@@1@3@@danf@17-8-2009 10831390@unknown@formal@none@1@S@:@@@@1@1@@danf@17-8-2009 10831400@unknown@formal@none@1@S@There is a general perception that statistical knowledge is all-too-frequently intentionally [[Misuse of statistics|misused]] by finding ways to interpret only the data that are favorable to the presenter.@@@@1@28@@danf@17-8-2009 10831410@unknown@formal@none@1@S@A famous saying attributed to [[Benjamin Disraeli]] is, "[[Lies, damned lies, and statistics|There are three kinds of lies: lies, damned lies, and statistics]]"; and Harvard President [[Lawrence Lowell]] wrote in 1909 that statistics, ''"like veal pies, are good if you know the person that made them, and are sure of the ingredients"''.@@@@1@52@@danf@17-8-2009 10831420@unknown@formal@none@1@S@If various studies appear to contradict one another, then the public may come to distrust such studies.@@@@1@17@@danf@17-8-2009 10831430@unknown@formal@none@1@S@For example, one study may suggest that a given diet or activity raises [[blood pressure]], while another may suggest that it lowers blood pressure.@@@@1@24@@danf@17-8-2009 10831440@unknown@formal@none@1@S@The discrepancy can arise from subtle variations in experimental design, such as differences in the patient groups or research protocols, that are not easily understood by the non-expert.@@@@1@28@@danf@17-8-2009 10831450@unknown@formal@none@1@S@(Media reports sometimes omit this vital contextual information entirely.)@@@@1@9@@danf@17-8-2009 10831460@unknown@formal@none@1@S@By choosing (or rejecting, or modifying) a certain sample, results can be manipulated.@@@@1@13@@danf@17-8-2009 10831470@unknown@formal@none@1@S@Such manipulations need not be malicious or devious; they can arise from unintentional biases of the researcher.@@@@1@17@@danf@17-8-2009 10831480@unknown@formal@none@1@S@The graphs used to summarize data can also be misleading.@@@@1@10@@danf@17-8-2009 10831490@unknown@formal@none@1@S@Deeper criticisms come from the fact that the hypothesis testing approach, widely used and in many cases required by law or regulation, forces one hypothesis (the [[null hypothesis]]) to be "favored", and can also seem to exaggerate the importance of minor differences in large studies.@@@@1@45@@danf@17-8-2009 10831500@unknown@formal@none@1@S@A difference that is highly statistically significant can still be of no practical significance.@@@@1@14@@danf@17-8-2009 10831510@unknown@formal@none@1@S@(See [[Hypothesis test#Criticism|criticism of hypothesis testing]] and [[Null hypothesis#Controversy|controversy over the null hypothesis]].)@@@@1@13@@danf@17-8-2009 10831520@unknown@formal@none@1@S@One response is by giving a greater emphasis on the [[p-value|''p''-value]] than simply reporting whether a hypothesis is rejected at the given level of significance.@@@@1@25@@danf@17-8-2009 10831530@unknown@formal@none@1@S@The ''p''-value, however, does not indicate the size of the effect.@@@@1@11@@danf@17-8-2009 10831540@unknown@formal@none@1@S@Another increasingly common approach is to report [[confidence interval]]s.@@@@1@9@@danf@17-8-2009 10831550@unknown@formal@none@1@S@Although these are produced from the same calculations as those of hypothesis tests or ''p''-values, they describe both the size of the effect and the uncertainty surrounding it.@@@@1@28@@danf@17-8-2009 10840010@unknown@formal@none@1@S@
Syntax
@@@@1@1@@danf@17-8-2009 10840020@unknown@formal@none@1@S@In [[linguistics]], '''syntax''' (from [[Ancient Greek]] {{lang|grc|συν-}} ''syn-'', "together", and {{lang|grc|τάξις}} ''táxis'', "arrangement") is the study of the principles and rules for constructing [[sentence]]s in [[natural language]]s.@@@@1@27@@danf@17-8-2009 10840030@unknown@formal@none@1@S@In addition to referring to the discipline, the term ''syntax'' is also used to refer directly to the rules and principles that govern the sentence structure of any individual language, as in "the [[Irish syntax|syntax of Modern Irish]]".@@@@1@38@@danf@17-8-2009 10840040@unknown@formal@none@1@S@Modern research in syntax attempts to [[descriptive linguistics|describe languages]] in terms of such rules.@@@@1@14@@danf@17-8-2009 10840050@unknown@formal@none@1@S@Many professionals in this discipline attempt to find [[Universal Grammar|general rules]] that apply to all natural languages.@@@@1@17@@danf@17-8-2009 10840060@unknown@formal@none@1@S@The term ''syntax'' is also sometimes used to refer to the rules governing the behavior of mathematical systems, such as [[logic]], artificial formal languages, and computer programming languages.@@@@1@28@@danf@17-8-2009 10840070@unknown@formal@none@1@S@== Early history ==@@@@1@4@@danf@17-8-2009 10840080@unknown@formal@none@1@S@Works on grammar were being written long before modern syntax came about; the ''Aṣṭādhyāyī'' of [[Pāṇini]] is often cited as an example of a pre-modern work that approaches the sophistication of a modern syntactic theory.@@@@1@35@@danf@17-8-2009 10840090@unknown@formal@none@1@S@In the West, the school of thought that came to be known as "traditional grammar" began with the work of [[Dionysius Thrax]].@@@@1@22@@danf@17-8-2009 10840100@unknown@formal@none@1@S@For centuries, work in syntax was dominated by a framework known as {{lang|fr|''grammaire générale''}}, first expounded in 1660 by [[Antoine Arnauld]] in a book of the same title.@@@@1@28@@danf@17-8-2009 10840110@unknown@formal@none@1@S@This system took as its basic premise the assumption that language is a direct reflection of thought processes and therefore there is a single, most natural way to express a thought.@@@@1@31@@danf@17-8-2009 10840120@unknown@formal@none@1@S@That way, coincidentally, was exactly the way it was expressed in French.@@@@1@12@@danf@17-8-2009 10840130@unknown@formal@none@1@S@However, in the 19th century, with the development of [[historical-comparative linguistics]], linguists began to realize the sheer diversity of human language, and to question fundamental assumptions about the relationship between language and logic.@@@@1@33@@danf@17-8-2009 10840140@unknown@formal@none@1@S@It became apparent that there was no such thing as a most natural way to express a thought, and therefore logic could no longer be relied upon as a basis for studying the structure of language.@@@@1@36@@danf@17-8-2009 10840150@unknown@formal@none@1@S@The Port-Royal grammar modeled the study of syntax upon that of logic (indeed, large parts of the [[Port-Royal Logic]] were copied or adapted from the ''Grammaire générale'').@@@@1@27@@danf@17-8-2009 10840160@unknown@formal@none@1@S@Syntactic categories were identified with logical ones, and all sentences were analyzed in terms of "Subject – Copula – Predicate".@@@@1@20@@danf@17-8-2009 10840170@unknown@formal@none@1@S@Initially, this view was adopted even by the early comparative linguists such as [[Franz Bopp]].@@@@1@15@@danf@17-8-2009 10840180@unknown@formal@none@1@S@The central role of syntax within theoretical linguistics became clear only in the 20th century, which could reasonably be called the "century of syntactic theory" as far as linguistics is concerned.@@@@1@31@@danf@17-8-2009 10840190@unknown@formal@none@1@S@For a detailed and critical survey of the history of syntax in the last two centuries, see the monumental work by Graffi (2001).@@@@1@23@@danf@17-8-2009 10840200@unknown@formal@none@1@S@==Modern theories==@@@@1@2@@danf@17-8-2009 10840210@unknown@formal@none@1@S@There are a number of theoretical approaches to the discipline of syntax.@@@@1@12@@danf@17-8-2009 10840220@unknown@formal@none@1@S@Many linguists (e.g. [[Noam Chomsky]]) see syntax as a branch of biology, since they conceive of syntax as the study of linguistic knowledge as embodied in the human [[mind]].@@@@1@29@@danf@17-8-2009 10840240@unknown@formal@none@1@S@Others (e.g. [[Gerald Gazdar]]) take a more [[Philosophy of mathematics#Platonism|Platonistic]] view, since they regard syntax to be the study of an abstract [[formal system]].@@@@1@24@@danf@17-8-2009 10840260@unknown@formal@none@1@S@Yet others (e.g. [[Joseph Greenberg]]) consider grammar a taxonomical device to reach broad generalizations across languages.@@@@1@16@@danf@17-8-2009 10840280@unknown@formal@none@1@S@Some of the major approaches to the discipline are listed below.@@@@1@11@@danf@17-8-2009 10840290@unknown@formal@none@1@S@===Generative grammar===@@@@1@2@@danf@17-8-2009 10840300@unknown@formal@none@1@S@The hypothesis of [[generative grammar]] is that language is a structure of the human mind.@@@@1@15@@danf@17-8-2009 10840310@unknown@formal@none@1@S@The goal of generative grammar is to make a complete model of this inner language (known as ''[[i-language]]'').@@@@1@18@@danf@17-8-2009 10840320@unknown@formal@none@1@S@This model could be used to describe all human language and to predict the [[grammaticality]] of any given utterance (that is, to predict whether the utterance would sound correct to native speakers of the language).@@@@1@35@@danf@17-8-2009 10840330@unknown@formal@none@1@S@This approach to language was pioneered by [[Noam Chomsky]].@@@@1@9@@danf@17-8-2009 10840340@unknown@formal@none@1@S@Most generative theories (although not all of them) assume that syntax is based upon the constituent structure of sentences.@@@@1@19@@danf@17-8-2009 10840350@unknown@formal@none@1@S@Generative grammars are among the theories that focus primarily on the form of a sentence, rather than its communicative function.@@@@1@20@@danf@17-8-2009 10840360@unknown@formal@none@1@S@Among the many generative theories of linguistics are:@@@@1@8@@danf@17-8-2009 10840370@unknown@formal@none@1@S@*[[Transformational Grammar]] (TG) (now largely out of date)@@@@1@8@@danf@17-8-2009 10840380@unknown@formal@none@1@S@*[[Government and binding theory]] (GB) (common in the late 1970s and 1980s)@@@@1@12@@danf@17-8-2009 10840390@unknown@formal@none@1@S@*[[Linguistic minimalism|Minimalism]] (MP) (the most recent Chomskyan version of generative grammar)@@@@1@11@@danf@17-8-2009 10840400@unknown@formal@none@1@S@Other theories that find their origin in the generative paradigm are:@@@@1@11@@danf@17-8-2009 10840410@unknown@formal@none@1@S@*[[Generative semantics]] (now largely out of date)@@@@1@7@@danf@17-8-2009 10840420@unknown@formal@none@1@S@*[[Relational grammar]] (RG) (now largely out of date)@@@@1@8@@danf@17-8-2009 10840430@unknown@formal@none@1@S@*[[Arc Pair grammar]]@@@@1@3@@danf@17-8-2009 10840440@unknown@formal@none@1@S@*[[Generalised phrase structure grammar|Generalized phrase structure grammar]] (GPSG; now largely out of date)@@@@1@13@@danf@17-8-2009 10840450@unknown@formal@none@1@S@*[[Head-driven phrase structure grammar]] (HPSG)@@@@1@5@@danf@17-8-2009 10840460@unknown@formal@none@1@S@*[[Lexical-functional grammar]] (LFG)@@@@1@3@@danf@17-8-2009 10840470@unknown@formal@none@1@S@===Categorial grammar ===@@@@1@3@@danf@17-8-2009 10840480@unknown@formal@none@1@S@[[Categorial grammar]] is an approach that attributes the syntactic structure not to rules of grammar, but to the properties of the [[syntactic categories]] themselves.@@@@1@24@@danf@17-8-2009 10840490@unknown@formal@none@1@S@For example, rather than asserting that sentences are constructed by a rule that combines a noun phrase (NP) and a verb phrase (VP) (e.g. the [[phrase structure rule]] S → NP VP), in categorial grammar, such principles are embedded in the category of the [[head (linguistics)|head]] word itself.@@@@1@48@@danf@17-8-2009 10840500@unknown@formal@none@1@S@So the syntactic category for an [[intransitive]] verb is a complex formula representing the fact that the verb acts as a [[functor]] which requires an NP as an input and produces a sentence level structure as an output.@@@@1@38@@danf@17-8-2009 10840510@unknown@formal@none@1@S@This complex category is notated as (NP\\S) instead of V.@@@@1@10@@danf@17-8-2009 10840515@unknown@formal@none@1@S@NP\\S is read as " a category that searches to the left (indicated by \\) for a NP (the element on the left) and outputs a sentence (the element on the right)".@@@@1@32@@danf@17-8-2009 10840520@unknown@formal@none@1@S@The category of [[transitive verb]] is defined as an element that requires two NPs (its subject and its direct object) to form a sentence.@@@@1@24@@danf@17-8-2009 10840530@unknown@formal@none@1@S@This is notated as (NP/(NP\\S)) which means "a category that searches to the right (indicated by /) for an NP (the object), and generates a function (equivalent to the VP) which is (NP\\S), which in turn represents a function that searches to the left for an NP and produces a sentence).@@@@1@51@@danf@17-8-2009 10840540@unknown@formal@none@1@S@[[Tree-adjoining grammar]] is a categorial grammar that adds in partial [[tree structure]]s to the categories.@@@@1@15@@danf@17-8-2009 10840550@unknown@formal@none@1@S@===Dependency grammar===@@@@1@2@@danf@17-8-2009 10840560@unknown@formal@none@1@S@[[Dependency grammar]] is a different type of approach in which structure is determined by the [[relation]]s (such as [[grammatical relation]]s) between a word (a ''[[head (linguistics)|head]]'') and its dependents, rather than being based in constituent structure.@@@@1@36@@danf@17-8-2009 10840570@unknown@formal@none@1@S@For example, syntactic structure is described in terms of whether a particular [[noun]] is the [[subject]] or [[agent]] of the [[verb]], rather than describing the relations in terms of trees (one version of which is the [[parse tree]]) or other structural system.@@@@1@42@@danf@17-8-2009 10840580@unknown@formal@none@1@S@Some dependency-based theories of syntax:@@@@1@5@@danf@17-8-2009 10840590@unknown@formal@none@1@S@*[[Algebraic syntax]]@@@@1@2@@danf@17-8-2009 10840600@unknown@formal@none@1@S@*[[Word grammar]]@@@@1@2@@danf@17-8-2009 10840610@unknown@formal@none@1@S@*[[Operator Grammar]]@@@@1@2@@danf@17-8-2009 10840620@unknown@formal@none@1@S@===Stochastic/probabilistic grammars/network theories ===@@@@1@4@@danf@17-8-2009 10840630@unknown@formal@none@1@S@Theoretical approaches to syntax that are based upon [[probability theory]] are known as [[stochastic grammar]]s.@@@@1@15@@danf@17-8-2009 10840640@unknown@formal@none@1@S@One common implementation of such an approach makes use of a [[neural network]] or [[connectionism]].@@@@1@15@@danf@17-8-2009 10840650@unknown@formal@none@1@S@Some theories based within this approach are:@@@@1@7@@danf@17-8-2009 10840660@unknown@formal@none@1@S@*[[Optimality theory]]@@@@1@2@@danf@17-8-2009 10840670@unknown@formal@none@1@S@*[[Stochastic context-free grammar]]@@@@1@3@@danf@17-8-2009 10840680@unknown@formal@none@1@S@===Functionalist grammars===@@@@1@2@@danf@17-8-2009 10840690@unknown@formal@none@1@S@Functionalist theories, although focused upon form, are driven by explanation based upon the function of a sentence (i.e. its communicative function).@@@@1@21@@danf@17-8-2009 10840700@unknown@formal@none@1@S@Some typical functionalist theories include:@@@@1@5@@danf@17-8-2009 10840710@unknown@formal@none@1@S@*[[Functional grammar]] (Dik)@@@@1@3@@danf@17-8-2009 10840720@unknown@formal@none@1@S@*[[Prague Linguistic Circle]]@@@@1@3@@danf@17-8-2009 10840730@unknown@formal@none@1@S@*[[Systemic functional grammar]]@@@@1@3@@danf@17-8-2009 10840740@unknown@formal@none@1@S@*[[Cognitive grammar]]@@@@1@2@@danf@17-8-2009 10840750@unknown@formal@none@1@S@*[[Construction grammar]] (CxG)@@@@1@3@@danf@17-8-2009 10840760@unknown@formal@none@1@S@*[[Role and reference grammar]] (RRG)@@@@1@5@@danf@17-8-2009 10850010@unknown@formal@none@1@S@
SYSTRAN
@@@@1@1@@danf@17-8-2009 10850020@unknown@formal@none@1@S@'''SYSTRAN''', founded by Dr. [[Peter Toma]] in [[1968]], is one of the oldest [[machine translation]] companies.@@@@1@16@@danf@17-8-2009 10850030@unknown@formal@none@1@S@SYSTRAN has done extensive work for the [[United States Department of Defense]] and the [[European Commission]].@@@@1@16@@danf@17-8-2009 10850040@unknown@formal@none@1@S@SYSTRAN provides the technology for [[Yahoo!]] and [[AltaVista]]'s ([[Babel Fish (website)|Babel Fish]]) among others, but use of it was ended (circa 2007) for all of the language combinations offered by [[Google]]'s [[List of Google products#anchor_language_tools|language tools]].@@@@1@36@@danf@17-8-2009 10850050@unknown@formal@none@1@S@Commercial versions of SYSTRAN operate with operating systems [[Microsoft Windows]] (including [[Windows Mobile]]), [[Linux]] and [[Solaris (operating system)|Solaris]].@@@@1@18@@danf@17-8-2009 10850060@unknown@formal@none@1@S@== History ==@@@@1@3@@danf@17-8-2009 10850070@unknown@formal@none@1@S@With its origin in the [[Georgetown-IBM experiment|Georgetown]] machine translation effort, SYSTRAN was one of the few machine translation systems to survive the major decrease of funding after the [[ALPAC|ALPAC Report]] of the mid-1960's.@@@@1@33@@danf@17-8-2009 10850080@unknown@formal@none@1@S@The company was established in [[La Jolla, San Diego, California|La Jolla]], [[California]] to work on translation of Russian to English text for the [[United States Air Force]] during the "[[Cold War]]".@@@@1@31@@danf@17-8-2009 10850090@unknown@formal@none@1@S@Large numbers of Russian scientific and technical documents were translated using SYSTRAN under the auspices of the USAF Foreign Technology Division (later the National Air and Space Intelligence Center) at [[Wright-Patterson Air Force Base]], Ohio.@@@@1@35@@danf@17-8-2009 10850100@unknown@formal@none@1@S@The quality of the translations, although only approximate, was usually adequate for understanding content.@@@@1@14@@danf@17-8-2009 10850110@unknown@formal@none@1@S@The company was sold during 1986 to the Gachot family, based in [[Paris]], [[France]], and is now traded publicly by the French stock exchange.@@@@1@24@@danf@17-8-2009 10850120@unknown@formal@none@1@S@It has a main office at the [[Grande Arche]] in [[La Defense]] and maintains a secondary office in [[La Jolla, San Diego, California]].@@@@1@23@@danf@17-8-2009 10850130@unknown@formal@none@1@S@== Languages ==@@@@1@3@@danf@17-8-2009 10850140@unknown@formal@none@1@S@Here is a list of the source and target languages SYSTRAN works with.@@@@1@13@@danf@17-8-2009 10850150@unknown@formal@none@1@S@Many of the pairs are to or from English or French.@@@@1@11@@danf@17-8-2009 10850160@unknown@formal@none@1@S@* Russian into English (1968)@@@@1@5@@danf@17-8-2009 10850170@unknown@formal@none@1@S@* English into Russian (1973) for the [[Apollo-Soyuz]] project@@@@1@9@@danf@17-8-2009 10850180@unknown@formal@none@1@S@* English source (1975) for the [[European Commission]]@@@@1@8@@danf@17-8-2009 10850190@unknown@formal@none@1@S@* Arabic@@@@1@2@@danf@17-8-2009 10850200@unknown@formal@none@1@S@* Chinese@@@@1@2@@danf@17-8-2009 10850210@unknown@formal@none@1@S@* Danish@@@@1@2@@danf@17-8-2009 10850220@unknown@formal@none@1@S@* Dutch@@@@1@2@@danf@17-8-2009 10850230@unknown@formal@none@1@S@* French@@@@1@2@@danf@17-8-2009 10850240@unknown@formal@none@1@S@* German@@@@1@2@@danf@17-8-2009 10850250@unknown@formal@none@1@S@* Greek@@@@1@2@@danf@17-8-2009 10850260@unknown@formal@none@1@S@* Hindi@@@@1@2@@danf@17-8-2009 10850270@unknown@formal@none@1@S@* Italian@@@@1@2@@danf@17-8-2009 10850280@unknown@formal@none@1@S@* Japanese@@@@1@2@@danf@17-8-2009 10850290@unknown@formal@none@1@S@* Korean@@@@1@2@@danf@17-8-2009 10850300@unknown@formal@none@1@S@* Norwegian@@@@1@2@@danf@17-8-2009 10850310@unknown@formal@none@1@S@* Serbo-Croatian@@@@1@2@@danf@17-8-2009 10850320@unknown@formal@none@1@S@* Spanish@@@@1@2@@danf@17-8-2009 10850330@unknown@formal@none@1@S@* Swedish@@@@1@2@@danf@17-8-2009 10850340@unknown@formal@none@1@S@* Persian@@@@1@2@@danf@17-8-2009 10850350@unknown@formal@none@1@S@* Polish@@@@1@2@@danf@17-8-2009 10850360@unknown@formal@none@1@S@* Portuguese@@@@1@2@@danf@17-8-2009 10850370@unknown@formal@none@1@S@* Ukrainian@@@@1@2@@danf@17-8-2009 10850380@unknown@formal@none@1@S@* Urdu@@@@1@2@@danf@17-8-2009 10860010@unknown@formal@none@1@S@
Text analytics
@@@@1@2@@danf@17-8-2009 10860020@unknown@formal@none@1@S@The term '''text analytics''' describes a set of linguistic, lexical, pattern recognition, extraction, tagging/structuring, visualization, and predictive techniques.@@@@1@18@@danf@17-8-2009 10860030@unknown@formal@none@1@S@The term also describes processes that apply these techniques, whether independently or in conjunction with query and analysis of fielded, numerical data, to solve business problems.@@@@1@26@@danf@17-8-2009 10860040@unknown@formal@none@1@S@These techniques and processes discover and present knowledge – facts, business rules, and relationships – that is otherwise locked in textual form, impenetrable to automated processing.@@@@1@26@@danf@17-8-2009 10860050@unknown@formal@none@1@S@A typical application is to scan a set of documents written in a [[natural language]] and either model the document set for predictive classification purposes or populate a database or search index with the information extracted.@@@@1@36@@danf@17-8-2009 10860060@unknown@formal@none@1@S@Current approaches to text analytics use [[natural language processing]] techniques that focus on specialized domains.@@@@1@15@@danf@17-8-2009 10860070@unknown@formal@none@1@S@Typical subtasks are:@@@@1@3@@danf@17-8-2009 10860080@unknown@formal@none@1@S@* [[Named Entity Recognition]]: recognition of entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions.@@@@1@22@@danf@17-8-2009 10860090@unknown@formal@none@1@S@* [[Coreference]]: identification chains of [[noun phrase]]s that refer to the same object.@@@@1@13@@danf@17-8-2009 10860100@unknown@formal@none@1@S@For example, [[Anaphora (linguistics)|anaphora]] is a type of coreference.@@@@1@9@@danf@17-8-2009 10860110@unknown@formal@none@1@S@* [[Relationship Extraction]]: extraction of named relationships between entities in text@@@@1@11@@danf@17-8-2009