<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1"/><title>Pronouncing Text by Analogy</title><author surname="Damper" givenname="Robert I."><org  name="PST Research Group"/></author><author surname="Eastmond" givenname="John EG."><org  name="University of Southampton" country="United Kingdom" city="Southampton"/></author></firstpageheader><frontmatter><p><b>Pronouncing Text by Analogy</b></p><p><b>Robert I. Damper and John EG. Eastmond</b></p><p>Image, Speech and Intelligent Systems (ISIS) Research Group, Department of Electronics and Computer Science, University of Southampton, Southampton SO 17 IB J, UK.</p><p>{rid| j e}@ecs.soton.ac.uk</p></frontmatter><abstract>Pronunciation-by-analogy (PbA) is an emer­ging technique for text-phoneme conversion based on a psychological model of read­ing aloud. This paper explores the impact of certain basic implemcntational choices on the performance of various PbA mod­els. These have been tested on their abil­ity to pronounce sets of short pseudowords previously used in similar studies, as well as lexical words temporarily removed from the dictionary. Best results of 85.7% and 67.9% words correct are obtained for the pseudo-words and lexical words respectively, cast­ing doubt on certain previous-reported per­formance figures in the literature. </abstract></header><body><section number="1" title="Introduction"><p>Pronunciation-by-analogy (PbA) is an influential psy­chological model of the process of reading aloud. In PbA, most words arc pronounced by retrieving their phonemic form from the readers's lexicon, or diction­ary. The pronunciation for a 'novel' word not in the lexicon, however, is derived not by the application of abstract letter-to-sound rules but is 'assembled' from the (known) pronunciations of words that it resembles. PbA has obvious application to tcxt-to-speech conver­sion by machine.</p><p>Although PbA programs have been presented in the literature, they are they are few in number. Ded-ina and Nusbaum (1991) describe PRONOUNCE: a rather simple system for English. Sullivan and Damper (1990; 1992; 1993) describe a considerably more complex and developed system, but which apparently yields a much poorer performance.</p><p>As a psychological theory, PbA is under-specified: offering little meaningful guidance on the implement­ation choices which confront the programmer. Indeed, Sullivan and Damper (1993) show that such choices can have a profound impact on performance. In this paper, we seek to understand how Dcdina and Nus­baum 's largely unjustified implementational choices affected their results and, thereby, to resolve the con­flict between their performance claims and Sullivan and Damper's.</p></section><section number="2" title="Psychological Background"><p>In the standard <i>dual-route </i>model of reading aloud (Coltheart, 1978), there is a lexical route for the pro­nunciation of known words and a parallel route util­ising abstract letter-to-sound rules for the pronunci­ation of unknown ('novel') words. Arguments for dual-route theory cite the ability to pronounce pseudo-words (non-words conforming to the spelling patterns of English), latency difference effects between regular and exception words, and apparent double dissociation between the two routes in dyslexia (see Humphreys and Evett, 1985). However, all these observations can arguably be explained by a single route. One pervasive idea is that pseudowords are pronounced by analogy with lexical words that they resemble (Baron, 1977; Brooks, 1977; Glushko, 1979; 1981; Brown and Be-sner, 1987). Glushko, for instance, showed that "ex­ception pseudowords" like <i>tave </i>take longer to read than "regular pseudowords" such as <i>taze. </i>Here, <i>taze </i>is considered as a "regular pseudoword" since all its orthographic 'neighbours' <i>(raze, gaze, maze </i>etc.) have the regular vowel pronunciation /el/. By contrast, <i>tave </i>is considered to be an "exception pseudoword" since it has the exception word <i>(havejhav/) </i>as an orthographic neighbour. Thus, according to Glushko (1979), the "assignment of phonology to non-words is open to lex­ical influence". This is at variance with the notion of two independent routes to pronunciation. Instead:</p><p>"it appears that words and pseudowords are pronounced using similar kinds of or­thographic and phonological knowledge: the pronunciation of words that share or­thographic features with them, and spe­cific spelling-to-sound rules for multiletter spelling patterns."<page local="2"/></p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">268</doubt><p>There are two forms of PbA: <i>explicit </i>analogy (Baron, 1977) is a conscious strategy of recalling a similar word and modifying its pronunciation, whereas in <i>implicit </i>analogy (Brooks, 1977) a pronunciation is derived from generalised phonographic knowledge about existing words. The latter has obvious com­monalities with most single-route, connectionist mod­els (e.g. Sejnowski and Rosenberg, 19X7) in which the generalised knowledge is learned (e.g. by back-propagation) as a set of weights, and the network has no holistic notion of the concept 'word'.</p><p>Until the recent advent of computational PbA mod­els, analogy 'theory' could only be considered seri­ously underspeeilied. Clearly, its operation must de­pend critically on sonic measure of similarity, and "without a metric for similarity and without a specific­ation of how similar is similar enough, the concept of analogy by similarity offers little insight" (Glushko, 1981, p. 72). Further, as detailed by Brown and Be-sner (1987), the operation of lexical analogy must be constrained by factors such as:</p><p>• the size of the segment shared between novel and lexical word;</p><p>• its position in the two strings;</p><p>• its frequency of occurrence in the language;</p><p>• and the frequency of occurrence of the words con­taining it;</p><p>none of which had then received serious consideration. Accordingly, they write: "Extant analogy models arc not capable of predicting the outcome of assembly op­erations for all possible strings."</p><p>In particular, the 'theory' gives no principled way of deciding the orthographic neighbours of a novel word which are deemed to influence its pronunciation whereas a computational model must (specifically or otherwise) do so.</p></section><section number="3" title="Existing PbA Programs"><subsection number="3.1" title="Dcdina and Nusbaum's System"><p>The overall structure of PRONOUNCE is as shown in Fig. 1. The lexical database consists of "approx­imately 20,000 words based on <i>Webster's Pocket Dic­tionary" </i>in which text and phonemes have been auto­matically aligned. Dcdina and Nusbaum acknowledge the crude nature of their alignment procedure, saying it "was carried out by a simple Lisp program that only uses knowledge about which phonemes are consonants and which arc vowels."</p><p>An input siring is matched in turn against all ortho­graphic entries in the lexicon. The process starts with the input string and the current dictionary entry left-aligned. Information about matching letter substrings</p><p>INPUT (spelling pattern)</p><p>Substring matching alignment</p><p>Build pronunciation lattice</p><p>Decision function</p><p>OUTPUT (pronunciation)</p><figure caption="Figure 1: Dcdina and Nusbaum's PRONOUNCE."></figure><p>- and their corresponding phoneme substrings in the dictionary entry under consideration - is entered into a <i>pronunciation lattice </i>as detailed below. The shorter of the two strings is then shifted right by one letter and the process repeated. This continues until the two are right-aligned, i.e. the number of right shifts is equal to the difference in length between the two strings. The process is repeated for all words in the dictionary.</p><p>A node of the lattice represents a matched letter, L, , at some position, <i>i,</i><i> </i>in the input, as illustrated in Fig. 2. The node is labelled with its position index <i>i </i>and with the phoneme which corresponds to <i>L, </i>in the matched substring, <i>1),,, </i>say, for the mth matched sub­string. An arc is placed from node <i>i </i>to node <i>j </i>if there is a matched substring starting with L, and ending with <i>Lr</i><i> </i>The arc is labelled with the phonemes in­termediate between /',,„ and <i>Pj,„ </i>in the phoneme part of the matched substring. Note that the empty string labels arcs corresponding to bigrams: the two symbols of the bigram label the nodes at cither end. Addition­ally, arcs arc labelled with a "frequency" count which is incremented by one each time that substring (with that pronunciation) is matched during the pass through the lexicon. Finally, there is a <i>Start </i>node at position 0 and an <i>End </i>node at position one greater than the length of the input string.</p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">269</doubt><page local="3"/><figure caption="Figure 2: Partial pronunciation lattice for the pseudowordshead."></figure><p>A possible pronunciation for the input corresponds to a complete path through its lattice from <i>Start </i>to <i>End, </i>with the output string assembled by concatenating in order the phoneme labels on the nodes/arcs. The set of candidate pronunciations is then passed to the decision function. Two (prioritised) heuristics are used to rank the pronunciations, and the top-ranking candidate se­lected as the output. The first is based on path length. If one candidate corresponds to a unique shortest path (in terms of number of arcs) through the lattice, this is selected as the output. Otherwise, candidates that tie are ranked on the sum of their arc "frequencies".</p><p>Dedina and Nusbaum tested PRONOUNCE on 70 of Glushko's (1979) pseudowords, which "were four or five characters long and were derived from mono­syllabic words by changing one letter". Seven subjects with phonetics training were asked to read these and give a transcription for the first pronunciation which came to mind. A 'correct' pronunciation for a given pseudoword was considered to be one produced by any of the subjects. A word error rate of 9% is reported.</p></subsection><subsection number="3.2" title="Sullivan and Damper's System"><p>Sullivan and Damper employ a more principled align­ment procedure based on the Lawrence and Kaye (1986) algorithm. By pre-computing mappings and their statistics, they implemented a considerably more 'implicit' form of PbA: there is no explicit matching of thé input string with lexical entries. Their pronun­ciation lattice differs, with nodes representing <i>junc­tures </i>between symbols and arcs representing letter-phoneme mappings. They also examine different ways of numerically ranking candidates, taking into account probabilities estimates for the letter-phoneme map­pings used in the assembled pronunciation.</p><p>Given the improved alignment and candidate-ranking methods, better performance than Dedina and Nusbaum might be expected. On the contrary, Sullivan and Damper's best result on the full set of 131 pseudo-words from Glushko (1979) (plus another 5 words -see section 5.1) is only 70.6% (1993, p. 449). This is an error rate of almost 30%, as compared to Dcdina and Nusbaum's 9% on the smaller test set of size 70. Differences in test-set size and between British and American English, the transcription standards of the phoneticians, and the lexicons employed seem insuffi­cient to explain this.</p></subsection></section><section number="4" title="Re-Implementing PRONOUNCE"><p>Our purpose was to re-implement PRONOUNCE, as­sess its performance, and study the impact of vari­ous implementational choices on this performance. However, the described alignment algorithm is prob­lematic (see pp. 71-73 of Sullivan, 1992) and needs to be replaced. Rather than re-implement a flawed al­gorithm, we have used manually-aligned data. Since manual alignment generally produces a better result than automatic alignment, we ought to produce an <i>even </i>lower error rate than Dedina and Nusbaum's claimed 9%.</p><p>The performance on lexical words (temporarily re­moved from the lexicon) has not previously been as­sessed but seems worthwhile. Arguably, 'real' words form a much more sensible test set for a PbA system than pseudowords, not least because they are multi­syllabic. Temporary removal from the lexicon means that the pronunciation must be assembled by the ana­logy process rather that merely retrieved in its entirety. Hence, we believe it is sensible and important to test any PbA system in this way.</p><subsection number="4.1" title="Lexical Databases"><p>To examine any impact that the specific lexical data­base might have on performance, we have used two in this work: the 20,009 words of <i>Webster's Pocket Dictionary </i>and the 16,280 words of the <i>Teacher's Word Book </i>(TWB) (Thorndike and Lorge, 1944). In both cases, letters and phonemes have previously been hand-aligned for the purposes of training back-propagation networks. The Webster's database is that used by Sejnowski and Rosenberg (1987) to train and test NETtalk. The TWB database is that used by Mc-Culloch, Bedworth and Bridle (1987) for NETspeak.</p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">270</doubt><page local="4"/><p>The phoneme inventory is of size 52 in both eases, including the null phoneme but excluding stress sym­bols. We leave the very important problem of stress assignment for later study.</p></subsection><subsection number="4.2" title="Re-Implementation Details"><p>The re-implementation was programmed in C on a Hewlett-Packard 712/80 workstation running HP-UX. A 'direct' version scores candidates using Dedina and Nusbaum's method with its two prioritised heurist­ics: we call this model D&amp;N. Two other methods for scoring have also been implemented. In one, we re­place the second (maximum sum) heuristic with the maximum product of the arc frequencies: we call this model PROD. (It still selects primarily on the basis of shortest path length.) We have also implemented a version which uses a single heuristic. This takes the product along each possible path from <i>Start </i>to <i>End </i>of the <i>mapping probabilities </i>for that arc. These arc computed using Method 1 <i>{a</i><i> priori </i>version) of Sulli­van and Damper (1993, pp. 446-447). For all paths corresponding to the <i>same </i>pronunciation, these values arc summed to give an overall score for that pronun­ciation. Wc call this the MP model. The final product score is <i>not </i>a proper probability for the assembled pro­nunciation, since the scores do not sum to one over all the candidates.</p><p>The 'best' pronunciation is found by depth-first search of the lattice, implemented as a prcorder tree traversal. For the D&amp;N and PROD models, paths were pruned when their length exceeded the shortest found so far for that input, leading to a useful reduction in run times. A similarly motivated pruning was carried out for the MP model. If any product fell below a threshold during traversal, its corresponding path was discarded. The threshold used was <i>e </i>times the maximum product score found so far, with <i>e </i>set by at 10~<footnote anchor="3"/>. While this may have led to the pruning of a path contributing to the 'best' pronunciation, its contribution would be very small. Again, this gave a very significant improvement in run times for the testing of lexical words (section 5.2 below) but was unnecessary for the testing of pseudo-words.</p></subsection></section><section number="5" title="Results"><subsection number="5.1" title="Pseudowords"><p>Pronunciations have been obtained for:</p><p>• the 70 pseudowords from Glushko ( 1979) used by Dedina and Nusbaum to lest PRONOUNCE. The 'correct' pronunciation for these strings is taken to be that given by Dedina and Nusbaum (1991, pp. 61-62). We refer to this test set as D&amp;N 70.</p><p>• the full set of 131 pseudowords from Glushko plus two others <i>(goot, pome) </i>plus two lexical</p><p>words <i>(cat </i>and <i>play) </i>plus the pseudohomophone <i>kwik, </i>as used by Sullivan (1992). The 'correct' pronunciations are those read aloud by Sullivan's 20 non-phonetician subjects, and transcribed by him as British Received Pronunciation. We refer to this test set as Sulll36. Our expectation is that the error rate will be relatively high for this test set, partly because of its larger size but more importantly because the subjects' dialect of Eng­lish is British RP rather that general American, i.e. there is a very significant inconsistency with the lexical databases.</p><p>The output has been scored on words correct and also on symbol score (i.e. phonemes correct) using the Lcvenshtein (1966) string-edit distance as shown in Table 1.</p><p>Our best comparison with Dedina and Nusbaum (D&amp;N 70 test set, D&amp;N model, Webster's database) gives a figure of 77.1% words correct. This is enorm­ously poorer than their approximately 91 % words cor­rect - yet the implementation, reference pronunci­ations and test set are (as far as we can tell) identical. The only relevant difference is that the Webster's data­base is automatically-aligned in their work and hand-aligned in ours. The clear expectation, given the crude nature of their alignment, is that they should have ex­perienced a <i>higher </i>error rate, not a dramatically lower one. Overall, this result accords far more closely with Sullivan and Damper (1993) whose best word score for automatic alignment (and using smaller databases but a larger test set) was just over 70%.</p><p>The re-implementation made 16 errors under the above conditions. Dcdina and Nusbaum's claim of 9% words correct amounts to just 6 errors, 3 of which arc the same as ours. The commonest problem is vowel substitution. It is possible to discount a very few errors as essentially trivial, reducing the error rate marginally to some 20%. We conclude, therefore, that Dedina and Nusbaum's reported error rate of 9% is unattainable.</p><p>In our opinion, a major deficiency of the simple shortest-path length heuristic is that the output can be­come unreasonably sensitive to rare or unique pronun­ciations. For instance, <i>mone </i>receives the strange pro­nunciation /mgni/ by analogy with <i>anemone. </i>Also, the pseudoword <i>shead </i>receives the bizarre, vowel-less pronunciation /J___<i>AI </i>(where '_' denotes the null phoneme) when using the D&amp;N model and the TWB database. As illustrated in Fig. 2 earlier, this turns out to be a result of matching the unique but long (arc frequency 1) in conjunction with the very com­mon mapping <i>sh </i>—&gt;■ /]'_/ as in <i>she </i>and <i>shed </i>(arc fre­quency 174) which swamps the overall score of 175. The same bizarre pronunciation does not occur with the PROD model. In this case, the path through the<page local="5"/></p><doubt alpha="60.9" length="46" tooSmall="False" monospace="0.0">mappinghead-»/___d/ asin forehead~&gt; /for____d/</doubt><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">271</doubt><p>Table 1 : Results for PbA of pseudowords with both dictionaries. See text for further specification.</p><p><i>(lei, </i>3) node has a product score of 12 x 30 = 360 for the pronunciation /Jed/ which considerably exceeds the score of 174 for/fd/.</p><p>Replacing the arc-sum heuristic of the D&amp;N model by arc-product as in the PROD model leads to a considerable increase in performance, e.g. from 77.1 % words correct to 82.9% for the D&amp;N 70 test set with Webster's database. In turn, the MP model per­forms better than PROD in all cases.</p><p>For the Sull 136 test set, our expectation of poorer performance (because of the larger test set and incon­sistency between of dialect between the target pro­nunciations and the lexical databases) is borne out for Webster's dictionary. For TWB, however, the perform­ance difference between test sets is less consistent.</p></subsection><subsection number="5.2" title="Lexical Words"><p>The primary ability of a text-to-spcech system must be to produce correct pronunciations for lexical words (rather than pseudowords) which just happen to be ab­sent from the system's dictionary. Accordingly, we have tested the PbA implementations by removing each word in turn from its relevant database, and ob­taining a pronunciation by analogy with the remainder. In these tests, the transcription standard employed by the compilers of the dictionary becomes its own ref­erence and problems of transcription inconsistencies between input strings and lexical entries are avoided.</p><p>Results for the testing of lexical words are shown in Table 2. Again there are consistent performance differences with the 'standard' D&amp;N model worst and the mapping probability (MP) model best. All models perform better with the TWB database than with Web­ster's, probably simply because of its smaller size.</p><p>For some lexical words, no pronunciation at all was produced because there was no complete path from <i>Start </i>to <i>End </i>in the lattice. This occurred for 92 of the TWB words and 117 of the Webster's words irre­spective of the scoring model. This is a serious short­coming: a PbA system should always produce a best-attempt pronunciation, even if it cannot produce the correct one. Sometimes, this failure is a consequence of the form of pronunciation lattice in which nodes are used to represent the 'end-points' of mappings. One of the inputs for which no pronunciation was found is <i>anecdote, </i>whose (partial) lattice is shown in Fig. 3. There is in fact no arc in the complete lattice between nodes <i>(Ikl, </i>4) and <i>(Id/,</i><i> </i>5) because there is no <i>cd -&gt; </i>/kd/ mapping anywhere in either dictionary. Nor is there an <i>ecd </i>or <i>cdo </i>trigram - with or without the right end-point phonemes - which could possibly bridge the gap. This problem is entirely avoided with the Sullivan and Damper style of lattice, because the shortest-length arc corresponds to a single-symbol mapping rather than to a bigram (which may be unique). Thus, there will always be a 'default' single-symbol mapping corres­ponding to the commonest pronunciation of the let­ter. This is not to say that Sullivan and Damper's sys­tem will necessarily produce the correct output here: it almost certainly will not because of the rarity of the <i>c -&gt; Ikl </i>mapping in the <i>-d</i><i> </i>context.</p><p>Another input which foils to produce a pronunci­ation is <i>aardvark. </i>The problem here is not that there is no <i>aa </i>bigram in the dictionary (which is found in words such as <i>bazaar), </i>but that it only appears to­wards the end of other words. Dedina and Nusbaum's strategy of performing substring matching only over a restricted range (the number of matching comparisons is equal to the difference in length between the input string and lexical entry) is at the root of this problem.</p></subsection></section><section number="6" title="Conclusions and Discussion"><p>We find that Dedina and Nusbaum's reported er­ror rate of 9% cannot be reproduced: our figure is about two or three times that. Because of the short­comings which emerge in this work, we believe the problem lies with PRONOUNCE rather than our re-implementation. Overall, our results are in much closer agreement with Sullivan and Damper's word er­ror rates of almost 30% on a similar test set.</p><p>This work suggests several useful ways in which the performance of PbA systems might be improved. Out-best results are obtained with a scoring method based on <i>a priori </i>mapping probabilities. According to Sul-<page local="6"/></p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">272</doubt><table class="main" frame="box" rules="all" border="1" regular="False"><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Test set</p></td><td class="cell"><p>Implementation</p></td><td class="cell"><p>Webster's (%)</p></td><td class="cell"><p>TWB (%)</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>words</p></td><td class="cell"><p>phonemes</p></td><td class="cell"><p>words</p></td><td class="cell"><p>phonemes</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>D&amp;N 70</p></td><td class="cell"><p>D&amp;N</p></td><td class="cell"><p>77.1</p></td><td class="cell"><p>94.3</p></td><td class="cell"><p>70.0</p></td><td class="cell"><p>92.6</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>PROD</p></td><td class="cell"><p>82.9</p></td><td class="cell"><p>95.9</p></td><td class="cell"><p>78.6</p></td><td class="cell"><p>94.9</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>MP</p></td><td class="cell"><p>85.7</p></td><td class="cell"><p>96.6</p></td><td class="cell"><p>80.0</p></td><td class="cell"><p>95.3</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Sull 136</p></td><td class="cell"><p>D&amp;N</p></td><td class="cell"><p>75.0</p></td><td class="cell"><p>93.6</p></td><td class="cell"><p>72.1</p></td><td class="cell"><p>93.1</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>PROD</p></td><td class="cell"><p>80.1</p></td><td class="cell"><p>95.0</p></td><td class="cell"><p>76.5</p></td><td class="cell"><p>94.5</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>MP</p></td><td class="cell"><p>83.8</p></td><td class="cell"><p>95.9</p></td><td class="cell"><p>81.6</p></td><td class="cell"><p>95.7</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr></table><p>Taille 2: Results l'or PbA of dictionary words.</p><p>Figure 3: Simplified pronunciation lattice for the lexical word <i>anecdote </i>which tails to produce any pronunciation.</p><p>livan and Damper (1993), <i>a posteriori </i>mapping prob­abilities may do even better. Also, the type of pronun­ciation lattice used by Sullivan and Damper, in which nodes correspond to the junctures between symbols, is likely to be superior. The impact of different align­ment strategies should repay study. Finally, we intend to assess the impact of incorporating information about word frequency in the analogy process.</p><p><b>Acknowledgement</b></p><p>This work was funded by the UK Economic and So­cial Research Council via research grant R000235487: "Speech Synthesis by Analogy".</p><table class="main" frame="box" rules="all" border="1" regular="False"><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Implementation</p></td><td class="cell"><p>Webster's (%)</p></td><td class="cell"><p>TWB (%)</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>words</p></td><td class="cell"><p>phonemes</p></td><td class="cell"><p>words</p></td><td class="cell"><p>phonemes</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>D&amp;N</p></td><td class="cell"><p>57.8</p></td><td class="cell"><p>90.4</p></td><td class="cell"><p>65.6</p></td><td class="cell"><p>93.1</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>PROD</p></td><td class="cell"><p>58.5</p></td><td class="cell"><p>90.7</p></td><td class="cell"><p>66.1</p></td><td class="cell"><p>93.2</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>MP</p></td><td class="cell"><p>60.7</p></td><td class="cell"><p>91.2</p></td><td class="cell"><p>67.9</p></td><td class="cell"><p>93.5</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr></table></section><references><p>Karon, J. (1977). Mechanisms for pronouncing printed words: use and acquisition. In <i>Basic Processes in Reading: Perception and Comprehension </i>(D. LaBerge and S. Samuels, eds.), pp. 175-216. Lawrence lirlbaum, Hillsdale, NJ.</p><p>Brooks, L. (1977). Non-analytic correspondences and pal-tern in word pronunciation. In <i>Attention and Perform­ance VII </i>(J. Rcnquin, ed.), pp. 163-177. Lawrence lirlbaum, Hillsdale, NJ.</p><p>Brown, P. and Besner, D. (1987). The assembly of phon­ology in oral reading: a new model. In <i>Attention and Per­formance XII: the Psychology of Reading </i>(M. Colthcarl, ed.), pp. 471-489. Lawrence Lirlbaum, London.</p><p>Colthcait, <b>M. </b>(1978). Lexical access in simple reading tasks. In <i>Strategies of Information Processing </i>(G. Under­wood, ed.), pp. 151-216. Academic, London.</p><p>Dcdina, M.J. and Nusbaum, U.C. (1991). PRONOUNCH: a program for pronunciation by analogy. <i>Computer Speech and language, </i><i>S,</i><i> </i>55-64.</p><p>Glushko, R.J. (1979). The organization and activation of or­thographic knowledge in reading aloud. <i>Journal of Expert-menial Psychology: Human Perception and Performance, </i>5, 674-691.'</p><p>Glushko, R.J. (1981). Principles for pronouncing print: the psychology of phonography. In <i>Interactive Processes in Reading </i>(A.M. Lesgold and CA. Perfetti, eds.), pp. 61-84. Lawrence Erlbaum, Hillsdale, NJ.</p><p>Humphreys, G.W. and Evctt, L.J. (1985). Arc there inde­pendent lexical and nonlexical routes in word processing? An evaluation of the dual-route theory of reading. <i>Behavi­oral and Brain Sciences, </i><b>8, </b>689-740.</p><p>Lawrence, S.G.C. and Kaye, G (1986). Alignment of phonemes with their corresponding orthography. <i>Computer Speech and Language, </i><b>1, </b>153-165.</p><p>Levcnshlcin, V.l. ( 1966). Binary codes capable of correcting deletions, insertions and reversals. <i>Cybernetics and Control Theory, </i><b>10, </b>707-710.</p><p>McCulloch, N., Bedworth, M. and Bridle, J.S. (1987). NET-speak - a re-implementation of NETtalk. <i>Computer Speech and Language, </i>2, 289-301.</p><p>Sejnowski, T.J. and Rosenberg, CR. (1987). Parallel net­works that learn to pronounce English text. <i>Complex Sys­tems, </i><b>1, </b>145-152.</p><p>Sullivan, K.P.H. (1992). <i>Synthesis-hy-Analogy: a Psycho­logically Motivated Approach to Text-to-Speech Conversion, </i>PhD Thesis, Department of Electronics and Computer Sci­ence, University of Southampton, UK.</p><p>Sullivan, K.P.H. and Damper, R.I. (1990). A psychologic­ally governed approach to novel-word pronunciation within a tcxt-to-speech system. <i>Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '9U), Vol. </i><i>I,</i><i> </i>Albuquerque, NM, pp. 341-344.</p><p>Sullivan, K.P.H. and Damper, R.I. (1992). Novel-word pro­nunciation within a text-to-speech system. In <i>Talking Ma­chines: Theories, Models and Applications </i>(G. Bailly and C Benoît, eds.), pp. 183-195. Elsevier (North-Holland), Ams­terdam.</p><p>Sullivan, K.P.H. and Damper, R.I. (1993). Novel-word pro­nunciation: a cross-language study. <i>Speech Communication, </i><b>13, </b>441- 452.</p><p>Thorndikc, E.L. and Lotge, 1. (1944). <i>The Teachers' Word Book of 30,000 Words. </i>Teachers' College, Columbia Uni­versity, NY.</p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">273</doubt></references></body></article>