<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1" global="162"/><title>Reading Distinction in MT</title><author surname="Hacken" givenname="Pius ten"><org  name="University of Utah" country="USA" city="Salt Lake City"/></author></firstpageheader><frontmatter><p>Reading Distinction in MT*</p><p>Pius ten Hacken Eurotra</p><p>Onderzoeksinstituut voor Taal en Spraak University of Utrecht Trans 10, 3512 JK Utrecht email: tenhacken@hutruu59.bitnet</p></frontmatter><abstract>In any system for Natural Language Process­ing having a dictionary, the question arises as to which entries are included in it. In this pa­per, I address the subquestion as to whether a lexical unit having two senses should be con­sidered ambiguous or vague with respect to them. The inadequacy of some common strate­gies to answer this question in Machine Trans­lation (MT) systems is shown. From a seman­tic conjecture, tests are developed that are ar­gued to give more consistent and theoretically well-founded results. </abstract></header><body><section number="1" title="Introduction"><p>In any system for Natural Language Processing having a dictionary, the question arises which entries are included in it. In this paper, I will assume the environment of a multilingual MT system, based on a linguistic analysis and trans­fer architecture, from which I will derive some argumentation.</p><p>The question which entries are included in the dictionary should be answered in two parts. First there is a mapping from graphic words to lexical units (lu's), then a mapping from lu's to readings, each of which is represented in an en­try. The former mapping represents a certain level of analysis of the graphic word. It ab­stracts away from inflection and spelling vari­ation, and, depending on the system's analysis component, may do so as well for productive</p><p>*I would like to thank my colleagues at the university and in Eurotra, especially Louis des Tombe and Henk Verkuyl, for their helpful comments.</p><p>derivation and compounding, and multi-word units. In this paper I will concentrate on the latter mapping, reading distinction, in a way that does not appeal to a particular choice on the relation between lu and graphic word.</p><p>A consistent approach to reading distinction is necessary, because inconsistencies in read­ing distinction in an MT system will compli­cate transfer components between a pair of lan­guages, and jeopardize extensibility of the sys­tem. A correct solution will save time in devel­opment and improve performance. The central question in this area can be formulated as in</p><doubt alpha="0.0" length="4" tooSmall="False" monospace="0.0">(!)•</doubt><p>(1) Given an lu <i>X</i><i> </i>and two of its senses <i>S\</i><i> </i>and <b>^2, </b>is <i>X </i>ambiguous or vague with respect to <i>Si </i>and <i>S2 ?</i></p><p>In (1) a sense of an lu is the meaning the lu has in a certain set of contexts. If the lu is vague, both senses are covered by the same reading. If it is ambiguous, <i>S\ </i>and 52 are examples of different readings of the lu, each reading being represented in a single entry.</p></section><section number="2" title="Some common methods"><p>Since every reading distinction creates lexical ambiguity that has to be solved, it seems at­tractive to use the features expressing the rele­vant information as criterion for answering (1): An lu is ambiguous between <i>Si </i>and <i>S2 </i>iff there is a feature describing the difference.</p><p>If we only take morphological and syntactic features, many intuitively clear cases of ambi­guity (e.g. <i>bank </i>as financial institution vs. as river verge) cannot be expressed.<page local="2" global="163"/> This will lead to problems in transfer or in generation. On the other hand, these features will cause un­wanted distinctions as well, e.g. French <i>fonc­tionnaire </i>(civil servant) with gender masculine or feminine, and <i>kneel </i>with past tense <i>kneeled </i>or <i>hielt. </i>It makes no sense to try to disam­biguate these.</p><p>The use of semantic features to define am­biguity should be rejected for similar reasons. First, we have to determine a fixed set of fea­tures a priori, since otherwise no answers to questions of reading distinction evolve. This imposes an artificial upper bound on reading distinction. Moreover, the availability of a cer­tain feature does not mean that it has to be assigned in all cases. We will certainly have a feature expressing the male/female contrast, but. it is not desirable to create two readings of <i>parent </i>accordingly, leading to a translation of <i>parents </i>into something meaning <i>mother(s) and/or jather(s).</i></p><p>Alternatively, we could argue that, since translation is the goal, it should be the cri­térium for reading distinction as well, answer­ing (1): An lu having two senses is ambigu­ous iff there are different translations for the two senses. Leaving aside the non -trivial prob­lem of determining whether there are differ­ent translations, we have to admit that there are cases of exceptional distinct ions in one lan­guage, e.g. <i>fleuve </i>vs. <i>riviere </i>in French, mean­ing river ending in a sea or in another river re­spectively. These distinctions will influence all other dictionaries in the system, in the sense that e.g. English <i>river </i>and Dutch <i>rivier </i>be­come ambiguous, and there are two transla­tion rules between them. If we restrict our attention to a limited group of languages, e.g. the languages in the system, the system be­comes difficult to extend, since adding a. new language from outside this group will affect all existing dictionaries. Otherwise there is a con­ceptual problem, since it will never be possible to decide that an lu is vague, unless we know all languages of the world. Instea.d, cases of exceptional distinctions, <i>bilingual ambiguities, </i>are best handled in transfer between the two languages, because they really are translation problems.</p><p>Summarizing, taking the means (features) or the goal (translation) as a critérium for read­ing distinction results in decisions that cause various practical problems and are intuitively incorrect. Furthermore these strategies detach the notion of <i>reading </i>from <i>meaning, </i>which is theoretically undesirable.</p><p>Taking only intuitions as our guide will link reading to meaning, but if even trained intu­itions of lexicographers do not prevent incon­sistencies, as can be seen in many published dictionaries, there is not much hope of reach­ing consistency, unless we manage to find some support for the intuitions.</p></section><section number="3" title="A semantic method"><p>The tests I will propose here to decide on read­ing distinction are based on monolingual mean­ing, and yield a substantially greater degree of consistency than direct, unaided intuitions. It is based on the following conjecture.</p><p>(2) There is a set of processes P, that , given a single occurrence of an lu, can stretch the actual meaning of the lu in the con­text to the boundaries of the reading the lu has, but not beyond.</p><p>In order to be able to check the results, we will first consider some tests where spe­cific processes in P are used, as applied to some intuitively clear cases of ambiguity (e.g. <i>bank </i>as financial institution vs. river verge) and vagueness (e.g. <i>elephant </i>as Indian elephant vs. African elephant). Then the scope of these tests will be expanded to other cases.</p><p>A well-known test evolving from (2) is based on conjunction. Lakoff (1970) proposed a test where anaphoric <i>so </i>in the second clause of a conjunction refers back to an antecedent con­taining the lu for which the question of reading distinction arises, as in (3).</p><p>(3) a. John went to a bank this morning,</p><p>and so did Mary, b. John saw an elephant, and so did Mary.</p><p>The question to be asked in this case is whether the sentence is semantically normal when the anaphor is interpreted in the other sense than its antecedent. Clearly, (3a) is strange in this interpretation, whereas (3b) is normal, con­firming that <i>bank </i>but not <i>elephant </i>is ambigu­ous in the relevant way (cf. Cruse (1986) on the use of semantic normality judgements).<page local="3" global="164"/> Other anaphors, e.g. <i>one, there </i>can be used as well. The answers are more reliable in case of an an­tecedent containing less lexical material out­side the lu in question. In (3a), the antecedent of <i>so </i>is <i>go to a bank, </i>and ambiguity might be claimed to arise from the verb. Using <i>one </i>in­stead of <i>so </i>takes away this possibility, which is especially relevant in less clear cases.</p><p>Other processes use quantifiers. One, based on Wiggins (1971), uses universal quantifica­tion. It is exemplified in (4).</p><p>(4) a. All banks in this town are safe, b. All elephajits in this zoo are old.</p><p>The question to be asked here is whether <i>all X </i>can be interpreted as <i>all Si </i>or <i>all S2, </i>or only as <i>all </i><i>S\</i><i> and all S2- </i>Whereas (4a) can mean either that there is no danger of flooding or that bank-robbers are effectively discouraged, and it is odd when used to mean both, (4b) can only be used to predicate over both African and Indian elephants in the zoo that they are old. A variant using negation in the same way is discussed by Kempson &amp; Cormack (1981).</p><p>A slightly different test can be performed with a universal quantifier somewhat remote from the relevant lu, as in (5).</p><p>(5) a. Every town has a bank.</p><p>b. Every zoo has an elephant.</p><p>The question to be asked here is whether the A" (bank/elephant) has to be interpreted in the same sense for every <i>Y </i>(town/zoo). In a similar way numerals can be used as in (6), and coor­dination as in (7), requiring the same question.</p><p>(6) a. This town has two banks, b. This zoo has two elephants.</p><p>(7) a. John and Mary went to a bank this morning.</p><p>b. John and Mary saw an elephant this morning.</p><p>Summarizing, there are three main classes of processes in P behaving as in (2). The first one refers to two elements from the extension of the lu, one of them by an anaphor, as in (3). The second one refers to the full extension of a reading at once, as in (4). The third one refers to a group of elements in the extension, exploiting distributivity, as in (5)-(7). Each class is associated with a different question the answer of which determines whether an anal­ysis as ambiguity or as vagueness is correct. There are various realizations of test sentences for each class, some of which are sub ject to in­dependently motivated constraints. In a nat­ural way an intuitively appealing definition of <i>reading </i>evolves as in (8).</p><p>(8) A reading of an lu is a coherent group of senses, the boundaries of which cannot be crossed by a single occurrence of the lu without losing semantic normality.</p></section><section number="4" title="The tests in actual use"><p>In the previous section, semantic tests were shown to give correct answers in cases where we can check them. This proves that, we should not immediately reject the tests. The reason we need them however, is that there are many cases where unaided intuition is not sufficiently determinate, so that conflicts on the correct analysis might arise.</p><p>A well-known problem area is the analysis of privative oppositions, where one of the senses is more general and includes the other one. Both <i>dog </i>and <i>lion </i>have senses <i>animal belonging to a particular species </i><i>of</i><i> mammals </i>and <i>male specimen of that species. </i>According to Kemp-son (1980) they are both vague with respect to these senses, but Zwicky <i>k.</i><i> </i>Sadock (1975) claim that <i>dog </i>but not <i>lion </i>is ambiguous. Ap­plying various tests to them we get the follow­ing sentences.</p><doubt alpha="64.1" length="39" tooSmall="False" monospace="0.0">(9) a. John has a dog, and Mary has one</doubt><p>too.</p><p>b. The zoo has a lion, and the circus has one too.</p><doubt alpha="65.9" length="41" tooSmall="False" monospace="0.0">(10) a. All dogs of this breed are short-</doubt><p>sighted.</p><p>b. All lions in this wild reserve have been killed by poachers.</p><page local="4" global="165"/><p>(11) a. This family has two dogs, b. This zoo has two lions.</p><p>The sentences (9) and (11) cannot lead to a conclusion for independent reasons. Since an individual or a group in the more specific sense of the lu is also an individual or a group in the general sense, the general sense is always available to cover up the opposition. This is not the case when the full extensions are com­pared, however. Therefore from (10) we can in­deed conclude that <i>dog </i>is ambiguous and <i>lion </i>is not. Both (10a) and (10b) have the gen­eral interpretation, but only (10a) also has the more specific one (cf. ..., <i>but not the bitches </i>vs. *..., <i>but not the lionesses).</i></p><p>Another problem that comes up is the con­struction of test sentences for other syntactic categories than nouns. Although the various processes are most easily demonstrated with nouns, nothing in the theory refers to nouns di­rectly. VP-anaphors, e.g. <i>so, </i>can also be used for verbs.</p><doubt alpha="66.7" length="42" tooSmall="False" monospace="0.0">(12) a. John has been running all day, and</doubt><p>so has his washing machine, b. John has been running all day, and so has his dog.</p><p>(13) John followed Mary, and Bill did so too.</p><p>The sentences in (12) show the ambiguity of <i>run </i>between the senses with a human and a machine subject, and the vagueness between senses with a biped and a quadruped subject. For transitive verbs, such as <i>follow, </i>having the sense <i>understand </i>and <i>go after, </i>the result of the test is more disputable, since (13) shows the ambiguity of <i>follow Mary, </i>and one could argue that it is due to ambiguity of <i>Mary, </i>e.g. between the senses <i>thinking person </i>and <i>spa­tial object. </i>Therefore, the use of a non-lexical ana,phor, indicated by # in the examples, is to be preferred.</p><p>(14) John followed Mary, and Bill # Kate.</p><p>It is rather difficult to construct a sentence with a quantifier over the verb comparable to (4) for nouns. Rather, a sentence such as (15) below displays the same distributivity effect as (5) , that can also be achieved by coordination as in (16).</p><p>(15) All boys followed Mary.</p><p>(16) John and Bill followed Mary.</p><p>The test sentences for verbs can also be used, for adjectives, if they are used prcdicatively. An example is (17), where <i>black </i>is shown to have different readings when used with a con­crete object and with <i>humour.</i></p><p>(17) Her dress is black, and so is her hu­mour.</p><p>For gr ad able adjectives, a comparison is a ba­sis for constructing a test sentence. Although (17) can be used humoristically, (18) below, il­lustrating the ambiguity <i>of</i><i> fair, </i>can hardly be interpreted.</p><p>(18) Her hair is as fair as the salary she pays her employees.</p><p>In general it seems that for gradable adjectives comparison provokes stronger judgements than anaphoric reference by <i>so. </i>In some cases, how­ever, one of the senses cannot be used predica-tively, and neither of the two processes can be used. An empty anaphor sometimes provides a solution, as in (19), where <i>economic </i>is shown to be ambiguous between the senses <i>relating to the economy </i>and <i>not wasteful.</i></p><p>(19) For many years, he produced economic theories and # cars.</p><p>In some languages, there is a lexical anaphor that requires an adjective as its antecedent, e.g. <i>dito </i>in Dutch, as illustrated in (20).</p><p>(20) Bij   hun gouden bruiloft kregen ze    een   dito horloge.</p><doubt alpha="63.8" length="58" tooSmall="False" monospace="0.0">(Litt.) 'At  their golden wedding got they a      # watch'</doubt><p>Among the remaining problems is the com­parison of two senses with big syntactic differ­ences. All test sentences have to be syntacti­cally correct, and syntax does not allow e.g. co­ordination of a noun and a verb in correspond­ing positions. In such cases, the semantic part of testing the senses is never arrived at.</p><page local="5" global="166"/></section><section number="5" title="Conclusion"><p>In this paper, I developed tests to answer the question whether an lu with two senses is to be analyzed as ambiguous or vague with re­spect to them from the semantic conjecture (2). The tests allow for theoretically well-founded and consistent decisions in many cases. In MT, they determine a proper balance on the cline between what can easily be disam­biguated monolingually, and what is useful as a distinction in translation. As such they define the target for monolingual disambiguation, and the class of bilingual ambiguities, that should be treated in transfer. Since the MT environ­ment has only been used in the argumentation, not in the solution proposed, theoretical well-foundedness and consistency evolving from the tests presented here are equally valid in other environments where a monolingual dictionary is used.</p></section><references><p>Cruse, D.A. (1986). <i>Lexical Semantics, </i>Cam­bridge University Press.</p><p>Kempson, Ruth (1980). ' Ambiguity and Word Meaning', in: Greenbaum, Sidney, Geoffrey Leech <i>k </i>Jan Svartvik, <i>Studies in English Lin­guistics, </i>Longman, London / New York, p. 7­16.</p><p>Kempson, Ruth <i>k </i>Annabel Cormack (1981). 'Ambiguity and Quantification', <i>Linguistics and Philosophy </i><i>4,</i><i> </i>p. 259-309.</p><p>LakofT, George (1970). 'A Note on Vagueness and Ambiguity', <i>Linguistic Inquiry </i><i>1</i>, p. 357­359.</p><p>Wiggins, David (1971). 'On sentence-sense, word-sense and difference of word-sense. To­wards a philosophical theory of dictionaries.' In: Steinberg, Danny <i>k </i>Leon Jakobovits (ed.). <i>Semantics, </i>Cambridge University Press, p. 14­34.</p><p>Zwicky, Arnold <i>k </i>Jerrold Sadock (1975). 'Am­biguity tests and how to fail them', in: Kim­ball, John (ed.). <i>Syntax and semantics </i><i>\,</i><i> </i>Aca­demic Press, p. 1-36.</p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">5</doubt></references></body></article>