<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:ns2="http://www.tei-c.org/ns/Examples">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>Squibs and Discussions Unsupervised Named Entity Recognition Using Syntactic and Semantic Contextual Evidence</title>
            </titleStmt>
        </fileDesc>
    </teiHeader>
    <text>
        <front/>
        <body>
            <div>
                <p>Alessandro Cucchiarelli* Universita di Ancona Paola Velardi t Universit~i di Roma 'La Sapienza' Proper nouns form an open class, making the incompleteness ofmanually or automatically learned classification rules an obvious problem. The purpose of this paper is twofold:first, to suggest he use of a complementary &quot;backup&quot; method to increase the robustness of any hand-crafted or machinelearning-based NE tagger; and second, to explore the effectiveness of using more fine-grained evidence--namely, syntactic and semantic ontextual knowledge--in classifying NEs. 1. Proper Noun Classification In this paper we present a corpus-driven statistical technique that uses a learning corpus to acquire contextual classification cues, and then uses the results of this phase to classify unrecognized proper nouns (PN) in an unlabeled corpus. Training examples of proper nouns are obtained using any available named entity (NE) recognizer (in our experiments we used a rule-based recognizer and a machine-learningbased recognizer). The contextual model of PN categories is learned without supervision.</p>
                <p>The approach described in this paper is complementary to current methods for NE recognition: our objective is to improve, without additional manual effort, the robustness of any available NE system through the use of more &quot;fine-grained&quot; contextual knowledge, best exploited at a relatively late stage of analysis. The method is particularly useful when an available NE system must be rapidly adapted to another language or to another domain, provided the shift is not dramatic.</p>
                <p>Furthermore, our study provides experimental evidence relating to two issues still under debate: i) the effectiveness, in practical NLP applications, of using syntactic relations (most systems use plain collocations and morphological features), and ii) context expansion based on thesauri. While we do not provide a definitive argument in favor of syntactic contexts and semantic expansion for word sense disambiguation tasks in general, we do show that they can be successfully used for unknown proper noun classification. Proper nouns have particular characteristics, uch as low or zero ambiguity, which makes it easier to characterize their contexts. 2. Description of the U_PN Classification Method In this section we briefly summarize the corpus-based tagging technique for the classification of unknown proper nouns (for more details, see Cucchiarelli, Luzi, and Velardi \[1998\]). * Istituto di Informatica, Via Brecce Bianche 1-60131 Ancona, Italy. E-mail: alex@inform.unian.it t Dipartimento diScienze dell'Informazione, Via Salaria 113, 1-00198 Roma, Italy. E-mail: velardi@</p>
                <p>dsi.uniromal.it Computational Linguistics Volume 27, Number 1 2.1 Learning Contextual Sense Indicators Our method proceeds as follows: first, by means of any available NE recognition technique (which we will call an early NE classifier), at least some examples of PNs in each category are detected. Second, through an unsupervised corpus-based technique, typical PN syntactic and semantic ontexts are learned. Syntactic and semantic ues can then be used to extend the coverage of the early NE classifier, increasing its robustness to the limitations of the gazetteers (PN dictionaries) and domain shifts.</p>
                <p>In phase one, a learning corpus in the application domain is morphologically processed. The gazetteer lookup and the early NE classifier are then used to detect PNs. At the end of this phase, &quot;some&quot; PNs are recognized and classified, depending upon the size of the gazetteer and the actual performance (in the domain) of the NE classifier.</p>
                <p>In phase two, the objective is to learn a contextual model of each PN category, augmented with syntactic and semantic features. Since the algorithm is unsupervised, statistical techniques are applied to smooth the weight of acquired examples as a function of semantic and syntactic ambiguity. 1</p>
                <p>Syntactic processing is applied over the corpus. A shallow parser (see details in Basili, Pazienza, and Velardi \[1994\]) extracts from the learning corpus elementary syntactic relations such as Subject-Object, Noun-Preposition-Noun, etc. 2 An elementary syntactic link (esl) is represented as: esl(wi, mod( typei, Wk ) ) where wj is the headword, Wk is the modifier, and type i is the type of syntactic relation (e.g. Prepositional Phrase, Subject-Verb, Verb-Direct-Object, etc.). For example, esl(close mod(G_N_V_Act Xerox)) reads: Xerox is the modifier of the head close in a Subject-Verb (G_N_V_Act) syntactic relation.</p>
                <p>In our study, the context of a word w in a sentence S is represented by the esls including w as one of its arguments (wj or Wk). The esls that include semantically classified PNs as one of their arguments are grouped in a database, called PN_esl. This database provides contextual evidence for assigning a category to unknown PNs. 2.2 Tagging Unknown PNs A corpus-driven algorithm is used to classify unknown proper nouns recognized as such, but not semantically classified by the early NE recognizer. 3 • Let U_PN be an unknown proper noun, i.e., a single word or a complex nominal. Let Cpn = (Cp~l, Cpn2 ..... CpnN) be the set of semantic ategories for proper nouns (e.g. Person, Organization, Product, etc.). Finally, let ESL be the set of esls (often more than one in a text) that include U_PN as one of their arguments. • For each esli in ESL let: esli( wj, mod( typei, Wk ) ) = esli( x, U_PN) Cucchiarelli and Velardi Unsupervised Named Entity Recognition where x = w\] or x = Wk and U-PN=wk or wj (the unknown PN can be either the head or the modifier), type i is the syntactic type of esl (e.g. N-of-N, NAN, V-for-N, etc.), and furthermore let: pl(esli(x, U_PN) ) be the plausibility of a detected esl. Plausibility is a measure of the statistical evidence of a detected syntactic relation (Basili, Marziali, and Pazienza 1994; Grishman and Sterling 1994) that depends upon local (i.e., sentence-level) syntactic ambiguity and global corpus evidence. The plausibility accounts for the uncertainty arising from syntactic ambiguity. ,. • Finally, let: - - ESLA be a set of esls in PN_esl (the previously learned</p>
                <p>contextual model) defined as follows: for each esli(x, Uff)N) in</p>
                <p>ESL, put in ESLA the set of eslj(x, PNj) with typej = type i, x in</p>
                <p>the same position as esli, and PNj a known proper noun, in</p>
                <p>the same position as U_PN in esli.</p>
                <p>ESLB be a set of esls in PN_esl defined as follows: for each</p>
                <p>esli(x, U_PN) in ESL put in ESLB the set of eslj(w, PNj) with</p>
                <p>type\] -- type i, w in the same position as x in esli, Sim(w,x) &gt; 6,</p>
                <p>and PNj a known proper noun, in the same position as U_PN</p>
                <p>in esli. Sim(w, x) is a similarity measure between x and w. In</p>
                <p>our experiments, Sim(w,x) &gt; ~ iff w and x have a common</p>
                <p>hyperonym H in WordNet. The generality of H (i.e., the</p>
                <p>number of levels from x to H) is made parametric, to analyze</p>
                <p>the effect of generalization. For each semantic category Cp,j compute evidence(Cp,j) as: E esliC ESLA,C( PNj)=Cpn j weightq (x)D(x, C(PNj)) evidence(Cp~j) = a</p>
                <p>E esliEESLA</p>
                <p>E fl esli E ESLB,C( PNj) =Cpn j + weight~j (x)D(x, C(PNj)) weightq (x)D(x, C(PNj)) E weightiy(x)D(x'C(PNJ )) esli6 ESLB where:</p>
                <p>weightq(x) = weight q ( esli(x, PNj) ) = pl( esli(x, PNj) ) • (1 - ~(~)-1~_1 ,</p>
                <p>weightij(w ) = weightij(esli(w, PNj) ) = pl(esli(w, PNj)). (1 - amb(w)-l~k_\] -2 u pl(esli(x, PNj)) is the plausibility and arab(x) is the ambiguity of x in esli k is a constant factor used to incrementally reduce the influence of ambiguous words. The smoothing is tuned to be higher in ESLB a and fl are parametric, and can be used to study the evidence provided by ESLA and ESLB Computational Linguistics Volume 27, Number 1 D(x, C(PNj)) is a discrimination factor used to determine the saliency (Yarowsky 1992) of a context esli(x, _) for a category C(PNj), i.e., how good a context is at discriminating between C(PNj)and the other categories. 4 The selected category for U~N is C = argmax(evidence(Cp~k))</p>
                <p>When grouping all the evidence of a U_PN in a text, the underlying hypothesis is that, in a given linguistic domain (finance, medicine, etc.), a PN has a unique sense. This is a reasonable restriction for Proper Nouns, supported by empirical evidence, though we would be more skeptical about the applicability of the one-sense-per-discourse paradigm (Gale, Church, and Yarowsky 1992) to generic words. We believe that it is precisely this restriction that makes the use of syntactic and semantic ontexts effective for PNs.</p>
                <p>Notice that the formula of the evidence has several smoothing factors that work together to reduce the influence of unreliable or uninformative contexts. The formula also has parameters (k, ~, fl), estimated by running systematic experiments. Standard statistical techniques have been used to balance experimental conditions and the sources of variance.</p>
            </div>
            <note n="1" place="below">We say the algorithm is unsupervised because neither the NE items detected by the early recognizer nor the extracted syntactic contexts are inspected for correctness. 2 Shallow, or partial parsers are a well-established technique for corpus parsing. Several partial parsers are readily available---for example, the freely downloadable LINK parser. 3 A standard POS tagger augmented with simple heuristics is used to detect possible instances of PNs. Errors are originated only by ambiguous entence beginners, as &quot;Owens Illinois&quot; or &quot;Boots Plc&quot; causing partial recognition.</note>
            <div1>
                <head xml:id="sec3.">Using WordNet for Context Generalization</head>
                <p>One of the stated objectives of this paper is to investigate the effect of context generalization (the addend ESLB in the formula of the evidence) on our sense tagging task.</p>
                <p>The use of on-line thesauri for context generalization has already been investigated with limited success (Hearst and Schuetze 1993; Brill and Resnik 1994; Resnik 1997; Agirre and Rigau 1996). Though the idea of using thesauri for context expansion is quite common, there are no clear indications that this is actually useful in terms of performance. However, studying the effect of context expansion for a PN tagging task in particular is relevant because: PNs may be hypothesized to have a unique sense in a text, and even in a domain corpus. Therefore, we can reliably consider as potential sense indicators all the contexts in which a PN appears. The only source of ambiguity is then the word wi co-occurring in a syntactic context with a PN, esli(wi, U_PN), but since in ESLB we group several contexts, hopefully spurious hyperonyms of wi will gain lower evidence. For example, consider the context &quot;division of Americand3randsdnc&quot;. Division is a highly ambiguous word, but, when generalizing it, the majority of its senses appearing in the same type of syntactic relation with a Proper Noun (e.g. branch of Drexel_ Burnhamd,ambert_Group dnc, part of Nationale_ Nederlanden_Group) are indeed pertinent senses. 4 For example, aSubject_Verb phrase with the verb make (e.g., Ace made acontract) is found with almost</p>
                <p>equal probability with Person and Organization ames. We used a simple conditional probability</p>
                <p>model for D(x, c(PNj)), but we believe that more refined measures could improve performance. Cucchiarelli and Velardi Unsupervised Named Entity Recognition • PN categories (e.g., Person, Location, Product) exhibit a more stable and less ambiguous contextual behavior than other more vague categories, such as psychological_feature. 5 • We can study the degree of generalization at which an optimum performance is achieved.</p>
            </div1>
            <div1>
                <head xml:id="sec4.">Experimental Discussion</head>
                <p>The purpose of experimental evaluation is twofold: To test the improvement in robustness of a state-of-the-art NE recognizer. To study the effectiveness of syntactic contexts and of a &quot;cautious&quot; context generalization on the performance of the U_PN tagger, analyzed in isolation. The effect of generalization is studied by gradually relaxing the notion of similarity in the formula of evidence and by tuning, through the factors a and fl, the contribution of generalized contexts to the formula of evidence.</p>
                <p>In our experiment, we used the Italian Sole24Ore half-million-word corpus on financial news, the one-million-word Wall Street Journal corpus, and WordNet, as standard on-line available resources, as well as a series of computational tools made available for our research: • the VIE system (Humphreys et al. 1996) for initial detection of Proper Nouns from the learning corpus; for the same purpose we also used a machine learning method based on decision lists, described in Paliouras, Karkaletsis, and Spyropolous (1998). • the SSA shallow syntactic analyzer (Basili, Pazienza, and Velardi 1994) for surface corpus parsing. 6 • the tool described in Cucchiarelli and Velardi (1998) for corpus-driven WordNet pruning. 7 4.1 Experiment 1: Improving Robustness of NE Recognizers The objective of Experiment 1 is to verify the improvement in robustness of existing NE recognizers, through the use of our tagger. In Figure 1, three testing experiments are shown. The table measures the local performance of the NE tagging task achieved by the early NE recognizer, by our untrained tagger, and finally, the joint performance of the two methods.</p>
                <p>In the first test, we used the Italian Sole24Ore corpus. Due to the unavailability of WordNet in Italian, we used a dictionary of strict synonyms for context expansion. In this test, we &quot;loosely&quot; adapted the English VIE system (as used in MUC-6) to Italian. 5 In Velardi and Cucchiarelli (2000) we formally studied the relation between category type and</p>
                <p>learnability of contextual cues for WSD. 6 We also used the GATE partial parser. We were not as successful with this parser because itis not</p>
                <p>designed for high-performance VP3?P and NP-PP detection, but prepositional contexts are often the</p>
                <p>most informative indicators. 7 This method produces a 20-30% reduction of the initial WordNet ambiguity, depending on the specific</p>
                <p>corpus. Computational Linguistics Volume 27, Number 1</p>
                <p>A B C D E F Test 1 239 355 67.32% 339 70.50% 60 Test 2 650 793 81.90% 759 85.63% 67 Test 3 3,040 4,168 72.94% 3,233 94.03% 585 G 83 83 935</p>
                <p>Legend PNs correctly tagged by the early NE recognizer</p>
                <p>H I 72.29% 75 80.72% 80 62.57% 810</p>
                <p>K L 80.00% 84.23% 88.20% 83.75% 90.42% 94.47% 72.22% 86.97% 89.66% J A: B: C: D: E: F: G: H: I: Total UPNs for which a decision was possible by the UPN tagger \]: K: L: Joint Precision of the two methods (A+F)/D Figure 1 Outline of results on the Sole24Ore corpus. Total PNs in the Test Corpus Local Recall of the early NE recognizer (A/B) Total PNs detected by the early NE recognizer (D = A + A1 (errors) + G(unknown) Local Precision of the early NE recognizer (A/D) UPNs correctly tagged by the UPN tagger in the Test Corpus Total UPNs not detected by the early NE recognizer Local recall of UPN tagger (Phase2) (F/G) Local precision of the UPN tagger Joint Recall of the two methods (A + F)/B We used the English gazetteer as it was and we applied simple &quot;language porting&quot; to the NE grammar (e.g., replacing English words and prepositions with corresponding Italian words, and little more), sThis justifies the low performance of the rule-based classifier. Note that our context-based tagger produces a considerable improvement in performance (around 18%), therefore the global performance (column K and L) turns out to be comparable with state-of-the-art systems, without a significant readaptation effort.</p>
                <p>In the second test, we used again VIE, on the English Wall Street Journal corpus. We used a version of VIE that was designed to detect NE in a management succession domain (we are testing the effect of a domain shift here). Local performance was somewhat lower than in MUC-6. Again, we measured a 9% improvement using our tagger, and very high global performance.</p>
                <p>The third test was the most demanding. Here, we used only half of the named entity gazetteer used in previous experiments. The purpose of this test was also to verify the effect on performance of a poorly populated gazetteer. In this test, rather than using LASIE, we used a machine learning method described in Paliouras, Karkaletsis and Spyropolous (1998). This method uses as a training set the available half of the gazetteer to learn a context-based ecision list for NE classification.</p>
                <p>As shown in Test 3, column B, the initial number of PNs in the test corpus is now considerably higher. The decision-list classifier is tuned to classify with high precision and lower recall. Therefore, only the &quot;hardest&quot; cases are submitted to our untrained classifier. In fact, local performance of our classifier is around 10% lower than for previous tests, but nevertheless, global performance (in terms of joint precision and recall) shows an improvement. Finally, we observe that the performance figures reported in Figure 1 say nothing about the various sources of errors. Errors and misses occur both during the off-line learning phase (as we said, NE instances and syntactic contexts 8 Most location and company names known worldwide (e.g., NewYork, IBM) are in fact mentioned in</p>
                <p>economic journals regardless of the language. Cucchiarelli and Velardi Unsupervised Named Entity Recognition are not inspected for correctness, therefore the contextual knowledge base is error prone) and prior to the U_PN tagging phase: a compound PN may be incompletely recognized uring POS tagging, causing the generation of an uninformative syntactic context (e.g., &quot;Owens Illinois&quot; at the beginning of a sentence is recognized as &quot;owens Illinois&quot;, causing a spurious NdN(owen,Illinois) context o be generated).</p>
                <p>Because all these &quot;external&quot; sources of noise are not filtered out, we may then reliably conclude that our tagger is effective at improving the robustness of proper noun classification, though clearly the amount of improvement depends upon the baseline performances of the early method used for PN classification.</p>
                <p>Although the classification evidence provided by syntactic ontexts is somewhat noise prone, it proves to be useful as a &quot;backup,&quot; when other &quot;simpler&quot; contextual evidence does not allow a reliable decision. 4.2 Effectiveness of Syntactic and Semantic Cues for Semantic Classification In a second experiment, we used the experimental set up of Test 2 (WSJ+VIE described above) to evaluate the effectiveness of context expansion on system performance. We applied a pruning method on WordNet (Cucchiarelli and Velardi 1998) to reduce initial ambiguity of contexts. This pruning method allowed an average of 27% reduction in the initial ambiguity of the total number of the 13,428 common nouns in the Wall Street Journal corpus. The objective of this experiment was to allow a more detailed evaluation of our method, with respect o several parameters.</p>
                <p>We built four test sets with the same distribution of PN categories and frequency distribution as in the application corpus. We selected four frequency ranges (1, 2, 3-9, &gt; 10) and in each range we selected 100 PNs, reflecting the frequency distribution in the corpus of the three main PN semantic categories--Person, Organization, and Location. We then built another test set, called TSAll, with 400 PNs again reflecting the frequency and category distribution of the corpus. The 400 PNs were then removed from the set of 37,018 esls extracted by our parser and from the gazetteer (whenever included).</p>
                <p>In this experiment, we wanted to measure the performance of the U_PN tagger over the 400 words in the test set, in terms of F-measure, according to several varying factors: • • • •</p>
                <p>Figures 2 summarizes the results of the experiment. Figure 2(a) shows the increase in performance as a function of the values of oe and fl and the generalization level. N means no generalization, only the evidence provided by ESLA is computed; 0means that ESLB collects the evidence provided by contexts in which w is a strict synonym of x according to WordNet; 1, 2, and 3 refer to incremental levels of generalization i the (pruned) WordNet hierarchy. The figure shows that context generalization produces up to 7% improvement in performance. Best results are obtained with L = 2 and ~ = 0.7, fl = 0.3. Further generalization may cause a drop in performance. High ambiguity is the cause of this behavior, despite WordNet pruning (without WordNet pruning, we observed a performance inversion at level 1; this experiment is not reported due to the category type; the amount of initial contextual evidence (i.e., the frequency range, reflected by the different est sets); the factors oe and fl, i.e., the influence of local and generalized contexts; the level of generalization L. Computational Linguistics Volume 27, Number 1 hi1% i ..,&quot; .,&quot; &quot;//(l=o ?, \]3 0 3 f:2 I ~% 4 o 3 I~ 41 7 t~ N &quot; Level of Gen~ralizatlon (a) (b) limitations of space). Figure 2(b) illustrates the influence of initial contextual evidence. Recognition of singleton PNs remains almost constant as the contribution of generalized and nongeneralized contexts varies. Looking more in detail, we observe that recall increases with fl -- (1- c~), but precision decreases. Generalization on the basis of a unique context does not allow any filtering of spurious senses, while when grouping several contexts, spurious senses gain lower evidence (as anticipated in Section 3).</p>
                <p>Finally, we designed an experiment o evaluate the influence of the test set composition on the U_PN tagger performances. We performed an analysis of variance (ANOVA test \[Hoel 1971\]) on the results obtained by processing nine different test sets of 400 PNs each, selected randomly. In all our experiments the details of which we omit, for lack of space), we found that the U-PN tagging method performances were independent of the variations of the test set. Figure 2 Evaluation of the effectiveness ofcontext expansion. References Agirre, Eneko and German Rigau. 1996.</p>
                <p>Word Sense Disambiguation using</p>
                <p>Conceptual Density. In Proceedings of the</p>
                <p>16th International Conference on</p>
                <p>Computational Linguistics (COLING '96),</p>
                <p>Copenhagen, Denmark. Basili, Roberto, Alessandro Marziali, and</p>
                <p>Maria Teresa Pazienza. 1994. Modelling</p>
                <p>syntax uncertainty in lexical acquisition</p>
                <p>from texts. Journal of Quantitative</p>
                <p>Linguistics, 1(1). Basili, Roberto, Maria Teresa Pazienza, and</p>
                <p>Paola Velardi. 1994. A (not-so) shallow</p>
                <p>parser for collocational analysis. In</p>
                <p>Proceedings of the 15th International</p>
                <p>Conference on Computational Linguistics</p>
                <p>(COLING '94), Kyoto, Japan. Brill, Erik and Philip Resnik. 1994. A</p>
                <p>transformation-based approach to</p>
                <p>prepositional phrase attachment</p>
                <p>disambiguation. In Proceedings of the 15th</p>
                <p>International Conference on Computational</p>
                <p>Linguistics (COLING '94), Kyoto, Japan. Cucchiarelli, Alessandro, Danilo Luzi, and</p>
                <p>Paola Velardi. 1998. Automatic semantic</p>
                <p>tagging of unknown proper names. In</p>
                <p>COLING-ACL &quot;98: 36th Annual Meeting of</p>
                <p>the Association for Computational Linguistics</p>
                <p>and I7th International Conference on</p>
                <p>Computational Linguistics, Montreal,</p>
                <p>Canada. Cucchiarelli, Alessandro and Paola Velardi.</p>
                <p>1998. Finding a domain-appropriate s nse</p>
                <p>inventory for semantically tagging a</p>
                <p>corpus. International Journal on Natural</p>
                <p>Language Engineering, December. Gale, William, Kenneth Church, and David</p>
                <p>Yarowsky. 1992. One sense per discourse.</p>
                <p>In Proceedings of the DARPA Speech and</p>
                <p>Natural Language Workshop. Harriman, NY. Grishrnan, Ralph and John Sterling. 1994.</p>
                <p>Generalizing automatically generated</p>
                <p>selectional patterns. Proceedings of the 15th</p>
                <p>International Conference on Computational</p>
                <p>Linguistics (COLING &quot;94), Kyoto, Japan. Hearst, Marti and Hinrich Schuetze. 1993.</p>
                <p>Customizing a lexicon to better suite a</p>
                <p>computational task. In Proceedings of</p>
                <p>ACL-SIGLEX Workshop on Lexical</p>
                <p>Acquisition from Text. Columbus, OH. Hoel, Paul Gerhard. 1971. Introduction to Cucchiarelli and Velardi Unsupervised Named Entity Recognition</p>
                <p>Mathematical Statistics. John Wiley &amp; Sons</p>
                <p>Inc., New York. Humphreys, Kevin, Robert Gaizauskas,</p>
                <p>Hamish Cunningam, and Sheila Azzan.</p>
                <p>1996. Technical Specifications, 1996/10/1815.</p>
                <p>ILASH, University of Sheffield, UK. Paliouras, George, Vangelis Karkaletsis, and</p>
                <p>Constantine Spyropolous. 1998. Results</p>
                <p>from the named entity recognition task. In</p>
                <p>Deliverable 3.2.1 of the European project</p>
                <p>ECRAN LE 2110. Available at: http://</p>
                <p>www2.echo.lu/langeng/en/lel/ecran/</p>
                <p>ecran.html. Resnik, Philip. 1997. Selectional reference</p>
                <p>and sense disambiguation. In Proceedings</p>
                <p>of the ACL Workshop Tagging Text with</p>
                <p>Lexical Semantics: Why, What, and How?</p>
                <p>Washington, DC. Velardi, Paola and Alessandro Cucchiarelli.</p>
                <p>2000. A theoretical analysis of</p>
                <p>contextual-based learning algorithms for</p>
                <p>word sense disambiguation. In Proceedings</p>
                <p>of ECA12000, Berlin, Germany. (To</p>
                <p>appear.) Yarowsky, David. 1992 Word-sense disambiguation using statistical models of Roget's categories trained on large corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING &quot;92), Nantes, France.</p>
            </div1>
            <note n="9S~," place="below"></note>
            <note n="76%" place="below"></note>
        </body>
        <back/>
    </text>
</TEI>
