<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1"/><title>TOPIC IDENTIFICATION TECHNIQUES FOR PREDICTIVE LANGUAGE ANALYSERS</title><author surname="Tait" givenname="J.I."></author></firstpageheader><frontmatter><p><b>TOPIC IDENTIFICATION TECHNIQUES FOR PREDICTIVE LANGUAGE ANALYSERS</b></p><p><b>J.I. Tait</b></p></frontmatter><abstract><b>University of Cambridge Computer Laboratory, Corn Exchange St.. Cambridge CB2 3QG, England.</b> </abstract></header><body><section number="1." title="Introduction"><p><b>The use of <u>prediction</u> as the basis for Inferential analysis mechanisms for natural language has become increas­ingly popular in recent years. Examples of systems which use prediction are FRUMP (DeJong 79) and(Schank 75a). The proper­ty of interest here ie that their basic mode of working Is to determine whether an input text follows one of the systems pre-epeclfied patterns; in other words they predict, to some extent, the form their input texts will take. A crucial pro­blem for such systems is the selection of suitable sets of predictions, or patterns, to be applied to any particular text, and it is this problem I want to address in the paper.</b></p><p><b>I will assume that the predictions are organised into bundles according to the topis of the texts to which they apply. This is a generalisation of the script idea employed by (DeJong 79) and (Schank75a).   I will call such bundles <u>stereotypes</u>.</b></p><p><b>The basis of the technique described here is a distinct­ion between the process of <u>suggesting</u> possible topios of a section of text and the process of <u>eliminating</u> candidate topios (and associated predictions) which are not, in fact, appropriate for the text section. Those candidates which are not eliminated are then <u>identified</u> as the topics of the text</b> <b>section.</b><page local="2"/><b> (There may only be one such candidate.) This approach allows the use of algorithms for suggesting possible topics which try to ensure that if the system possesses a suitable stereotype for a text section it is activated, even at the expense of activating large numbers of irrelevant stereotypes.</b></p><doubt alpha="0.0" length="7" tooSmall="False" monospace="0.0">- 281 -</doubt><p><b>This technique has been tested in a computer system called Scrabble.</b></p></section><section number="2." title="Suggesting Candidate Topics"><p><b>The discovery of candidate topics for a text segment is driven by the association of a set of patterns of semantic primitives with each stereotype. (For the purposes of this paper it is assumed that the system has access to a lexicon containing entries whose semantic component is something like that used by (Wilks 77).) As a word is input to the system the senses of the word are examined to determine if any of them have a semantic description which contains a pattern associated with any of the system s stereotypes. If any do contain such a pattern the corresponding stereotypes are loaded into the active workspace of the system, unless they are already active.</b></p></section><section number="3." title="Eliminating Irrelevant Candidates"><p><b>In parallel with the suggestion process, the predictions of each stereotype in the active workspace are compared with the text. In Scrabble, the sentences of the text are first parsed into a variant of Conceptual Dependency (CD) represent­ation (Schank 75b) by a program described in (Cater 80). The semantic representation soheme has been extended to include nominal descriptions similar in power to those used by (Wilks 77). The predictions are compared with the CD representation structures at the end of each sentence; but nothing in the scheme described in this paper could not be applied to a</b> <b>system which integrated the process of parsing with that of determining whether or not a fragment of the text satisfies some prediction, as is done in (SeJong </b><b>79).</b><page local="3"/><b></b></p><doubt alpha="0.0" length="7" tooSmall="False" monospace="0.0">- 282 -</doubt><p><b>It is likely that stereotypes which are not relevant to the topic of the current text segment will have been loaded as a result of the suggestion process. Since the cost of the comparison of a prediction with the CD-representation of a sentence of the text   is not trivial it is Important that Ir­relevant stereotypes are removed from the active workspaoe as rapidly as possible. The primary algorithm used by Scrabble removes any stereotype which has failed to predict more of the propositions in incoming the text than it has successfully predicted. Ihis simple algorithm has proved adequate in tests and its simplicity also ensures that the cost of removing Irrelevant stereotpyes is minimised.</b></p><p><b>Further processing Is subsequently done to separate stereotypes which were never appropriate for the text from stereotypes whioh were useful for the analysis of some part of the text, but are no longer useful.</b></p><p>4t <b>An Example.</b></p><p><b>Consider the following short text, adapted from (Char-niak 78)»</b> <b>Jack pioked a can of tuna off the shelf.</b><b> He put it in his basket* He paid for It and went home*</b></p><p><b>Assume that associated with the primitive pattern for food the system has stereotypes for eating in a restaurant, shopping at a supermarket, and preparing a meal in the kitoh-en. She lexicon entry for tuna   (a large sea fish whioh Is caught for food) will contain this pattern, and this will cause the loading of the above three stereotypes into the aotive workspace. The restaurant stereotype will not prediot the first sentence, and so will immediately be unloaded. Both the supermarket and kitchen stereotypes expeot sentences like</b> <b>the first in the text.</b><page local="4"/><b> When the second sentence is read, the supermarket stereotype will be expecting it (since it expects purchases to be put into baskets), but the kitchen stereotype will not. However the kitchen stereotype will not be unloaded since, so far, it has predicted as many propositions as it has failed to predict. When the third sentence is read, again the supermarket stereotype has predicted propositions of this form, but the kitchen stereotype has not. Therefore the kitch­en stereotype is removed from the active workspace, and the topic of text is firmly identified as a visit to the super­market.</b></p><doubt alpha="0.0" length="7" tooSmall="False" monospace="0.0">- 283 -</doubt><p><b>It should be noted that a completely realistic system would have to perform muoh more complex processing to analyse the above example. In such a system additional stereotypes would probably be activated by the occurrence of the primitive pattern for food, and it Is likely that yet more stereotypes would be activated by different primitive patterns in the lexicon entries for the words in the input text.</b></p></section><section number="5." title="Conclusions"><p><b>The teohnique described in this paper for the identific­ation of the topic of a text section has a number of advant­ages over previous schemes. first, its use of Information whioh will probably already be stored in the natural language processing system's lexicon has obvious    advantages over schemes which require large, separate data-structures purely for topic identification, as well as for making the predict­ions aseoolated with a topic. In practice* Scrabble uses a slightly doctored lexioon to improve   efficiency, but the necessary work could be done by an automatic preprocessing of the lexicon.</b></p><p><b>Second, the scheme described here can make use of nominals which suggest a candidate topio, and associated stereotypes, without complex manipulation of semantic inform­ation which is not useful for this purpose* The scheme of</b> <b>(BeJong 79), for example, would perform oomplex operations on semantic representations associated with "pick" before it processed the more useful word "tuna" if it processed the above example text*</b><page local="5"/></p><doubt alpha="0.0" length="7" tooSmall="False" monospace="0.0">- 284 -</doubt><p><b>Third the use of semantic primitive patterns has great­er generality than techniques whioh set up direct links bet­ween words and bundles of predictions, as appeared to be done in early versions of the SAH program (Sohank 75a)*</b></p><p><b>One final point. The technique for topic identification In this paper would not be practical either if it was very expensive to load stereotypes which turn out to be irrelevant, or if the cost of comparing the predictions of such stereo­types with the text representation was high. The Scrabble system, running under Cambridge LIS? on an IBM 370/165 took 8770 milliseconds to analyse the example text above Of whioh 756 milliseconds was used by loading and activating the two irrelevant stereotypes and 103 milliseconds was spent compar­ing their predictions with the CD-representation of the text* The system design is such that these figures would not in­crease dramatically if more stereotypes were considered whilst processing the example.</b></p></section><section number="6." title="References (Cater 80)"><p><b>Cater, A.W.S. Analysing English Text: A Non-determiniat-io Approach with Limited memory. AISB-80 Conference Proceedings. Society for the Study of Artificial Intell­igence and the Simulation of Behaviour. July 1980* (Chamiak 78)</b> <b>Charniak E.</b><b> With Spoon in Hand this must be the Eating Frame. TIHLAP-2. 1978. (DeJong 79)</b> <b>DeJong, O.</b><b>P. Skinning Stories In Real Time» an Experim­ent in Integrated Understanding. Research Report Bo* 158. Yale University Department of Computer Soienoe,</b><page local="6"/></p><doubt alpha="0.0" length="7" tooSmall="False" monospace="0.0">- 285 -</doubt><doubt alpha="63.0" length="46" tooSmall="False" monospace="0.0">Hew Haven, Connecticut. May 1979. (Schenk 75a)</doubt><p><b>Schenk, R.C. and the Yale A.I. Projeot. SAM — A Story Understander. Researoh Report B0. 43. Yale University Department of Computer Solence, New Haven, Connecticut.</b></p><doubt alpha="41.2" length="17" tooSmall="False" monospace="0.0">1975 (Sohank 75b)</doubt><p><b>Schank R.C. Conceptual Information Processing. North-Holland, Amsterdam. 1975. (Wilks 77)</b> <b>Wilks, Y.</b><b>A. Good and Bad Arguments about Semantic Primitives. Communication and Cognition, 10.1977.</b></p><doubt alpha="0.0" length="7" tooSmall="False" monospace="0.0">- 286 -</doubt></section></body></article>