American Journal of Computational Linguistics M i C ~ i che O 5 ~ 5

THE FINITE STRING NEWSLETTER OF THE ASSOCIATION FOR COMPUTATIONAL LIKGUI'STICS ACL New officers for 1977 . . . . . . . . . . . . . . 2 Call for papers . . . . . . . . . . . . . . 3 Minutes, 1976 bbsiness meeting . . . . . . 4

Secretary-Treasurer's report . . . . . . . .

Financial report . . . . Hurrfanities - 3rd International Conference . . . . Linguistic and Literary Analysis - 5th International Graphics and Interactive Techniques - 4th Annual Undergraduate Curricula and Computing Conference . Representation and Understanding, edited by Daniel G..

Bobrow and Allan Collins. Reviewed by John Mylopoulos The Role of Speech in Language, edited by James F

Kavanagh and James E. Cutting. Reviewed by Sieb

Nooteboom . . . . . . . . . . . . . . Algebraic Parsing of Context-Free Languages

Stephen F,. Weiss and Donald F. Stanat . . . . . . . A Comparison of Term Value M~asurements for Automatic

Ipdexing - Gerard Salton . . . . . . . . . . . SNOPAR A Grammar Testing System - T P. Kehlrr . . . . . . . . . .

AlERICAN JOURNAL OF COMPUTATIONAL L'INGUISTICS ' is published by the Association for Computational Linguistics. EDITORDavid G. Hays, 5848 Lake Shore Road; Hamburg, New York, 14075. EDITORIAL ASSISTANT: William Benson. SECRETARYTREASURER: Donald E. Walker, Stanford Research Institute, Menlc Park, California 94025. Assocla tlon ~lnguls tics 7 . . 10 . . . 11

13 . . 14 American Journal of Computational Linguistics Microfiche 55 2 NEW O F F IC ER S FOR 1 9 7 7 President PAUL CHAPIN National Science Foundation Vice President JONATHAN ALLEN

Massachusetts Institute of Technology Secretary-Treasurer DONALD E . WALKER

Stanford Research Institute JERRY HOBBS City University of New York Executive Council Continuing members of the Executive Committee are Bonnie Nash-Webber (through 197%) and Timothy C Diller (through 1978). Continuing members of the Nominating Committee are William A. Woods, Jr . (1977) and Aravind K Joshi (1978) The Editor is a member of the Executive Committee ex officio American Journal of Computational Linguistics Microfiche 55 : 3 CALL FOR PAPERS 1-page abstract, wi~h title but no name Letter with author's name and paper title DEADLINE January 1, 1977

Members of ACL should have received prior

notice of this deadline by letter. ADDRESS Jonathan Allen Room 36-575 Massachusetts Institute of Technology Cambridge 02139 The Georgetown Universitv Round Table (on ~ in~k s t ic and s ~n th ;o~o lo~~ will ) be held immediately Eollowing the ACL meeting. Microfiche 55 : 4 ASSOL 1 A l 'J k UH \ZoyPUTAl IONAL LliJcUf&I'lCS'

Pet r im announced tha t Haad Rober PO, Secretary'Eredscrrer fif the ACLd tor the past f ive Years has restsned frm that t~os t , coincident i t h is departure from the Center for hrplied Lin~uistlcs to establish his own company, non ~ 3 k e r has been appointed to tPe nosition on aP 4r.tQrim taslS for the reaalnder ot tne year, Petrick exflressed th.e clratitude and app~eciatian of tne Associatiion for the dwfllcdion an4 service Roberts has provided during h is tenure a sentiqent stronqly supported by the members Present.

Walker r ed the Secretary-Treasurer's Report and the Fi~~ancial He~wrtt 00th of Which hdd been prepared by Poterts. Copies dre attached to these Yi~i~tes, Membership renebals, killirQ ~racticest and the financial status of the Association uqre discugscd, Petrick announced that John MoY ne r as Chalrran of the Yernbershlp Committee, was preparfnq e carpalan to recruit neb members,

Petrick rev le~ed tne status of the AJCL in the absence of pave Hays, i ts W i tor , In discussing Aays* recent survey of the TernbershiD about the Jaqrnal, petric~ remarked On its quality and thbrouuhness, bath in PreDaration and in the analysis of the results, Over 200 members res~onded, an unusually high percentage; they stronqlr~ supported continulna publication in microfiche form. There was also consjderab1e interest expressed in having The Finite Stri'ng available in hara copy and in maklns it possible to acquire fu l l size copi,es ot certain articles. ~efr ick dnnounced an Lxecutjve committee decisionr continclent on adequate financial support, that at least part of tne contents of the Finite Strinq would be issued in hard COPY form, partjcularly those items of key ITtportance and timely interest to the mevhership. The cast of vakjnq hard copies of articles wallable dould he determinedr and members would be not i t ied accordJnqly.

Petrick annQunCed that the Executive v~t 'ed to increase h e kditoria]. Board o t the 15 qernbers and to establish three year terms f ive , D P ~ menlbets to tre dppointed @act? year on basis. Committee had AJCL from 14 to qt office With

a regular

Petr ick announced that the increases tn ex~enses associatecj w i th the AJCL and with tne preparation and distrSbUtjon harflcopb nc?#Slett&r required rd.lsjnq the dtres $5 to 3 new tatdl of S15 tar individual rnembcrships. A sw4qest10n #as wade from the floor that a class of family nleFbershlp be established that would allow reduced dues for one of two spouses So that both could be memaers but only one copy of ~ubl~catlons kould be received.

Pettick announced that h e Executive tommitt@@ had decided to ~)utd ish all the information that could be adthered about recent events assbciate? w i t h lgar Mel*cuk, Me1 CU )~ &as f ired from 6 long t'erm Position as Senior Research Fellow ot the Institute of L~nguistics in the Acarjemy of Sciences of the USSR, ostensibly on the basis of a letter he had submitted to the New York Times, The letter, wt~ich das published tn Januaryt expressed aisaareement biLh the criticisms of Andrei bakharov made by the soviet press. Ih blarchr Me1 cuk was fired; subsequently he prepared a letter describing the Circumstances and asked that ft be brought to the attention of American scientists, Questions had been raised about the appropriateness of ~ublisnlng such correspondence on the grounds that it might hurt ejthet Me1 cuk or discussion from the positions were ta~en on menlhers that the Soviet extent that information

The 15th Annual Meeting of tne ACL is being for hashinaton* LC . , conjunction with the Georqeto~n IJnfversity Rauna rable on Lanjuases and I~inquistfcs , Tentative dates are 15-16 March 1977,

the ACL or both. An extensive floor indicated that a variety of

the issue, Petr ick assured the

position would be represented to the

about it was available, planned PJ.k'tj1F~: 14tn bnnu-31 ACL Husiness 1Cleetinq Uar t lnhdyrepb t ted briefly on COLJING 7hr the bth Ir~ terna t ion~L ~oOference eon Computationa1 Linquistics, ~h ich bas held a t the bniverslty of Ottawa, Uttaha* Canada, from 2Y june to 3 Ju ly 1976, The next conference is scheduled for varna, t~ lqar ia , in 1978, Lker Conterpncr A \ I P " ~ ~ 1~ 77 Ca~bridqe, announced that the next Iqternat icnal Joint on Art i t $cia1 lntel lisence N i l 1 he held 22-2b at the ~~assachusetts 1nstftute of Techna1ogy in s4assachubetts,

Jane pobi~s~n reported on Local 4rransementsr with pdrt lculal' emahasis a, the banquet scheduled for the evenjnur ~130rt ly after the conclusion of the buslness Meetfna*

Paul cha~in ~e~ortcd for the Program Committee. Of the 21 abstracts submitted , 14 were accepted; he exaressed h is apmreciation to tbe Comaittee memners tor their assistance. i s experience h ith publicity about the Call for Papers sw~es ted that a Check list be establishej to provide more effectjve notitjcation,

HOD barnes reported for the fiaminatlon Committee that the follohinq slate of of'ficers had been prgposed: f'aesident :- F'aul Chapin, hSC Vice President: Jonathan Allenr MIT Secretary lreasurer: Don da lker , SHI Fxecutive Cowmlttee: Jerry Hoobs. CUky yomimating Cornnittee: Stan Yetrick, lt3Y that the slate be accepted unan i~1ous1~ bas carried,

bonnie bash-hehber expressed the appropriate sentiments In the forw ot a Resolutions Committee Report.

Tbe mirrotiche question *as raised again, and Petrick reviewed ttie results of the questionnaire, the decision to tov vide newsletter informatfon in hard copy form, and the provision of hard copies of selected articles a t cost , Ihe me-etinq adjourned. Dona13 t. balker Secretary-~r easurer , Pro-re~~ Microfiche 55 : 7 ' Secretary-Treasurer s ' Report

I dm aladys sorry on those rare occasions when 1 cannot attend the annudl meeting of ACL; my Anqst is even qteatcr P O a t h ~ s meet ina is my last one d s your secretary-t reasurer, hy annual report to you typically consists of statements about membership and finances, Tnis k i l l be a typical report, when the neN journal uas f irst issued in 1974 there a dra-at1.c inc?rease in the number of ACL members-from under 100 in 1973, to over 800 by early 1975, Since

these impressive gains have been so seriously eroded our Current mehbefship stands at 580 (445 individuals 135 institUtlons). A total of 212 individuals and 46 had paid for 1975 d id not rene# for 1976, activities a t the beginning o really herenet as interested in they nay have thougfit they microfiches, method Of billing members for was, just the,n, that and institutions who althouah each Mek severa l renewals continue to dribble in, Several reasons might be thought of for the deckine:

%tion he heavy ~mmotional

brouqht in some mewbers ~ h c0mputat:ional 1inguistiCs as were ,

It is hoped that the newly reactivated membership* CoKtmfttee, under the chairmanship of John liqoyne, will be able to devise creative ansNers to th is chronic problem,

In an organization such as Oars, where the association 1s almost entirely dependent On the payment of annual dues, even a slight drop in membership causes serious problems, This years financial situation was further exacerbated bv three additional things: ACL secretary-Treasurer s Report, h,Uctober 197b Page 8 I . he coptinued, unreal st ically low dues rare of s ~ O , for whipp members. are receivins nearly '2 ,OW microfiche paqes YekrlY. [This proolen has not unexpected and #as dfscussed at the last Arlnual Yeet the in9 in Hoston, and there were qaob reasons for ltbavin3 dues at $10 Uhtil Such t ime as the ACL dr rided *P la t t o 00 j~bod~t: he journal. 1 3. [Inexpected char I ~ S -- primarily refreshvents (cotfee and pastries) provide? k\y the sheratan Hotel at the I1 now believe that the Sheratan chain, is owned tne Slri30,OO for which were generously last annual meeting. lndeed, by The customary cateL~orfzea financial statement is qiven below. Althouah the state'llent reflects ACL 'S income and expenses, sowe ndlustwents lat-er r penlqins a detailed in and Incove derlved from ~ithin these figures w i l l be made allooC3tion of the costs incurred the TtNLkY volumes, fiespeetfully submitted, A , Mood Roberts ACLJ Secretary-lreasurer's RePoxt r (c, actabcr 3P76 Page 9 ASSCIC 3 &1 ION FOP CCMPUTATlO7JAL LINGUISTICS kinancia1 Heport far 197b Receipts:

Pembership dues- 1975-1 q76

A ~ l r 76 meeting receipts to date 14,353.37 I l l a -wVw I I I I Slhr818m61 L~xshursement s: Adnlin5StratlVe c6Stsr office supplies,

md1 Linqr and AJCL coats not

covered by CAL account 317 Vembership ACAL AFIPF dues 497b 4pnual meetfnq costs 1975-1976 Pa16 out o t ACL membership receipts into CAIJ Account 317 for AJCL , as required by NSF Balance as of October 10 1976 S -----IaoII $15,497.82 American Journal of Computational Linguisdcs Microfiche 35 : THIRD 1NTERNATIONA.L CONFERENCE ON C O H P U T I N G I N THE H U M A N I T I E S SPONSORED BY THE UNIVERSITIES OF MONTREAL AND WATERLOO THEMES Frontiers between language and literature, Fine arts; Graphics, Historical studies, Information retrieval;

Input techniques, Lexicography, Literary stylistics;

Medieval studies; Music; Photocomposition, Public

sexvice systems, Sernaptics INTERNATIONpL COMMITTEE F. V. Spechtler, Austria; J. R. Allen,

Canada, A. Jones, England, I T. Piirainen, Finland,

L. Fossier, France; W. Lenders, Germany; M. L. Alinei,

Holland, S. C. Loh, Hong Kong, F. Papp, Hungary,

B. J6nsson, Iceland; S. K. 'Havanur, India; U. Oman,

Brael, L. F. Lara, Mexico, K. Hyldgaard-Jensen, Sweden,

J. Joyce, USA, J. Raben, USA. REGISTRATION Professor J. S. North

Chairman, ICCH3

Department of English

University of Waterloo

Waterloo, Ontario, Canada

N2L 3G1 American Journal of Computational Linguistics Microfiche 55 : l1 FIFTH INTERNATIONAL SYMPOSIUM ON THE USE OF COMPUTERS IN L I N G U I S T I C AND . L I T ER AR Y ANALYSIS UNIVERSITY OF ASTON IN BIRMINGHAM Authorship studies Syntactic analysis Concordances Text editing Classical studies Input-output Oriental studles Software Stylistic analysis ADDRESS FOR CORRESPONDENCE Professor D. E. Ager CLLR Department of Modern University of Aston in Birmingham Gosta Green Birmingham England B4 7ET Language-oriented groups Education Lexicography Literary statistics Languages American Journal* of Computational Linguistics Microfiche 55 : 12 FOURTH ANNUAL CONFERENCE C OM P U T C R G R A P H I C S A i4D IN T ER AC - r I V E . T ECHN IQUm E S CALL FOR PAPERS TOP I CS DEADLINE PROGRAM CHAIRMAN Graphical theory and techniques such as languages, hardware, software, tools, portability, standards, device independence, line graphics, raster graphics, data structures, satellite systems, human factors, applications in the area of environmental, urban, transportatioh, cartography, biomedicine, ahimation, computer aided design, art, music, business, statistics, recreational graphics, decision making, and computer graphics education. Papers may report original work, unusual or unique applications or techniques of computer graphics, or they may evaluate graphical specifics A short abstract is requested by December 1, 1976, and the final paper must be submitted by May 2, 1977 James E George 415-447-1100 EXIZ 3360 Los Alamos Scientific Laboratory P. 0. Box 1663, MS 272 Los Alamos, New Mexico 8x545 American Journal of Computational Linguistics Microfiche 55 : 13 CALL FOR P4PERS : 1977 CONFERENCE ON COMPUTERS IN THE UND ERGR ADU A T E C U R tR I C U L A MICHIGAN STATE UNIVERSITY, EAST LANSING SUBSTANCE Reports of actual experience with computer use in a specific course or sequence of courses, in any field except computer science. No-proposals; no repetition of previous reports without substantial new results. Survey papers only with synthesis or thorough evaluation. FORMAT Original manuscript suitable for reproduction in the proceedings. Typed, double spaced, up to 15 pages. 8"xlO" pictorial matter, glossy B&W photographs or photographable drawings. Title page. Authors' names, complete mailing add~ess, telephone numbers, if multiple, indicate which handles correspondence and wLll deliver the talk. Each page should have the principal author's name .on it. DEADL I NE ADDRESS Gerald L. Engel, Virginia Instituee of Marine Science, Gloucester Point, Virginia 23062. TRAVEL GRANTS .4 limited number of partial travel and subsistence grants may be available to speakers and others from minority institutions and small golleges. Information and app=cations a, from CCUC/8 Travel Grant Committee, Eppley Center, East Lansing 48824 American Journal of Computational Linguistics Microfiche 55 : 14 REPRESENTATION AND UNDERSTANDING STUDIES IN COGNITIVE SCIENCE EDITED BY DANIEL G. BOBROW AND ALLAN CC~LINS Xerox Palo Alto Research Center and Bolt Bcranck and Newman Academic Press, Inc. New York LC 75-21630 $15.00 ISBN 0-12-108550-3 Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 1A7

A n~ajor goal of A)-tificlal 'Tntcll igcncc rcsral-ch today is to dcsign systcms that "undcrstnnd" p hocly of hnovlcdgc, i .c. usc j t whc%~tevrr ;ippropri:lte. Thc rcprcsm~tat ion oi the Anowl cJyc \available to such an '!rlndcrst nndcr" systg~n is an impel-t an t i ssuc for the systcmls dcsign ,~nd is intimately related to ihc pl-ogosctl uses of- that knowlcclge. 'I'his book includcs n co l lcc t lo~~ of

thirteen p;lpcrs. wri tten by some of the bcst known rcscnrchcrs 1~110 are currcntly working on ~~nderstandcr systrms. The pnpcrs wcrc selcrtcd nnorlg those presented a t a conference held in mcinory of Ja ilne Carl~oncll. Rt-prcs~nt a tion and 1lndc.r~ tanding l'hc c-ontct~ts of tllc Imok arc iis fol lnws 1 . * 1 l p of Hcp~ cs~nt i~t i~n 1. llimcns ions of Rcprcscnt i ~ ion t !kinicl G. 13obrow 2. I\')~:lt's ill a, Link: Poundniiol~s for Scmnntic Mctworks I 11 1 1 A 'lVlmds 3. RcTlc3ct ions on thc 1:ormnl JIcscription of Rchovior JOSCF~ n. UCC~CT 4. Systcmn~ic ilnc1erst:rildin~: Synthcsi s , Analysis, and Cont ingcnt Kno\~rmlcclge in Spcci:11 i zccl ilndcrst il11tl i ng Systcms

Rohcrt J. Robrow 6 John Sccly Brown Ncru Elcmory !lode 1 s

5. Solnc Principles of Xcmory Schemata

Daniel G. R01)row C;. Donald A. Yorman

A Fran~c Tor Fratilcs : Representing Knowlcdgc for Rccogni tion

Benjamin J. Kuipcrs Frame Roprescntations and Thc ~cclarativc-Procedural Controversy Terry Ninograd 6. 111. Iiigller Lcvel Structures

8. Notes on a Scllc~na for Stories David E. Rumelhart ta ~eprcscn t iorr and Understanding 9. Thc St ructurc of Epi sodcs in Pfcmory Rogcr C. Schnnk Rohcrt P. Ahclson

Some members ao not lfke The recently established their annudl dues (Including the dues notice on one of the opaque cards in the journal) which was conceived as an eCQnomy Peasure clearly failed to FrodUce rekults, and I would urqe--prasmatically--that th is method never be 9ried again. .1 .64 50 00 500.00 r lr130k58 ,60 - 7 APRIL 1978 Conccpts for Rcprcscn~ing Yundnnc Rcoliiy in Plans Srm:~ni c Know1 cdgc in i~nJcrsi:~~rilc.r Syst enis . Flu1 t ipl c Rcprcscnt at i ons of Know1 cdgc

for Tutorial Reasoning John Sccly Rrown 6 ~ichnrd R. Burton

Thc Rolc of Scmantiss

in Automatic Spcech lfndcrstanding

Bonrric Nush-Kcbbcr 13. Reasoning From Incompl cte X~lorrlcdgc

Allun Collins, Elcanor 11. ICarnock,

Ncllckc Aicllo, I . Miller

As stated in the book's introduction, the section of Rcprescntation" deals with gcncral issues regarding rcprcscntation of knorvlcrlge, while that on lfScw afclnory discusses the implications of the assu-npti on illat input tion is always interpreted in terms of large structur:ll clcrivcd from expcricnce. The scction titled "Ilighcr Lcvel Structures" focuses on the rcprcscntation of plans, cpisodcs and stories within memory. Finally, the scction on "Semantic knowlcdgc in llndcrstandcr Systcms" dcscr~ibcs on- going work af the SO1311TE, SPI~I~CIILIS and SCIIOI.AR pro j cct s n t BRN.

In attempting to rcvicw thc pilpcrs that nppcnr in this book collcctivcly rather tllan individually, wc i~rrivcd at a slightly on "Thcory the ?-lodclsH informaunits .& L J w P +-r 'La L c G k C s .r w , L. G f m ... I - - 5 L- '5 rc. . f *f .- C t. C , E: C* r, C, c L 0 C C F. 5 F L - - c. C C; d Representation and Understanding 20 for dcclnrat ivc, VS. Ilcxiblc interaction ilnlong cli ffrrrnt Facts, for procedural . Schank ' s paper r 1 O 1 i ncl udcs a cl i s'cussi 011 on whct hcr thc o~~gnni zat ion of human mcmory is cpisodic or scmant ic. An cpi sodic mcmory orgn~~i ;at ion i111pl ics that h~io~\rl ccljic is storcdl ss tcapor;~l y dated cpiso~lcs and cwcnts, with 1 rmpornl spat in1 rclat ions

G s yr *$a inking thcsc cvcnts. A S~milnI ic nlcinory organ i z i l t i on, on the

other ha11c1, involvcs t imc- invar innt know1 cdgc n pcrson posscsscs, c.g., "all elephants arc animals1'. A cor.ollary 01 these dcf init ions j s that an cpisodic mcmory orgi~ni zat ion f',~vo~rrs tclllporal and causal coi~nect ivos (e .g. , TllliN, Rl:ASON, I:NABIJE ctc.) , whcrcas a sc~nantjc mcmory org:lni znt ion uscs rxtclls ivel y the "I SA hicrnrchy!' (e.g. , "an cl epllant i s-a animalH). The d i scussion prescntcd in thc papcr on this issue is somcwhnt confusing since at one point (pp. 255-256) the tvo typcs I of or.gnni zation arc contrasted as i f t h y -wcrc mutually cxclusi~rc, whilc later on (p. 263) thc paper argues Tor a combination of the notions of scmantic and episodic mcmory. in cithcr casc, Schnnkls work certainly makes a convincing argumcnt in rnvor of an cpi sodic monory organization by sllowing how jt can hc usccl to rcyrrscnt thc mcnning of a paragraph. I1 . Crl -- ,prc and ---- Extc~lsions or Rcj~rescntntion ,,--,----,,, of li~~o\~lcclgc ------ Par:~Jig~ns -- -

Scvcral papcrs, includjng some that were mcntioncd in thc prcvious scction, criticize, rcfinc, or cxtcncl one of the csisting paradigms Cor the rcprcscnt:it i on of knorul ctlgc. l'hc most not:ll~le csa~spl c alaong tllosc in ill i s calcl:ory i s I\'oodst paprr r23 i criticizes nlany (mis)uscs oi scmi~ntic nctworks by pointing out situations whcrc thcir sc'mantics arc poorly dcf incdaor inconsistent. Particular attcntjon is paid to tl~c rcprcscntation of qunntlficntion and that of rcla1 ive cln~iscs. As many of tI1c rcndcrs iindouhtcdly kllow, )Iinsky1s jnflucntinl paper intrnduc Sng I'framcs" 11 5'1 providcs more of an idcolog)* than a theory Tor rcprcscntjng knoxlcclge. Kuigcrs in [GI :irgucs in iilvor of a n~~nbcr or propcrtics Eramcs should h;luc, such ;IS the Iabj1.it.y to Jrscribc all object or situation to varying drgrces of detail, the ability to hc instanti;ltcd and the :~I~ j l i ty to 11antlIc sa~n l l pcrlurbnt ions of cxpcrted input data ui thout major rai lures. lie illustrotcs thq dcsirabjlity of tllcse icatures with a simple example of obj cct rccogni t j on.

The second half of iVinogrndls paper m:~kcs an attcmpt to synthesize dcclarativc and proccdurnl aspects of a rcprcsentntion. Ilis proposal is hnscd on fr:uacs :~nd uscs a gcncra1jz;ltion (1%) hi crarchy 1lalring a number of icatur-cs , including thc abi 1 i ty to associ ntc procedurrs to obj ccts on the hi crnrclly l0li cll sl~cci fy how lo perform di ifcrept oycr:itions on lhosc ol~jc~cts. U:my of thc iclcns in TSJ and 171 hnvc bccn incorporated in KRL r161, as developed by D. Bobrow and IVinogrnd. IT1 . , Rcprrscnt --------- inp Dificrcnt -----,, Kinds of ,-,,, Know1 ,-- cdgc -----

Tn Format ion cnt cri ng an unclcrr.~ nndcr sys tcn~ may havc many di ffcrrnt VormsV, i. o. it may be codcd as photographs or 1 inc Rcprcsenta tj on and Undcrstdndjng 22 clrnwi ngs, sinplc sent rnces or paragraphs or cvcn coirpl ct-c st ori cs . Florcovcr , i t may hn~~c di f'fcrcnt "corlt cnt'! i . c. i nvolvc a fairy inlc world of 1;inl;s and tlrngons, a- blocks v:orId of' c~ibc's and pyl-amids, a social, mcntal or pllysical worltl. Onc j~rportnnt aspect o i the rcprcscntntjon problc~n is thc dcrinition of a collection o i kna~clcdgc, dcfincd by n restriction on i ts rorm and/or content, and the invcst igation of thc nclcquncy of a part icular rcprcsc~~totion . As ~ncnt i onbd c r l ic r , lroods ' pilpcr docs (1 i scuss thc reprcscntation or quantification in tcrms or scm;intic networks, where thc form of tllc l\novlcdgc involved is prcsumnl?ly (first order) Prcdi catc Ca7 cul us and the contc~lt is ~i~lconsi-r;ii ncd . It a1 so discusses thc rcprcscntat ion of' rclativc clauses and conlplcx sdntcnccs 1111cre the form is flnturnl language and the content is, again, unconstrajncd.

Rumelhart I s paper 83 j s prilnari ly conccrncd with tllc discovcry of structurc underlying simple storjcs. The structure js dcrincd in tcrms of n p1lr;lsc st,-ucturc gr;lalm;lr w i~h scl~n;ullic ~ u cs l associated to each product. ion. Thc pnpcr rcr t ainly follows thc gc~lcrsl trcncl to\~;iscls stlidying 1 ingui st ic u11 its 1:lrgcr t11n11 sent cnccs, such as pnrngraphs, dinlogucs or st orics. Whcthcr thc methodology uscd (in part icul ;lr, pllrasc structure gr:ln1111:1rs) w i l l bc found ndequatc for thc dcscrjl tion of sti-~ictu~c in stories rcmnins to be sccn.

Sch:lnk 191 dcals mainly with the pr~blcm of constructing a structure of causal ly-1 i~ikcd actiorls nncl (-l~:~nj;cs or st n l cs ~cprcsentation and Understanding 23 (cpisodrs) from n pn~*agriiph. lShen cpisodcs arc used to make scnsc of ncw inputs in often-cxpcricnced situations, thcy arc callctl "scripts". The pnpcr cnds with a brief inti-oduction of scripts. Florc dct:lils :{bout them can be found in more rcccnt pub1 i cations by Sch:jnk and hi,s students, c. g. r l7,183. R~~~ncI l~nr and t~s Schm1k1s work arc rclntcd in that they both a i tc~~p to t dcl inc tho structure of a collection of knowledge

im j t cd with rcspcct to form (stories Tor Rulnclhsrt ,- paragraphs

Tor Schank) and u~~constraincd with rcspcct to co~ltolt. \forcover, both papcrs agree that the dndcrlying rcprcscntntion used must involve causally-linked cvcnts, and thc causal coi~ncctivcs thcy employ arc similar.

Abelsonts papcr is conccrncil with the rcprcscntation of

"iiiu~~~lnnc r nlityfl involving social act ions. l'hc approach he

fo l lo~~s i to postulate a number of primitive states and actions

for achieving these states, jn terms of which hopefully a l l simple

social bchi~viow can be dcscribed, The discussion of .the ,primitjvcs is quitc thorough, but thc cxainpIcs givcn do not

provjde.sufficient cvidcncc that thc primjtivcs proposcd arc in

fact clcscript ivcly adcquate. ~lbC?l son's work is complcmcntnry to

Schankls in scvcrel rcspccts and there is more rcccnt joint work

on thc subject C193.

IV1 . - Onso - j ng Pro -- a-w-- j-ccts invo - .lvi~ IJndcrstnncler -- asterns ,

Thc last thrcc papcrs of thc book discuss partjcular projccrs involvi~~g the desjgn and jrnplcnicnt*atjon of undcrstand~~ systems. ~cprovcntation and Understanding 24 I11 1 dcscribcs thc scopc, has ic ~llc~tl~odology, :ind :ic]~jcvcmcnt s of S I I , n kno~~rlrdgc-l~:~sc~cl omputr~- ;~idcd instrustion (CAI) by asking qucstions, :~nswcr.ing qc~cst ions and lctting 11im try out Ilis iclcns. Of particular intcrcst to coaputntionnl 1 inguists sLo111d bc t 11c section dcscr i bing t l~c !I sc~ninntic grnnil~~ar" dcvc1opcd by nurton to 11:lndIc thc typcs of scn~cnccs c?xpcctcd during a dialog~ic on clcctronic circr~its. Nash-Kcl)l)cr I121 providcs an o~~crvicw of illc HBN SI)~~l~ClIJ,IS projcct in thc contrxt of a disc~ission on the usc oC scmn~~tic knowlcdgc for spccch umclcrst anding. F I 1 , [ 137 discusses some of thc inrcrcncc rulcs jlnpl cacntocl or bpi ng eonsidcrcd !'or j~~~plclncntntion by the SCIIOJ,AR projcct whose aim is to dcvclop a knowJ cdgc-based CAI systcln 1 hat tc;lchrs geography . Tllc rcadcr. may filid many of tlrc rules stated in the pnpcr coml~lctely rcasonable and yet quite shaky from a logical point of view. For example, one rulc (the uniqncncss asyulipt ion) st atcs that if only one thing is fo~ind, it can bc nsslimcd that it constitutes a complcte set. Thus if sonlco~lc k~~ows of only onc city cnllcd flS1~ringricld" and locotcd jn ?l:~ssacl~~~:;sctts, hc can usr I he uniqueness nssumpt i on to rcply "no" to "1s Springfj cld in Kc~ltucky?" even though thcrc 11iny well be such a city.

'l'he papers in this scctjon constitntc an important complcmcnt to the rcst of tllc book wllicl~ often irlvolvcs djscussio~is tllat arc too far rcmoVcd from thc rcnl i ty of I illlplr~n~~ltcd (or imp1cmcnt;iBJ c: Rcprcscnta tion and Understanding 25 Ovcrall, this book provides an cxccllcnt rcvicw of thc state or tllc art , circa 1975, on thc problem of rcprcscnting knowlcdgc. Jt should be cq~pn~cnt rro~n the prc~ious discussion that the book assumcs a familiarity with basic issucs of rc~rcsc~~ ta t ion and unJcrstandcr systcm design. For more introductory discussions, the rcBdcr is rcfcrrcd to /I41 or Schank and Colby L203. -__I- Rcfcrcnccs 14. ICinograd, T. "Five Lectures on Artificial Tnt eJ ligcncc" Stanford AI-h4cmo 246, Scptcmbcr 1974. 15. 18. 19. 20. Minslcy, ?I . "A Framework for Rcprcscnt ing Know1 edge" in

Winston P. (Ed.) The ---pa- - Psycholog) - -..--- of Computcr - .----- Vjsion, McGrau Hill, 1975, Robraw, TI. and Winograd, T. "A KRL Uscrl s Manual" (uilpubi j sl~cd) . Schank, R. ''Ilsjng Knowlcdgc to IJnder~tand~~ TINLAP rrocccd ings

pp. 117-121, June 3975. Schank, R. and the Yale A1 i;youp "SAM ---a Story Unclerstnndcrff

Yalc University, Dcpt. of Computer Science, August 1975. Schank, R. and Abelson, R. "Scripts, PI ans and Know1 cdge"

P.roccodings IJCAI, pp. 151-157, Scptc~nbcr 1975. Schank, R. and Colby, K. (Eds.) Computer Vodels .- of Thougllt --

and Language, Frccman, 1973. American Journal of Computational Linguistics Microfiche 55 : 26 EDITED BY INSTITUTE JAMES F. KAVANAGH (GROWTH AND DEVELOPMENT BRANCH, NATIONAL OF CHILD HEALTH AND HUMAN DEVELOPMENT) AND JAMES E. CUTTING (DEPARTMENT OF PSYCHOLOGY, WESLEYAN UNIVERSITY) The MIT Press Cambridge, Massachusetts 02139 1975 xiv + 335 pages $15.00 ISBN 0-262-11059-8 REVIEWED BY SIEB NOOTEBOOM

Instituut voor Perceptie Onderzoek Postbus 513 den Dolech 2 Eindhoven 4502 The book under review contains the proceedings of a small conference (22 participants) w i th the same title, held in October 1973 at the Urban Life Center, Columbia, Maryland. The conference was one in a series called "Communicating by Language", sponsored by the National. ~nstiiute of Child Health and Human Development (NICHD). These are 19 papers, divided into 3 major sections, viz.

I The development of speech in man and child

I1 Language without speech (dealing with sign language)

I11 Phonology and language Some papers are followed by comments of one of the participants each paper or coherent group of papers is followed by a summary of fhe open discussion. A separate IVth section of the book contains reflections on the conference by Ira J. Hirsh. RefeThe Role of Speech in Language rences are presented at the end of each paper. The editors have provided a name index and a subject index at the end of the book. Many linguists and psycholinguists take it for granted that language can be studied without studying speech. Likewise many speech researchers seem to work from rhe view that the p~oduction and perception of speech can be studied without s~udying language. This situation leads Alvin Liberman to state in his "Introduction to the conference" that I I our topfc --the role of speech in language--is not an established one; no one has made it the direct and primary object of his research. 11 Although this statement is perhaps too categorical, it certainly is valid for most of the field. (An obvious exception, to my mind, is among others Professor Lindblom of the University of Stockholm, who systematically explores the explanatory value of quantitative models of speech production and perception in phonology, e.g. Lindblom 1972, 1975). The organizers of the conference, Kavanagh and Liberman, have taken care to select well-known researchers with different backgrounds and different interests to discuss the various problems which may be derived from the central question: "do we increase our understanding of language when we take into account that it is spoken?"

The resulting texts make interesting reading, although one w i l l look in vain for a convincing answer to the initial question. Different investigators have different opinions and

the present state of knowledge does not seem to make it The RoJ~I of Speech in Language possible to settle the matter. In most papers specialist knowledge is freely intermixed w i th speculation, and it is not always easy to tell the one from the other. The discussions generally serve more to con-tinrle speculation than to criticize in detail each other's thinking. These remarks are not meant as a criticism of the conference and its proceedings. They intend to give an indication, however, of the style of this book, and a warhing that one w i l l not find here a thorough discussion of empirical data or explicit, testable theories, that could be of use in more practically oriented work. Instead one finds a number of inspiring expositions of such diverse topics as similarities and dissimilarities between human and an ima l communication systems, the evolutionary connections between language, speech, and tool-making, the primacy of production or perception in the phylogenesis and the ontogenesis of speech, the primacy of signs or speech in the evolution of language, the articulate structure of signs in those who have sign language as their first language, the origins of phonological change, and the parallels in phonological and other linguistic organization of language.

Below I w i l l make a few remarks on a few selected topics:

a) The evolutiorl of speech and language

b) Spoken language and sign language

c) Innate feature detectors

d) The absence of prosody I w i l l not attempt to cover in this review all papers in the book. A* THE EVOLUTION OF SPEECH AND LANGUAGE In a number of places in this volume attempts are made to relate results of recent empirical studies of several kinds to theoretical ideas on the evolution of speech and language in early man. So Peter Pfarler gives an interesting description of communication systems in nonhuman primates and birds. His data on monkeys show a difference between discrete signal systems, consisting of a limited number of acoustically welldistinguished sound signals, used by monkeys living in dense forests and having little visual contact, and graded signal systems displaying continuous variation of sound signals, used by terrestrial monkeys. The bird data on the white-crowned sparrow lead him to the concept of an innate auditory template for bird song, modifiable by a suitable external model and serving for the developmefit of vocal behavior. In his speculations on the origin of speech Marler emphasizes the importnace of the evolution of innate but modifiable auditory templates for speech sounds, serving to distinguish between acceptable and nonacceptable models for vocal development, for classifying acceptable sounds into.subcategories and for developing speech. He also assumes that, while categorical processing was developed as an aid in identifying sounds from memory, continuous sensory processing of sounds was retained, thus leading to an intermingling of categorical and noncategorical (discrete and graded) processing. He finally suggests

that "The substitution of categorical for continuous processing ~ht. Role of Speech in Lanquaqe 30 of speech sounds may have directly facilitated the introduction of syntax as a radical innovation in primate communication". There appear to be two basic assumptions underlying Marler's reasoning. One is that comparative studies of sensory and vocal behavi~r in animals and man maxr lead to interesting theories about specific properties of the human brain underlying man's capacity for speech and language. The other is that such studies may clarify the order in which postulated changes in vocal perception and development might have occurred in the evolution of early man. There is an important difference between these two assumptions. Whereas the former may lead to theories or hypotheses which in ~rinciple might become testable, the latter does not, at least not within the limits of this reviewer's imagination. Obviously this lack of testability is common to many speculations about the evolution of humari behavior. This has in the past not kept scientists from making reasonable guesses particularly about the evolution of language and speech, and probably will not do so in the fature In this volume both Hewes in his comments on Mattingly's paper and Liberman in his own contribution relate the genesis of language to toolmaking. Hewes observes similarities between syntactic structures and the prescribed order of the various

steps necessary for the manufacture of flakes from a prepared Levallois core. Liberman, taking the same Line of thought,

states that the Levallois toolmaking technique cannot reason-

ably be described by means of a phrase-structure grammar. A The Ro-E-' of Speech in Lanyuayc transformational grammar which formally incorporates a memory is necessary. As far as I understand his reasoning this is so because in making a particular chip one has to keep two things in mind, both the last chip that has been made and the final form of the tool. It seems to me, bwever, that in order to 3 ~ive his argument its force it still has to be shown that there is a fundamental difference in the necessary complexity of underlying mental structures between Levallois toolmaking and many forms of goal-oriented behavior we find in higher animals.

Liberman also suggests that the final crucial stage in the evolution of human language would appear to be the development of the bent two-tube supralarynge~l vocal tract of modern man, which allows its possessors to generate acoustic signals ehat (1) have very distinct acoustic properties and (2) are easy to produce, being acoustically stable. Reconstructions from fossils tell him that the Neanderthal hominids had to do without this asset, and therefore probably retained a cormunication system w i th a mixed phonetic level that relied on both gestural and kocal components. A t this point the reader particularly feels the need for an expert criticism of the validity of %uch reconstructions. Bn SPOKEN LANGUAGE AND SIGN LANGUAGE The question whether speech or gestural comunication has been more important in the evolution of human language came up several times during the conference. In reaction to Mattingly 's The ÿ ole of Speech i n Lanryudye idea that "speech exemplifies a thoroughly and peculiarly human kind of knowing" Hewes commented that the depigmentation of the volar skin would indicate the antiquity of nonvocal cormn~nication. Indirect support for this supposed antiquity of gestural communication comes from some fascinating studies of American Sign Lansuage (ASL), according to Bellugi and Klima a full-fledged language of its own, and not a derivative or degenerate form of written or spoken English. Stokoe argues for the antiquity of sign language from a possible parallel between ontogeny and phylogeny. It appeatrs to be the. case that the infant w i th deaf parents, learning ASL as its first language, begins putting wordlike signs into sentencelike struktures at an earlier age than the child making two-word or three-word sentences in speech.

Bellugi and Klima have studied sign language from historical changes in the form of signs, in short term memory experiments, by analyzing a collection of "slips of the hand1', sad by comparing American Sign Language w i th Chinese Signs,

in all cases with profoundly deaf peaple who use sign language as their primary form of communication. They show that signs

in ASL are not simply signals which differ uniquely and hlis-

tically from one another but are, rather, highly coded units.

They also provide evidence that grammatical processes bear the marks of the particular transmission system in which the lan-

guage developed. This seems to be donfirmed in ~uttenlocher's Thc Rol c of Speech in Lany uaqe contribution, comparing the encoding of spatial relations in ASL and natural language (= spoken American English) It is too early to draw any definite conclusions from these studies of sign language on the interdependence of natural language and speech, as the structure of sign language is only beginning to be understood. But it is certainly of much interest to students of language behavior that the human perceptual and cognitive systems appear to be so flexible that profoundly deaf people may develop visual communication systems among themselves which, if not equal in expressive power and speed of communication to natural spoken languages, at least come close to them. Further comparisons between the syntax of natural spoken languages and sign languages may lead to more caution in interprethg current ideas about what is and what is not innate in our linguistic abilities. Similarly comparisons between the efficiency of speech perception and the efficiency of visual sign perception might well make us wonder whether speech perception is as special as some theorists like to make us believe. C I INNATE FEATURE DETECTORS The idea that speech perception is mediated by, possibly innate, speech specific feature detectors was given considerable atreption in the conference. This idea supported Marler's extrapolation from innate auditory templates in birds to. innate auditory templates in humans. Studdert-Kennedy provides a The Role of Speech in Lilnyuayc careful survey of the current empirical evidence concerning the perceptual processing of consonants and vowels, from which he concludes that the "human cortex is supplied with sets of acoustic detectors tuned to speech, each inhibited from output to the phonetic system in the absence of collateral response in other detectors". Cutting and Eimas present evidence that such feature detectors are innate. Eimas has shown that very young infants, one month bnd four months of age, can discriminate much better between different speech sounds that belong to different phonemic categories than between different speech sounds belonging to the same phonemic category in adult speech. One ma7 concur, however, w i th the doubt expressed by Hirsh in his reflections on the conference whether Eimas 's data are about speech or about general auditory perception. One may feel similar doubts about the interpretation Eimas and Cutting give to the data stemming from the selective adaptation paradigm, introduced in speech perception studies by Eimas and Corbit in 1973 and since then used by an increasing number of investigators. In selective adaptation studies it is shown that repeated stimulation with a particular acoustic configuration, for instance a syl-

lable - ba, may change the response distribution in a phoneme

identification task, for instance the binary forced choice between - ba and measured w i th stimuli . taken from the acoustic

continuum between - ba and . In this case the number of Eresponses would increase at the cost of the - ba-responses. The Tho Role of Spcoch in Ldnguaye interpretation is that there are feature detectors which can be fatigued by repeated stimulation. By carefully studying which acoustic configurations lead to shifts in particular response distributions, it would be possible to find out what information is extracted by particular feature detectors. Cutting and Eimas argue for the existence of phonetic, speech speciiic, feature detectors. More recent studies show that categorical perception and selective adaptation are not unique to speech perception (Cutting, Rosner and Foard 1976) . Furthermore, to my knowledge, nobody has yet seriously discussed the. difficulties for a theory of "wired-in" feature detectors stemming from perceptual normalization experiments in which it is shown that response distributions in phoneme identification tasks may shift systematically due to the immediate environment of the test segment (e .g . Fourcin 1972) . Dm THE ABSENCE OF PROSODY The volume under review is not only remarkable for the many interesting and stimulating papers it contains but also for -what it does not con&ain. In a collection of papers with the

title "The role of speech in language" one wo~ld have expected

to find at least one contribution seriously discussing the relation between speech prosody and linguistic structure. It

is ironical that the only paper in which intonational contrast

is given more ateention than obligatory lip service is Stokoe's

contribution "The shape of soundles~ language", dealing with The Role of Speech in Language 3G sign language Stokoe's treatment of intonation and its kinesic correlate in sign language seems to make explicit why so many speech researchers do not pay attention to speech prosody. He suggests that intonational contrasts "are not necessarily linguistic and have more affinity with other systems that signal affect than with phonemic contrasts. There remain then only phonemic contrasts between consonant and consonant, vowel and vowel, and tone and tone (when so used) as the ihdisputably linguistic, basic features of language". One may fear that this undue overemphasis on phonemic contrast in speech perception research will persist until speech scientists turn away from the study of isolated CV-syllables and start wondering about the perception of normal spontaneous connected speech. REFERENCES Cutting, J. E., Rosner, B. S., Foard, C. F. (1976) Perceptual

categories for musiclike sounds: implications for theories

of speech perception. Quarterly Journal of Experimental Psycholoqy,

28 : 361-378. Fourcin; A. J. (1972) Perceptual mechanisms at the first

level.of speech processing. In: A. Rigault and R. Charbon-

nea~ , eds . Proceedings of the VII th International Congress of ~honet-ic

Sciences, Montreal 1971.. Mouton, The Hague. Lindblom, B. E. F. (1972) Phonetics and the description of

language. In: A. Rigault and R. Charbonneau, eds . Proceedings of the VVTIth Intcrnat ional Congress of Phonetic Sciences, Montreal, 1971. The Rol c> of Speech- in Lancjua yc Mouton, The Hague. Lindblom, £3. E. F. (1975) Experiments in sound structure. Plenary paper, presented at the V I I I th International Congress of Phonetic Sciences, Leeds 1975. American Journal sf Computational Linguistics STEPHEN F. WEISS AND DONALD F. STANAT Department of University of New West Hall Chapel Hi 11 27514 Computer Science North Carolina 035A A class of algebraic parsing techniques for context-free languages is presented. A grammar is used to characterize a parsing homomorphism which maps terminal strings to a polynomial semiring. The image of a string under an appropriate homomorphism contains terms which specify all derivations of the string. The work describes a spectzum of parsing techniques for each context-free grammar, ranging from a form of bottom-up to top-down procedures. ALGEBMIC PARSING OF CONTEXT-FREE LANGUAGES I. Introduction For many years syntactic analysis and the theor;- of formal languages have developed in a parallel, but not closely rel-ted, fashion. The work described here is an effort t.0 relate these areas by applying the tools of formal power series to the p-iroblem OF parsing.

This paper presents an algebraic technique for parsing a broad class of context-free grammars. By parsing we mean the process of determining whether a string of terminal symbols, 1, is a member of the language generated by grarnmar G i.., is x e L(G)?) and, if it is, finding all derivations of x from the starting symbol of G. We hope that posing the parsing problem in purely algebraic terms will provide a basis for examination and comparison of parsfng algorithms and grammar classes.

Section 11 presents an overview of the algebraic parsing process. It provides a general notion of how the method works w i thou t going into detail. Section 111 contains the algebraic preliminaries and notational eonventions needed in order to describe the parsing method precisely. The formal presentation of the parsing method and the proof of correctness form Section IVI Section V contains some

interesting special cases of the theorem and presents some examples 0-f parses. 11. - Overview of the algebraic parsing, p recess The algebraic parsing formalism described here is applicable to all context-free grammars G = <vN, vT9 P, S> except those that contain producti~ns ~f the form A B where A and B are both nonterminals, or erasing rules such as A -p e . The parsing process consists first of constructing (on the basis of the grammar G); a polynomial and a function defined on polynomials. A parse of x is obtained by repeated applications of the function to a polynomial P(x). The process has two features worthy of note. First, it produces all parses of x in parallel. Second, the process of cohverting a grammar into the required algebraic form is straightforward and does not alter the structure of the grammar. This property, the preservation of grammatical structure, is particularly impo r tan t in areas such as natural language analysis where the structure that a grammar provides is as important as the language it generates.

The polynomials we w i l l use have te rms of the form (Z ,A) , where Z is a string aver aa extended alphabet and A represents a sequence of productions of G. The process begins with a polynomial of ordered pairs representing X, the string to be parsed. A function is

repeatedly applied to the poJvnomia1; the number of applications nacessary is bounded by. the input length. If the resulting polynomial

contains a te rm (S,A) where S is the starting symbol in G, then A

repLresents the production sequence used in generating x Esom S. If no such pair occurs, then x is not in L(G), and i f multiple pairs (S hl) , (5 . . then x is ambiguous and the A I occur 'A2) . s specify the several parses. A precise formulation of the polynomial and the operations on it is given belaw. 111. Algebraic preliminaqies, and notation

A semigroup is formally defined as an ordered pair <S , - i where S is a set (the carrier) znd ' is an associative binary operation. Similarly, a monoid is a triple consisting of a set, an operation and a two-sided identity (e.g., s , ) We will feel free to denote a monoid or semigroup by its cerrier.

* For any set V, V denotes the free monoid generated by V; * * ,concatenation,n>. Similarly, + V = <V V denotes the -- free semigroup generated by. r; + V+ = <V , concatenat ion). We denote the length of a string in 7 * + X or V by 1x1.

For an arbitrary alphabet V, we define = E;~V~VI. The free half-group generated by V, H(V), is defined to be the monoid generated by V u 9 together'with the relation aa = 1, where 1 is the monoid identity and a s any element of V. No te that in H(V) the elements of 7 are left inverses but not right inverses of the co.rresponding e lemen ts of V. W e denote the extended alphabet

If T = <~,*,1> and Q = <~,+,0> are monoids, we deno.te by T Q the product monoid <T y Q,@, (1;0)>. The carrier of T Q is the cartesian product T Q and the operation @ is defined to be the component-wise operation of T and 0: A semiring is an alzebraic system <S ,+ , ,O> such that <S,+,O> is a commutative monoid, <S,m> is a semigroup, and the operation distributes over +: am.(b+c) = a*b + aec, (a+b)*c = a-c + b*c. A semiring is commutative if the operation is commutative, A semiring with identity is a system <~,+;,0,1> where <s,+;,O) is a monoid. The semirings used in this paper are commutati~re and have identities. Furthermore, in each case the additive identity is a multiplicative zero:

0-x = x-0 = 0.

The boolean sem%ring B consists of the carrier {0,1] under the comrn~tat~ve operations + and *, where 1-1 = l+x = I. and 0+0 = O*x = 0 for all x E I0,l).

For an arbitrary monoid M we denote by R(M) the baniring of polynomials described as follows :

1) Each tern is of the form ca where c E B (the

boolean serniring of coefficients) and rx E M.

Each polynomial is a formula sum (under +) of

a finite number of terms. Addition and multiplication of terms is defined as follows :

a) bu + crx = (b -f- c) a

b) (ba) (cB) = (be) bP ) . Addition a,nd multiplication of polynomials is performed in the usual manner consistent with 3). 3) Note that all coef iicients of R(M) arc either 1 or 0. We wi 11 adopt the usual convention of not explicitly writing 1 for the terms with that coefficient and omitting telms with a coefficient of 0. A --- context-free grammar is a system G = <VN, VT, P, S> where VN and V are finite, disjoint, non erlpty sets denoted non-terminal and T terminal symbols respectively. We denote by V the set V N :I VT. The symbol S is the distinguished nonterminal from which all derivations begin, and P 2s the set of productions of G. A context-free grammer is proper if it does not contain productions of thz form A -+ c (erasures) or A B where A and E are both nonterminals.

It can easily be shown that the set of Languages generated by proper context-free grammars is exactly the set of context-free languages. In addition, an arbitrary context-free grammar can be made proper by a straightforward method which alters the structure of the grammar very little. In this study we will deal with only proper .context-free grammars. This guarantees that all terminal strings have a finite number of derivations in C-, and thus makes possible our goal of finding all derivations of an input.

Productibns of G will be indexed by integers. Thus A i M denotes that A -+ >I is the i th production in P. We will deal only with leftmost deriyations. A leftmost derivation is completely specifzed by the initial sentential form and the sequence of production indices. If* A c_ 1 is the sequence of production indices in the leftmost derivation. of + N 6- V from M + c V , we rite ?I C -N. The - length of a derivation D is denoted by I , and is equal to the number of production indices in L.

We will use, but not formally define, the notion of height of a derivation', meaning the height of the corresponding derivation tree or the length of the longest path from the root to the frontier of the tree. The height of a derivation C w i l l be denoted by h(C) .

Since ' derivation' w i l l always mean 'lef tmos t derivation1 in the sequel, the following assertions hold: Assertion 1: A derivation is of height 0 if and only if it is of length 0. A derivation is of height 1 if and only if it is of length 1. Assertion 21 Let G be a proper context-free grammar, and

A -9M G where IGliO. Then A is of height Assertion 3: Let G = <VN, VT-, P, index set for P, and let the j th Let -jr be a derivation jr A -P i of height n + 1. Then and and for all i, 1 i "m,

less than or equal to ]MI .

S> be a context-free grammar, I an pr~duction of G be is a derivation of height n or less. The algebraic structure used in this work is the semiring of polynomials R(H - I*) where H = H (v) I the free half -group generated by V, and I is the in'dex set of the set of proJuctions P. We will. use an initial segment of the natural numbers, 2 3,. . , , as the index set I. Each term of a polynomial from R(H * I*) consists of an element from H I* tcge the r w i th a coefficient from the boolean semiring B. The elements of H - I* will be the basis for calculating the parses of a string A . The elements of H will interact to determine if a product of terms characterizes a derivation. If so, the associated element og I* 3s the sequence of production indices or" the derivation.

The following notational conventions will be observed. i, j, k m, n E - N,(set of natural numbers)* I S , g , , v will denote functions. Far the function g, IV. An algebraic parsing theorem Theorem (version - 1): Let G = <VN, vT , S, P) be a proper contextfree grammar. Then there exist homomorphisms L,, g, and (5, special 2" p R r I ) * and a polynomial ' E such that for every T X cz VT' X = XI --• . , X Xi VT, contains a term A if and only if A is a leftmost derivatim of x from S. Construction for the proof:

Let V = v1 IJ Vg be an arbitrary exhaustive division of V: The construction is most economfcal when V 1 and V 2 are disjoint, but th$s is not required. The function v is the homomorphism induced by the following: v(a) = a V and is the identity in I * (a,A), E fl . Since v is a homomorphism, v(A) = A. The function g is the homomorphism induced by defi-ning g on the generators of the domain as follows: 2i g (a, A) contains the term (a, A) ; a c V 2ii. If A abl b is the i th -+ ... production n of P and a E V 1 then g(a ,A) contains 2iii. There are no other terns in ga(a,L) . Note that because g is a hombmorphism, g(A) = 4, where .?. the the * ) * is identity of monoid (X I d. The function 6 is the canonical homomorphism wh'ich coalesces in * ) * a product (C T into a single ordered pair by component-wi se mcltiplicati3n of the first entries (thus allowing cancellation in H) and catenation of the second entries. For examp le , ~olynomial is ( ..' * ) 3% The p an element of I: defined as follows : 1.- p contains the summand A; 2. If a c Vp and A ab b is the j th -+ 1 ... r~ production

of P then p contains the summand 3. p contains no other summands. adopt the convention that k We p = A for k ' 0. that since contains ' X, k Note p p contains A as we l l as all summands of pJ for j k.

For notational convenience we adopt the followiag conventions. First; where result, products in * ) * no ambiguity can R(T: T of the form will be abbreviated as: No cancellation is implied by this notation since cancellation cannot occur I * ) * in R(c . Second, we define the function 'Yk as follows: where ai E V and p is the polynpmial defined above. Note that. if k < 0, then y k (a 1 a 2 ... an) = v(ala2 ,.. a ) and Y~ (A ) = A. Using n this notation, we can re-state the theorem as follows: Theorem (version 2) - : Let C = <VN, ' P, S> be a proper context-free v~ grammar. Then there exist ms2s Y, g and 6 such thit such that + for every x E V T' x = xlxZ ... X,r xi E v T' 5gny n (X) contains (S,A) if'and only if A a te rm S --- X. The proof of the theorem rests on three lemmas. - Lernma I impliel; the "if" part of the theorem; Lemma 111 imp l ies the "only if" part. Lemma 11 is used in the proof of Lemma 111. + Lemma Let E V , A E V, and n J: M A -M. Then for all k ' h (A ) , k 6g \Ykcm) contains (A,A) . Proof (by induction on h(A), the height of the derivation A ) : Basis: If h(A) = 0, then = A and fl A. Then Y~ (A ) k A = = p (A,Ei). Since A is a summand of p, it follows that (A,A) is a summand of k (A,A), and therefore A is a summand of 6g k p , Bk(A,h). Thus the derivation A is represented in k A A 6g Y k (A) by (A,A), which establishes the basis. Induction: assertion 3, A be derivation height n + 1, A - A Let a of Y. By where and where h(ri) n. then by the induction hypothesis, bg k Y (M j ) contains the summand k a j ,r j) Consider the te of k rm g 'Y k (M 1 ) which cancels to (a 1' I? ) 1 in R(H 'I * T ). must be of the form (a ? This te rm 1' I' 1 )T, where r 1 is a prefix of r 1' Eithera 1 V o r a c Vp . The sum 6g k+ly e 1 1 k+l (v 1 1 k 1 contains 6gg Yk(M1), which contains 6g(a 1' I' 1 )T. If al c V1, then g(al ,rl) .contains (Aa2ag: . . a , j rl) , and Gg(al ,rl)T cbntains r (ha2a3.. .a , jrl). On the other hand, r contains dpgny k (M 1 ) If a i E Vq, then of p, and therefore bp(a 11 I' )T contains either case, 6g k+l Yk+l (M) contains the summand since every. summand k of 6g (M j ) is a summand of 6g k k+l . 1 it follows that 6g k+l k+l (M) contains This completes the proof. * Lemma 11: a E V, I' f 1 . k 2 8, k Let For all terms of g (a,l") are of (b,aJ') (S (GI where - the form ,A). . . ,A) b c V, c i e 7, m 2 0, m For notational. convenience we abbreviate c 1 c by N; Ilence we m denote (c ) - (b ,~ r ,A) . . . (cl,A) by (bN,AT).

m Proof by induction on k, the number of applications of g. By definition, 0 g (a,I') = ( I ) which establishes the assertion for the the sum (Aala2..

-(Aa 2 a 3" bg k+lg k+l (M 1 ) also .a , j) is a sumand

r (AaZag=.-a r' jrl) and

k+ly . a , j r Thus in value k = 0. Assume the assertion holds for k and consider n+l n < n g a , = gq (a,?'). By the induction hypothesis, all terms of $(a,T') are of the form (bfi,~~ ') where b 8 aN. Hence terms of gnfl(a,r) are of the form g(bE,Or). Since g limited to is the identity. .g(b i ,~~) = [g(b,bT ) ] (i,,~). By definition of g, g(b ,Or ) contains only terms of the i'orm (cG, j91 ) j blf is production. Therefore of n't-1 where C + a terms g (a,I') are of the form j F) and since C -h b?l and b =s, aN it corollary: A l l terms of k g (&r) Lemma 111: If 6g k , yk(M) contains Proof by--induction on the length Basis : Let a. E V and assume

6g k Yk(a) contains If pi represents an arbitrary summand of p other than PL, then every term of k g Y k (a) can be represented in the form where 0 r n < k and n denotes the number of nontrivial summands of p which are factors of the term.

j Q follows that C aNM.

- are of the form (~NM ,AT) . (A~ ,A ) then - A , A MN. of M: (G>A). By const~uction, every summand of p is either A-or of the form .F i j i) where Bi I: VN, P + (B , i c V , j i F - T .I I. i and B i -* P i is a production in G. By Lemma. 11. every term of k g (B I . i , j i) is of the form : C - (C ,I' 11 . j i ..) * * .M.P where Ci Vi, Mi, P i c V , T i t l ~ r the lemma, it follows that of k By same every term g (a,A) is of the form (Cn+lMn+19 I' n+l ) where -. C n+l E V, M n+l 3 ll *+I c 1.Hence every te rm of k g Y- k (a) is of the form r L riji where C i P.M. for 1 r i r n and C n+l - I n+l -----" 11 M nC1 assumption there is tern1 t of k By a g Bk(a) such that 6[k] = (A~ ,A ) ; t must be in the form indicated above. In order for t to cancel under 4, the following must be true: C1 = A since C 1 cannot I cancel from t, - - P i =QC i i+l for 1 i n since C 2- acn+l must all cancel from t. Therefore This cancels to (i,~) as required w i th = I n+l Q M Q n-1 M n-1 . Q 1 M n n 1 Then by (19, C i -C it1 Q i M i' 1 5 i 5 n, and Hence-, since C 1 = *A, and thus a A . N. This establishes the basis . Induction; Assume that for all M 6g k contains (AN,A) then A a Yk(w) =-. such that I~a i n+l and k5 = 6g y k (Ma) and Y are hmornorphisms, * V such chat I M I n, if MN. Let fi = Ma be a string contaks (AN,&). Because 6 g Then 6g k W k (K) must contain a (T A 1 ) and 6g k term is Y k (a) must contain a term (T 2' A 2 ) such that T T = AN and h = 1 2 81A2 In order f6r this tc occur, T2 must be of tahe form (BE*) r~hpre * B V and TI just,be of - - c(V, N2 r I , the form (ANIB) where1 A E V, - * N1 E V , and - N = fi1i2. (If T 1 and T 2 were not of this form, cancellation - to ~ would be impossible.) Thus 6g k Yk(M) contains (AN 1 R,Al)., and by the induction hypothesis Also 6g k Y (a) contains (B&>,A~) and by the basis k It follows that N2N1' which completes- the proof. and since A = Ma and N =

The theorem now follows from Lemmas 1 and I I1 and As~ertion 2. The ' if ' part follows from Lemma I and Assertiori 2 , and the 'only i f ' part follows kminediately from Lemma I11 for the special case of N = A .

As we have stated the theorem, the length of x is used to determine a sufficient number of applications of g and Y . Alternatively, the theorem could be foxmulated in terms of the heights of derivations of X; if A is a derivation of x of height k, then for every n 2 k, the te rm (S , A) w i l l be in the polynomial .s~"Y (x) . Furthermore, it n follows from Lemma 111 that no harm is done by choosing the value of n too large, i-e., no 'false' derivation te rms will occur. In the flrst statement of the theorem, the derivation terms

n obtained from the polynomial n Tl,p n are Bg v(x.) which can be rewritten in the form Although we have used a constant value of n (equal to the length of X) for both the powers of the map g and the polynomial p, some economy can be gained in this respect. In fact, the poweYs 5f g and p can decrease from left to right so long- as they remain large enough to perform the appropriate computations on the suffix strirlgs of X. Thus, the theorem is true (b~ t considerably mora difficult to prove) if,one instead uses a parsing po lynom ia l of the form V. Special cases of the thedrem

A number of intdesting special cases occur based cln the choice of V1 and V 2' Case 1. V1 = VT. The function g handles all productions of the form while p handles productions of the fo rm Notice that since g is nontrivial on only V T ' g need be used only once; i-e., The parsing polynomial is then

The special case 01 V 1 = VT and,V2 = VN results in a particularly simple form if the grammar is in Greibach no ma1 form . The polynomial p = (A,A) and therefore has 110 effect. Since g need only be applied once, all derivations are found in one step. Example 1: G = <~ ,A ,B> , {-a,b), S , P> P=1. 3. 4. For the string x = aabb, g [Y k (x)] (among other things) for all k 2 2, s- i-ah A+AB A f A B - t b the parsing polynomial then contains This contains : [w(S,l) (h,h)] [(A,2) (:,A) (x .~) (A,a) (i,~) (x ,~) [(A,3)1 ] (B ,4) 1 [ (B ,4)1 Applying 6 we get Case 2. V1 = V.

The entire job of parsing is now done by g, since the polynomial p is equal to (A, .'I) . Hence the parsing polynomial is Example V1 v2 g(S,P-) g(A,A) g(B,A> g(a,/.) The polynomial azbb is For k 2 3, this contains = is, A, B, a, b). = 9' = (%A) = (A,A) + (A,?) (&A) = = a , + (~,1) (A,A) + @,3) parsing for which in turn contains [(s,I.)(H,A>][~ 2 (A,3)][(B,4)][(B,4)] after one application of g, [(S,l)(A,~>][(A,223)(B,h)(B,h)] [(B ,4)][( ,4 after three. Applying 8 results in (S ,122344) as before . Case 3. Vl = 0. N.ow the entire parse is handled bp p. The parsing polynomial becomes V I . Observations

+he ma j or theorem presented here shows how context-free parsing may be carried out by purely algebraic means. A l l parses of an input string are developed in parallel and the process is guaranteed to terminate-. As we have described the process, the number of terms of parsing polynomial for string + a a x c V T is unreasonably large. %lowever, mos t of the terms in such a polynomial are not associated with a derivation in the grammar, and method; exist for r9ducing the computation by disregardin4 dead-end terms before they are completely evaluated. By applying such techniques in a straightforward fashion, and choosing V 1 and V2 in various ways, the algebraic method can be associated in natural ways w ith classical parsing techniques. For example, the algebraic process in case 1 above 5s a goal directed top-dawn apptoach simflar to the predictive analyzer. Case 2 is the algebraic version of generalized bcttorn-up. Parsing algorithms are typically so d ifg erent one from another that they are incomparable. But using techniques described above, many parsing algorithms may be posed in a single algebraic framework. This may facilitate the comparison and evaluation of parsers and of various classes of grammars . REFERENCES Chomsky ,. N. and M. Schutzenberger (1963) , The Algebraic Theory of Context-Free Languages, in I 1 Computer Programming and Formal Systems". (P- Braffor t and D. Hirschbert, Eds.) , North Holland, Amsterdam, Ginsburg, .S . and H. G . Rice (1963), Two Families of Languages Related to ALGOL, JACM 9 , pp. 350-371. Shamir, Eliahu (1967), A Representation Theorem for Algebraic

and Context-Free Power Series in Non-Commuting Variables,

Information and Control 11, pp. 239- 254 Stanat, D. F, (1972), Approximation of weighted Type 0 Languages

by Formal Power Series, Information and Contro l 21, pp

344-381 . Stanat, D. F. (1972), A Homomorphism Theorem for Weighted Context-

Free Grammars, J. Comput. System Sc i . , pp. 217-232 Weiss, S . F., D. F. Stanat and G. A- Mago (1973), Algebraic

Parsing Techniques for Context-Free Grammars, in "Automata,

Languages and ProgrammingT1 (M. N ivo t , Ed.), pp. 493-498,

North Holland/Arnerican Elsevier. American Journal of Computational Linguistics Microfiche 55 : 61 Department of Computer Science Cornell University Ithaca, New York 14853 This work was supported in part by the National Science Foundation under grant GJ 43505. b number of statistical theories have been proposed capable of identqifying individual text words that* are most useful for the content representation of written texts and documents. Among these are parameters based on the variance of the word-frequency distribution (NOCC/EK), and on information theoretical (signalnoise S/N) premises. These formal parameters are reLated to practical automatic indexing techniques--most notably to the discrimination value (DV) method, capable of generating content identifiers (individual words, phrases, and word classes) that distinguish the various texts and documents from each other. It is shown that terms with favorable formal parameters also exhibit desirable semantic characteristics in that such terms are concentrated in documents judged relevant by the respective user populations, and vice-versa for terms with unfavorable formal properties.

Theories of Term Importance

Automatic indexing may be considered to be a two-step process:. first the automatic identification of linguistic entities useful for the representation of document content, and then the assignment to the prospective content identifiers of weights reflecting their importance for content description. Since these tasks must ultimately depend on a study of the texts or documents under consideration-, a grelt deal can be learned by examining Term Value Measurements the occurrence patterns of words and other linguistic entities in the documents of a collection. Indeed, among the theories of term importance which have been studied in recent years, the best known 'ones are based on the respective frequency distributions across a variety of written texts. A) Variance-Based Measures The most widely used of the statistical theories' distinguishes so-called "specialty" wowds from "nonspecialtyll words by assuming that a deviation from randomiiess in the occurrence pattern of certain text words is indicative of specialization and hence of good content identifiers. Thus the best content descriptors are te rms Whose occurrence pattepns deviate most strongly from randomness. Since a random sprinkxing of the occurrences of a given text w ~ d across the documents of a collection leads to wora frequency distributions which follow the Poisson model, a compa~ison of the actual freqiieiicy characteristics of a given term with the Poisson distribution leads to the appropriate distinct ion between good content words and poor. ones.

More specifically, since the variance vk of the frequency distribution of term k is propo~tioeal to the total frequency of occurrence F~ for te rms whose distribution obeys the Poisson model, a measure of term importance is obtainable by using formula based the ratio of vk to F k a on . Some typical formulas used fcr this purpose are vk/fk and n 2 k / F k - v where n is the collection size. [1,2,3] The basic mathematical formulations are collected in Table 3 . Term Value Measurements 63 ---Formulas - I.d - .- ,er L of 22c.~-;::r I.. - - "-1 ------ -=,-a : -- - -.., li.ec_uency of term k in docment l I I binary fre~uency of Term k in document i total frequency of term k in collection document Srequency of term k in

collect ion (number df documents in which the

term occurs ) average frequency of term k in collect ion Basic Frequency Formulas

Table 1 Term Val ue Measurements One such variance-based measure used by Dennis under the name of NOCC/EK [3] may be computed as It is obvious from this formulation that the most effective terms are those whose occurrence frequencies fk i in the individual documents deviate strongly from the frequency F k average /n.

B) Signal-Noise Measure

Another measwe based on the characteristics of the frequency digtribution of individual text units across the documents of a collection is the signal-noise ratio which varies with the skewness of the frequency distribution. This measure has the form OF entropy and assigns the highest value ts those terms whose occurrence characteristics exhibit the greatest variation from one document to another; ccntrariwise low values are assigned to terms with relatively similar frequency patterns in each of the documents of a collection. [3,4] The idea is that terms with even frequency distributions which may occur an identical number of times in each document of the collection canr~ot be used to distinguish the documents from each other; hence, their assignment for purposes of content representation is counterproductive. The reverse obtains for terms with skewed fvequency distributions.

The signal noise value (S/N) k for (s/N)~ k = log F term k is defined as A1

C : -A J. i - I log l? E k Term Val uc Meas urcmcnt s The negative term in expression is known.as the noise k (2) N ; it is maximized for even distributions k fk F /n for l l f.. k where = a 1 The properties of the signal-noise measure are thus very similar to those described earlier for the variance-based formulas. C) I nf ormat ion Theoret ic Considerations The for~going development leads to a distinction among the terms in accordance w th the relative sizes of the indibidual te rm frequencies k i fi in the documents and the total collection frequency F k . A question arises about the preferred size of the collection frequency F~ (or of the document frbquency k B 1 for terms that are useful as content identifiers. This problem may be tackled by having recourse to certain information-theoretic concepts. Consider the task of supplementing a set of existing 5ndex terms ideneifying a collection of documents by addition of a certain number of new Terms. Each new 'term is then most effective when

a) it provides maximum additional reduction in uncertainty among the

documents of the collection (that is, its assignment breaks up

existing subsets of documents that cannot be distinguished by the

existing term assignments into substantially smaller subsets);

b) it exhibits little redundancy with the previously available terms

SO that its assignment does indeed optimally divide the various

document sets.

The first property is obviously not fulkilled for tersms with low document frequency B k , that is, those assigned to very few documents in the collection, because their assignment provides little additional discrimination among the documents; the second property, on the other hand, does not obtain for terms of high document frequency that may be assigned to a very large number of documents, because such terms w i l l obviously exhibit a good deal of redundancy with the already existing terms. Term value Measurements The conclusion is that the best terms are those whose document frequency k B , or total frequency F k , i~ neither too large nor too small, and whose ikequency distribution is skeued in that for documents4 f k some i larger than - F~ > and for some others fi is much smaller than n is much - F~ . n D) The Discrimination Value Model The discrimination value model uses as a point of departure the retrieval capability of the various index terms; specifically, a good content-indicative term is designed to help in the retrieval of material that is wanted (thus enhancing the recall), and in the rejection of material that is extraneous (thus enhancing the precision)fi. To produce high recall, that is to retrSeve most everything that is relevant, the terms used to 'identify documents and user queries must be fairly general in natwe; high precision, on the other hand, that is the rejection of the nonreleudat material, depends on the use of reasonably specific content identifiers. The indexing problem then reduces to the choice of terms that are specific enough to prohuce high precision while also being general enough to produce high recall.

In the discriminatiqn value model, the assumption is aade that the best terms in this respect arc those which cause the maximum possible separation among the dobuments in the "document space". Consider , in part idular , a collect ion of documents each identified by a set of content identifiers, or index terms. The ?'ndex term sets for two given documents can be compared to produck a similuity coefficient measuring the closeness between the respective documents. * Recall is the proportion of relevant material retrieved while precision is

the proportion - 7 of retrieved material that is relevant. An effective retrieval system is one whlch produces the highest possible precision for a

given level of recall. Term Val uc> Mcas urcmcnts The existence of the term qets representing the various documents, and the possibility of computing similarity measures between documents can be used to define a document space For the collectioh. In such a space two documents appear in close proximity when their similarity aoeffi~ient is large; contrariwise, documents exhibiting little similarity are widely separated in the document space. One may then conjecture that a document space which is "bunched up", in the sense that a l l documents exhibit somewhat similar term sets is not u~eful for retrieval, since one document cannr*t then be distinguished *om another. On the con;trary, a space. Which is spread out in suchma way that the documents are widely separated from each other may provide an ideal retrieval situation since some documents may then be retrieved - hopefully the relevant ones - while others can be rejected.

This suggests that the value of an index term can be ascertained,by measuring the amount of spreading in the document space which occurs when that term is assigned to the documents of the collection. Specifically ; if Q is the density of the document space without term k present among the content indicators, and Qk is the density after term k is assigned, then for a good te rm Q - Qk > 0, since the space w i l l have spread after term k is assigned. ConverseSy for poor terms Q - % T 0.2 [5,6] An appropriate * The density of the space might be computed, for example, as the sum of all

pa:-vwise similarities between dist inct document pairs, that is where S(Di, D.), 0 - < S - < 1, is the similarity between documents D and D.. 3 i Term Value Measurements measure of term importance is then the term discrimination value, DVk 3 defined as It may be of interest to inquire into the relationship between the discrimination value of a term and the statistical. (frequency) parameters introduced earlier. The following conclusions are reached from a study of the indexing vocabularies in several different subject areas, relating the document frequency of a term to its discrimina.tion value: [5] a) terms with yery Low documeht fiequenay that may be assigned to

very feQ documents in a collection are generally poor discriminators;

when the terms are arranged in decreasing order of their discriminamtion

values (where rank 1 is asdgned to the best discriminator, rank 2

to the next best, and so on) such terms exhibit ranks in excess

of t/2 for a total of t existing terms; b) term3 with high document frequencies, comprising those that are assigned to more than 10 percent of the documents of a collection are the worst discriminators, with average discrimination ranks (ranks in decreasing discriminatioh value order) near t; c) the best discriminators are those whose document frequency is neither €QO high nor too low -with document frequencies between n/100 and n/10 for n documentq; their average discrimination ranks are generally belaw t/5 for t terms. The vector space analysis then appear& to confirm the conclusions derived earlier from the statistical models, that terms which appear in a collection with great rarity or excessive frequency are not optimal for content description purposes. Term Value Measuremt:nts 2. Compariscn and Evaluation The discrimination value analysis can be used to derive an effective indexing policy: since the best terms appear to be those with medium document frequencies, such terms can be directly assigned as content identifiers without further refining transformations. On the other hand, terms with excessively high document frequencies must be made more specific thereby decreasing the frequency of their assignment to The queries and documents of the coilection: contrariwise, terms with low document frequencies must be made more general by increasing their assignment frequencies. [5] This can be achieved by joining two or more high frequency terms into term phrases, while assembling a number of low frequency terms into term classes. Obviously, a term phrase exhibits a lower assignment frequency than any phrase component, and vice-versa for a term class which replaces a number of individual class elements.

It was shown earlier that the use of phrases and term classes (thesaurus) constructed in accordance with t*he frequency requirements imposed by the discrimination value theory produces substantial improvements in retrieval effectiveness (recall and precision). In the present work, additional relationships are examined between the statisticd and the vector space models. However, instead of aotudly 'using the various term sets in a retrieval environment, an attempt is made to relate the formal frequency and vector spaee properties of the terms to the se-nantic characteristics of these terms.

Specifically, consider a collection of documents in a given subject aea and an appropriate set of user queries pertaining to that area. For each user query, the set of documents can be partitioned into two subsets consisting of the Term Value Measutements 70 relevant set R and the rlonrelevant set I, respectively. Relevance is assumed to be user-specified in such a way that a relevant item is assumed to be one which ig related in some sense to the infornation need expressed by the various user queries. The linguistic, or semantic, character of a given term can now be introduced by assuming that the most valuable contentidentifiers assigned to a collectio~l of texts are those which are! concentrated in the documents specified as relevant to the respective queries, as opposed to the. nonrelevant ones, contrariwise, the less valuable terw w i l l be concentbated in the nonrelevant items.

The discussion may be formalized by using the concept of term relevance TR. [7] Consider a te rm k contained in query Q;. the terBm releva~ke TR(k) may be defined as where r k and hk are the number of documents containing te rm k that are relevant and nonrelevant respect ively to query Q, and I R I and I I I ire the total number of relevant and nonrelevant documents for that query.;' When a te rm k occurs in more than one query, its term relevance may be taken as the average of the relevance values obtained for the various queries. The mathernaticxlly undesirable situation when I R I is not likely to occur in a pi-actizd envircnmegt. r k or when h k Term Value Mcas urcmen ts It is clear from the function (4) that high values arc ~ : S S ~ ~ T to IC ~ those query terms which are prevalent in the relevant items and rhe in the nonrelevant, and vice-versa for thase previl3cn-t mainly %n the nonrelevant. Furthermore, the terms falling into ;he former class ape likely,to be more useful for content representation than those in the latter. To verify th'e relationships between the statistical models of word importance and ths vector space model, dcsument collections are used in three different subject areas, including aerodynamics (cRAN), medicine (MED) and world affairs (TIME). The vocabularies and user populations are disjoint for these rhree areas. Results which carry through for all three cases should be extendable to other subject fields as we l l . The basic collectibn statistics are contained in Table 2.

It may be seen from the Table that the term relevance is defined for only a relatively small number of terms for each collection, namely 458, 172 and 375 for CRAN, MED, and TIME, respectively. The reason Ls that a term relevance value is computable only for terms which occur joinaly in certain query-document pairs. Fop small experimental collections operqting with a restricted number of queries the size of the corresponding term sets is obviously limited.

Consider now the comparison of the standard statistical term value measures with the te rm discrimination values obtained by the vector space transformations. Table 3 shows the values of the NOCC/EK and S/N measures (expressions (1) and (2)) obtained for tine 50 terms with highest discrimination values and the 50 terms with lowest discrimination values for each of the three test collections. The range of the respective values is given in each case, as well as the average values for each set of 50 terms in percent (that is, on Term Value Mcdsurcmcnts 7 2 Chc:ract eristics : CRAid MCU TIME 4 34 450 425

Subject area aerodynamics medicine world affairs Nunher of documents 424 450 425 Numb5r of cser queries 155 2 I! 83 Number of terms assigned 2651 4726 7569

to collect 5 on Number of teps occurring 458 172 37 5

jointly in queries and document sets . Basic Collection Statistics Table 2 Term Value Measurements a scale of 0 to 100). T test values are.also shown ~epreserlting the probability that the two sets of 50 values (for the high DV and low DV terms) could have been derived from a common probability distrihutzon by chance. In statistical significance testing, a t-test value smaller than 0.05 is normally taken to imply a significant difference; that is, the hypothesis that the mo sets of values do in fact originate from a common distribution is rejected in such a case. [8]

It rpay be seen that the ranges of values for the statistical parameters NOCC/EK and S/?$ exhibit substantial differences for ail three colleotions. The same is true for the corresponding average values. Moreover the differences are in all cases statistically significant. , It is then clear that a high discrimination value reflected in the ability of a term to expand the document space upon assignment to the collection also implies, favorable statistical parameters in terms of va iance and skewed frequency distributions; the converse is true for the low discrimination values.

A t the bottom of Table 3, range and average values are given for those terms among the sets of 50 terms for which the term relevar~ce is defined (that is, lhose which co-occur jointly in some query-document pair). Again the term relevance values are substantially different for the two classes of DV terms, and these differences are statistically significant.

Also included in Table 3 are the multiplicative factors which relate the average values for the 50 high discriminators and the 50 low discrimihators for each of the three measures (that is, the factor by Term Value Measurements which the OW average value must be multiplied to obtain the high). It may be seen that this factor is much higher for the term relevance than for either of NQCCIEK or S/N. The actual factors for the term relevance are 6.66, 80.0 and 36.33 for the CRAN, MED, and TIME collections, respectively. Thi& indicates that the high discriminators have very much higher average term relevance than the low discriminators; alternatively expressed, there is substantial agreement between the semantic term relevance concept and the automatically derived term discrimination values.

The data already included in Table 3 are shown in term relevance order in Table 4. The output of Table 4 contains range and average values for NOCC/EK, S/N, and DV for the 50 terms with highest term precision and the 50 terms with lowest precision for the CRAN and TIME collections, respectively. Averages are produced for only 30 high and 30 low precision terms for the MED collection because in the medical environment the small number of available queries (24) made it possible to compute term precision values for only 172 terms in all.

It is clear from the output of Table 4 that the differences in the respective values aye substantial in all cases, and the t-test values indicate that they arc fully significant. For the three collections under study, &he evidence indicates that terms with favorable formal parameters tend to be concentrated in documents identified as relevant by the user population, and vice-versa for terms with unfavorable formal parameters. Also shown in Table 4 are document frequency 4 (B ) and total frequency -k average average (F ) values for tho high and low relevance terms respectively. It may be seen that the Term Value Mcas urements high relevance terms exhibit a much lower frequency spectrum (as e~pected for good discriminators) than the low relevance terms. Once again, it appears that the term relevance reflecting the semantic properties of the terms in their particular collection environment effects a division among the terms very si~ilar to that obtained by the discrimination value cornputat ions. In earlier work it was shown that the discrimination value theory which leads to the assignment to queries and documents of medium frequency terms cincluding also phrases constructed from high frequency terms, and term classes made up of low frequency terms-) exhibits egfective retrieval characteristics. [4,5,6] Typical average retrieval precision values for three different recall levels (recall of 0.1, 0.5, and 0.9) are shown for the three collections in Table 5. The output shows that the use of mediumfrequency phrases and term classes improves performance by about 20 percent compared with the assignment of single terms alone. The comparison of Tables 3 and 4 between discpimination values on the one hand, and statistical and semantic parameters on th"e other, indicates that the same theory which produces such effective retrieval characteristics also conforms to the known sta'tfstical and linguistic theories of te rm behavior. Term Value Measurements 50 Terms with High Discrirninclt ion

Values 50 Terms with Low Discrjmi na t ion V~lJucs CRAN 424

NOCC/EK range average t-test average I ( in percent ) high/average low range average ( in percent ) t-test average high/average low Term range elevance TR average (in percent) t-test average high/average Lon a) CRAN 424 Cornparison of Statistical Models in Term Discriminati-or] Values Table 3 so. is% (21 terms only) (24 terms i;J only) 0.02208 Collection Term Value Mcasurcments 50 Terms with High Discrimination

Values Low ~iscbimindt f&n

Values r MED 45'0 NOCC/XK range average ( in percent ) t-test average high/average low ------------.I.--------.-----------,,,--,-----S/N range 2,792 to 0.693 1.738 to 0.126 average ( in percent ) 48: 46% 23.93% -----I--------------- t-test 0 . 00002 average high/avsrage low 2,03 - - - - - - - - - - - - - - - - - - - - - - a Term range 874.00 to 0,00 Relevance TW average (in percent) i 16.0% (12 terms only) t -.t cst average high/averags low

b) MED 450 Collection

Comparison of Statistical Models vith

Term Discriminat ion Values (cont . ) Table. 3 A Term Value Measurements High Uiscrimi~~at ion Values Low Diccrinind tion

Values r TIME 425 NOCC/EK range average (in percent ) t-test average hi,gh/ave~age low d--------------------c----------------------------d- 6 S/N range 2.966 to 1.424 average 68.85% t-test average high/average low ,--,--------------------.----------------------Term Relevance TR range

average ( in percent ) t-test average high/average low (23 ?Arms only) c) TIME 425 Collection Comparison of Statistical Mo'ciels with

Term D i scrirniriat-j on Values (cont . ) Tablc 3 Term Value Measurements 50 High Relevance *- Terms --k B =10.3 F -24.6 -A =58.9 -& B F =84.0 NOCC/EK 3657 to 420 1584 t~ 432 average 38.95% average 20.66% ------I-------------------- t-test 0.000n2 average high/average low 1.89 S/N 1.953 to 0.000 0.998 to 0.045 average 42.81% average 20.63%

t-test 0.00002 average high/average low 2.08 --------------------------DV 1.223 to 0.002 0.075 to -1.283

average 65.52% average 25.06%

t-test 0.00140 average high/average - - a) CRAN 424 Cbllection Comparison of Term Relevance

Term D isc r im ina t ion Values

Table 4 - low 2.61 -- - - -with Term Value Measurements r NOCC/EK I I t average lt8.01% average 36.33%

t-test 0; 02378 average high/average low 1.32 -------- ------------------S/N t 1.664 to 0.~126 1.259 to 0.000 average 61.0% average 46.33% t-test 0.00272 average high/average low 1.3 2. ----------------e---------DV 0.135 to '0.006 0.688 to -1.030

average 62.11% average 56.11%

t-test 0.00621

average high/averag low 1.11 ,.. . .

b,; MED 450 Collection Comparison of Term Relcvarlce w i th Term Discrirninat ion Values ( cont . ) Table 4 Term Value Measurements I NOCC/EK 13010 to 1117 2266 to 43% average 9 6.1% average 3.4% t-tes-b 0d00002 average high/ave-rage low 4.7 4 ------- JLA- ---- A--d ----------S/N 2.966 to 0.000 1.376 to 0.126 average 42.31% average 19.25% t-test Q. 00002 average high/average Iow 2~20 ,-,-------.r--,-----L.-----.L-,-c-1-B.156 to 0.000 0 .a04 to -1.862 average 94.05% average 83.0%

t-test 0.00148

average, highlaverage low 1.13 DV I c) TIME 425 Collection C~mpari~bfi of Tern Relevance w i ~ h Term Discriminat ion Values ( cont . ) Table 4 Term Value Measurements Average Retrieval Becisiun I CRAN MED TIME For Various Recall I Levels 4 24 I 546 I 425 I \ A) Low Recall (0.1) i) single terms ii) single terms,

phrases and term classes B) Medium Recall (0.5) single terms single terms, phrases and term classes C) High Reca-31 (0.9)

i) single term

$2) single terms, phrases and terms classes i) ii.) I I I Recall-Precision Performance for

Medium Frequency Terms

(Discriminat ion Value Theory)

Table 5 Term Value Measurements 83 References (11 A. Bookstein and D.R. Swanson, Probabilistic Models for Automatic

Indexing, Jownal of the ASIS, Vol. 25, No. 5, ~e~tember~~ciober 1974 ,

p. 312-318. [-21 D.C. Stone and M I Rubinoff, Statistical Generation of a Technical

Vocabulary, American Documentation, Vol. 19, No. 4, October 1968,

p. 411-412. [3J S.P. Dennis, The Design and Testing of a Fully Automatic Indexing-

Searching System for Documents Consisting of Expository Text, in

Information Retrieval: A Critical Review, G. Schecter , editor,

Thomps~h Book Co. , Washington, 1967, p. 67-94. [4] G. Salton, A Theory of Indexing, Regional Conference- Series in

Appliea Mathematics No. 18, Society for Industrial and Applied Mathematics., Philadelphia, 1975. [5] G. Salton ,. C. S. Yang and C. T. Yu, A Theory of Term, Importance in Automatic Indexing, Journal of the ASIS, Vol. 26, No. 1, JanuaryFebruary 1975, p. 33-44. G. Ealton, A. Wong, and C.S. Yang, A Vector Space Model for Automatic Sndexing , Communications of the ACM , Vol . 18, No. 11, November ' 1975, p. 613-620. C.T. Yu and G. Salton, Precision Weighting -An Effective Automatic Indexing Method, to be published in Jownal of the ACM, 1376. [8] D. Williamsori, R. Williamson, and M. Lesk, The Cornell Emplementation of the SMA~T System, in The SMART Retrieval System, G. Salton, editor Pren-k ice -Hall, EngLewood Cliffs , NJ , 197 1 , Chapter 2. American Journal of Computational Linguistics Microfiche 55 : 84 S N O P A R : Department of Mathematics Texas Woman's University Denton, Texas 76204 A grammar testing program has been developed which permits modeling augmented transition network grammars as a series of SNOBOL4 functions. SNOPAR is designed for lknguistics teaching and research. Emphasis is placed or1 the development of small to medium grammars in a variety of languages. The system has been used so far to develop a grammar of English for use transformational grammar course and develop small grammars of a Nigerian and an American Intended of in & Indian language. applications SNOPAR are fi.' linguistics and grammar model testing.

The main part of the program is the routine PARSER. When PARSER is c'allbd with a lexicon and grammar, input* strings are parsed according to the model grammar. The PARSER functions available for grammar developmerlt are CAT, PARSE, SETR, GETR, RESET, TESTR, GETF, GETCL, TO, BACK, FINDWRD, and BUILDS. The function operations and descriptions of their argumerits are given in Table 1. After a parsing, PARSER returns' control to the user permitting examination of stacks and registers at all TABLE 1 PARSER CAT looks up the word class of the current first word.in the.input string. If the word is not in the lexicon an add routine is called which permits additions. If CAT succeeds by matching the current word class with its argument, the word is removed fram it the input string and pushed ont-o a staclc (SAVEW) . If fails an alternate class is tested, provided chat the alternate flag is on. Fail return leaves the surface string unaltered. PARSE calls the function given by its argument and if successful pushes the structure returned by the function onto a stack (SAVEQ) and assigns the structure to the Q register. SETR GETR TESTR RE SET GETF GETGL BACK FINDWRD BUILDS sets the values of registers. It has three arguments level, register name, and value. Each cavil ~f SETR causes the register name specified to be placed on -3 list for the specified level. SETR enrries are treated as stacks, providing automatic saves for recursi-ve calls. returns the contents of the register name specified by its argument, and pops it ofk the stack saving the last value. looks at the value of the register name specified by its argument without popping it off the stack. changes the vaLue of a register without changing stack Levels. looks up the feature value for a feature speciFiied by its argurnebt of the current value of the word. register. Any word can be specif-ted by giving a second aqpment. Tf GETF fails for the word it looks at the root form of the word for certzain features loa~s up the word class of the word specified by its argument. has as its argumcnt, the new state label. It pushes the label onto a stack (PATH) ; outputs t:he s-t:at-c? , ou~:pllts the contents o f the QrcgJstxr, and transfers control t.o the new scare. backs to ehhe state specified by its nrgrmic~nt. tests for the word specified by its argument:. builds a structure from trhe register name list:. SNOPAR leve'ls. In the examination stage, traces may be turned on lexioal entries may b~ examined or minor changes to the grammar may be made. Functions available A for the examination of stacks, registers and lexicon are POP, OUT, GETR, LOOKLEX,- and TRACE. A function GETENG is also available for dictionary lookup in other languages. PARSER requires approximately 150 lines of SNOBDL code and i s currently operating on a DEC 10. A hatch version has been tested on an ZBM 360

In order to use PARSER, a grammar and lexicon must be developed as disc: files. Since the grammar Ls developed as a separate file different components of the grammar can be tested and put together in a variety of contigurations. If a Lextcon is not developed as a disc Eiie prior to a parse, it may be entered fron the terminal A simple grammar which produces surface structure trees is shown in Example 1 along with a sample parsing. A portion of the lt?xicon is shown at the bottom of the page. Example 2 shows the use of the GETF function to handle agreement between plural adjccti.vcs and rnark~r in Angas, a Nigeticln lony,unjic. Epomplc '3 shows

We use the same grammar and input stririg as above. to 0,699 10 0.000 59.95% 00002 to 0.00 14.06% to 0.00 Terms with to 1359 to 531 to 0.00 0.20% (24 terms only) 04274 80.0 Terms with Terms ,with to 2330 to 451 0.81% 3.46 to 0.231 2.60 to 62.62 (12 terms only) to 0.44 Low Relevance Terms 30 High Relevance , 30 Low Relevance Terms Terms P '9.5 --k F -24.0 ec22.5 * F =41.9 { to 521 2248 tu 1140 High Relevaxe * 50 L0.w Relevance 4 Terms =12.5 -94, 5 --k B +=JC=~S.~. B F =161+.8 whi ch hyncll c?s r;c)ni-cnctt c?mbcdtli nf: in Engl i :;h . fiornc i IT; arcA :;hewn . 7'11~ mc~dr~l II:;P~ r ihca EX:IIII~ I cl '3

j !; I,:,I :; i cij l 1 y I 11c orlcA (Icvca 1 opctl i 11 En); 1 j :;I) 7'rl111;; f o r-tll;i 1 j OII,J I I y : , J o :I I I J<o ~ : ; (~T I~J~J [J~ . I A J :; i ( (':I NV ~:J-:IIII~II;I r f-o r i : or i cA1ll c~t l r r r for ~ (:bo(* l ;JW (.in i 11 tl(~vc.1 oplocJnI . ~ 1 j 1 1 I ; wc: 1 1 ;I ;I. I h~ic.ri can Indi an 1 nn):u;ij:cs) I a plural, a gr;lmnnmlr 1 cB j;r;liJlrIi;ir SNOPAR The complete SNOPAR system has in add'ition to PARSER a routine for generating grammars from a .state transition graph and a register action table. This routine called NEW guides the user through a state transition graph and register actions to produce a grammar compatible with PARSER. Thd SNOPAR NEW routine is still in develoj~rnen~. The current routine allows deuelopment.of small grammars. The new developments will pro vide diagnostics of grammar errors. SNOPAR dlso has a line editor (FIXUP) and disc 1-10 commands. The complete system allows repetitive testing of model grammars, permits editing; and has trace capabilities fsr grammar debugging. SNOPAR Example 1 PARSE(NPO) IS(TO( .SNP)-) CAT( 'AU 'X*) ts(~o( .QSSS*) )F(FRETURN) SSTR( .s, 'TYPE ' , 'DCL'> SETR(.S,'SUBJ*,Q) PARSE(VPO) ;S(TO(:POPS))F(FRETURX) SETR( ,S, 'TYPE',. '~t'F-STi051') SETR( .S,'AUX',Q) SETR(.S, *TE?lSEc,GETF( 'TNS')) PARSE (EP ~ ) ;S(TO(.QNP))F(F.RETURN) SETR( .s , *SUEJ*,Q) ;(TO( .TRYVP)) SETR(.S,'PREDO,Q) S = BUILDS(S) s {RETURN) CAT( 'DET ') .IS(TO( .DET4)) CAT( 'PRO'] aS(TO(.PFO)) CAT( 'NPR ") IS(TO(.NPR)>F(FRETURNI SETR( .NP, 'PROP',Q) ~(TO(.POPNP)) SETR( .NP, *PRO* ,Q) t (-TO(. POPNP)) SETR( .NS, 'DE-T*,Q) . a CAT( *ADJ*) IF(TO( .TRYN-)) SETRf .NP, 'ADJO,O) a(TO(.ADJ)) CAT( 'N ') ~F(FRETURN) SET!?( .NP, 'N ' ,Q ) R ~ ~ f ) ~ ~ SF(TO(.POPNP)) ( ~ ~ ( SETR(.NP,*PP",Q) S(TO(.TRYPP)) NP = BUI:LDS(NP) $(RETURN) CAT( -PREP ) t F(@RETURN) SETP(.PP,*PREP*,Q) PARSE(NP() 1 :F(FRETURN) SETR(.;PP, 'PREPNP ,U) PP = BULLDS(PP) ~(RETuRH) CAT( 'V * ) t F~F 'RETURN) SETRC PARSE(NPO) .VP, 've ,Q) :S(TO(.VNP)) PARSE(PP~) IF(TO(.POPVP)) SETR(.VP,'PP',Q) 1 (TO( .TRYVPP) ) SETR( .VP, *NP',Q) I(TO(.POPVP)) VP = BUILDS(VP) [RETURN) SNP TRYVP QilES QhP POPS NPR PRO DET ADJ TRY N TRYPP POPNP PP TRYVPP VNP POPVP TY LEXENG. DIP= (AUX)(TNS PAST). CAN= (AUX ) (TNS P'RES') . COULD= (EOR!/l 'CA : : ) . WILL= (AUX ) (TtlS FUT) .I THE= (DET) . A= (DET). AN= (UET). THAT= (CLI!!l)) . BOY= (tl)(!JUP :;L.tif;). BOYS= (N ) (NbS PL ) . . GIRL= (N)(tl9!{ LING) . GLRLS- (E'0ItE.I I ) 1 MAN= (N ) (t.lnri SL l iG ) . MEN= (N ) ( t !UR PL ) . WOMAN= (N)(NER :;L~Ic). WOHEN= (N) (1132 P I , ) . TABLE= (N) (!:!3R Sf !lC) . PL) . )ID YOU WALK TO THE VILLAGE DLD YO2 WALK TO THE V ILLAGE STATE CUES COF?PLEuZZdT STRING:' YOU UALK TO BUILD STSUCT3RE DID STATE PFO COMPLEYEYT STRINGa WALK BUILD STRUCTUFE YOU STATE POPNP COMPLEEE!;T STR iNC ; WALK BUlLP STRUCTURE YOU STATE QEiP COMPLEXEAT STRi?:G;b XALX BUILD STRUCTURE (NF(PRO STATE TRYVP COMPLEE'.ENT. STRLNG: WALK BUILD STRUCTURE (NP(PR0 STATE DET COMPLEXENT. STR ING a V-CLLAGE BU lLD STRUCTURE THE STATE TRYN COMPLEXEKT STRiNGt VILLAGE BUILD STRUCTURE STATE POPNP COMPLE3SST STRING: BUILD STRKTURE VILLAGE STATE TRYVPP ~OMPLEMEKT STRiNC: BUILD STPUCTURE (PP(PREP TO)(PREFNP (NP(PET VILLAGE ) ) ) ) STATE POPVP COMPLEMEAT STRING : BUILD STISt 'CTURE (PP(P .REP TOICPRZPNP (NP(DET T;iE)(N VILLAGE)))) STATE POPS COMPLEYENT S,Ti?I NG : BULLD SSRUCTURE: (VP (V WALK)(PP (PP(PREP TS)(PREPNP (NP(DET THE)(N VILLAGE)))))) STATE S COMPLEME?iT STRING 8 BUILD STRUCTURE r (S(TYPE Q3ESTII:O))AAX DDDDJTEEISE PAST)(S 'J&J (NP(PR0 YOU ) ) ) (PRED (V -P (V WALK) (PP (PaP(PREP TO) (PREPKP (NP(DET THE) (N V ILLAGE ) ) ) ) ) ) ) ) DO Y-OU WANT TO EXAKINE THE REGISTERS ? YES II OT OUTPUT = POP(PATH) SS(OT)F(EXAS\S\MIN) EP\P\OF PCPs POPVP TRYVPP POPNP TRYN DET TRYVP QNP POPNP PRO QUES DO YOU WANT TO EXAXLNE THE REGISTERS ? - C ~d THE TO THE TO THE YOU)^ TO THE YOU ) ) THE VILLAGE VfLLACE VILLAGE VILLAGE VILLAGE THE)(N Example 2, 90 ANGAS NOLT PHICASE NP POS ADJ KOM DET PL PL T NUH THWA POPNP EOC E&D ANGAS LEXICON LC-AS') = '(NOUN)(ENG DOG)* L<'MAT'> = '.(i!srr~) (ENG WOYAN) ' L~ 'FAE~A =' -> *(POSPRO) (EN?, MY? L<'RIZTO> = '( ADJ) (PL -PLl) (ENC GOOD) ' L<*R~~ 'J '-FfIJiTO> = '(ADJ) (P'L PL) (EMG GOOD) [ADJ (PL -PL) (ENG NIf3) ' ' L<*BIJIM*> = ) e<'~~'tj-WAN *, = *(ADJ,) (PL PL)(E~ .IS P IC>* L< 'GAK .> ( J E N ONE ) * L< 'UAP .$ = ( I t i TWO) L<'NYII*> = .(I)FT)(I.,NC 'I 'IlJ~~)~ L< '~JA '> = * (D l~T ) (b , t l~~ Tt!b;)- tc 'c~; ', = '(DET) ' (E!;'; A ) ' L< *MWA -3 = 1 ) ( t i EJL~JP) * L< 'RuLU. ';*? = ( 1 1 ( i "AMRE) L< 'KI - ) = ' (K l . ) (E t lC ; POS:;:.:;IVE) 'EX$% STATE POPNP COmYTLE!?ENT STR ING : BUILD STSUCTURE MvdA STATE 3.P 91 COYPLEEENT STR ING : BUILD STRUCTURE (MP(NOLTN AS.)(POSPRQ FANAHADJ NAN -? :A~J ) (DET CEHPL WA ) ENGLISH: DCl3 MY BIG A PLUR DO YOU WANT TO EXAM INE TEE REGISTERS ? NO INPUT STRUCTURE TO BE PARSED AS FANA B I J IY CE MWA AS FANA BLJIM CE KWA STATE ?OS CORPLEHSNT STRINS: FAMA BIJIM CE MW4. BUILD STFUCTURE AS STATE ADJ COYPLEMZNT STRISG: BIJIM CE MWA, BUILD ST8UCTURE FANA STATE DET COMPLEMENT STR ING : CE BUILD STRUCTURE BJJ IH STATE PLT COMPLEMENT STRING: Bu rm ST~HJCTYRZ ~ W A STATE NP COMPLEMENT STR ING : DID NOT PARSE BUILD STRUCTURE M'dA DO YOU WANT TO EXAM INE THE REGISTERS ? NO 1NPUT.STRUCTURE TO AS WdA AS MWA STATE POS COMPLE.ME~TI STRING: BUILD STRC~CTURE ' AS STATE KT COMPLEMENT STReING: BUILD STRUCTUaE STATE ADJ CONPLEMENT STRING: BUXLC STRUCTURE STATE KOM COMPLEMEfiT STR ING : BUILD STRUCTURE STATE DET GOXPLEMGNT STRING: BUILD, STRUCTURE STATE PL COMPLEYENT STSING : BUILD STRUCTURE STATE FLT COMPLEYENT STRING: BU ILD STRUCTURE MWA STATE POPUP COMPLEXEYT STRING : BUILD STFUCTUfiE YWA ST A'-?' E Y COXPLE3,FNT STRING: BU ILD ."sTRilCWTUFE P C AS) (PL MXA ) ) ENGLISH: , DOG PLUR DO YOU WANT TO EXANYE THE REGISTERS ? un BE PARSED MWA MWA MrdA pl !A' MWA MWA Example 3 FL~vCTION DEFINITIONS PARSER G,RAP * s SNP IMP €4 AX QfiP PO P s NP DET ADJ h' FOSPPO NPR POPrJF NPP P E-0 BLNP CEFINE("st]Nv) CEFIKF('ESOp) CEFJhW('NP[)PINw) CEFIREtPPP()') GFIFIW[p 'VP()N*) CEFXhE(.XO() S PARSER PARSE(S(1) OUT('S'rSTRpQ) (NX'T,COM) FARSE(NFO) ISCTOQ,SNPI~ CAT(,AUX) :S(TO(,Q)] FARSE(VF0) rS[TOC,XMP)lFCFRETURH) SETF(pS~'SUBJ'tQ) SETF( t S 'TYPEn# ~ 'CCL") FFFSE(VP()) 3S[TOT,POPS)) CAT( ,AUX) tS(TO[,AX))F(FRElTURN) EETR(,SI'TYPE'I?I~P') SETf:(,Sr'SU!3J*p '(PRO YOU)*@) a[TO(,PQPS)) SEW[ @SI'AUX'~Q) SETR(~SpPThS'IGETF(CTN5P)) EL7F~,Ev'TYPEC,"C'3 FEPSE(bP0) ~s(TQ(,QNPI)F(FRET~RN) SETRI,Sr "AUXoyQ1 SETF[,SI'ThS'rGETFfoTNS']) FXNJIkFD('HAVEC) SE'TP(IS,'HA",eHAVEe) FFRSE(VP[)) 8S~TO(,POPs)lF(FRETURM) SETF(,Sr *SUBJ',Q] FIt~DhPD( 'HAVE ') SE?R(,Sf "HApr@HRVE@) Pf iPSE(VF(3) rS(TO(,POPS))F(FRETWRM) EETF(,SI~PRED~,O) a a EUILDS("S /TYPE /SUBJIPRED~~] I [RETURN) CB%T{"DEI@> xS[TO(,DET>) CAT(*PRQP) rS(TO(,PROl) C.AT(*NPR@) t5[70( ,NPRIE FAFSECES[)I

a[TO[,PbNP)) SETP( ,NEiI CDET'pQ) C~T ("ADdp l !r(To(,WI EETR(~NPp@ADJP MgQ ) BUkP(@M') ttTo(,ADJS) CAI( *f ip j tF(FRETURN1 SETP(,NF, 'MC IQ ) I(TQ(,NPF)) SETG [ ,Nfi, *PROCpO) r(%Q(,ADJ)l SETP ( ltJFI "NPR5rQP XS(SE%I; [CCA~Eb)t @FOSYl CHG~AM(~NB\ p ~ ~ @PQS?IPR.@) ~ * c IFI:TO(,MPP)) FAPSEIFSf,)) tF(TO(,ADJ)I SF'TPC,?4F,'POSSflrG) t;E = BUILDS(*/rJP/!~PP/POSS./@4 I [RETURN) FAPSECFSO) rS(TO( ,NPESl) hF a HUILDSCNP) &!(FErLUPN) FPRSE (PP ( ) ) IF(TOE,POPNPI) SkTH ( ,f iEr "NPPC NrQ) EVPP(~N~I esTo(,Npn>) GFTF('CASEP) bPOS* rS(TO(,POSPAO)) EFTu( , f~P r 'PPOCpQ1 1 (TQ( rPQPfJP1) CAT('ADJC) tFdTO( p?dBL)l SETPI ,!IF? 'ADJ' p Q ) z (TO( ,FLhPlI tSCTO(,NPES)l NPL NPES HTPP VADJ VDJPF VADJES NTNP VNP V IONF IOL AUXBE PAS PFPP FRNP TRPAS CA?('N') I$(GETF('N0R')rvPL') 2F(FRETURH) ~ETF(,KFI *KOpQ ) t (T0CQPQPNl3)) SETS( ,biFr 'COPP'r C ) hF = EUlGDS(%P) :IFEIUFN) PP PARSER C''A'I'(@PREPe) ' ;S(TO( ,PREP)IFCFRETURN) FCPPREP'> = Q PP -= 'IFFEP Fq'PPEP', NpO ') ' ;S(RETURNlF(FRETUPN] VP PARSER CAT( .V ' I ~F(TO(,AUXBEI) SFTR( ,VFt 'TtJS'rGETF('TWS')) HkStdAM(FS*r*AUXr) GETR('TNS*) IS(GETF('VTYP')r'TPANS') tS [TO[ ,TPAPiS) )F(TO( ,ITRAN]c) SE'IF( ,VFp 'VT'pQI I(~Q(,VNP)~ SETRC ,VPt 'VgPQ) tETo(,NTPP)3 CPT("AD3') $S('IB[,VADJ>) FARSE(NP0) ~S(TOC,NTRPI) FPFSE(PP()) tF[TO(,POPVP)) SETR(,VPICVPPo WIG) EUI.'P(?N') : (TO[ ,NTPP)'l SEfR( ,VP, *ADJgrQ) FFP SE(ES()) tS(TO(,VADJES)) VF = EUILDS(VP1 PAPSECFFO) tF(RETUFN1 VF = VP Q rtTO(,VDJPPI) SE?P(,VPI 'ADJESWpG) t(TO( ,POPVP)) S~'IF(~VFr 'NTNPCJQ) ~(T~(QPQPVP>) FPFSECIC()) tS(TO(,XOI)) FAR-SEINE ()) IFCFPWTVRN) SETF( ,VP, *OBJ',Q) FAPSE(IO0) lS(ToC,XOL)lF(TOC,POPVP))

SETPCQVPt 'XOC,C) FAFSE(NF()) tS(TQ(,VXONP))F(FRETURH-j SE'IF[,VFc *OBJ'rQ) t(TOC,PQPVP)) SETR(,VF, 'IO',Q] t (TO( ,POPVP)) CA'I('V'r'~LTP) fS(TOC,BE)) IS('fESTF ( *TYPEo) , 'C*) ISCTESTR(~AUX') @BEe) $F(TO[,'SRYESI) SE?~ ( ,VF I~V '~GEZF~ 'AZJ )S '~~ tCTO(,PAS)) IS(TESIF(*IF')l PA~SECESO) :F (FRETUFW ,j NP = Q t(FETURN1 GETF(,VFp 'V' ,Q) SE'IF( ,tF , "TNSr,GETF[fTUS*)] CAT [ 'V') tFCTO(,TADJ]) kORO 'INGC iStTo(,ING)I IS[GETF('VTYP@))~'TRANSC) tFCFRETUPN) GETF('TNS*) 'PPRTf tS(TPPAS)P[FRETURN) VF\z SkTF( @VPr 'AUX'r 'RE') Sp-TF( ,VEr 'TMS'r 'PPRG') SETP( ,VP ,*VC IQ ) FAPSE(NP()) ISCTO~~PRNP)) VF = ~~ILUS (*VP ) PAFSECPFC)) ~FGRETURN) VP = VP 0 tCTO[,PRPp)) SETF(,VP,*PPNPrpG] t(rO(,POPVP)) VP = 'AUXpr 'BE') 'TYFE '), '€2') SETR( ,VPp 'TNS', *PPRTU) VPES TFPP FIO PNPTS'I PNP FIOL I0 IOTO ADD, TO XOFOF TES TWV POPES END CETF( 'V-') SETP(;VP,-p~',~) FAFSECIQ())*. tSCTO(pP1O)I F x ?~D~P u ( '~EY~ ) ;S I IO ( ,PNPTST)) F I~C~hF I 'FPOMF) ) tS(TO(,PQPTSTI) FAFSE(ES()) !S(al:Q,C ;VPESl.jF(TO[ ,CHGSBJ)) SETR{weVPc 'ORJV,R<'SUP3')) XS(?ESTFI*TYPE'IV'DCL~ FFSET(*TYPEPt*TRPhSC) I~(TESTF(*TYPFC),'Q') PESE'I('TYPE','QPAS') FESET ('SUBJ'p 'SOFEONE') t (TO(,POBVP)) SLTPC ,VPv 'OBJESVr Q) VQ = EUILDSCVP) FAPSECPPO) ~FCRETURN) VP = VF Q I (TO( ,IRPP)~ SETPC e~P~*XO'~Q) FJNChRD('BY@) ;SCTO(,PNS?ST)) F~~~HFD('FPOM~) tS(ToC,PNPTST>>F(FRETURN) FARSE(NF0) :S(Tol,PNP))FCFRETURN) SETP(,VF,'OBJ8rGfTRCCSUBJ*)] RESET( 'SUT3dS8 Cl] IS~TEST~ T TYPE^, ~DCL.) PSSET(@TYPE~, .TF ?PA ~ ~ ~ ~S(TESIP['TYPE~')~~Q@) FESETr"TYPEr,VQpA$@] IS(P<'PO'>) tF(TQ( ,POPVP)) PAFSE(IO()) 8S~~Q(8~IoLl>F(TO[,POPVfa)) EETP( 5VFp'10vrQ] :SfTO( ,PQPVP)) VF = BUILDS(VP) r (RETURN) INOIPEC l OBJECT FIhChPD(CTO*) FXVChPD( *FORp) SETR(,ICI'PREPgpr*TO') FARSE (NF () ) ' xS(TOC~XONR)I STP = '10 STR ICFRE'XUPN) SETF( ,IC,"PREPfpQ) FAPSEChFO) rStTO(,IONP)) STR = 'FOR p $TI? 1 (FRETURN) SETRC.I~~*IONP.,G) Xa = aUILDs(X0) 8CPETUPN.l CAT ( 'CLINDp ) 8S(TO( ,TESI) FIVCbPD(@TOc) gSCTO(,THV)) FXI;l?b.RD C 'HAVING') IS{TO(,ESVP)) IE(GEIF[rTNS')rrPPPCO) tF€FRETURN) FAFSF(VP()) aF [FRETURN) ES = C (RETURN SETP[ ,ESpCCLnI!JD ($0) ,Q) PARSF I ?SrTQCaP0PEs3)F(FRE'3:URM] SLTP ,ESP 'ItJF" TOC ) EINDbFD[ ' ' l lAVFe) OSITO(~ESVBII PARS t l 1) OF (ADD,T0) SETFI,ESpCESVPB~C) ES = BUILOS(@/S/TYPE/SUU3/ESVP/@I SETF l #EE, *A t lX"r 'HAVE@ '] FARSE(VF0) rFCFPETURN) SETh( ,ESP *ESVP'~Q) ES = RUXLDSIV/S/TYPE/SU~3J/AtJX#ESVP/*) t CPETURN') ES a C tCPETlJPN) rStTO(,IOTO)) rSETOC,IQFCR))FCFRZTURNl n (RETURN) .:'Thi~ I. COPIFLEbIEIil ITF'I tit: GO ElJIL-11 ITPI-IIZTI-~FE: 1 1 T'r'PE L.1 I - , UE 1 I.I~F-1- PF0 1 1 1 I I FFED 1 $ ' 1 T id PF'E' 4 1 I*;T l~lRt{T-&

I :I { z IT'~'PE'~ICLI I - -l-IE,_I I~~P I~F I I '~J 1 I IEI'~:'F 1$'Ft1t,! 13n) Ff?E.Z *I .#*.I 3 .I .I .I .-I b DO '1'01-J ItlRtiT TO E: :fiM'ItiE THE F'EGISTEET C I < t1u I tiP!!T Z TFUC TUPE TO I THIril THIlib lHld t4bT LE: :ICUli 'J'E" .Id I2 P-11 ? .-. AH!?? FE.a $LIFE 1% ! T II~Q~;~I$ 3 I I:€ FRF-ZED ZTf3fE Z COlrlPLEMEtfT I :TF'UCTI,IFE: 11if'r'PE DI:LI (:SlJE:J II~P~PFO I I F T It i FF 'E 'SI~~~T lHIli)E'* (OBJ fi-riP11;QMP ~I~T 'I 'F 'E DGLJ 4:KE:J IIIF'~F'F'O 1.1 tF'PED fVP1TIiI PHfTl rvT IHIIII tnBJ ~.~{P;PPD 41'UtJl 1 1PF.EP IIIITHI~{F'I,FFIO HEPl 1 1 i 1 1 1 1 1 1 ) 1 1 TI-FIT I I .IRIll THHT Z i i l t l I11 LE%ICOt4 ADil. TO fiE:OF'T- F'HFIE T'I'PE I TDP. ELI€ TS'Pi 'I'EZ '.1"171 IlIlTH HER '-t'OCI 1111 TH HER

,TFIL t4G I TPHtiZ, tTC4 1 F'H 3T > ?I WI {if:: HEF rtwr STRUCTURE TO BE PARSED ,J0F l i lm S 1 :~L-IEVING THAT tIfIR'I' IS GOIN'G TO TI-IE VILI-AGE: IS EIYSTCRIOUS .iOl-It4 " S E:IZLIEt?II.IG THAT MARY IS GCIII4G 'TO THE VILL-AGE JS MYS TEriIOUS SIf'lTL: S CL7i rl"I, Eflr3i.4T .I; ST133:PICi SL i ! C : : MY i STFri 1 OUS 1IU:l LEI TI\LJCTUI'<E ( f YPC rrcl, I (E.lrK ~ i ~ S ~ ( F LICII-li\,! ( I='OSs ( VF'( TfiS F.'r;CS ( VT r:ELIcvE ) ( l3E.i' (NI'" ( COril ' ( 3 i ! f C.1, ( SLJD J, ( (141 '( NI''~.' Mr;TiY ) j ( F'FiED ( VI-' (kUX I{E ) 1 I V G09ilFtFiEf' (PRED EE i I I VILL , ,^ .G11)))))1))$))) ( VI'' (V ( TNS T'FiE.2 i ChLI,J M'I'S T'ERIOUG) ) 1 ) DO YCSU WAIdT 70 IIYkMTNE TI-IE REGISTERS ? NO IPIF'UT STRUCTUI7T: TO BIZ F'ARSED TliAJ tIE, EFiOliE tlER 11 751-1 IS SERIOUS T l ' l~ l I--IE I I I 1'5 SEIXIUUS STATE S Ct3MF'L.t:Ml11.I T S fRTNG: SERIOUS L~UILTr STRUCTURE t (S(TYT'E DCI,) (SbKtJ* (NF'(CO45F' (S(TYF'E KlCL) (SUEJ (NF1(I-"TiO (FRED T14S PhST ) u f T:I?EÎ II.\ ) C 0E.J ( ( PRO I !El7 1 i N, LIISI I) HE))) ( VT-' t f ( T413 1 ) 1 ) ) z FRED ( VF'( v i~ r~is FIRES i ~,LI.J SER~OUS 1 I 1 DO YOU WANT TO EXAhINE Ti4E REGISTERS ? ?\I0 II!F'UT STRUCTURE TO BF F'ARSED TI-IE EO'i' ?!L'E(I~ I t.IZ 'T I-If.: f;L(.52S IS MLJLLIGAN Ti-IE F:OY I:HECII,I~.IG TI-IE GLASS 1S MULLIGAN STATE S CO~-~F'LEH~ZP~T STR JNG : MULLIGAN EU-I LK1 $7 r.:UCTURE t ( 5 ( 7 YIZ'C DCI, 1 ( SUBJ C i t T I t EjDY 1 ( EMP ((IF' (TNS F'F'RG ( VT EREAIi) (,OBJ uw;(ZsCr i I ~ I -ASS ) ) ;~~~ ) (F 'KED (LIF'(V BEHTNS FRES) ( flTNT' PJ!:# ( NrlR MllL 1-1 GAf4 j ) 1 1 ) 1 DO YOU wntu TQ Eitnrm<c I REGISTERS ? NO TP!F'UT STRUCTURE TO hE PARSED JCltlrdD S IiEIt!G TtiIN 'IS NICE JUIIPJQ4; K:EING TI-IIIJ IS NICE STATE '3 COP~F'LEI.II:I~T STRING : : NICE HUlLLI 43-1 RUCT'UliE (E:<TYFIF, DCI,itSUI!J !f4F1(tJF'R JOIiNmSi*(POSS iLIF'(U-EE)(TNS F'F'KG) (ABJ 7Hlll)j)):~a(F'I;'CLI i l:E)ITNS I-'li'CS)(kLIJ PIICE)))) 110 YOU WANT TU EXAMINE THE I?'I,GISTEI?S '3 NQ J NFUT STF:LJCTLIr*:C: 7 CI .; CL":'*AKSTD TO rd I n I wnr; t11 rll-iEnM r:r: fi nnbt wn; 111s 11nEnM 5 -1.17 T r L; I I I STRTNG rw I. I :+~ITI.JC tmr: : : DIXECiM ( I; 1 T ( T?I"IT DCI, 1 ( ,I;IJL{ J (NI ' ( I;OMTB ( S ( 'T Y[.'E J I J r I ( I t I t h j ( E4 MAFI 1 i 1 C Z1IK.i 1 j 1 ) 1 i J 1 ( ET;VF.' ( VI" ( V PE ) j ) ( l'*l<L I1 ( VI-' ( V BE )

(7.N:; lbfi:.;T) 1 } ) 110 YOIJ umg'r rn TX~I~+T .NK 71-11: ,KG 1:;rlir;S 'F N CI C L f; TF: I E4r; f LCLl I I.JI :, 1 lilll: 1 IJIit-1 tS( TYIvr: DCL) (!;LI.T,;J I , DE I"')iRSCTi lil. C;I\L.CSS iil- Clil-ESE; (UI*(TNS iL'PR(;i (VT 14liI:t\l\) (OL{-l (i4F1(N IfTSl!ESk)) THE F{QY RuNNIJ~J\]~G TO THE HOUSE IS JOHN 'TtiE BOY RkJNl43 NG TO TliE HOUSE IS JOHN S'1A7E S COrlPLEMENT STRING: JOHN BUILrl S TI?UCTUI7E *' (S(TYF'E UCI-1 (SUGJ OEP(DET TI-IE) (id DoY) (EMB T DE r THE ) (N I-IUUSE )'> j ) i (Ul='iU RUN) (TNS F'F'RGI (UFp (I"RI,r-' CI (Ef1'- C > > > > PIED ( VF' ( U BE) <TNS F'RES)(N?I'IP (I!I:'(NE'R JOI-IN)))))) TI^ ~ J u lrJriNr TO EXRMlNE THE REGISTERS ? NO INPUT STF;UCTURE K:REklxING IIISWCS DREklil 1.IG IllSliES ST'f2T.F S CCIFil='LEMi:I!T S TRIf4G t BUILD STRUCTURE t (S(TYF'E DCLi(6UKiJ tNP(COM13 CVF(TNS T-'F'RG)(VT EREAh)(OBJ (NF '(N DISI-IESl>) ) ) I ) E L P J E:E > (TNS PRES) f AT1 J RECIiLESS ) ) 1 NO INPUT STRUCTURE TO BE PARSED I WAS TkJIt4liING THAT YOU E\F\WEKE CONSERVATIVE f UkS THI?.II<ING THAT YOU WERE CQNSERVATIVE STATE S COMPLEMTi4T STRING! CONSERVATIVE PUIL'It STRUCTURE: {SCTYFIE DCI,) (SUFJ (NF'(PR0 1)) 1 (FRET1 CUF'(RUX E:E) (TNS FF'RG) (V TI-/INI<j-CFYRPJF' (NF'CCOMI=' (S (TYF'E ICILL~ (SU~J (NF'(FRO YOU) ) (F'F:ED ( I 3 EEHTNS I='AST)(ATiJ CDNSERVCiT1VE)))I)))))) DO YOU WRrX TU EXAhIIIE THE REGISTEl3S ? NO TNFUT STKUCT-URE TO BE PARSED JOIN we PELI~VED TD RE DELAYEII 401 iV WiiS t:lfLIEVt~t TO BE DELAYED 57kTt S COMPLEI.SEN T S'Tf I NG t DELAYEX1 BUILP S TRUC'TURE t (8 ( TYF'IE TriPhS ) t S~IIEI J SOHEONE? ( PRECI (:GUX, Blfb> ( TNS P~ES 1 t U *EEL JEVE ) (CIFJES C S Z TYI"E TT<FI.IS*) (,SUE{ J SC)r?EOl.ll: > I ILZV1.' ( V~?J( AUX Pic) ( T PIIS F'I"11T 3 (U DELAY) (C1-E.J (NF (~tI=~l7 JQl{]gj >> j ) 1 > ) 3 1 DO vDU WRi$'r TO EX#+MI IJE 'THC t7FGTSSTETiSj ? ND rbrrur STIxuc-rulx cu EE pAflSED T.Hi?T tie E E-IER IiS 31 1 WAS SI,RI.OWS TI-lhT HE t:fT~liE l711SR IC1I S1.f YhS' SERIOUSSTATE S COMr.'LEHISl4T STRT'NG $ C:ERIOUS BUILD STI2WCTURE t (S $TYPE lDCld) (WC J (WP( COI.IR IS (TYPE DCL, 1 (SlJp,l ( 1 HE) j I. (F'ftED (V17'fifNS ~A$ 'J ) (V-f GREhli) (OIIJ (I),F'(FBFiD H~K) (N DfS1-l))) ))I)!) )) WREIJ V E f , F.h:;T) ChLlJ I;Efi,ID,US:) 1 ) DO YOU WANT i (3 .EXtlMINE TtlE REGIGTERs ? ' YES TO IS IS BE PARSED F:ECKLESS RECKLESS RECliLESS

NP POPVF 11lStt4T TO E: :%PIItiE THE PEG1 STEF'? F DCJ NO ItjPLIT Z-TFUSTUPE TO BE PRFIED I IlltitiT TO 150 1. IIIH~~T D GO I - 5 I I I TC1 . . I I I 1 I IS I I I 4 I I I 1 1 2 E;TttI,E YOU WANT TO EXAMINE THE REGISTERS ?