<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1" global="59"/><title>The Generalized LR Parser/Compiler V8-4: A Software Package for Practical NL Projects</title><author surname="Tomita" givenname="Masaru"><org  name="Carnegie Mellon University" country="USA" city="Pittsburgh"/></author></firstpageheader><frontmatter><p><b>The Generalized LR Parser/Compiler V8-4: A Software Package for Practical NL Projects</b></p><p>Masaru Tomita School of Computer Science and Center for Machine Translation Carnegie Mellon University Pittsburgh, PA 15213, USA mt@cs.cmu.edu</p><p><b>1. Introduction</b></p></frontmatter><abstract></abstract></header><body><section title=""><p>This paper<footnote anchor="1"/> describes a software package designed for practical projects which involve natural language parsing. The Generalized LR Parser/Compiler V8-4 is based on Tomita's Generalized LR Parsing Algorithm [7, 6], augmented by pseudo/full unification modules. While the parser/compiler is not a commercial product, it has been thoroughly tested and heavily used by many projects inside and outside CMU last three years. It is publicly available with some restrictions for profit-making industries<footnote anchor="2"/>. It is written entirely in CommoriLisp, and no system-dependent functions, such as window graphics, are used for the sake of portability. Thus, it should run on any systems that run CommonLisp in principle<footnote anchor="3"/>, including IBM RT/PC, Mac II, Symbolics and HP Bobcats.</p><p>Each rule consists of a context-free phrase structure description and a cluster of <i>pseudo equations </i>as in figure 1-1. The non-terminals in the phrase structure part of the rule are referenced in the equations as <b>xO. . ,xn, </b>where <b>xO </b>is the non-terminal</p><footnote label="1">Many members of CMU Center for Machine Translation have made contributions to the development of the system. People who implemented parts of the system, besides the author, are: Hideto Kagamida, Kevin Knight, Hiroyuki Musha and Kazuhiro Toyoshima. People who made contributions in maintaining the system include: Steve Morrisson, Eric Nyberg, Hiroakj Saito and Hideto Tomabechi. People who provided valuable comments/bug reports in writing and debugging grammars include: Donna Gates, Lori Levin, Toru Matsuda and Teruko Mitamura. Other members who made indirect contributions in many ways include: Ralph Brown, Jaime Carbonell, Marion Kee, Sergei Nirenburg and Koichi Takeda.</footnote><footnote label="2">For those interested in obtaining the software, contact Radha Rao, Business Manager, Center for Machine Translation, Carnegie Mellon University, Pittsburgh, PA 15213 ( rdr@nl.cs.cmu.edu ).</footnote><footnote label="3">ln practice, however, we usually face one or two problems when we transport it to another CommonLisp system, due to bugs in CommonLisp and/or file I/O complications.</footnote><doubt alpha="51.2" length="43" tooSmall="False" monospace="0.0">(&lt;NP&gt; &lt;VP&gt;) case)  = nom) form)  =c finite)</doubt><doubt alpha="30.0" length="10" tooSmall="False" monospace="0.0">(&lt;DEC&gt; &lt;=&gt;</doubt><doubt alpha="42.0" length="50" tooSmall="False" monospace="0.0">(((x2  :time)  = present) ((xl agr)  =  &lt;x2 agr)))</doubt><doubt alpha="40.8" length="71" tooSmall="False" monospace="0.0">(((x2  :time = past))) (xO = x2) ((xO subj)  = xl) ((xO passive) = -)))</doubt><p>Figure 1-1 : A Grammar Rule for Parsing in the left hand side (here, &lt;<b>dec</b>&gt;) and <b>xn </b>is the n-th non-terminal in the right hand side (here, <b>xl </b>represents &lt;<b>np</b>&gt; and <b>x2 </b>represents &lt;vp&gt;). The pseudo equations are used to check certain attribute values, such as verb form and person agreement, and to construct a f-structure. In the example, the first equation in the example states that the case of &lt;NP&gt; must be nominative, and the second equation states that the form of &lt;VP&gt; must be finite. Then one of the following two must be true: (1) the time of &lt;VP&gt; is present and agreements of &lt;NP&gt; and &lt;VP&gt; agree, OR (2) the time of &lt;VP&gt; is past. If all of the conditions hold, let the f-structure of &lt;DEC&gt; be that of &lt;VP&gt;, create a slot called "subj" and put the f-structure of &lt;NP&gt; there, and create a slot called "passive" and put "-" there. Pseudo equations are described in detail in section 3.</p><p>Grammar compilation is the key to this efficient parsing system. A grammar written in the correct format is to be compiled before being used to parse sentences. The context-free phrase structure rules are compiled into an <i>Augmented LR Parsing Table, </i>and the equations are compiled into CommonLisp functions. The runtime parser then does the shift-reduce parsing guided by the parsing table, and each time a grammar rule is applied, its CommonLisp function compiled from equations is evaluated.</p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">1</doubt><doubt alpha="25.0" length="4" tooSmall="False" monospace="0.0">(&lt;x2</doubt><doubt alpha="40.0" length="5" tooSmall="False" monospace="0.0">(*OR*</doubt><page local="2" global="60"/><p>In the subsequence sections, features of the Generalized LR Parser/Compiler v8-4 are briefly described.</p></section><section number="2." title="Top-Level Functions"><p>There are three top-level functions:</p><p>; to compile a grammar <b>(compgra </b><i>grammar-file-name)</i></p><p>; to load a compiled grammar <b>(loadgra </b><i>grammar-file-name)</i></p><p>; to parse a sentence string <b>(p </b><i>sentence)</i></p></section><section number="3." title="Pseudo Equations"><p>This section describes pseudo equations for the Generalized LR Parser/Compiler V8-4.</p><subsection number="3.1." title="Pseudo Unification, ="><p><i>path val</i></p><p>Get a value from <i>path, </i>unify it with <i>val, </i>and assign the unified value back to <i>path. </i>If the unification fails, this equation fails. If the value of <i>path </i>is undefined, this equation behaves like a simple assignment, if <i>path </i>has a value, then this equation behaves like a test statement.</p><p><i>pathl </i>= <i>path2 </i>Get values from <i>pathl </i>and <i>path2, </i>unify them, and assign the unified value back to <i>pathl </i>and <i>path2. </i>If the unification fails, this equation fails. If both <i>pathl </i>and <i>path2 </i>have a value, then this equation behaves like a test statement. If the value of <i>pathl </i>is not defined, this equation behaves like a simple assignment.</p></subsection><subsection number="3.2." title="Overwrite Assignment, &lt;="><doubt alpha="63.6" length="11" tooSmall="False" monospace="0.0">path &lt;= val</doubt><p>Assign <i>val </i>to the slot <i>path. </i>If <i>pathl </i>is already defined, the old value is simply overwritten.</p><p><i>pathl &lt;= path2 </i>Get a value from <i>path2, </i>and assign the value to <i>pathl. </i>If <i>pathl </i>is already defined, the old value is simply overwritten.</p><p><i>path &lt;= lisp-function-call </i>Evaluate <i>lisp-function-call, </i>and assign the returned value to <i>path. </i>If <i>pathl </i>is already defined, the old value is simply overwritten, <i>lisp-function-call </i>can be an arbitrary lisp code, as long as all functions called in <i>lisp-function-call </i>are defined. A path can be used as a special function that returns a value of the slot.</p></subsection><subsection number="3.3." title="Removal Assignment, =="><p><i>pathl </i>== <i>path2 </i>Get a value from <i>path2, </i>assign the value to <i>pathl, </i>and remove the value of <i>path2 </i>(assign nil to <i>path2). </i>If a value already exists in <i>pathl, </i>then the new value is unified with the old value. If the unification fails, then this equation fails.</p></subsection><subsection number="3.4." title="Append Multiple Value, &gt;"><p><i>path 1 &gt; path2 </i>Get a value from <i>path2, </i>and assign the value to <i>pathl. </i>If a value already exists in <i>path 1, </i>the new value is appended to the old value.  The resulting value of <i>pathl </i>is a multiple value.</p></subsection><subsection number="3.5." title="Pop Multiple Value, &lt;"><p><i>pathl &lt; path2 </i>The value of <i>path2 </i>should be a multiple value. The first element of the multiple value is popped off, and assign the value to <i>pathl. </i>If <i>pathl </i>already has a value, unify the new value with the old value. If <i>path2 </i>is undefined, this equation fails.</p></subsection><subsection number="3.6." title="'DEFINED* and &quot;UNDEFINED*"><p><i>path= </i><b>*defined*</b></p><p>Check if the value of <i>path </i>is defined. If undefined, then this equation fails. If defined, do nothing.</p></subsection><subsection number="3.7." title="Constraint Equations, =c"><p><i>path </i><b>=c </b><i>val </i>This equation is the same as an equation <i>path = val</i> except if <i>path </i>is not already defined, it fails.<page local="3" global="61"/></p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">2</doubt></subsection><subsection number="3.8." title="Removing Values, 'REMOVE*"><p><i>path = </i><b>* remove* </b>This equation removes the value in <i>path, </i>and the path becomes undefined.</p></subsection><subsection number="3.9." title="Disjunctive Equations, *0R*"><p><b>(*or* </b><i>list-of-equations</i></p><doubt alpha="60.0" length="25" tooSmall="False" monospace="0.0">list-of-equations . . ..)</doubt><p>All lists of equations are evaluated disjunctively. This is an inclusive <b>or</b>, as oppose to exclusive <b>or</b>; Even if one of the lists of equations is evaluated successfully, the rest of lists will be also evaluated anyway.</p></subsection><subsection number="3.10." title="Exclusive OR, *EOR*"><p><b>(*eor* </b><i>list-of-equations</i></p><p>This is the same as disjunctive equations *<b>or*,</b> except an exclusive <b>or </b>is used. That is, as soon as one of the element is evaluated successfully, the rest of elements will be ignored.</p></subsection><subsection number="3.11." title="Case Statement, 'CASE*"><p><b>(*case* </b><i>path</i></p><doubt alpha="51.4" length="37" tooSmall="False" monospace="0.0">(key 1 equation 1 -1 equation 1-2...)</doubt><doubt alpha="52.4" length="21" tooSmall="False" monospace="0.0">{Key2 equation2-1...)</doubt><doubt alpha="50.0" length="24" tooSmall="False" monospace="0.0">( Key3 equations-1)....)</doubt><p>The *CASE* statement first gets the value in <i>path.</i></p><p>The value is then compared with Key1, Key2,.....and as soon as the value is <b>eq </b>to some key, its rest of equations are evaluated.</p></subsection><subsection number="3.12." title="Test with an User-defined LISP Function, *TEST*"><p><b>(*test* </b><i>lisp-function-call) </i>The <i>lisp-function-call </i>is evaluated, and if the function returns nil, it fails.   If the function returns a non-nil value, do nothing.  A path can be used as special function that returns a value of the slot.</p></subsection><subsection number="3.13." title="Recursive Evaluation of Equations, 'INTERPRET*"><p><b>(* interpret </b><i>path) </i>The 'INTERPRET* statement first gets a value from <i>path. </i>The value of <i>path </i>must be a valid list of equations. Those equations are then recursively evaluated. This "INTERPRET* statement resembles the "eval" function in Lisp.</p></subsection><subsection number="3.14." title="Disjunctive Value, 'OR*"><p><b>(*or* </b><i>valval...) </i>Unification of two disjunctive values is set interaction. For example, <b>(unify </b><b>' (*or* </b><b>abed)</b><b>  ' (*</b><b>or* </b><b>b d e f </b><b>) </b>) is <b>(*or* </b><b>b d) .</b></p></subsection><subsection number="3.15." title="Negative Value, 'NOT*"><p><b>(*not* </b><i>valval...) </i>Unification of two negative values is set union. For</p><p>example, <b>(unify </b><b>' (*not* </b><b>abed)</b><b> </b><b>' (*not* </b><b>b d e f </b><b>) ) </b>is <b>(*not* </b><b>a b c d e f ) .</b></p></subsection><subsection number="3.16." title="Multiple Values, *MULTIPLE*"><p><b>(*multiple* </b><i>valval...) </i>Unification of two multiple values is append. When unified with a value, each element is unified with a value. For example, <b>(unify ' (*</b><b>multiple* </b><b>a b</b></p><p><b>c d b d e f)   'd) </b>is <b>(^multiple* </b><b>d d) .</b></p></subsection><subsection number="3.17." title="User Defined special Values, * user-defined*"><p>The user can define his own special values. An unification function with the name <b>uni fy* </b><i>user-defined* </i>must be defined. The function should take two arguments, and returns a new value or *FAIL* if the unification fails.</p></subsection></section><section number="4." title="Standard Unification Mode"><p>The pseudo equations described in the previous section are different from what functional grammarians call "unification". The user can, however, select "full (standard) unification mode" by setting the global variable *<b>unification~node</b>* from <b>pseudo </b>to<page local="4" global="62"/></p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">3</doubt><p><b>full</b>. In the full unification mode, equations are interpreted as standard equations in a standard functional unification grammar [5], although some of the features such as user-defined function calls cannot be used. However, most users of the parser/compiler find it more convenient to use PSEUDO unification than FULL unification, bot only because it is more efficient, but also because it has more practical features including user-defined function calls and user-defined special values. Those practical features are crucial to handle low-level non-linguistic phenomena such as time and date expressions [8] and/or to incorporate semantic and pragmatic processing of the user's choice. More discussions on PSEUDO and FULL unifications can be found in [10].</p></section><section number="5." title="Other Important Features"><subsection number="5.1." title="Character Basis Parsing"><p>The user has a choice to make his grammar "character basis" or standard "word basis". When "character basis mode" is chosen, terminal symbols in the grammar are characters, not words. There are at least two possible reasons to make it character basis:</p><p>1. Some languages, such as Japanese, do not have a space between words. If a grammar is written in character basis, the user does not have to worry about word segmentation of unsegmented sentences.</p><p>2. Some languages have much more complex morphology than English. With the character basis mode, the user can write morphological rules in the very same formalism as syntactic rules.</p></subsection><subsection number="5.2." title="Wild Card Character"><p>In pseudo unification mode, the user can use a wild card character "%" in his grammar to match any character (if character basis) or any word (if word basis). This feature is especially useful to handle proper nouns and/or unknown words.</p></subsection><subsection number="5.3." title="Grammar Debugging Tools"><p>The Generalized LR Parser/Compiler V8-4 includes some debugging functions. They include:</p><p><b>• dmode </b>— debugging mode; to show a trace of rule applications by the parser.</p><p><b>• trace </b>— to trace a particular rule.</p><doubt alpha="60.0" length="35" tooSmall="False" monospace="0.0">•disp-trees,   disp-nodes,   etc. —</doubt><p>to display partial trees or values of nodes in a tree.</p><p>All of the debugging tools do not use any fancy graphic interface for the sake of system portability.</p></subsection><subsection number="5.4." title="Interpretive Parser"><p>The Generalized LR Parser/Compiler V8-4 includes another parser based on chart parsing which can parse a sentence without ever compiling a grammar:</p><p>; to load a grammar <b>(i-ioadgra </b><i>grammar-file-name)</i></p><p>; to run the interpretive parser <b>(i-p </b><i>sentence)</i></p><p>While its run time speed is significantly slower than that of the GLR parser, many users find it useful for debugging because grammar does not need to be compiled each time a small change is made.</p></subsection><subsection number="5.5." title="Grammar Macros"><p>The user can define and use macros in a grammar. This is especially useful in case there are many similar rules in the grammar. A macro can be defined in the same way as CommonLisp macros. Those macros are expanded before the grammar is compiled.</p></subsection></section><section number="6." title="Concluding Remarks"><p>Some of the important features of the Generalized LR Parser/Compiler have been highlighted. More detailed descriptions can be found in its user's manual [9]. Unlike most other available software [1, 2, 4], the Generalized LR Parser/Compiler v8-4 is designed specifically to be used in practical natural language systems, sacrificing perhaps some of the linguistic and theoretical elegancy. The system has been thoroughly tested and heavily used by many users in many projects inside and outside CMU last three years.<page local="5" global="63"/> Center for Machine Translation of CMU has developed rather extensive grammars for English and Japanese for their translation projects, and some experimental grammars for French, Spanish, Turkish and Chinese. We also find the system very suitable to write and parse task-dependent semantic grammars. Finally, a project is going on at CMU to integrate the parser/compiler with a speech recognition system (SPHINX [3]).</p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">4</doubt></section><section number="7." title="References"><doubt alpha="47.6" length="21" tooSmall="False" monospace="0.0">[1]      Karttunen, L</doubt><p>D-PATR: A Development Environment for</p><p>Unification-Based Grammars. In <i>12th International Conference on</i> <i>Computational Linguistics.</i><i> </i>Bonn, 1986.</p><p>[2]      Kiparsky, C. <i>LFG Manual.</i></p><p>Technical Report, Xerox Palo Alto Research Center, 1985.</p><p>[3]      Lee, K. F. and Hon, H. W.</p><p>Large-Vocabulary Speaker-Independent</p><p>Continuous Speech Recognition. <i>Proceedings of IEEE Int'l Conf. on Acoustics,</i> <i>Speech and Signal Processing</i>, 1988.</p><p>|4]      Shieber, S. M.</p><p>The Design of a Computer Language for</p><p>Linguistic Information. In <i>10th International Conference on</i> <i>Computational Linguistics, </i>pages 362-366. Stanford, July, 1984.</p><p>[5]      Shieber, S. M.</p><p><i>CSLI Lecture Notes: An Introduction to</i></p><p><i>Unification Approaches to Grammar. </i>Center for the Study of Language and</p><p>Information, 1986.</p><p>[6]     Tomita, M.</p><p><i>Efficient Parsing for Natural Language. </i>Kluwer Academic Publishers, Boston, MA, 1985.</p><p>[7]     Tomita, M.</p><p>An Efficient Augmented-Context-Free Parsing Algorithm.</p><p><i>Computational Linguistics </i>13(1 -2):31 -46, January-June, 1987.</p><p>[8]     Tomita, M.</p><p>Linguistic Sentences and Real Sentences. <i>12th International Conference on Computational Linguistics , </i>1988.</p><p>[9]     Tomita, M., Mitamura, T. and Kee, M.</p><p><i>The Generalized LR Parser/Compiler: User's Guide.</i></p><p>Technical Report, Center for Machine Translation, Carnegie-Mellon University, 1988.</p><p>[10]    Tomita, M. and Knight, K.</p><p><i>Pseudo Unification and Full Unification. </i>Technical Report unpublished, Center for</p><p>Machine Translation, Carnegie-Mellon</p><p>University, 1988.</p><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">5</doubt></section></body></article>