<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1" global="20"/><title>A COMMON PARSING SCHEME FOR LEFT-AND RIGHT-BRANCHING LANGUAGES</title><pubinfo>Copyright1988 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and theCLreference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/88 /010020-30$03.00 Computational Linguistics, Volume 14, Number 1, Winter 1988</pubinfo><author surname="Sato" givenname="Paul T."><org  name="North Central College" country="USA" city="Naperville"/></author></firstpageheader><frontmatter><p><b>A Common Parsing Scheme For Left- and Right-Branching Languages</b></p><p><b>Paul </b><b>t. </b><b>Sato</b></p><p><b>Department of Computer Science North Central College Naperville, Illinois 60566</b></p><p><b>This paper presents some results of an attempt to develop a common parsing scheme that works systematically and realistically for typologically varied natural languages. The scheme is bottom-up, and the parser scans the input text from left to right. However, unlike the standard LR(fc) parser or Tomita's extended LR(1) parser, the one presented in this paper is not a pushdown automaton based on shift-reduce transition that uses a parsing table. Instead, it uses integrated data bases containing information about phrase patterns and parse tree nodes, retrieval of which is triggered by features contained in individual entries of the lexicon. Using this information, the parser assembles a parse tree by attaching input words (and sometimes also partially assembled parse trees and tree fragments popped from the stack) to empty nodes of the specified tree frame, until the entire parse tree is completed. This scheme, which works effectively and realistically for both left-branching languages and right-branching languages, is deterministic in that it does not use backtracking or parallel processing. In this system, unlike in ATN or in LR(£), the grammatical sentences of a language are not determined by a set of rewriting rules, but by a set of patterns in conjunction with procedures and the meta rules that govern the system's operation. </b><i>j</i></p><p>This paper presents some results of an attempt to develop a common parsing scheme that works system­atically and realistically for typologically varied natural languages. When this project was started in 1982, the algorithm based on augmented transition networks (ATNs) codified by Woods (1970, 1973) was not only the most commonly used approach to parsing natural languages in computer systems, but it was also the achievement of computational linguistics which was most influential to other branches of linguistics. For example, researchers of psycholinguistics like Kaplan (1972) and Wanner and Maratsos (1978) used ATN-based parsers as simulation models of human language processing. Bresnan (1978) used an ATN model, among others, to test whether her version of transformational grammar was "realistic". Fodor's theory of "super-strategy" Fodor (1979) was also strongly influenced by the standard ATN algorithm. Indeed, as Berwick and Weinberg (1982) contend, parsing efficiency or compu­tational complexity by itself may not provide reliable criteria for the evaluation of grammatical theories. It is evident, however, that computers can be used as an</p></frontmatter><abstract></abstract></header><body><section title=""><p>effective means of simulation in linguistics, as they have proved to be in other branches of science.</p><p>Nevertheless, as a simulation model of the human faculty of language processing, the standard ATN mechanism has an intrinsic drawback: unless some <i>ad hoc, </i>unrealistic, and efficiency-robbing operations are added, or unless one comes up with a radically different grammatical framework, it cannot be used to parse left-branching languages like Japanese in which the beginning of embedded clauses is not regularly marked.</p><p>One may try to cope with this problem by developing a separate parsing algorithm for left-branching lan­guages, leaving the ATN formalism to specialize in right-branching languages like English. However, this solution contradicts our intuition that the core of the human faculty of language processing is universal. Another possible alternative, an ATN-type parser which processes left-branching language's sentences backward from right to left, is also unrealistic. If computational linguistics is to provide a simulation model for theoretical linguistics and psycholinguistics, it must develop an alternative parsing scheme which can effectively and realistically process both left-branching and right-branching languages.<page local="2" global="21"/> Even for purely practi­cal purposes, such a scheme is desirable because it will facilitate the development of machine translation sys­tems which can handle languages with different typo­logical characteristics.</p><p>Some limitations of ATN-based parsers for handling left-branching languages are illustrated in section 1. The rest of this paper describes and illustrates my alterna­tive parsing scheme called Pattern Oriented Parser (POP), which can be used for both left-branching and right-branching languages. (POP is a descendant of its early prototype called Pattern-Stack Parser, which was introduced in Sato (1983a.)) A general outline of POP is given in section 2, and its operation is illustrated in section 3, using both English and Japanese examples. Some characteristics of POP are highlighted in section 4, after which brief concluding remarks are made in section 5.</p><p>The present version of POP is a syntactic analyzer, and it does not take semantics into consideration. However, the system could be readily augmented with procedures that build up semantic interpretations along with syntactic analysis. One such model was presented in Sato (1983b).</p></section><section number="1" title="Limitations of ATN-Based Parsers 1.1 case assignment"><p>One of the greatest obstacles faced when attempting to develop an ATN-based parser for a language like Japa­nese is the unpredictability caused by the relatively free word order and by the left-branching subordinate clauses which have no beginning-of-clause marker.</p><p>Indeed, Japanese word order is not completely free. For example, modifiers always precede the modified, and the verb complex (a verbal root plus one or more ordered suffixes marking tense, aspect, modality, voice, negativity, politeness level, question, etc.) is always placed at the end of the sentence. Moreover, almost all nouns and noun phrases occurring in Japanese sen­tences have one or more suffixes marking case relationships.<footnote anchor="1"/></p><p>However, Japanese postnominal suffixes, by them­selves, do not always provide all the necessary infor­mation for case assignment. For example, the direct object of a nonstative verb complex is marked by <i>-o, </i>while the direct object of a stative verb complex is usually marked by <i>-ga, </i>which also marks the subject.</p><p>Compare the two sentences in (1).</p><p>(1) a. <i>Mary-wa John-ga nagusame-ta. </i>'As for Mary, John consoled her.' <i>(-wa </i>= TOPIC, <i>nagusame-</i>'console' &lt;-STATIVE&gt;), <i>-ta = </i>PAST) b. <i>Mary-wa John-ga wakar-ta. </i>'As for Mary, she understood John.' <i>(wakar- </i>'understand' &lt;+STATIVE&gt;)</p><p>An ATN-based parser cannot positively identify the functions of the two noun phrases of these sentences until it processes the verb complex at the end of the sentence.</p><p>Examples like (2) also illustrate how little can be deduced from postnominal suffixes before the sentence-final verb complex is processed.</p><doubt alpha="64.6" length="48" tooSmall="False" monospace="0.0">(2) a.Mary-ga hon-o kaw-ta.'Mary bought a book.'</doubt><doubt alpha="54.5" length="11" tooSmall="False" monospace="0.0">(kaw-'buy')</doubt><p>b. <i>John-ga Mary-ni hon-o kaw-sase-ta. </i>'John made Mary buy a book.' <i>(-sase- = </i>CAUSE)</p><p>c. <i>Mary-ga John-ni hon-o kaw-sase-rare-ta. </i>'Mary was made by John to buy a book.' <i>{-rare- = </i>PASSIVE)</p><p>The agent of the embedded sentence is marked by <i>-ni </i>in (2b), but by <i>-ga </i>in (2c).</p><p>The relatively free word order of Japanese further complicates the situation, as in the six sentences listed in (3) which are all grammatical and all mean ' 'Mary was made by John to buy a book", but each with different noun phrases given prominence.</p><doubt alpha="60.8" length="51" tooSmall="False" monospace="0.0">(3) a.Mary-ga John-ni hon-o kaw-sase-rare-ta.= (2c)</doubt><p>b. <i>Mary-ga hon-o John-ni kaw-sase-rare-ta.</i></p><p>c. <i>John-ni Mary-ga hon-o kaw-sase-rare-ta.</i></p><p>d. <i>John-ni hon-o Mary-ga kaw-sase-rare-ta.</i></p><p>e. <i>Hon-o Mary-ga John-ni kaw-sase-rare-ta.</i></p><p>f. <i>Hon-o John-ni Mary-ga kaw-sase-rare-ta.</i></p><subsection number="1.2" title="EMBEDDED SENTENCES"><p>Embedded sentences in languages like Japanese pose more serious problems because they do not normally carry any sign to mark their beginning. As a result, the beginning of a deeply embedded sentence can look exactly like the beginning of a simple top-level sen­tence, as illustrated in (4).</p><p>(4) a. <i>Mary-ga sotugyoo-si-ta. </i>'Mary was graduated (from school).' <i>(sotugyoo-si- </i>'be graduated')</p><p>b. <i>Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta. </i>'The high school from which Mary was graduated was burnt down.' <i>(kookoo </i>'high school', <i>zensyoo-si- </i>'be burnt down')</p><p>c. <i>Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta to iw-ru. </i>'It is reported that the high school from which Mary was graduated was burnt down.' <i>(to </i>= END-OF-QUOTE, <i>iw- </i>'say', <i>-ru = </i>NON-PAST)</p><p>d. <i>Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta to iw-ru sirase-o uke-ta. </i>'(I/we/you/he/she/they) received news (which says) that the high school from which Mary was graduated was burnt down.' <i>(sirase </i>'news', <i>uke- </i>'receive')</p><p>e. <i>Mary-ga sotugyoo-si-ta kookoo-ga zensyoo-si-ta to iw-ru sirase-o uke-ta Cindy-ga nak-te i-ru. </i>'Cindy, who received news that the high school from which Mary was graduated was burnt down, is crying.' <i>(nak-te i- </i>'be crying, be weeping') <b>a </b><b>Common Parsing Scheme for Left- and Right-Branching Languages</b><page local="3" global="22"/></p><p>In order to process sentences listed in (4), the NP network of an ATN-based parser must be expanded by prefixing to it another state with two arcs leaving from it: a PUSH SENTENCE arc that processes a relative clause, and a JUMP arc that processes noun phrases that do not include a relative clause.</p><p>However, as (4) illustrates, there is no systematic way to determine which of the two arcs leaving the first state of this expanded NP network should be taken when the parser encounters the first word of the input. The parser cannot predict the correct path until it has completed processing the entire sentence or the entire relative clause and has seen what followed it. Because there is theoretically no limit to the number of levels of relative clause embedding, the number of combinations of possible arcs to be traversed is theoretically infinite.</p></subsection></section><section number="2" title="Overview of Pattern Oriented Parser (POP)"><p>This section presents a quick overview of Pattern Oriented Parser (POP), which I have developed in order to cope with the kind of difficulties mentioned in the previous section.</p><p>POP is a left-to-right, bottom-up parser consisting of three data bases, a push-down STACK, a buffer, a register, and a set of LISP programs collectively called here the PROCESSOR that builds the parse tree of the input sentence. The relationship of these components is shown schematically in (5).</p><p>(5) Components of POP (Data bases) (Buffer, stack, and register)</p><p>LEXICON JNPUT BUFFER</p><p>SNP-^ROCESSOR^-STACK PHP XNP REGISTER</p><p>The SNP (Sentence Pattern data base) contains a set of parse tree frames, each of which is associated with one class of verbs or verbal derivational suffixes and includes information about the syntactic subcategoriza-tion of the members of that class and information about the thematic roles of their arguments. For example, the SNP entry for a class of English verbs which includes <i>buy </i>and <i>sell </i>looks like (6).</p><doubt alpha="16.7" length="12" tooSmall="False" monospace="0.0">(6) (S (* V)</doubt><doubt alpha="50.0" length="44" tooSmall="False" monospace="0.0">(AGNT (* NP &lt;+HUMAN&gt;) (PTNT (* NP &lt;-HUMAN&gt;))</doubt><p>The PHP (Phrase Pattern data base) contains infor­mation about the internal structure of noun phrases and adverbial phrases and the procedures for building the parse trees of such phrases. For example, (7) is an English translation of the PHP entry for a Japanese noun phrase which contains a relative clause.<footnote anchor="2"/></p><p>(7) If the CWS is an NP and the TOS is an S, then construct the following noun phrase and push it to the STACK:</p><doubt alpha="64.3" length="14" tooSmall="False" monospace="0.0">(NP (HEAD CWS)</doubt><doubt alpha="62.5" length="24" tooSmall="False" monospace="0.0">(MOD (rep_emn TOS CWS)))</doubt><p>- CWS is the word or phrase on which the PROCES­SOR is currently working.</p><p>- TOS is the word or phrase at the top of the STACK.</p><p>- (rep_emn TOS <i>X) </i>means "pop the TOS and attach <i>X </i>to its first matching empty node".</p><p>- Each non-empty NP node is given a new index number when it is constructed.</p><p>Details of how (7) works will be illustrated in section 3.</p><p>The push-down STACK of POP stores partially as­sembled parse trees and tree fragments, while LNP or the "Last NP" REGISTER temporarily stores a copy of the noun phrase most recently attached to a node in the sentence tree. LNP is necessary to process a noun phrase with a modifier that follows the head noun (e.g., English noun phrases which contain relative clauses). The present version of POP for Japanese does not use an LNP; however, it will prove useful when we try to process parenthetical phrases. The INPUT BUFFER stores the input sentence.</p><p>The three data bases of POP are stored on disk and can be updated independently of each other and of the PROCESSOR, while the buffer, the stack and the register are created by the PROCESSOR each time it is invoked.</p><p>The major program modules (functions) that consti­tute the PROCESSOR and their hierarchical calling paths are presented in (8), where the parameters are enclosed in parentheses.</p><p>(8) Major Functions of the PROCESSOR</p><p>PARSE-SENTENCE (SENTENCE)</p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">-1-</doubt><p>PARSE-WORD (WORD) <u>ASSEMBLE-NP (CWS)  ASSEMBLE-SENTENCE (CWS SNA)</u> <u>CHECK-PHP (CWS)</u> - called in different modules.</p><doubt alpha="20.0" length="5" tooSmall="False" monospace="0.0">"I-1-</doubt><p>where CWS = the word or the phrase which the PROC­ESSOR is currently working on SNA = address of a sentence pattern stored in the SNP</p><p>The PROCESSOR is activated when its top-level function, PARSE-SENTENCE, is called with the input sentence as its parameter. PARSE-SENTENCE then creates the STACK, the INPUT BUFFER and the LNP-REGISTER in the memory, puts the input sen­tence into the INPUT BUFFER, and calls PARSE-WORD. PARSE-WORD searches the LEXICON for an entry which matches the first word in the INPUT BUFFER and, when it is found, calls either ASSEM­BLE-NP or ASSEMBLE-SENTENCE, depending on the word type of the entry it finds in the LEXICON, assembles a sub-tree, and pushes the result to the STACK. After that, PARSE-WORD removes the first word from the INPUT BUFFER and repeats the same</p><p>Pault.</p><page local="4" global="23"/><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p><p>process with the next word. In the course of assembling sub-trees, ASSEMBLE-NP uses the PHP, and AS­SEMBLE-SENTENCE uses the SNP and the PHP as their data bases. This process continues until the IN­PUT BUFFER contains only the end-of-sentence mark (EOS), when PARSE-WORD returns control to PARSE-SENTENCE, which pops the assembled sen­tence from the STACK and sends it to the output device, removes the stack, the buffer and the register from memory, and exits successfully.</p><p>As shown in section 3, POP assembles a parse tree primarily by attaching terminal elements (copies of lexical entries) or tree fragments popped from the STACK to the first matching empty node of the matrix tree. All empty nodes of tree frames have an asterisk as their first element, followed by various specifications for matching requirements: (* <i>ga </i>(NP &lt;+HUMAN&gt;)) is an empty node for an NP which has a feature specification &lt;+HUMAN&gt; and is flagged with <i>ga. </i>To find the first matching empty node, the PROCESSOR conducts a depth-first search for "*" followed by other conditions, and when the first matching empty node is found, it attaches the specified element to that node using the LISP function UNION, thus preventing over­lapping elements from being duplicated in the resultant branch. After the attachment is completed, the asterisk is removed from the node.</p><p>The use of the LNP REGISTER will be illustrated in subsection 3.3.</p></section><section number="3" title="Operation of pop"><p>This section illustrates the operation of POP more in detail. Subsection 3.1 is a quick walk-through of the overall operation using a simple <i>yes/no-question </i>in English as an example, while subsection 3.2 illustrates how POP handles the inherent problems of left-branching languages discussed in section 1, using the Japanese examples presented in that section. Then we turn our attention to English again in subsection 3.3 and illustrate POP's handling of English w/i-questions and relative clauses.</p><subsection number="3.1" title="simple english example"><p>Our first example is (9).</p><p>(9) <i><u>Did John buy a good book in Boston</u>?</i></p><p>When PARSE-SENTENCE calls PARSE-WORD and the latter finds <i>did </i>in the LEXICON, it makes a copy of the matching lexical entry, (V &lt;+PAST&gt;), and pushes it to the STACK. The next word that PARSE-WORD finds in the INPUT BUFFER is <i>John. </i>There­fore, PARSE-WORD searches the LEXICON and gets a copy of the entry that matches this word, ("John"), which is a noun.<footnote anchor="3"/></p><p>Whenever PARSE-WORD encounters a noun, it calls ASSEMBLE-NP with a copy of the lexical entry as its argument. ASSEMBLE-NP assembles a new noun phrase (NP1 "John"), and then it calls CHECK-PHP with the newly assembled NP1 as its argument. CHECK-PHP then examines the PHP data base, and returns NIL to ASSEMBLE-NP because it finds no pattern that matches the string {&lt;V, +PAST&gt; NP} (i.e., the TOS followed by the CWS). Because CHECK-PHP failed to find any matching entry of the PHP, ASSEMBLE-NP pushes NP1 to STACK without con­ducting any further assembling operation, and returns control to PARSE-WORD. The contents of the STACK at this time are shown in (10).</p><doubt alpha="34.4" length="32" tooSmall="False" monospace="0.0">(10) ((NP1 "John") (&lt;V, +PAST&gt;))</doubt><p>PARSE-WORD then removes <i>John </i>from the INPUT BUFFER, picks up <i>buy </i>there, searches the LEXICON, and gets a copy of a matching entry. This is a verb. The lexical entry of every verb or verbal derivational suffix contains an SNA (the SNP address of the sentence pattern associated with it). Therefore, ASSEMBLE-SENTENCE retrieves a copy of the sentence pattern from the address matching the verb's SNA and attaches the verb's remaining lexical entry to its first empty V node (i.e., the first node whose CAR is "*" and the second member is "V"). It then removes the "*" from that node. As mentioned in section 2, the SNP entry for the class of verbs like <i>buy </i>and <i>sell </i>is (6). Therefore, by attaching (V &lt;"buy"&gt;) to the V node of its copy, ASSEMBLE-SENTENCE constructs (11).</p><doubt alpha="26.3" length="19" tooSmall="False" monospace="0.0">(11) (S (V &lt;"buy"&gt;)</doubt><doubt alpha="48.9" length="45" tooSmall="False" monospace="0.0">(AGNT (* NP &lt;+HUMAN&gt;) (PTNT (* NP &lt;-HUMAN&gt;)))</doubt><p>After (11) is assembled, ASSEMBLE-SENTENCE pops the TOS, attaches it to the first empty node matching its specifications and removes the asterisk at the beginning of that node. The result is (12).</p><doubt alpha="26.3" length="19" tooSmall="False" monospace="0.0">(12) (S (V &lt;"buy"&gt;)</doubt><doubt alpha="50.0" length="42" tooSmall="False" monospace="0.0">(AGNT (NP1 "John") (PTNT (* NP &lt;-HUMAN&gt;)))</doubt><p>ASSEMBLE-SENTENCE pops TOS again. This time, it is (&lt;V, +PAST&gt;). ASSEMBLE-SENTENCE then examines the PHP and finds two entries (13) and (14) whose conditions match the current state.</p><p>(13) If the element popped is a V and if it contains no feature other than tense, number, and/or person, attach it to the V node of the S tree which ASSEMBLE-SENTENCE is currently building.</p><p>(14) If there is a tense feature in the element that is popped immediately after the AGNT node (or the OBJ node if the tree has no AGNT node) is filled, attach feature &lt;Q&gt; (i.e., "question") to the main verb of the matrix S.</p><p>ASSEMBLE-SENTENCE executes (13) and (14). The result is (15).</p><p>Pault.</p><page local="5" global="24"/><doubt alpha="64.3" length="14" tooSmall="False" monospace="0.0">(NP (HEAD CWS)</doubt><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p><doubt alpha="50.0" length="42" tooSmall="False" monospace="0.0">(AGNT (NP1 "John") (PTNT (* NP &lt;-HUMAN&gt;)))</doubt><doubt alpha="34.5" length="29" tooSmall="False" monospace="0.0">(15) (S (V &lt;"buy", +PAST, Q&gt;)</doubt><p>The STACK is now empty. Therefore, ASSEMBLE-SENTENCE pushes (15) to the STACK and returns control to PARSE-WORD.</p><p>PARSE-WORD removes <i>buy </i>from the INPUT BUFFER, encounters the indefinite article <i>a, </i>gets a copy of the matching lexical entry (DET &lt;-DEF&gt;) from the LEXICON, and pushes it to the STACK. The next word that PARSE-WORD sees is <i>good. </i>So a copy of its matching lexical entry (ADJ "good") is pushed to the STACK and <i>good </i>is removed from the INPUT BUFFER.</p><p>PARSE-WORD then finds <i>book </i>in the INPUT BUFFER. Because it is a noun, PARSE-WORD calls ASSEMBLE-NP, which assembles a single-word NP and routinely calls CHECK-PHP. This time, CHECK-PHP finds (16) in the PHP.</p><doubt alpha="66.7" length="60" tooSmall="False" monospace="0.0">(16) If the CWS is an NP and if the TOS is an ADJ, assemble:</doubt><doubt alpha="56.2" length="16" tooSmall="False" monospace="0.0">(MOD (pop TOS)))</doubt><p>At this time, the TOS is (ADJ "good"). Therefore, ASSEMBLE-NP pops it and assembles a new noun phrase in accordance with (16) and calls CHECK-PHP again. The new TOS is (DET &lt;-DEF&gt;). CHECK-PHP finds (17) in the PHP which matches this situation.</p><doubt alpha="66.1" length="59" tooSmall="False" monospace="0.0">(17) If the CWS is an NP and if the TOS is a DET, assemble:</doubt><doubt alpha="60.0" length="25" tooSmall="False" monospace="0.0">(NP (HEAD CWS) (pop TOS))</doubt><p>ASSEMBLE-NP executes (17). The result is (18).</p><doubt alpha="45.0" length="40" tooSmall="False" monospace="0.0">(18) (NP4 (HEAD (NP3 (HEAD (NP2 "book"))</doubt><doubt alpha="50.0" length="32" tooSmall="False" monospace="0.0">(MOD(ADJ "good"))) (DET &lt;-DEF&gt;))</doubt><p>Because (18) is an NP, ASSEMBLE-NP calls CHECK-PHP again. This time, the TOS is (15), which is an S tree. CHECK-PHP finds a matching entry in the PHP again, which is (19).</p><p>(19) If CWS = NP and TOS = S, pop the TOS and attach the CWS to its first matching empty node.</p><p>What is involved here is the assembly of an S, which is outside the domain of ASSEMBLE-NP's responsibility. Therefore, before popping the S from the STACK, ASSEMBLE-NP returns the symbol "AS" to PARSE-WORD. PARSE-WORD then calls ASSEMBLE-SEN­TENCE substituting (18) for the parameter CWS and "TOS" for the parameter SNA. ASSEMBLE-SEN­TENCE then builds (20) in the manner explained ear­lier. The STACK is now empty, and there is no matching PHP entry. Therefore, ASSEMBLE-SENTENCE pushes the newly assembled tree (20) to the STACK.</p><doubt alpha="41.7" length="48" tooSmall="False" monospace="0.0">(20) (S (V &lt;"buy", +PAST, Q&gt;) (AGNT (NP1 "John")</doubt><doubt alpha="55.0" length="40" tooSmall="False" monospace="0.0">(PTNT (NP4 (HEAD (NP3 (HEAD (NP2 "book")</doubt><doubt alpha="44.4" length="36" tooSmall="False" monospace="0.0">(MOD (ADJ "good")))) (DET &lt;-DEF&gt;))))</doubt><p>The next thing PARSE-WORD sees in the INPUT BUFFER is EOS (end-of-sentence symbol). Therefore, it returns control to PARSE-SENTENCE, which pops (20) from the STACK, and sends it to the output device. Nothing is left in the STACK now. Therefore, PARSE-SENTENCE removes the stack, the buffer and the register from memory and exits successfully.</p></subsection><subsection number="3.2" title="JAPANESE EXAMPLES"><p>This section illustrates how POP handles the problems of Japanese sentences discussed in section 1.</p><subsubsection number="3.2.1" title="CASE MARKING IN SIMPLE SENTENCES"><p>The first example in section 1 was (la), which is repeated here in (21).</p><p>(21) <i>Mary-wa John-ga nagusame-ta. </i>'As for Mary, John consoled her.'</p><doubt alpha="66.0" length="53" tooSmall="False" monospace="0.0">(-wa= TOPIC,nagusame-'console' &lt;-STATIVE&gt;),-ta= PAST)</doubt><p>POP processes Japanese sentences in basically the same way as it processes English sentences. Therefore, when PARSE-SENTENCE calls PARSE-WORD and PARSE-WORD sees the first word, <i>Mary-wa, </i>PARSE-WORD retrieves from the LEXICON a copy of the entry which matches the stem of this word, and calls ASSEMBLE-NP because <i>Mary </i>is a noun. ASSEM­BLE-NP assembles (NP1 "Mary"), and places its suffix <i>-wa </i>in front of the newly assembled NP as its flag. Then CHECK-PHP is called, but it returns NIL because the STACK is still empty. Therefore, ASSEMBLE-NP pushes <i>{wa </i>(NP1 "Mary")) to the STACK. The second word, <i>John-ga, </i>is processed in the same way, and <i>(ga </i>(NP2 "John")) is also pushed to the STACK.</p><p>PARSE-WORD then encounters <i>nagusame-ta </i>and identifies it as the verb "console" with a past tense suffix. Therefore, PARSE-WORD retrieves a copy of its SNP using the SNA included in the lexical entry, and attaches the lexical entry of <i>nagusame-ta </i>to its empty V node. The result is (22).</p><doubt alpha="43.3" length="30" tooSmall="False" monospace="0.0">(22) (S (V &lt;"console", +PAST&gt;)</doubt><doubt alpha="49.0" length="51" tooSmall="False" monospace="0.0">(PTNT (*o(NP &lt;+HUMAN&gt;))) (AGNT (*ga(NP &lt;+HUMAN&gt;))))</doubt><p>ASSEMBLE-SENTENCE then pops the TOS <i>(ga </i>(NP2 "John")) and attaches it to the first matching empty node, namely, the AGNT node. The case flag <i>ga, </i>which is no longer necessary, is removed.</p><p>The next TOS is <i>(wa </i>(NP1 "Mary")). As mentioned in section 1, <i>wa </i>is a suffix that marks the sentence topic. However, there is no sentence pattern stored in the</p><p>Pault.</p><page local="6" global="25"/><doubt alpha="54.5" length="11" tooSmall="False" monospace="0.0">(kaw-'buy')</doubt><p>b. <i>John-ga Mary-ni hon-o kaw-sase-ta. </i>'John made Mary buy a book.' <i>(-sase- = </i>CAUSE)</p><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p><p>SNP which includes a topic (TPIC) node. Instead, it is created by the following instructions (23) retrieved from the PHP.</p><doubt alpha="64.5" length="31" tooSmall="False" monospace="0.0">(23) If the TOS has the flagwa:</doubt><p>a. Create a TPIC node which is directly dominated by the topmost S node and attach a "copy" (i.e., the category symbol and its index) of the TOS to this node.</p><p>b. Attach the TOS to the first matching empty node.</p><p>As is evident from (la, lb), the topic marker <i>wa </i>absorbs both <i>g a </i>and <i>o: </i>i.e., the topicalized NP without any other case flag can match both an NP node which is flagged with <i>o </i>and an NP node which is flagged with <i>ga. </i>Therefore, following (23b), (NP1 "Mary") is attached to the first (and the only) empty node (PTNT) after (23a) is executed. The result is (24), which is the correct parse tree of (21).</p><doubt alpha="43.3" length="30" tooSmall="False" monospace="0.0">(24) (S (V &lt;"console", +PAST&gt;)</doubt><doubt alpha="49.1" length="53" tooSmall="False" monospace="0.0">(PTNT (NP1 "Mary")) (AGNT (NP2 "John")) (TPIC (NP1)))</doubt><p>'As for Mary(, John consoled Mary;.'</p><p>Example (lb) is processed in the same way, produc­ing the correct parse tree (25b), although both the PTNT node and the AGNT node of the SNP pattern associated with the stative verb <i>wakar- </i>'understand' are flagged by <i>ga, </i>as shown in (25a).</p><doubt alpha="60.0" length="50" tooSmall="False" monospace="0.0">(25) a. SNP    pattern    associated    withwakar-</doubt><doubt alpha="54.5" length="22" tooSmall="False" monospace="0.0">" understand" (S (* V)</doubt><doubt alpha="53.3" length="15" tooSmall="False" monospace="0.0">(PTNT (*ga(NP))</doubt><doubt alpha="63.8" length="138" tooSmall="False" monospace="0.0">(AGNT (*ga(NP &lt;+HUMAN&gt;))) b. Parse tree of (2-lb)Mary-wa John-ga wakar-ta.'As for Mary, she understood John.' (S (V &lt;"understand", +PAST&gt;)</doubt><doubt alpha="52.6" length="19" tooSmall="False" monospace="0.0">(PTNT (NP2 "John"))</doubt><doubt alpha="52.6" length="19" tooSmall="False" monospace="0.0">(AGNT (NP1 "Mary"))</doubt><doubt alpha="46.2" length="13" tooSmall="False" monospace="0.0">(TPIC (NP1)))</doubt><p><b>3.2.2 VERBAL DERIVATIONAL SUFFIX AND CASE MARKING </b>The next set of examples is (2), repeated here as (26).</p><doubt alpha="63.3" length="49" tooSmall="False" monospace="0.0">(26) a.Mary-ga hon-o kaw-ta.'Mary bought a book.'</doubt><p>c. <i>Mary-ga John-ni hon-o kaw-sase-rare-ta. </i>'Mary was made by John to buy a book.' <i>(-rare- = </i>PASSIVE)</p><p>The SNP pattern associated with <i>kaw- </i>'buy' is (27).</p><doubt alpha="15.4" length="13" tooSmall="False" monospace="0.0">(27) (S (* V)</doubt><doubt alpha="46.7" length="15" tooSmall="False" monospace="0.0">(PTNT (*o(NP)))</doubt><doubt alpha="50.0" length="26" tooSmall="False" monospace="0.0">(AGNT (*ga(NP &lt;+HUMAN&gt;))))</doubt><p>Therefore, the parsing of (26a) to get (28) is straightfor­ward.</p><doubt alpha="34.6" length="26" tooSmall="False" monospace="0.0">(28) (S (V &lt;"buy", +PAST&gt;)</doubt><doubt alpha="50.0" length="40" tooSmall="False" monospace="0.0">(PTNT (NP2 "book")) (AGNT (NP1 "Mary")))</doubt><p>The parsing of (26b) is a little more complex because it involves causative suffix <i>-sase-, </i>to which is associated another SNP pattern (29) (simplified here for the sake of legibility).</p><doubt alpha="36.8" length="19" tooSmall="False" monospace="0.0">(29) (S (V &lt;CAUSE&gt;)</doubt><doubt alpha="56.2" length="32" tooSmall="False" monospace="0.0">(PTNT (* or(ni(NP = AGNT of Sk))</doubt><doubt alpha="50.0" length="70" tooSmall="False" monospace="0.0">(o(NP = OBJ or PTNT of Sk)))) (AGNT (*ga(NP &lt;+HUMAN&gt;))) (ACTN (* Sk)))</doubt><p>where ACTN = action, Sk = embedded S.</p><p>When the PROCESSOR processing (26b) encounters the verb <i>kaw-sase-ta </i>'made to buy', it first retrieves (27) and attaches "buy" to its empty V node to construct the tree frame (30).</p><doubt alpha="26.3" length="19" tooSmall="False" monospace="0.0">(30) (S (V &lt;"buy"&gt;)</doubt><p>This tree is then incorporated into (29) to obtain the complex tree frame (31). (There is a meta-rule that removes the case flag of a node in the embedded sentence if the node is co-indexed with a node in the matrix sentence.)</p><doubt alpha="36.8" length="19" tooSmall="False" monospace="0.0">(31) (S (V &lt;CAUSE&gt;)</doubt><doubt alpha="47.9" length="73" tooSmall="False" monospace="0.0">(PTNT (*ni(NP; &lt;+HUMAN&gt;))) (AGNT (*ga(NP &lt;+HUMAN&gt;))) (ACTN (S (V &lt;"buy"&gt;)</doubt><doubt alpha="46.2" length="26" tooSmall="False" monospace="0.0">(AGNT (* NPj &lt;+HUMAN&gt;)))))</doubt><p>By the time the PROCESSOR encounters the verb complex <i>kaw-sase-ta </i>'caused to buy' and constructs the complex tree frame (31), all three noun phrases of the sentence have already been processed and stored in the STACK, as shown in (32).</p><doubt alpha="43.4" length="53" tooSmall="False" monospace="0.0">(32)((o(NP3 "book"))(ni(NP2 "Mary"))(ga(NP1 "John")))</doubt><p>Therefore, when the tree frame (31) is completed, ASSEMBLE-SENTENCE begins to pop elements from the STACK and to attach them to empty nodes of the tree. First, <i>(o </i>(NP3 "book")) is popped. The PTNT node of the embedded sentence is the only empty node that matches it, so the popped NP is attached there. Next, <i>(ni </i>(NP2 "Mary")) is popped, which is attached to the PTNT node of the matrix sentence and its copy is attached to the co-indexed AGNT node of the embed­ded sentence. Finally, <i>(ga </i>(NP1 "John")) is popped and<page local="7" global="26"/></p><doubt alpha="64.3" length="14" tooSmall="False" monospace="0.0">(NP (HEAD CWS)</doubt><doubt alpha="62.5" length="24" tooSmall="False" monospace="0.0">(MOD (rep_emn TOS CWS)))</doubt><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p><p>attached to the AGNT node of the matrix sentence. The result is (33), which is the correct parse tree of (26b).</p><doubt alpha="42.3" length="26" tooSmall="False" monospace="0.0">(33) (S (V &lt;CAUSE, +PAST&gt;)</doubt><doubt alpha="48.3" length="60" tooSmall="False" monospace="0.0">(PTNT (NP2 "Mary")) (AGNT (NP1 "John")) (ACTN (S (V &lt;"buy"&gt;)</doubt><doubt alpha="52.6" length="19" tooSmall="False" monospace="0.0">(PTNT (NP3 "book"))</doubt><doubt alpha="40.0" length="15" tooSmall="False" monospace="0.0">(AGNT (NP2)))))</doubt><p>'John made Mary buy a book.'</p><p>Example (26c) is a passive of (26b) with passive suffix <i>-rare-, </i>with which is associated an SNP pattern (34) (simplified here for the sake of legibility).</p><doubt alpha="42.9" length="21" tooSmall="False" monospace="0.0">(34) (S (V &lt;PASSIVE&gt;)</doubt><doubt alpha="57.3" length="75" tooSmall="False" monospace="0.0">(PTNT(ga(NP = OBJ or PTNT of Sk))) (AGNT(ni(NP = AGNT of Sk))) (ACTN (Sk)))</doubt><p>Therefore, before beginning to pop elements from the STACK, ASSEMBLE-SENTENCE constructs the complex tree frame (35) by incorporating (31) into (34).</p><doubt alpha="46.4" length="28" tooSmall="False" monospace="0.0">(35) (S (V &lt;PASSIVE, +PAST&gt;)</doubt><doubt alpha="54.1" length="61" tooSmall="False" monospace="0.0">(PTNT(ga(NP; &lt;+HUMAN&gt;))) (AGNT(ni(NPj))) (ACTN (S (V &lt;CAUSE&gt;)</doubt><doubt alpha="47.6" length="105" tooSmall="False" monospace="0.0">(PTNT (NP; &lt;+HUMAN&gt;)) (AGNT (NPj &lt;+HUMAN&gt;)) (ACTN (S (V &lt;"buy"&gt;) (PTNT(o(NP))) (AGNT (NP; &lt;+HUMAN&gt;)))))))</doubt><p>At this stage, the contents of the STACK are the same as (32). So when they are popped and attached to the matching nodes according to the principle explained above, we obtain the correct parse tree (36).</p><doubt alpha="46.4" length="28" tooSmall="False" monospace="0.0">(36) (S (V &lt;PASSIVE, +PAST&gt;)</doubt><doubt alpha="48.6" length="107" tooSmall="False" monospace="0.0">(PTNT (NP1 "Mary")) (AGNT (NP2 "John")) (ACTN (S (V &lt;CAUSE&gt;) (PTNT (NP1)) (AGNT (NP2)) (ACTN (S (V &lt;"buy"&gt;)</doubt><doubt alpha="43.2" length="37" tooSmall="False" monospace="0.0">(PTNT (NP3 "book")) (AGNT (NP1)))))))</doubt><p>'Mary was made by John to buy a book.'</p></subsubsection><subsubsection number="3.2.3" title="RELATIVE CLAUSES"><p>As mentioned in section 2, Japanese noun phrases containing a relative clause are processed by the PHP entry presented in (7), repeated here in (37).</p><p>(37) If the CWS is an NP and the TOS is an S, then construct the following noun phrase and push it to the STACK:</p><p>To illustrate how (37) works, we will trace the noun phrase (38), which is included in all sentences cited in (4b) through (4e).</p><p>(38) <i>Mary-ga sotugyoo-si-ta kookoo-ga </i>'The high school from which Mary was graduated' <i>(sotugyoo-si- </i>'be graduated', <i>-ta </i>= PAST, <i>kookoo </i>'high school', <i>-ga </i>= case suffix)</p><p>The SNP pattern associated with <i>sotugyoo-si- </i>is (39). (39) where ABL = ablative and DEF = default.</p><doubt alpha="25.0" length="8" tooSmall="False" monospace="0.0">(S (* V)</doubt><doubt alpha="52.0" length="25" tooSmall="False" monospace="0.0">(AGNT (*ga(NP &lt;+HUMAN&gt;)))</doubt><doubt alpha="48.8" length="41" tooSmall="False" monospace="0.0">(ABL (*o(NP &lt;PLACE, DEF = " school" &gt;))))</doubt><p>Therefore, when the first two words of (38) are proc­essed, (40) is assembled and pushed to the STACK.</p><doubt alpha="0.0" length="4" tooSmall="False" monospace="0.0">(40)</doubt><doubt alpha="54.0" length="50" tooSmall="False" monospace="0.0">(S (V &lt;"be graduated", +PAST&gt;) (AGNT (NP1 "Mary"))</doubt><doubt alpha="51.3" length="39" tooSmall="False" monospace="0.0">(ABL (*o(NP &lt;PLACE, DEF = "school"&gt;))))</doubt><p>If the next item in the INPUT BUFFER were EOS (as in (4a)), the system pops (40) and, finding that the STACK is now empty, attaches the default value "school" to the empty ABL node, and sends the result to the output device. However, what follows the verb in (38) is a noun. Therefore, ASSEMBLE-NP assembles <i>(ga </i>(NP2 "high school")) and calls CHECK-PHP, which finds (37) because the CWS is the noun phrase just assembled and the TOS is (40).</p><p>In accordance with (37), (40) is popped from the STACK, and a new noun phrase (41) is assembled and pushed to the STACK.</p><doubt alpha="52.6" length="38" tooSmall="False" monospace="0.0">(41)(ga(NP3 (HEAD (NP2 "high school"))</doubt><doubt alpha="49.3" length="71" tooSmall="False" monospace="0.0">(MOD (S (V &lt;"be graduated", +PAST&gt;) (AGNT (NP1 "Mary")) (ABL (NP2))))))</doubt><p>There is no backtracking involved here and, by repeating the same process, POP can process nested relative clauses like those cited in (4) from left to right, without facing any combinatorial explosion.</p></subsubsection></subsection><subsection number="3.3" title="WH-QUESTION AND RELATIVE CLAUSE IN ENGLISH"><p>The ATN strategy for parsing w/i-questions and relative clauses in English attracted special attention of many linguists, including Bresnan (1978) and Fodor (1979), because it seemed to support the trace theory and the theory of wA-movement transformation. Therefore, we will conclude the illustration of POP by explaining how it handles them.</p><subsubsection number="3.3.1" title="WW-QUESTIONS"><p>No special mechanism is necessary for processing En­glish w/i-questions like (42) by POP.</p><doubt alpha="62.5" length="24" tooSmall="False" monospace="0.0">(42) a.Who praised John?</doubt><page local="8" global="27"/><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p><p>b. <i>Who did John praise"? </i>The SNP pattern associated with the verb <i>praise </i>is (43).</p><doubt alpha="15.4" length="13" tooSmall="False" monospace="0.0">(43) (S (* V)</doubt><doubt alpha="47.8" length="46" tooSmall="False" monospace="0.0">(AGNT (* NP &lt;+HUMAN&gt;)) (PTNT (* NP &lt;+HUMAN&gt;)))</doubt><p>First, we will trace the parse of (42a). The first word, <i>who, </i>is processed and the result, (NP1 &lt;+HUMAN, WH, Q&gt;), is pushed to the STACK before the PROC­ESSOR encounters <i>praised </i>and retrieves a copy of (43) from the SNP. Then "praised" is attached to the empty V node of the tree frame, and the TOS is popped and attached to the first matching empty node. Since that NP has the features &lt;WH, Q&gt;, and because the STACK is now empty, the feature &lt;Q&gt; is moved from NP1 node to the V node. The result is (44).</p><doubt alpha="40.6" length="32" tooSmall="False" monospace="0.0">(44) (S (V &lt;"praise", +PAST, Q&gt;)</doubt><doubt alpha="51.1" length="47" tooSmall="False" monospace="0.0">(AGNT (NP1&lt;+HUMAN, WH&gt;) (PTNT (* NP &lt;+HUMAN&gt;)))</doubt><p>Then, <i>John </i>is processed in the normal way, and it is attached to the first (and the only) matching node (PTNT), following the ordinary procedure illustrated in section 3.1. The result is the correct parse tree (45).</p><doubt alpha="40.6" length="32" tooSmall="False" monospace="0.0">(45) (S (V &lt;"praise", +PAST, Q&gt;)</doubt><doubt alpha="52.3" length="44" tooSmall="False" monospace="0.0">(AGNT (NP1&lt;+HUMAN, WH&gt;) (PTNT (NP2 "John")))</doubt><p>At first sight, parsing (42b) by POP may seem difficult because the object is placed before the subject in this sentence. However, POP processes the sentence using auxiliary <i>did </i>as a clue, just as humans do. In the same way as POP handled the first word of (42a), it processes <i>who </i>in (42b) by assembling (NP1 &lt;+HUMAN, WH, Q&gt;) and pushing it to the STACK. And in the same way as it handled <i>did </i>in (9), POP assembles (V &lt;+PAST&gt;) and pushes it on top of NP1, after which it processes <i>John </i>and pushes (NP2 "John") to the STACK.</p><p>The system then encounters <i>praise </i>and retrieves (43) from the SNP, pops (NP2 "John") from the STACK, and attaches it to the first matching empty node, which is the AGNT node. Next, (V &lt;+PAST&gt;) is popped, and it is attached to the V node in accordance with (13). Because (V &lt;+PAST&gt;) is an element that is popped immediately after AGNT node is filled and because it contains a tense feature, the feature &lt;Q&gt; is added to this node in accordance with (14). The result is (46).</p><doubt alpha="40.6" length="32" tooSmall="False" monospace="0.0">(46) (S (V &lt;"praise", +PAST, Q&gt;)</doubt><doubt alpha="48.8" length="43" tooSmall="False" monospace="0.0">(AGNT (NP2 "John")) (PTNT (* NP &lt;+HUMAN&gt;)))</doubt><p>The TOS is now (NP1 &lt;+HUMAN, WH, Q&gt;), which is popped and attached to the remaining matching node, and its feature &lt;Q&gt; is moved to the V node.<footnote anchor="4"/> The result is the correct parse tree (47).</p><doubt alpha="40.6" length="32" tooSmall="False" monospace="0.0">(47) (S (V &lt;"praise", +PAST, Q&gt;)</doubt><doubt alpha="52.6" length="19" tooSmall="False" monospace="0.0">(AGNT (NP2 "John"))</doubt><doubt alpha="52.0" length="25" tooSmall="False" monospace="0.0">(PTNT (NP1&lt;+HUMAN, WH&gt;)))</doubt></subsubsection><subsubsection number="3.3.2" title="RELATIVE CLAUSE"><p>As an example of English sentences which include relative clauses, we will examine (18).</p><p>(48) <i>Joan loves the brilliant linguist who the students respect.</i></p><p>The first two words are processed and the partial tree (49) is constructed in the usual way, and it is pushed to the STACK.</p><doubt alpha="37.0" length="27" tooSmall="False" monospace="0.0">(49) (S (V &lt;"love", -PAST&gt;)</doubt><doubt alpha="48.8" length="43" tooSmall="False" monospace="0.0">(AGNT (NP1 "Joan")) (PTNT (* NP &lt;+HUMAN&gt;)))</doubt><p>The next three words <i>(the, brilliant, linguist) </i>are processed in the ordinary way, and following the PHP instructions cited in (16) and (17), they are assembled into noun phrase (50) and attached to the empty PTNT node of (4-41). The result is (51), and NP4 is the content of the LNP REGISTER.<footnote anchor="5"/></p><doubt alpha="50.0" length="44" tooSmall="False" monospace="0.0">(50) (NP4 (HEAD (NP3 (HEAD (NP2 "linguist"))</doubt><doubt alpha="55.3" length="38" tooSmall="False" monospace="0.0">(MOD (ADJ "brilliant")) (DET &lt;DEF&gt;))))</doubt><doubt alpha="37.0" length="27" tooSmall="False" monospace="0.0">(51) (S (V &lt;"love", -PAST&gt;)</doubt><doubt alpha="52.6" length="19" tooSmall="False" monospace="0.0">(AGNT (NP1 "Joan"))</doubt><doubt alpha="57.8" length="45" tooSmall="False" monospace="0.0">(PTNT (NP4 (HEAD (NP3 (HEAD (NP2 "linguist"))</doubt><doubt alpha="53.8" length="39" tooSmall="False" monospace="0.0">(MOD (ADJ "brilliant")) (DET &lt;DEF&gt;)))))</doubt><p>The next word <i>(who) </i>is read in. Its lexical entry includes the feature &lt;WH&gt;, and the TOS is (51). Therefore, CHECK-PHP finds (52) which matches these condi­tions.</p><doubt alpha="65.1" length="63" tooSmall="False" monospace="0.0">(52) If the CWS has a feature &lt;WH&gt; and if the TOS is an S, then</doubt><doubt alpha="64.8" length="54" tooSmall="False" monospace="0.0">(mark TOS) and (setq CWS (list (copyi MARKED) '&lt;REL&gt;))</doubt><p>where - (mark TOS) marks the constituent of the TOS that is equal to the content of the LNP REGISTER</p><p>- MARKED represents the constituent of the TOS thus marked</p><p>- (copyi X) returns the category index of X.</p><p>When (52) is applied, the CWS becomes (53), which is pushed to the STACK.</p><doubt alpha="31.2" length="16" tooSmall="False" monospace="0.0">(53) (NP4 &lt;REL&gt;)</doubt><p>The next two words, <i>the </i>and <i>students, </i>are processed, and the result (54) is pushed to the STACK in accor­dance with (17).</p><page local="9" global="28"/><p><b>À Common Parsing Scheme for Left- and Right-Branching Languages</b></p><doubt alpha="48.5" length="33" tooSmall="False" monospace="0.0">(54) (NP6 (HEAD (NP5 "students"))</doubt><doubt alpha="46.2" length="13" tooSmall="False" monospace="0.0">(DET &lt;+DEF&gt;))</doubt><p>The verb <i>respect </i>is encountered, the matching sentence pattern is retrieved, and the verb is attached to its V node. The result is (55).</p><doubt alpha="43.3" length="30" tooSmall="False" monospace="0.0">(55) (S (V &lt;"respect", -PAST&gt;)</doubt><doubt alpha="45.9" length="37" tooSmall="False" monospace="0.0">(AGNT (* NP &lt;+HUMAN&gt;)) (PTNT (* NP)))</doubt><p>The TOS is popped and attached to the first matching empty node. The result is (56).</p><doubt alpha="43.3" length="30" tooSmall="False" monospace="0.0">(56) (S (V &lt;"respect", -PAST&gt;)</doubt><doubt alpha="58.8" length="34" tooSmall="False" monospace="0.0">(AGNT (NP6 (HEAD (NP5 "students"))</doubt><doubt alpha="41.4" length="29" tooSmall="False" monospace="0.0">(DET &lt;+DEF&gt;))) (PTNT (* NP)))</doubt><p>The next TOS = (53) is popped and attached to the empty node of (56), hence (57).</p><doubt alpha="43.3" length="30" tooSmall="False" monospace="0.0">(57) (S (V &lt;"respect", -PAST&gt;)</doubt><doubt alpha="44.1" length="34" tooSmall="False" monospace="0.0">(DET &lt;+DEF&gt;))) (PTNT (NP4 &lt;REL&gt;)))</doubt><p>CHECK-PHP is called again, which finds matching entry (58).</p><p>(58) If the CWS contains &lt;REL&gt; and the TOS contains a marked NP, pop the TOS and replace its marked NP with:</p><doubt alpha="64.3" length="28" tooSmall="False" monospace="0.0">(NP (HEAD MARKED) (MOD CWS))</doubt><p>Then remove the mark from MARKED and remove feature &lt;REL&gt; from the CWS.</p><p>Before (58) is applied, the CWS is (57) and the TOS is (51) of which NP4 is marked in accordance with (52). Following (58), therefore, the daughter of the PTNT node of (51) is replaced by (59).</p><doubt alpha="0.0" length="4" tooSmall="False" monospace="0.0">(59)</doubt><doubt alpha="56.0" length="50" tooSmall="False" monospace="0.0">(NP7 (HEAD (NP4 (HEAD (NP3 (HEAD (NP2 "linguist"))</doubt><doubt alpha="54.4" length="68" tooSmall="False" monospace="0.0">(MOD (ADJ "brilliant")) (DET &lt;DEF&gt;))) (MOD (S (V &lt;"respect", -PAST&gt;)</doubt><doubt alpha="40.0" length="30" tooSmall="False" monospace="0.0">(DET &lt;+DEF&gt;))) (PTNT (NP4)))))</doubt><p>The result of this replacement is (60), and it is pushed to the STACK.</p><doubt alpha="0.0" length="4" tooSmall="False" monospace="0.0">(60)</doubt><doubt alpha="47.6" length="42" tooSmall="False" monospace="0.0">(S (V &lt;"love", -PAST&gt;) (AGNT (NP1 "Joan"))</doubt><doubt alpha="57.1" length="56" tooSmall="False" monospace="0.0">(PTNT (NP7 (HEAD (NP4 (HEAD (NP3 (HEAD (NP2 "linguist"))</doubt><doubt alpha="37.5" length="32" tooSmall="False" monospace="0.0">(DET &lt;+DEF&gt;))) (PTNT (NP4)))))))</doubt><p>The next element found in the INPUT BUFFER is EOS (end-of-sentence). So the PROCESSOR pops (60) and sends it to the output device.</p></subsubsection></subsection></section><section number="4" title="Highlights of Some Characteristics of POP"><subsection number="4.1" title="verbal derivational suffixes and case assignment"><p>As illustrated in (2), the same postnominal suffixes mark different relations in Japanese, depending on the verbal derivational suffixes used in the verb complex. Tradi­tional generative grammarians (like Kuno (1973)) tried to explain this by means of a series of transformational rules such as agentive <i>ni </i>attachment, equi-NP deletion, Aux deletion, verb raising, subject marking, object marking, and <i>galni </i>conversion, which were applied cyclically. This transformational approach is still widely practiced by researchers of Japanese linguistics. How­ever, as demonstrated by Sato (1983b), this is unsuitable for application to parsing because many of the transfor­mational rules involved here are non-reversible.</p><p>A relatively recent approach to this problem is to use a set of rules like (61) which Kuroda (1976) calls Canonical Surface Structure Filters and Miyagawa (1980) calls Case Redundancy Rules.</p><doubt alpha="24.1" length="29" tooSmall="False" monospace="0.0">(61) a. [NP —] = = &gt;[NP-ga —]</doubt><doubt alpha="38.7" length="31" tooSmall="False" monospace="0.0">b. [NP NP —] = = &gt;[NP-gaNP-o —]</doubt><doubt alpha="41.5" length="41" tooSmall="False" monospace="0.0">c. [NP NP NP —] = = &gt; [NP-ga NP-m NP-o —]</doubt><p>These rules are invoked after applying all transforma­tional rules (Kuroda 1976) or all word formation rules (Miyagawa 1980), and they attach suffixes to noun phrases as specified in their output, without regard to the functions of the phrases to which they are attached. The selection of case suffixes and the order of their appearance in the surface structure are determined solely by the number of unmarked noun phrases in the sentence. This approach would work well if Japanese speakers always followed the "canonical word order". However, the so-called canonical word order is not always followed.</p><p>Contrary to the theories of Kuroda and Miyagawa which treat Japanese case suffixes as if they were useless appendages which have no syntactic role, POP uses them as integral parts of the input data and, as a result, it does not have to require the input sentences to conform to the "canonical word order". As illustrated in subsection 3.2.2, POP first constructs an expanded sentence tree frame using the SNP patterns that match the SNA's of the derivational suffixes. After this ex­panded frame is completed, arguments are popped from the STACK and attached to appropriate nodes in the usual manner. Note that the flag specifications on the tree frame are automatically adjusted in course of its expansion, so no further adjustment resorting to the "canonical word order" or scrambling is necessary.</p><p>Pault.</p><page local="10" global="29"/><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p></subsection><subsection number="4.2" title="EMBEDDED SENTENCES"><p>As illustrated in subsection 3.2.3, POP handles Japa­nese complex sentences with relative clauses without facing combinatorial explosion. Especially noteworthy is the similarity in the PHP instructions to assemble noun phrases with relative clause in Japanese (37) and in English (58), which are paraphrased in (62).</p><p>(62) PHP entries for assembling NP with a relative clause</p><doubt alpha="52.2" length="23" tooSmall="False" monospace="0.0">a. For Japanese = (37):</doubt><p>1. Pop the TOS (which is a sentence with an empty NP node).</p><p>2. Attach a copy of the CWS (which is an NP) to the first matching empty node of the popped sentence tree.</p><p>3. Assemble a new NP tree with the CWS as its HEAD and the sentence tree assembled in step 2 as its MOD(ifier).</p><doubt alpha="50.0" length="22" tooSmall="False" monospace="0.0">b. For English = (58):</doubt><p>1. Pop the TOS (which is a sentence with a marked NP).</p><p>2. Assemble a new NP tree with the marked NP of the sentence popped in step 1 as its HEAD and the CWS (which is a sentence tree containing an NP node co-indexed with the marked NP according to (52)) as its MOD.</p><p>The only major difference between the two is that in Japanese (62a) the relative clause is in the STACK when the head NP is encountered, while in English (62b) the head NP is a branch of an S tree in the STACK when the relative pronoun is encountered. This difference is a natural consequence of the difference in word order between the two languages (i.e., left-branching vs. right-branching).</p><p>An important fact is that POP for Japanese does not have to know in advance whether the sentence fragment that it is processing is a matrix sentence like (4a) or an embedded sentence like (4b) through (4e).</p></subsection><subsection number="4.3" title="COMPARISON WITH MARCUS'S PARSIFAL"><p>The reader may have wondered if there is any direct relationship between POP and Marcus's PARSIFAL Marcus (1980): both are bottom-up parsers, where at­tachment can be made freely to any matching node in the ACTIVE NODE STACK (Marcus) or the CWS (POP). Therefore, a brief comparison of these two systems may be in order.</p><p>When I heard about Marcus's work for the first time, the development of POP was already well under way: its basic algorithm was already completed and coding had already started. Therefore, the similarity between PAR­SIFAL and POP, if any, is only accidental. Moreover, the basic philosophies of these two systems are differ­ent. Marcus's goal was to build a "strictly determinis­tic" parser for natural language; mine was to build a parser that can handle not only right-branching sen­tences but also left-branching sentences naturally and without facing a combinatorial explosion. POP does not have any back-tracking or parallel parsing mechanism, but the lack of such mechanism was a consequence of the parser's algorithm and not an intended goal.</p><p>In fact, the only significant similarity between PAR­SIFAL and POP is between the former's pattern/action rules and the latter's PHP entries. The latter can be rewritten using the format of the former. However, the similarity ends here. PARSIFAL's rules are partially ordered by a priority scheme; POP's PHP entries are not ordered nor do they have priority over any other entries in the PHP. In PARSIFAL, a grammar rule activates a packet by attaching it to the constituent at the bottom of the ACTIVE NODE STACK, and the packet of rules remains attached to the node even after the node is pushed up.<footnote anchor="6"/> Such rules remain dormant until the node to which they are attached comes at the bottom of the ACTIVE NODE STACK again. On the other hand, POP's PHP pattern does not remain with any node after a phrase tree (or an S tree) is assembled and pushed to the STACK. A copy of PHP pattern is retrieved from the data base each time it becomes necessary. This strategy saves the memory space in the STACK, although it requires a longer processing time.</p><p>POP lacks one of PARSIFAL's most significant characteristics: the distinction between the ACTIVE NODE STACK and the BUFFER. POP also distin­guishes the place where trees are actually constructed (which I informally call here the "work space") and the place where the results are stored (i.e., the STACK). However, the similarity again ends here. POP's "work space" is neither a stack nor a buffer, but a machine-dependent temporary memory space where the program (ASSEMBLE-NP, ASSEMBLE-SENTENCE, etc.) re­trieves and manipulates partial trees popped from the STACK or lexical entries copied from the LEXICON. Unlike PARSIFAL's ACTIVE NODE STACK, POP's "work space" cannot store any partially completed tree which is not "active". Such inactive partial trees are stored in the STACK.</p><p>PARSIFAL's BUFFER is primarily a facility for "look-ahead". Therefore, it contains unprocessed input words as well as phrase trees with no empty node. It contains no phrase tree which has empty nodes, be­cause such trees are stored in the ACTIVE NODE STACK. In contrast, the primary purpose of POP's STACK is to store tree fragments and tree frames. It is not a "look-ahead" facility and therefore does not contain any unprocessed input word. When POP's PROCESSOR looks at an input word, it must process it immediately.</p><p>POP can process sentences like (4) without back­tracking or any look-ahead mechanism, while such sentences would remain "garden path sentences" for Marcus's parser even with its limited look-ahead mech­anism.</p><page local="11" global="30"/><p><b>A Common Parsing Scheme for Left- and Right-Branching Languages</b></p></subsection></section><section number="5" title="Conclusion"><p>POP as presented in this paper is still in its evolving stage, and it needs further refinement. For example, we could include in the common POP core such meta rules as "attach feature &lt;ANIMATE&gt; to AGNT node". As suggested in section 1, we could also augment POP with procedures to build semantic interpretations along with syntactic analysis. Such refinements and improvements will continue.</p><p>However, the basic linguistic theory underlying my scheme may not have to undergo a radical change in the process. According to the theory underlying this work, it is not a set of patterns or rewriting rules that singly determines the grammatical sentences of a language. Rather, it is the patterns (SNP) in conjunction with procedures (PHP) and POP's meta rules that do so. In -other words, this system points the way to a slightly ; different view of grammar competence than a basically Chomskian one, in which one provides a competence grammar that incorporates processing while leaving aside details of performance.</p></section><references><p>Berwick, Robert C. and Weinberg, Amy S. 1982 Parsing Efficiency, Computational Complexity, and the Evaluation of Grammatical Theories. <i>Linguistic Inquiry </i>13(2): 165-191.</p><p>Bresnan, Joan W. 1978 A Realistic Transformational Grammar. In Halle, Morris; Bresnan, Joan W.; and Miller, G. A., Eds., <i>Linguistic Theory and Psychological Reality. </i>MIT Press, Cam­bridge, Massachusetts: 1-59.</p><p>Fillmore, Charles J. 1968 The Case for Case. In Bach, E. and Harms, R.T., Eds., <i>Universals in Linguistic Theory. </i>Holt, Rinehart and Winston, New York.</p><p>Fodor, Janet D. 1979 Superstrategy. In Cooper, William E. and Walker, Edward CT., Eds., <i>Sentence Processing: Psycholinguis-tic Studies Presented to Merrill Garrett. </i>Lawrence Erlbaum, Hillsdale, New Jersey: 249-279.</p><p>Kaplan, Ronald M. 1972 Augmented Transition Networks as Psycho­logical Models of Sentence Comprehension. <i>Artificial Intelligence </i>3:77-100.</p><p>Kuno, Susumu. 1973 <i>The Structure of the Japanese Language. </i>MIT</p><p>Press, Cambridge, Massachusetts. Kuroda, S-Y. 1976 A lecture given to graduate students and faculty members of the Linguistics Department of the University of</p><p>Massachusetts at Amherst.</p><p>Marcus, Mitchell P. 1980 <i>A Theory of Syntactic Recognition for Natural Language. </i>MIT Press, Cambridge, Massachusetts.</p><p>Miyagawa, Shigeru. 1980 <i>Complex Verbs and the Lexicon. </i>Coyote Papers, Vol. 1. University of Arizona, Tucson, Arizona. (Orig­inally a Ph.D. dissertation, University of Arizona.)</p><p>Sato, Paul T. 1982 The Status of "Particles" and Its Typological Implications. <i>Papers in Japanese Linguistics </i>8:191-205.</p><p>Sato, Paul T. 1983a On-line Parsing Strategies for English and Japanese. A panel presentation at AAS Symposium on Japanese Language on the Computer, in San Francisco, California.</p><p>Sato, Paul T. 1983b Lexicalist vs. Tarnsformationalist Hypothesis on Parsing Japanese Phrases with Complex Verbs. Presented at the Linguistic Conference on East Asian Languages: Verb Phrases, in Los Angeles, California. (Reprinted in Kim, Nam-Kil and Tiee, Henry H., Eds. 1985 <i>Studies in East Asian Linguistics. </i>Depart­ment of East Asian Languages and Cultures, University of South­ern California, Los Angeles, California: 155-165.)</p><p>Tomita, Masaru. 1986 <i>Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. </i>Kluwer Academic Publishers, Boston, Massachusetts.</p><p>Wanner, E. and Maratsos, M. 1978 An ATN Approach to Compre­hension. In Halle, Morris; Bresnan, Joan W.; and Miller, G.A., Eds., <i>Linguistic Theory and Psychological Reality. </i>MIT Press, Cambridge, Massachusetts: 119-161.</p><p>Woods, William A. 1970 Transition Network Grammar for Natural Language Analysis. <i>Communications of the ACM </i>13:591-606.</p><p>Woods, William A. 1973 An Experimental Parsing System for Tran­sition Networks. In Rustin, R., Ed., <i>Natural Language Process­ing. </i>Algorithmics Press, New York: 111-154.</p><doubt alpha="100.0" length="5" tooSmall="False" monospace="0.0">Notes</doubt><p>1. These postnominal suffixes are usually called "particles", but see Sato (1982).</p><p>2. For the sake of readability, I present all PHP entries cited in this paper in their English translation.</p><p>3. "John" is an abbreviation of a bundle of features, &lt;N, +PROPER, +HUMAN, +MALE, -PLURAL, . . .&gt;. For con­venience' sake, such feature bundles are often rendered in this paper by an English word enclosed in quotation marks.</p><p>4. In fact, this &lt;Q&gt; attachment does not add another &lt;Q&gt; to the V node because there is already a &lt;Q&gt; there. Note that POP's attachment function uses UNION.</p><p>5. As mentioned in section 2, POP always keeps a copy of the most recently assembled NP in LNP REGISTER, or the "last (assem­bled) NP register", although I have not indicated this each time it occurred.</p><p>6. Marcus (1980) uses the phrase "associate with" instead of "attach to" here. PARSIFAL's ACTIVE NODE STACK grows downward.</p></references></body></article>