<?xml version="1.0"?><!DOCTYPE article SYSTEM "/project/take/software/searchbench_offline_processing/paperxml_generator/aclextractor/src/python/../resource/dtd/paperxml.dtd"><article><header><firstpageheader><page local="1"/><title>Backward Beam Search Algorithm for Dependency Analysis of Japanese</title><author surname="Sekine" givenname="Satoshi"><org  name="New York University" country="USA" city="New York"/></author><author surname="Uchimoto" givenname="Kiyotaka"><org  name="York University" country="Canada" city="North York"/></author><author surname="Isahara" givenname="Hitoshi"><org  name="Communications Research Laboratory" country="Japan" city="Kyoto"/></author></firstpageheader><frontmatter><p><b>Backward Beam Search Algorithm for Dependency Analysis of Japanese</b></p><p><b>Satoshi Sekine</b></p><p>Computer Science Department New York University 715 Broadway, 7th floor New York, NY 10003, USA sekine@cs.nyu.edu</p><p><b>Kiyotaka Uchimoto Hitoshi Isahara</b></p><p>Communications Research Laboratory 588-2 Iwaoka, Iwaoka-cho, Nishi-ku, Kobe, Hyogo, 651-2492, Japan <b>[uchimoto,isahara]@crl.go.jp</b></p></frontmatter><abstract>Backward beam search for dependency analy­sis of Japanese is proposed. As dependencies normally go from left to right in Japanese, it is effective to analyze sentences backwards (from right to left). The analysis is based on a statisti­cal method and employs a beam search strategy. Based on experiments varying the beam search width, we found that the accuracy is not sen­sitive to the beam width and even the analysis with a beam width of 1 gets almost the same de­pendency accuracy as the best accuracy using a wider beam width. This suggested a determin­istic algorithm for backwards Japanese depen­dency analysis, although still the beam search is effective as the N-best sentence accuracy is quite high. The time of analysis is observed to be quadratic in the sentence length. </abstract></header><body><section number="1" title="Introduction"><p>Dependency analysis is regarded as one of the standard methods of Japanese syntactic anal­ysis. The Japanese dependency structure is usually represented by the relationship between phrasal units called 'bunsetsu'. A bunsetsu usu­ally contains one or more content words, like a noun, verb or adjective, and zero or more func­tion words, like a postposition (case marker) or verb/noun suffix. The relation between two bunsetsu has a direction from a dependent to its head. Figure 1 shows examples of bunsetsu and dependencies. Each bunsetsu is separated by <b>"I". </b>The first segment <b>"KARE-HA" </b>consists of two words, <b>KARE </b>(He) and <b>HA </b>(subject case marker). The numbers in the "head" line show the head ID of the corresponding bunsetsus. Note that the last segment does not have a head, and it is the head bunsetsu of the sentence. The task of the Japanese dependency analysis is to find the head ID for each bunsetsu.</p><p>The analysis proposed in this paper has two conceptual steps. In the first step, dependency likelihoods are calculated for all possible pairs of bunsetsus. In the second step, an optimal de­pendency set for the entire sentence is retrieved. In this paper, we will mainly discuss the second step, a method for finding an optimal depen­dency set. In practice, the method proposed in this paper should be able to be combined with any systems which calculate dependency likeli­hoods.</p><p>It is said that Japanese dependencies have the following characteristics<footnote anchor="1"/> :</p><p>(1) Dependencies are directed from left to right (2) Dependencies don't cross</p><p>(3) Each segment except the rightmost one has only one head</p><p>(4) In many cases, the left context is not nec­essary to determine a dependency</p><p>The analysis method proposed in this paper as­sumed these characteristics and is designed to utilize them. Based on these assumptions, we can analyze a sentence backwards (from right to left) in an efficient manner. There are two merits to this approach. Assume that we are analyzing the M-th segment of a sentence of length <i>N </i>and analysis has already been done for the (M + 1)-th to N-th segments <i>(M&lt;</i><i> </i><i>N</i>).</p><p>The first merit is that the head of the depen­dency of the <i>M</i>-th segment is one of the seg-<page local="2"/></p><footnote label="1">Of course, there are several exceptions (S.Shirai, 1998), but the frequencies of such exceptions are neg­ligible compared to the current precision of the system. We believe those exceptions have to be treated when the problems we are facing at the moment are solved. As­sumption (4) has not been discussed very much, but our investigation with humans showed that it is true in more than 90% of the cases.</footnote><doubt alpha="33.3" length="6" tooSmall="False" monospace="0.0">ID 1 2</doubt><p><b>KARE-HA | FUTATABI (He-subj) (again)</b> <b>|  PAI-WO  |  TSUKURI,   | KANOJO-NI | OKUTTA.</b><b></b></p><doubt alpha="28.6" length="14" tooSmall="False" monospace="0.0">Head       6 4</doubt><doubt alpha="0.0" length="37" tooSmall="False" monospace="0.0">3               4                 5 6</doubt><doubt alpha="55.0" length="40" tooSmall="False" monospace="0.0">(pie-obj)    (made ,) (to her) (present)</doubt><doubt alpha="0.0" length="29" tooSmall="False" monospace="0.0">4            6            6 -</doubt><p><b>Translation: He made a pie again and presented it to her.</b></p><figure caption="Figure 1: Example a Japanese sentence, bunsetsus and dependencies"></figure><p>ments between <i>M </i>+ 1 and <i>N </i>(because of as­sumption 1), which are already analyzed. Be­cause of this, we don't have to keep a huge num­ber of possible analyses, i.e. we can avoid some­thing like active edges in a chart parser, or mak­ing parallel stacks in GLR parsing, as we can make a decision at this time. Also, we can use the beam search mechanism, by keeping only a certain number of analysis candidates at each segment. The width of the beam search can be easily tuned and the memory size of the pro­cess is proportional to the product of the input sentence length and the beam search width.</p><p>The other merit is that the possible heads of the dependency can be narrowed down be­cause of the assumption of non-crossing depen­dencies (assumption 2). For example, if the <i>K-th </i>segment depends on the L-th segment <i>(M&lt;</i><i> </i><i>K</i><i> </i><i>&lt;</i><i> </i><i>L),</i><i> </i>then the <i>M</i>-th segment can't depend on any segments between <i>K </i>and L. According to our experiment, this reduced the number of heads to consider to less than 50%.</p><p>The technique of backward analysis of Japanese sentences has been used in rule-based methods, for example (Fujita, 1988). How­ever, there are several difficulties with rule-based methods. First the rules are created by humans, so it is difficult to have wide cover­age and keep consistency of the rules. Also, it is difficult to incorporate a scoring scheme in rule-based methods. Many such methods used heuristics to make deterministic decisions (and backtracking if it fails in a searching) rather than using a scoring scheme. However, the com­bination of the backward analysis and the sta­tistical method has very strong advantages, one of which is the beam search.</p></section><section number="2" title="Statistic framework"><p>We combined the backward beam search strat­egy with a statistical dependency analysis. The detail of our statistic framework is described in (Uchimoto et al., 1999). There have been a lot of proposals for statistical analysis, in many languages, in particular in English and Japanese (Magerman, 1995) (Sekine and Grish-man, 1995) (Collins, 1997) (Ratnaparkhi, 1997) (K.Shirai et.al, 1998) (Fujio and Matsumoto, 1998) (Haruno et.al, 1997) (Ehara, 1998). One of the most advanced systems in English is pro­posed by Ratnaparkhi. It uses the Maximum Entropy (ME) model and both of the accuracy and the speed of the system are among the best reported to date. Our system uses the ME model, too. In the ME model, we define a set of features which are thought to be useful in dependency analysis, and it learns the weights of the features from training data. Our features include part-of-speech, inflections, lexical items, the existence of a comma or bracket between the segments, and the distance between the seg­ments. Also, combinations of those features are used as additional features. The system cal­culates the probabilities of dependencies based on the model, which is trained using a training corpus. The probability of an entire sentence is derived from the product of the probabilities of all the dependencies in the sentence. We choose the analysis with the highest probability to be the analysis of the sentence. Although the ac­curacy of the analyzer is not the main issue of the paper, as any types of models which use de­pendency probabilities can be implemented by our method, the performance reported in (Uchi-moto et al., 1999) is one of the best results re­ported by statistically based systems.</p><page local="3"/></section><section number="3" title="Algorithm"><p>In this section, the analysis algorithm will be de­scribed. First the algorithm will be illustrated using an example, then the algorithm will be formally described. The main characteristics of the algorithm are the backward analysis and the beam search.</p><p>The sentence <b>"KARE-HA FUTATABI PAI-WO TSUKURI, KANOJO-NI OKUTTA</b>.(Hemadeapie again and presented it to her)" is used as an in­put. We assume the POS tagging and segmen­tation analysis have been done correctly before starting the process. The border of each seg­ment is shown by " <b>| </b>". In the figures, the head of the dependency for each segment is represented by the segment number shown at the top of each segment.</p><doubt alpha="66.7" length="21" tooSmall="True" monospace="0.0">KARE-HA  | FUTATABI |</doubt><doubt alpha="0.0" length="1" tooSmall="True" monospace="0.0">3</doubt><doubt alpha="83.3" length="6" tooSmall="True" monospace="0.0">PAI-WO</doubt><doubt alpha="50.0" length="30" tooSmall="True" monospace="0.0">4 5 | TSUKURI,   | KANOJO-NI |</doubt><doubt alpha="0.0" length="1" tooSmall="True" monospace="0.0">6</doubt><doubt alpha="85.7" length="7" tooSmall="True" monospace="0.0">OKUTTA.</doubt><doubt alpha="48.5" length="68" tooSmall="True" monospace="0.0">(He-subj)    (again)    (pie-obj)    (made  ,)    (to her) (present)</doubt><p>Algorithm</p></section><section number="1." title="Analyze up to the second segment from the end"><p>The last segment has no dependency, so we don't have to analyze it. The second seg­ment from the end always depends on the last segment. So the result up to the sec­ond segment from the end looks like the following.</p><doubt alpha="76.3" length="38" tooSmall="True" monospace="0.0">&lt;Up to the third segment from the end&gt;</doubt><doubt alpha="0.0" length="2" tooSmall="True" monospace="0.0">12</doubt><doubt alpha="77.8" length="18" tooSmall="True" monospace="0.0">KARE-HA | FUTATABI</doubt><doubt alpha="64.7" length="17" tooSmall="True" monospace="0.0">(He-subj) (again)</doubt><doubt alpha="52.2" length="23" tooSmall="True" monospace="0.0">34| PAI-WO | TSUKURI, |</doubt><doubt alpha="47.6" length="21" tooSmall="True" monospace="0.0">(pie-obj)    (made ,)</doubt><doubt alpha="66.7" length="21" tooSmall="True" monospace="0.0">56KANOJO-NI | OKUTTA.</doubt><doubt alpha="66.7" length="18" tooSmall="True" monospace="0.0">(to her) (present)</doubt><doubt alpha="0.0" length="5" tooSmall="True" monospace="0.0">(0.9)</doubt></section><section number="3." title="The fourth segment from the end"><p>For each of the two candidates created at the previous stage, the dependencies of the fourth segment from the end <b>("PAI-WO") </b>will be analyzed. For <b>Cand1</b>,thesegment can't have a dependency to the fifth seg­ment <b>("KANOJO-NI"), </b>because of the non-crossing assumption. So the probabili­ties of the dependencies only to the fourth <b>(Cand1-1) </b>and the sixth <b>(Cand1-2</b>)seg-ments are calculated. In the example, these probabilities are assumed to be 0.6 and 0.4. A similar analysis is conducted for <b>Cand2 </b>(here probabilities are assumed to be 0.5, 0.1 and 0.4) and three candidates are cre­ated <b>(Cand2-1, Cand2-2 </b>and <b>Cand2-3).</b></p><doubt alpha="76.9" length="39" tooSmall="True" monospace="0.0">&lt;Up to the fourth segment from the end&gt;</doubt></section><section number="12" title="KARE-HA | FUTATABI"><doubt alpha="51.3" length="39" tooSmall="True" monospace="0.0">Cand1-1 Cand1-2 Cand2-1 Cand2-2 Cand2-3</doubt><doubt alpha="0.0" length="34" tooSmall="True" monospace="0.0">(0.54) (0.36) (0.05) (0.04) (0.01)</doubt><doubt alpha="76.9" length="39" tooSmall="True" monospace="0.0">&lt;Up to the second segment from the end&gt;</doubt><doubt alpha="0.0" length="1" tooSmall="True" monospace="0.0">5</doubt><doubt alpha="73.7" length="19" tooSmall="True" monospace="0.0">KANOJO-NI | OKUTTA.</doubt><doubt alpha="0.0" length="2" tooSmall="True" monospace="0.0">6-</doubt></section><section number="2." title="The third segment from the end"><p>This segment <b>("TSUKURI,") </b>has two depen­dency candidates. One is the 5th segment <b>("KANOJO-NI") </b>and the other is the 6th seg­ment <b>("OKUTTA</b>").Now,weusetheproba-bilities calculated using the ME model in order to assign probabilities to the two can­didates <b>(Cand1 </b>and <b>Cand2 </b>in the following figure). Let's assume the probabilities 0.1 and 0.9 respectively as an example. At the tail of each analysis, the total probability (the product of the probabilities of all de­pendencies) is shown. The candidates are sorted by the total probability.</p><p>As the analysis proceeds, a large number (almost L!) of candidates will be created. However, by limiting the number of candi­dates at each stage, the total number of candidates can be reduced. This is the beam search, one of the characteristics of the algorithm. By observing the analyses in the example, we can easily imagine that this beam search may not cause a serious problem in performance, because the candi­dates with low probabilities may be incor­rect anyway. For instance, when we set the beam search width = 3, then <b>Cand2-2 </b>and <b>Cand2-3 </b>in the figure will be discarded at this stage, and hence won't be used in the following analyses. The relationship of the beam search width and the accuracy ob­served in our experiments will be reported in the next section.</p><doubt alpha="80.0" length="5" tooSmall="True" monospace="0.0">Cand1</doubt><doubt alpha="80.0" length="5" tooSmall="True" monospace="0.0">Cand2</doubt><doubt alpha="0.0" length="5" tooSmall="True" monospace="0.0">(0.1)</doubt><doubt alpha="77.8" length="9" tooSmall="True" monospace="0.0">&lt;Initial&gt;</doubt><doubt alpha="100.0" length="4" tooSmall="True" monospace="0.0">Cand</doubt><page local="4"/></section><section number="4." title="Up to the first segment"><p>The analyses are conducted in the same way up to the first segment. For example, the result of the analysis for the entire sen­tence will be shown below. (Appropriate probabilities are used.)</p><doubt alpha="76.0" length="25" tooSmall="True" monospace="0.0">&lt;Up to the first segment&gt;</doubt><doubt alpha="25.0" length="8" tooSmall="True" monospace="0.0">ID123456</doubt><doubt alpha="66.7" length="60" tooSmall="True" monospace="0.0">KARE-HA | FUTATABI | PAI-WO | TSUKURI, | KANOJO-NI | OKUTTA.</doubt><doubt alpha="49.3" length="67" tooSmall="True" monospace="0.0">(He-subj)    (again)    (pie-obj)    (made ,)    (to her) (present)</doubt><doubt alpha="15.4" length="26" tooSmall="True" monospace="0.0">Cand1   6 4 4 6 6 - (0.11)</doubt><doubt alpha="15.4" length="26" tooSmall="True" monospace="0.0">Cand2   4 4 6 6 6 - (0.09)</doubt><doubt alpha="15.4" length="26" tooSmall="True" monospace="0.0">Cand3   6 4 6 5 6 - (0.05)</doubt><p>Now, the formal algorithm is described induc­tively in Figure 3. The order of the analysis is quadratic in the length of the sentence.</p></section><section number="4" title="Experiments"><p>In this section, experiments and evaluations will be reported. We use the Kyoto University Cor­pus (version 2) (Kurohashi et.al, 1997), a hand created Japanese corpus with POS-tags, bun-setsu segments and dependency information. The sentences in the articles from January 1, 1994 to January 8, 1994 (7,960 sentences) are used for the training of the ME model, and the sentences in the articles of January 9, 1994 (1,246 sentences) are used for the evaluation. The sentences in the articles of January 10, 1994 are kept for future evaluations.</p><subsection number="4.1" title="Basic Result"><p>The evaluation result of our system is shown in Table 1. The experiment uses the correctly seg­mented and part-of-speech tagged sentences of the Kyoto University corpus. The beam search width is set to 1, in other words, the system runs deterministically. Here, 'dependency accuracy'</p><table caption="Table 1: Evaluation"></table><doubt alpha="45.5" length="77" tooSmall="False" monospace="0.0">Dependency accuracy I 87.14% (9814/11263) Sentence accuracy 40.60% (503/1239)</doubt><p>Average analysis time 0.03 sec is the percentage of correctly analyzed depen­dencies out of all dependencies. 'Sentence accu­racy' is the percentage of the sentences in which all the dependencies are analyzed correctly.</p></subsection><subsection number="4.2" title="Beam search width and accuracy"><p>In this subsection, the relationship between the beam width and the accuracy is discussed. In principle, the wider the beam search width, the more analyses can be retained and the better the accuracy can be expected. However, the re­sult is somewhat different from the expectation. Table 2 shows the dependency accuracy and sentence accuracy for beam widths 1 through 20. The difference is very small, but the best accuracy is obtained when the beam width is 11 (for the dependency accuracy), and 2 and 3 (for the sentence accuracy). This proves that there are cases where the analysis with the highest product of probabilities is not correct, but the analysis decided at each stage is correct. This is a very interesting result of our experiment, and it is related to assumption 4 regarding Japanese dependency, mentioned earlier.</p><table caption="Table 2: Relationship between beam width and accuracy"></table><p>This suggests that when we analyze a Japanese sentence backwards, we can do it de-terministically without great loss of accuracy. Table 3 shows where the analysis with beam width 1 appears among the analyses with beam width 200. It shows that most deterministic analyses appear as the best analysis in the non-deterministic analyses. Also, among the deter­ministic analyses which are correct (503 sen­tences), 498 sentences (99.0%) have the same analysis at the best rank in the 200-beam-width analyses. (Followed by 3 sentences at the sec­ond, 1 sentence each at the third and fifth rank.) It means that in most of the cases, the analysis<page local="5"/></p><table class="main" frame="box" rules="all" border="1" regular="False"><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Beam width</p></td><td class="cell"><p>Dependency Accuracy</p></td><td class="cell"><p>Sentence Accuracy</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>1</p></td><td class="cell"><p>87.14</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>2</p></td><td class="cell"><p>87.16</p></td><td class="cell"><p>40.76</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>3</p></td><td class="cell"><p>87.20</p></td><td class="cell"><p>40.76</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>4</p></td><td class="cell"><p>87.15</p></td><td class="cell"><p>40.68</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>5</p></td><td class="cell"><p>87.14</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>6</p></td><td class="cell"><p>87.16</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>7</p></td><td class="cell"><p>87.20</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>10</p></td><td class="cell"><p>87.20</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>15</p></td><td class="cell"><p>86.21</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>20</p></td><td class="cell"><p>86.21</p></td><td class="cell"><p>40.60</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr></table><p><b>&lt;Variable&gt;</b> <b>Length:</b><b>    Length of the input sentence in segments W: The beam search width</b> <b>C[len]:</b><b>    Candidate list; C for each segment keeps</b> <b>the top W partial analyses from that segment to the last segment.</b><b></b></p><p><b>&lt;Initial Operation&gt;</b> <b>The second segment from the end depends on the last segment.</b><b> This analysis is stored in C[Length-1].</b></p><p><b>&lt;Inductive Operation&gt;</b> <b>Assume the analysis up to the (M+1)-th segment has been finished.</b><b> For each candidate 'c' in C[M+1], do the following operation.</b></p><p><b>Compute the possible dependencies of the M-th segment compatible</b> <b>with 'c'.</b><b> For each dependency, create a new candidate 'd' by</b> <b>adding the dependency to 'c'.</b><b> Calculate the probability of 'd'.</b></p><doubt alpha="64.0" length="50" tooSmall="False" monospace="0.0">If C[M] has fewer than W entries, add 'd' to C[M];</doubt><p><b>else if the probability of 'd' &gt; the probability of the least</b> <b>probable entry of C[M], replace this entry by 'd';</b> <b>else ignore 'd'.</b><b></b></p><p><b>When the operation finishes for all candidates in C[M+1], proceed to the analysis of the (M-1)-th segment.</b></p><p><b>Repeat the operation until the first segment is analyzed. The best analysis for the sentence is the best candidate in C[1].</b></p><figure caption="Figure 2: Formal Algorithm"></figure><p>with the highest probability at each stage also has the highest probability as a whole. This is related to assumption 4. The best analysis with the left context and the best analysis without the left context are the same 95% of the time in general, and 99% of the time if the analysis is correct. These numbers are much higher than our human experiment mentioned in the ear­lier footnote (note that the number here is the percentage in terms of sentences, and the num­ber in the footnote is the percentage in terms of segments.) It means that we may get good ac­curacy even without left contexts in analyzing</p><p>Japanese dependencies. <b>4.3   N-Best accuracy</b></p><p>As we can generate N-best results, we measured N-best sentence accuracy. Figure 3 shows the N-best accuracy. N-best accuracy is the per­centage of the sentences which have the correct analysis among its top N analyses. By setting a large beam width, we can observe N-best ac­curacy. The table shows the N-best accuracy when the beam width is set to 20. When we set <i>N </i>= 20, 78.5% of the sentences have the cor­rect analysis in the top 20 analyses. If we have<page local="6"/></p><p>Sentence Accuracy an ideal system for finding the correct analysis among them, which may use semantic or con­text information, we can have a very accurate analyzer.</p><table caption="Table 3: The rank of the deterministic analysis"></table><doubt alpha="0.0" length="1" tooSmall="False" monospace="0.0">0</doubt><doubt alpha="0.0" length="14" tooSmall="False" monospace="0.0">80 70 60 50 40</doubt><doubt alpha="0.0" length="6" tooSmall="False" monospace="0.0">78.53%</doubt><doubt alpha="47.7" length="44" tooSmall="False" monospace="0.0">30~i—I—I—I—I—I—I—I—I—I—I—I—I—I—I—I—I—I—I—I—I</doubt><doubt alpha="0.0" length="26" tooSmall="False" monospace="0.0">0      5      10     15 20</doubt><doubt alpha="100.0" length="1" tooSmall="False" monospace="0.0">N</doubt><figure caption="Figure 3: N-best sentence Accuracy"></figure><p>We can make two interesting observations from the result. The accuracy of the 1-best analysis is about 40%, which is more than half of the accuracy of 20-best analysis. This shows that although the system is not perfect, the computation of the probabilities is probably good in order to find the correct analysis at the top rank.</p><p>The other point is that the accuracy is sat­urated at around 80%. Improvement over 80% seems very difficult even if we use a very large beam width <i>W</i>. (If we set <i>W </i>to the number of all possible combinations, which means al­most <i>L! </i>for sentence length <i>L,</i><i> </i>we can get 100% N-best accuracy, but this is not worth consider­ing.) This suggests that we have missed some­thing important. In particular, from our inves­tigation of the result, we believe that coordinate structure is one of the most important factors to improve the accuracy. This remains one area of future work.</p></subsection><subsection number="4.4" title="Speed of the analysis"><p>Based on the formal algorithm, the analysis time can be estimated as proportional to the square of the input sentence length. Figure 4 shows the relationship between the analysis time and the sentence length when we set the beam width to 1. We use a Sun Ultra10 ma­chine and the process size is about 8M byte. We can see that the actual analyzing time al-</p><p>Analysis time (sec.)</p><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">0.3</doubt><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">0.2</doubt><doubt alpha="0.0" length="3" tooSmall="False" monospace="0.0">0.1</doubt><doubt alpha="0.0" length="4" tooSmall="False" monospace="0.0">2030</doubt><p>Sentence length most follows the quadratic curve. The average analysis time is 0.03 second and the average sen­tence length is 10 segments. The analysis time for the longest sentence (41 segments) is 0.29 second. We have not optimized the program in terms of speed and there is room to shrink the process size.</p><figure caption="Figure 4: Relationship between sentence length and analyzing time"></figure><table class="main" frame="box" rules="all" border="1" regular="False"><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Rank</p></td><td class="cell"><p>1</p></td><td class="cell"><p>2</p></td><td class="cell"><p>3</p></td><td class="cell"><p>4</p></td><td class="cell"><p>5</p></td><td class="cell"><p>6</p></td><td class="cell"><p>7</p></td><td class="cell"><p>8</p></td><td class="cell"><p>9</p></td><td class="cell"><p>10</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Frequency (%)</p></td><td class="cell"><p>1175 (95.3)</p></td><td class="cell"><p>20 (1.6)</p></td><td class="cell"><p>11 (0.9)</p></td><td class="cell"><p>8</p><p>(0.6)</p></td><td class="cell"><p>4</p><p>(0.3)</p></td><td class="cell"><p>2</p><p>(0.2)</p></td><td class="cell"><p>1</p><p>(0.1)</p></td><td class="cell"><p>2</p><p>(0.2)</p></td><td class="cell"><p>0</p></td><td class="cell"><p>3</p><p>(0.2)</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Rank</p></td><td class="cell"><p>11</p></td><td class="cell"><p>12</p></td><td class="cell"><p>13</p></td><td class="cell"><p>14</p></td><td class="cell"><p>15</p></td><td class="cell"><p>16</p></td><td class="cell"><p>17</p></td><td class="cell"><p>18</p></td><td class="cell"><p>19</p></td><td class="cell"><p>20 and more</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"><p>Frequency (%)</p></td><td class="cell"><p>1</p><p>(0.1)</p></td><td class="cell"><p>0</p></td><td class="cell"><p>1</p><p>(0.1)</p></td><td class="cell"><p>0</p></td><td class="cell"><p>1</p><p>(0.1)</p></td><td class="cell"><p>0</p></td><td class="cell"><p>1</p><p>(0.1)</p></td><td class="cell"><p>1</p><p>(0.1)</p></td><td class="cell"><p>0</p></td><td class="cell"><p>8</p><p>(0.6)</p></td><td class="cell"></td></tr><tr class="row"><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td><td class="cell"></td></tr></table><page local="7"/></subsection></section><section number="5" title="Conclusion"><p>In this paper, we proposed a statistical Japanese dependency analysis method which processes a sentence backwards. As dependencies normally go from left to right in Japanese, it is effective to analyze sentences backwards (from right to left). In this paper, we proposed a Japanese de­pendency analysis which combines a backward analysis and a statistical method. It can nat­urally incorporate a beam search strategy, an effective way of limiting the search space in the backward analysis. We observed that the best performances were achieved when the width is very small. Actually, 95% of the analyses ob­tained with beam width=1 were the same as the best analyses with beam width=20. The analysis time was proportional to the square of the sentence length (number of segments), as was predicted from the algorithm. The average analysis time was 0.03 second (average sentence length was 10.0 bunsetsus) and it took 0.29 sec­ond to analyze the longest sentence, which has 41 segments. This method can be applied to various languages which have the same or simi­lar characteristics of dependencies, for example Koran, Turkish etc.</p></section><references><p>Adam Berger and Harry Printz.  1998 : "A Comparison of Criteria for Maximum En­tropy / Minimum Divergence Feature Selec­tion". <i>Proceedings of the EMNLP-98 </i>97-106 Michael Collins.  1997 :    "Three Generative, Lexicalized Models for Statistical Parsing". <i>Proceedings of the ACL-97 </i>16-23 Terumasa Ehara.   1998 :     "Calculation of Japanese dependency likelihood based on Maximum Entropy model".  <i>Proceedings of the ANLP, Japan </i>382-385 Masakazu Fujio and Yuuji Matsumoto. 1998 : "Japanese Dependency Structure Analysis based on Lexicalized Statistics". <i>Proceedings</i></p><doubt alpha="55.6" length="18" tooSmall="False" monospace="0.0">oftheEMNLP-9887-96</doubt><p>Katsuhiko Fujita. 1988 : "A Trial of determin­istic dependency analysis". <i>Proceedings </i><i>ofthe</i><i> Japanese Artificial Intelligence Annual meet­ing </i>399-402</p><p>Masahiko Haruno and Satoshi Shirai and Yoshi-fumi Ooyama. 1998 : "Using Decision Trees to Construct a Practical Parser". <i>Proceedings</i></p><doubt alpha="58.6" length="29" tooSmall="False" monospace="0.0">ofthethe COLING/ACL-98505-511</doubt><p>Sadao Kurohashi and Makoto Nagao.  1994 : "KN Parser :   Japanese Dependency/Case Structure Analyzer". <i>Proceedings ofThe In­ternational Workshop on Sharable Natural Language Resources </i>48-55 Sadao Kurohashi and Makoto Nagao.  1997 : "Kyoto University text corpus project". <i>Pro­ceedings </i><i>of</i><i> the ANLP, Japan </i>115-118 David Magerman. 1995 : "Statistical Decision-Tree Models for Parsing". <i>Proceedings </i><i>ofthe</i><i> ACL-95 </i>276-283 Adwait Ratnaparkhi. 1997 :   "A Linear Ob­served Time Statistical Parser Based on Maximum Entropy Models". <i>Proceedings </i><i>of</i><i> EMNLP-97</i></p><p>Satoshi Sekine and Ralph Grishman. 1995 : "A Corpus-based Probabilistic Grammar with Only Two Non-terminals". <i>Proceedings </i><i>ofthe</i><i> IWPT-95 </i>216-223 Satoshi Shirai. 1998 : "Heuristics and its lim­itation". <i>Journal </i><i>of</i><i> the ANLP, Japan </i>Vol.5</p><doubt alpha="22.2" length="9" tooSmall="False" monospace="0.0">No.1, 1-2</doubt><p>Kiyoaki Shirai, Kentaro Inui, Takenobu Toku-naga and Hozumi Tanaka. 1998 : "An Em­pirical Evaluation on Statistical Parsing of Japanese Sentences Using Lexical Association</p><p>Statistics". <i>Proceedings </i><i>ofEMNLP-98</i><i> </i>80-86</p><p>Kiyotaka Uchimoto, Satoshi Sekine, Hitoshi Isahara. 1999 : "Japanese Dependency Structure Analysis Based on Maximum En­tropy Models". <i>Proceedings </i><i>of</i><i> the EACL-99</i></p><doubt alpha="22.2" length="9" tooSmall="False" monospace="0.0">pp196-203</doubt></references></body></article>