Using Hidden Decompose Markov Modeling Human-Written to Summaries

Hongyan Jing ∗ Lucent Technologies, Bell Laboratories Professionalsummarizersoftenreuseoriginaldocumentstogeneratesummaries.Thetaskofsummary sentence decomposition is to deduce whether a summary sentence is constructed by reusing the original text and to identify reused phrases. Specifically, the decomposition program needs to answer three questions for a given summary sentence: (1) Is this summary sentence constructed by reusing the text in the original document? (2) If so, what phrases in the sentence come from the original document? and (3) From where in the document do the phrases come? Solving the decomposition problem can lead to better text generation techniques for summarization. Decomposition can also provide large training and testing corpora for extraction-based summarizers. We propose a hidden Markov model solution to the decomposition problem. Evaluations show that the proposed algorithm performs well.

Introduction

We define a problem program referred is to as summary sentence decomposition . in The goal of a de- composition in to determine original the relations between Our phrases of a and corresponding document. analysis of summary written phrases the from indicated original that professional produce summarizers often a set Unlike rely on human- summaries has and the document most cutting pasting text modification, professional however, which to summaries. extract sentences or paragraphs without cur- rent automatic summarizers, any operations. summarizers edit the extracted text using a number of revision Decomposition of human-written it is summaries involves to how constructed by humans. Specifically, analyzing we a summary sen- tence mary determine problem Given define the sum- sentence decomposition program needs as follows: a human-written sentence, mary to answer in original three questions: (1) (2) Is summary a decomposition sentence by the the document? If this what sum- in constructed from reusing original text (3) From where so, phrases the sentence come nent is do the from phrases original come? Here, the document? and in the document the term phrase in refers summary. to any sentence compo- that granularity, cut the from document word to and reused the A phrase can be at any a single a complicated verb phrase to a complete sentence. problem. There First, are two primary benefits of solving the summary sentence decomposition Most decomposition can lead to better text generation on techniques in produce summa- rization. domain-independent summarizers rely incoherent, simple extraction to summaries, By even though extracted sentences can be we redundant, or mis- leading. decomposing human-written sentences, can deduce how summary sen[BAR] ∗ 600 Mountain was Avenue, while Murray Hill, NJ 07974. E-mail: Columbia hjing@research.bell-labs.com. University. The work reported here completed the author attended © c 2002 Association for Computational Linguistics Computational Linguistics Volume 28, Number 4 tences are constructed by humans. we By learning how humans programs use revision operations to edit extracted operations sentences, can develop automatic to simulate these revision and build a better text provides generation system for summarization. Second, the decomposition result also with large corpora for extraction-based original-document we summarizers. By aligning summary most sentences important in input sentences, By can automatically annotate ically, we the to mark sentences importance an document. of doing this automat- providing can afford content for a large set documents, thereby We valuable propose training and problem. a Markov testing data In next model sets for extraction-based summarizers. hidden we solution to the summary sentence professional section, show by example the revision operations decompo- sition the problem summarizers. In Section mathematically 3, we our used by present solution to the by formulating the decomposition problem decompo- sition first Hidden Markov Model. In Section 4, we and then presenting the Section 5 present Section three work. results. describes applications, and 6 evaluation experiments and their discusses related

Revision Operations

We analyzed This a included set of articles 15 news to observe how on they were summarized 5 by human tors. telecommunications, articles on abstrac- issues, set 10 in articles and articles legal domain. Although individual medical the broad of articles in writing related to spe- cific domains, within they covered a range The topics and differed structure domain. telecommunications articles were style and even using Benton free daily news the same http://www.benton.org service Communications-Related were writers . The Headlines Benton. The of provided collected the newspapers Foundation 〉 abstracts medical these articles news from by the ious 〈 were from HIV/STD/TB written var- by staff Disease Prevention News at Update, provided articles collected Control (CDC) http://www.cdcnpin.org/news/prevnews.htm by the Cen- ter for CDC provides 〈 HIV The staff-written synopses from of 〉 .Asa public media service, on daily key scientific articles and lay reports on / AIDS. legal articles the New York Law Journal de- scribe From court decisions lawsuits we that have been summarized by the journal’s editors. in the corpus original studied, found producing that human abstractors text document for of almost universally reuse is the with Endres-Niggemeyer a summary (1998), which that document. This finding consistent often on et al. original stated that produce pro- fessional abstractors rely cutting and pasting the text to summaries. Based operations on careful analysis of human-written summaries, in can be used transform a sentence in we have defined vision to an article into six re- mary that a sum- sentence a human-written abstract: sentence reduction, or sentence combination, syntactic ordering. transformation, The lexical paraphrasing, generalization of operations specification, in and re- following sections examine each these turn. 1. Sentence reduction. from In sentence in reduction, nonessential phrases a the following example (italics in are removed mark sentence, material as source that is the sentence removed): 1 Document V-chip sentence: will When it arrives sometime next year in new TV [BAR] sets, the give parents a new and potentially revoluJing Decomposing Human-Written Summaries tionary device to block out programs they don’t want their children to see. Summary sentence: The V-chip will give parents a device to 2. 3. 4. 5. The deleted material Multiple can be at any granularity: from a word, a phrase, or a clause. components can be removed a single sentence. Sentence combination. is merged In into sentence combination, material This from with single sentence. operation illustrated in is a few sentences a typically used together which sentence reduction, as (italics in the following example, mark material also employs is paraphrasing italics in the source sentences mark material is that removed; the summary sentence that added): Syntactic transformation. of Syntactic In transformation involves changing the syntactic structure a sentence. both may sentence reduction involved. and In sentence combination, syntactic transformations was also be from the following example, the sentence in structure original changed the causative summary. clause The structure of the to the conjunctive structure were the causative clause and the subject of in the main subject operation. the clause combined during this Lexical paraphrasing. For In instance, lexical paraphrasing, in phrases in item are (2), replaced with their paraphrases. the with example more the summary sentences substituted . fit squarely into a picturesque description hits the nail on the head Generalization or specification. with more In generalization (specific) (specification), phrases are general descriptions, as in or clauses replaced the Computational Linguistics Volume 28, Number 4 following examples: 6. Reordering. with In reordering, original. the order For of instance, extracted sentences is changed respect may to the of the ending sentence of an article be placed at the beginning an abstract. Not quently. Note all revision operations multiple are listed operations here, because often some operations involved in order are used infre- that revision are to produce a single summary sentence. In written human-written from abstracts, The main some sentences we are not based on cut and paste but are was scratch. from criterion written from used to distinguish was whether a sentence cut pasted sentence scratch more that words and in a were of borrowed from than half original the a summary in which sentence composed was phrases the document, otherwise, case the it sentence was considered to have by paste; considered have been written been constructed from cut and to scratch. 2

All the examples in this section in were taken from the 30 articles we analyzed; the summary sentences are actual examples found human-written abstracts. Using a Hidden Markov Model for Decomposition

To answer the three borrowed questions from of the decomposition original problem is difficult. Because granularity, the phrases that are is the not easy. Determining document can be origin at any determining phrase boundaries may occur multiple in the of a in phrase is also difficult, since Moreover, the phrase multiple operations times may the document slightly different forms. The revision have been performed text. sentence therefore differ significantly from on the reused resulting summary source sentences problem. from which it can the document has been developed. All these factors complicate We propose the decomposition problem. a hidden The model Markov model (HMM) First, (Baum we 1972) solution to the decom- position problem problem; has three steps. as equivalent that is, for each word formulate identify in the decomposition an its This is a important, summary sentence, as source. step since only we a document position ter on can we likely apply the HMM observed to solve from the problem. Second, we af- this HMM transformation a set of build the general is heuristic rules in the text-reusing HMMs, we practice of humans. it is appropriate Although in this our unconventional applications Evaluations that use believe HMM is particular application. In show that this effective decomposition. step, dynamic programming unconven- tional Viterbi for (Viterbi 1967), the ment is last a technique, the algorithm word in used to find the most likely docu- position for each a summary sentence and the best decomposition for the sentence. Jing Decomposing Human-Written Summaries 3.1 We Formulating the Problem first input mathematically formulate the summary sentence is represented as a word decomposition be sequence: ( I 1 , problem. ... ,I N ) , An summary sentence word of can is word. The of where word in I I 1 the first the sentence and N the last position can represented by the sentence position and the word a a document within be uniquely word sentence: ( SNUM,WNUM Multiple ). For example, occurrences of (4, 8) position the in uniquely word in refers to the eighth the fourth sentence. of word (SNUM ,WNUM a ), the document ,(SNUM can be represented Using by a notation, set we positions: { 1 problem 1 ... m ,WNUM Given m ) } . word the above 1 , ,I formulate the decomposition and the positions { (SNUM 1 ,WNUM as follows: ) ... N in ) most 1 ), ... , a sequence ( I word ( SNUM M , WNUM M } for word. each the sequence, determine the likely document posi- tion Through for each this formulation, we transform origins into the difficult problem tasks of of identifying most phrase boundaries and determining word. phrase in the Figure 1, when finding a likely document position word for in each As shown we obtain a position chosen sequence, sequence of has been For for each ((0,21), (2,40), the (2,41), summary (0,31)) is our a when positions. example, of word in position sequence the first occur- rence ((0,26), (2,40), the same (2,41), (0,31)) the is document has been chosen is position we obtain sequence. Every for every summary word; another word in a summary word, time a different sition for a different position sequence. The po- chosen the the occurs sequence occurs 44 occurs times in the document, of (44 twice, 1 and of 22 times. This four-word communication occurs once, subcommittee a 1,936 × × 2 × 22) possible position sequences. 3 Morphological sequence therefore has optional. or total In our can be performed to associate morphologically improved related words, anal- ysis it is stemming but when experiments, applying included stemming many words system were morphological performance the of human-written original-document summaries words. Many that variants iments, however, of morphological human-written summaries of in words our exper- borrowed contained from original few cases transformation not improve perfor- and phrases mance documents, so stemming did the most Finding for these the most summaries. likely document position for each word is equivalent For to finding the in likely Figure position 1, most sequence among all possible position sequences. ((2,39), (2,40), the the position sequence should (2,41), ex- ample (2,42)); that is, likely is the word number 39 from be fragment comes word document number 42. sentence How 2 and we its position within the sentence however, 1,936 to can automatically find this sequence, among possible sequences?

It more is, of course, possible of words that a in summary sentence from has not original been constructed by cut and paste even if than half the the sentence are the document. The Hidden Markov Model

The word exact document surrounding position it. from Using which a word model, in a summary we comes depends probability on the of a word’s positions from the bigram position in assume that certain the document depends only the coming it in a Suppose on the sequence. I adjacent words word directly before the probability is . We I i and i + 1 are two before use PROB ( I i + 1 =( S 2 ,W 2 ) | I i =( S 1 ,W in a summary sentence and I i I i + 1 from number word number 1 )) to represent of the To when that I I i + 1 comes i comes from sentence we sentence must number S S 2 2 document 1 and word and W the number W 1 . decompose it; we a summary on sentence, operations consider how humans to here the discussed in are likely generate draw revision section 2. Two [BAR] Computational Linguistics Volume 28, Number 4 the (0,21) (0,26) (0,32) ... (2,39) ... (23,44) Figure The 1 sequences of positions in summary sentence decomposition. communication subcommittee of (0,31) (1,10) (2,30) (4,1) (2,42) ... (4,16) ... (23,43) general heuristic rules isolated can be words; safely assumed: First, humans more are more likely into second, humans are These likely to combine nearby to cut phrases than single, sentences a process. single sentence than those far apart. two rules guide us in the decomposition We translate ,W , where the heuristic , rules into the bigram probability words in PROB input ( I i + 1 =( S 2 ,W 2 ) I i = S 1 1 )) I I 1 represent two adjacent | ( sentence (abbreviated i i + as ( I i + 1 | I ) ). The the i values of summary henceforth PROB PROB ( I i + 1 | I i ) are assigned as follows: • • • • If ((S 1 = S 2 ) and (W 1 = W 2 − 1)) (i.e., is words in two the For then PROB I + 1 I i ) assigned the maximal adjacent positions ( | value P1. in document), Figure 1 will PROB i example, (( subcommittee maximal =( 2,41 ) | (Rule: communications Two =( 2,40 in assigned the value. adjacent words )) in be a summary are most likely to come from two adjacent words in the document.) If ((S 1 = S 2 ) and (W 1 P2. < W 2 − For 1)), then PROB ( I i + 1 | I i 4,16 ) is assigned the second-highest value example, PROB ( of =( ) | subcommittee = ( 4,1 )) will be assigned a high probability. from (Rule: Adjacent words in in a summary are highly likely to come order, the in same sentence This relative as the case of the document, retaining their sentence reduction. rule words.) can be further refined by adding restrictions on distance between If ((S 1 = S 2 ) and (W P3. 1 > For W 2 )), then PROB ( I i + 1 | I i ) 2,30 is assigned the third-highest value example, PROB ( of =( ) | subcommittee = ( 2,41 )) . (Rule: in Adjacent words in a summary can come order. from the change their relative For the same sentence moved document from but of front, example, can end the sentence to the in a subject be the as syntactic transformation.) If ( S 2 − CONST < S P4. 1 < For S 2 ) , then PROB ( I i + 1 | I i ) is assigned 3,5 the fourth-highest value example, PROB ( of =( ) | subcommittee = Jing Decomposing Human-Written Summaries ( 2,41 )) . (Rule: in Adjacent words in a summary can come from sentences in document and retain their is relative order, nearby the such as sentence 5.) combination. CONST a small constant such as 3 or If ( S 2 < S 1 < S 2 + CONST P5. For ) , then PROB ( I i + 1 | I i ) is assigned 1,10 the fifth-highest value example, PROB ( of =( ) | subcommittee = ( 2,41 )) . (Rule: in Adjacent words in a summary can come from sentences but their relative orders.) nearby the document reverse If ( | S 2 − S P6. For 1 |> = CONST ) , then PROB 23,43 ( I i + 1 | I i ) is assigned the value words in PROB ( of =( not ) | subcommittee =( 2,41 smallest example, from )) . (Rule: Adjacent a summary are very likely to come sentences far apart.) • • Figure probabilities. 2 shows The a graphical nodes in representation of the above rules for assigning output figure represent possible positions in bi- gram the probability of moving from one node the These probabilities edges the most to another. doc- ument, and the next bigram are used P1–P6 to is find the likely imal is values 1 to others experimental. In our position sequence in experiments, the max- the step. Assigning 0.9, 0.8, value assigned on. These and however, are usually assigned evenly decreasing values: and so We values, approximate optimal can be experimentally of P1–P6 adjusted for dif- ferent corpora. P1–P6 decide the values by testing dif- ferent values for and choosing the values that give the best performance in the tests. Figure Each 2 word is considered in a very abstract representation of in our is the figure represents a state the HMM. HMM for For decompo- sition. S,W position state, ( S,W 1 ) is another Note that ( S,W ) and ( S,W example, ( ) a and + in state. S,W + on 1 ) are relative values; the S and W the state ( ) have different values based the (S−i,W+j) Sentence (S−CONST) Sentence S Sentence (S+CONST) i>=CONST Figure 2 Assigning transition probabilities in the HMM. Computational Linguistics Volume 28, Number 4 particular word position however, into under consideration. model. This S,W relative model can be easily trans- formed, word in an absolute ( ) can probabilities be replaced by every possible position the document; in transition way in between Figure 2. In every possible 3.6, we pair of positions can be assigned model the same of be transformed into as section the absolute model describe how the abstract our HMM. can and give a formal description 3.3 To The Viterbi mizes find the most Algorithm probability likely sequence, , we ,I must . Using find a sequence of approximated 1 ... N ) the bigram model, positions that maxi- the PROB ( I this probability can be as N − 1 PROB ( I 1 , ... ,I N )= ∏ PROB ( I i + 1 | I i ) . i = 0 Because PROB information ( I needed i + 1 | I i ) has been assigned problem. as indicated We earlier, we Viterbi therefore (Viterbi have all the 1967) occurs to the in most to solve the likely sequence. Viterbi For an N-word use the algorithm find times document, the algorithm is sequence, supposing each guaranteed to find the most word M the likely sequence using k × N × M 2 steps for some constant k, compared to M N for the brute ization We force search algorithm. have slightly revised in chance In is the Viterbi algorithm for our application. In equal assumed iteration for each we possible document measures position of the initial- word step, the first when the sequence. We mark word the word nonexistent not step, in in take special a does appear the original document (i.e., to handle the case summary has an empty position list). if it the not as in the document and continue the computation as did appear the sequence.

Given an N-word number sequence of ( I 1 , ,I N ) , supposing I i occurs F F i times in the F . document, for i = ... is F 1 ··· N, then the total possible position sequences 1 × 2 ×···× N Postediting

After the phrases are Viterbi identified, the program postedits word to because algorithm assigns each in cancel the input mismatchings that arise in the word once. For instance, sequence in to a position of the document, as long as the sentence given in appears at least the example combination section 2, the summary reduced the conjunction and. sentence two by adding The word combined inserted document sentences by writer, the Viterbi algorithm assigned it and was the it human occurred in but original The of to a document position, since mismatchings. the document. goal the postedit step is to annul The such postedit step deals inserted with two words types in of mismatchings: wrong assignment of for stop a summary sentence and wrong of document positions of mismatching, for isolated if content words in as- signment document positions a summary sentence. correct words first type any document sentence contributes only To the summary, matching is words more stop for the is been inserted the canceled, since humans coming from the stop by rather than the original are likely to have if example provides only To non–stop word, type we of document. case just discussed. correct the second mismatching, This the for the matching, a document sentence a single since cut single words from the original also cancel such humans rarely text to generate a summary sentence. Jing Decomposing Human-Written Summaries

An Example

To demonstrate input the program, we now present is an example from in beginning Figure 3: to end. The following sample summary sentence also shown Arthur B. Sackler, vice president for law and public policy of Time Warner Inc. and a member of the Direct Marketing Association, told the communications subcommittee of the Senate Commerce Committee that legislation to protect children’s privacy online could destroy the spontaneous nature that makes the Internet unique. We first indexed Stemming the document, was not listing in for each word its Upon possible positions in the document. 4 word with its used this example. we obtained augmenting each program: positions, the following input summary Viterbi possible document for the : : : arthur b sackler 1,0 1,1 1,2 2,34 ... 15,6 ... internet the unique : : : This 48-word probabilities sentence has a total in of 5 bigram most as assigned section word was ran the we sequence. After in every assigned a most algorithm to find the likely position marked words likely from document position, the phrases the sentence by conjoining adjacent document in Figure positions. 3 shows the final ( result for the sample input summary number are tagged FNUM:SNUM actual-text ), where sentence. FNUM is The phrases the summary of originates. and SNUM 1 is means the number of the the document is not sentence from in sequential the phrase which the phrase SNUM = − inal The borrowed that the ( phrase derived ) in the orig- document. phrases are tagged FNUM actual-text the document sentences. In this example, the program original correctly It identified concluded that the summary sentence was constructed were by reusing into the text. it the four document sentences that combined sentence into the summary sentence; also correctly pinpointing the document origin of divided each. In the summary phrases, the that were borrowed from exact document from words this example, phrases Certain borrowed were the ranged single to long clauses. program phrases also syntactically transformed; despite these, the The successfully outputs decomposed the sentence. decomposition such as shown in Figure 3 were then used for in corpora 3 was for included sentence in reduction and sentence combination. The building the training Figure output shown was the merging corpus for sentence combination, sentence by document sentences. If since the summary included was constructed by removing phrases from a summary sentence it was constructed in a single document sentence, then the training corpus for sentence reduction. 0,27 0,21 0,28 1,39 0,26 ... ... 23,44 18,16 . 08 3.2, × 10 27 we possible Viterbi position sequences. Using the Computational Linguistics Volume 28, Number 4 A material sample output was of the from summary original sentence decomposition program in ( cut document and reused the summary, boldface text indicates that indicates the material was writer). and italic text in the summary sentence that added by the human 3.6 We Formal illustrate Description of Our Hidden first absolute model Markov Model in Figure how 2. an For simplicity, can be created only from the relative model represented original suppose 2, we and each sentence model has two words. there are in Figure From two sentences 4. the relative model in the Figure document in In can build model, an absolute as shown the absolute word there Each are four states, only (1,1), one (1,2), observation (2,1), and (2,2), (i.e., each out- cor- responding to a position. state has symbol Figure Example 4 of the absolute hidden Markov model. Jing Decomposing Human-Written Summaries put): the word model. The in that position. Each state is probabilities, interconnected which with the other probabilities states of in the in from state Figure 2. In one transition represent the tran- sitioning state to another however, state, of we can need be assigned normalize following the case, to probabilities is of rules shown this the values one, { P which 1 ,P is 2 , ... ,P 6 } so that for each state the HMM. sum This the transition normalization is needed in in order a basic requirement our for model an model, in it is theory not needed in to conform relative process, to a formal it not but practice is does affect the final result. The initial the decom- position is, since in initial state, labeled as Φ in Figure 4, state distribution uniform; that the model. has an equal chance to reach We any state the original give a formal description we of our HMM in 2. In document, can build an absolute model for decomposition based on as follows. For each Figure model, word the relative model word the absolute each state The corresponds observation to a position, in observation state. probabilities symbol set includes and each words position corresponds to a all the the 1, document, if word and is the in , symbol are i P i )= W i i and ( i | i )= 0, if defined ( position P P W P word as P W not in | Figure i . The transition probabilities P ( P j | P i ) are defined as we W i is position 2, with P word other word described every every position, and state initial in probabilities position uniform we mentioned. linked to are as many This Markov model meaning is hidden many because one symbol sequence can correspond word to state sequences, in Figure that 1. Generally, position sequences Markov can correspond model, one to a sequence, as shown a can also corrrespond to many in hidden Our HMM not state sequence symbol sequences. does have this attribute.

The original document contained 25 sentences and 727 words in total. Evaluations

Three experiments we were performed to evaluate in the decomposition module. In the measured a task called summary alignment. This first experiment, evaluated decomposition with how successfully the decomposition program can align sentences summary that semantically equivalent. In in the we document sentences whether are the Compared judge was the more decomposition results were second experiment, asked humans to cor- rect. to of the first experiment, The this a direct evaluation, using larger experiment evaluated the portability of a program. collection documents. third the Davis The corpus which used in the first experiment consisted of 10 products documents from on corpus, from contains articles related to computer and is the Ziff- TIPSTER Linguistic Data Consortium (LDC) (Harman Liberman available 1993). The discs corpus used in second experiment consisted of 50 and issues. the The in documents related corpus third experiment consisted of to telecommunications on provided used Westlaw the Group. legal documents court cases, by the 4.1 The Summary were of Alignment goal the summary alignment task was to find sentences We in the document that of 10 semantically equivalent to Marcu the summary (1999). Marcu sentences. used a small 10 collection documents, with gathered by from presented These summaries the Ziff-Davis these documents together their human-written were instructed corpus from to 14 original human judges. human were judges to extract sentences the document that majority semantically of equivalent were to the summary sentences. Sentences selected by the of human judges This collected to build was an extract (i.e., extraction-based in summary) our the Note document. resulting will extract used as the gold standard evaluation. that this evaluation be biased against the Computational Linguistics Volume 28, Number 4 Table 1 Evaluation of decomposition program using the Ziff-Davis corpus. [BAR] Doc. No. Precision Recall F-Measure [BAR] ZF109-601-903 ZF109-685-555 0.67 0.75 0.67 ZF109-631-813 1 1 ZF109-712-593 0.86 1 ZF109-645-951 1 0.55 ZF109-714-915 0.56 1 ZF109-662-269 0.79 0.64 ZF109-715-629 0.67 0.79 ZF109-666-869 0.86 0.67 ZF109-754-223 1 0.55 1 [BAR] Average decomposition model, as Marcu’s semantic equivalence is a broader concept than our cut-and-paste Decomposition equivalence. in provides Figure a 3. list We of source document sentences for each summary sentence, as shown can build an identified automatic extract for the document by selecting We all the source document sentences with by the decomposition The pro- gram. compared this automatic 81.5% extract 78.5% the gold-standard 79.1% extract. F-measure pro- gram achieved By an average precision, recall, of 14 and was 88.8% for 10 documents. in 84.4% comparison, 1. Precision, and 85.7% the average F-measure. performance precision, Table Detailed human judges recall, F-measure results for each document are shown recall, and are computed as follows: Precision # of sentences in the automatic extract and in the gold-standard extract = [BAR] total # of sentences in the automatic extract Recall # of sentences in the automatic extract and in the gold-standard extract = [BAR] total # of sentences in the gold-standard extract

F-measure 2 × Recall × Precision = [BAR] Recall + Precision Further program analysis indicates two types of errors made by the program. with The first is that wordings. the For failed to it find semantically not equivalent sentences very different example, did find the correspondence between the summary sentence Running Higgins is much easier than installing it and the document sentence The is not program is very “error,” easy to use, although program the installation is not procedure is somewhat complex. This For really an since the program needs designed only indicate to find such paraphrases. decomposition is not produced purposes, the from to program and pasting text the original that the summary sentence indicated by cutting The problem is this by returning program no may matching document. The correctly second it the words identify sentence. This relevant if that a nonrelevant document sentence as occurs when contains some common is not to the summary sentence. typically from a summary sentence words with constructed by cutting and pasting For text the document but program shares mistakenly certain document sentences. example, the decomposition linked the summary sentence The program is very easy to use, although the installation procedure is somewhat complex with Jing Decomposing Human-Written Summaries the document sentence All you need to decide during the easy installation is where you want to put the Higgins users, files and associated directories; number of this words must in be a directory available including to all e- mail to, because . they Our had a common, the, is, easy, and installation we postediting steps completely. are designed to cancel such false matchings, although It is cannot worth noting remove them in that the is extract not based on For human judgments, considered the gold standard this evaluation, information (i.e., perfect. example, two document sentences may express the of may same they information important are semantic paraphrases), jects one to be in and all human consider enough other; the summary, sub- this but half will the subjects included selected in sentence and half selected the thus, both be in extract although they are semantic paraphrases. Precisely sentences the of ZF109-601-903. The this happened the extract document document sentence Thisgroup productivity package includes e-mail, group scheduling and alerting, keyword cross-reference fil- ing, to-do lists, and expense reporting and the document sentence At $ 695 for 8 users, this integrated software package combines LAN-based e-mail with a variety of personal information management functions, including group scheduling, were personal included calendars, in to-do lists, expense re- ports, and a cross-referenced key-word information. database The program both only the extract, although they contain very similar was picked in up the second doc- ument mistake sentence, in yet this correct decision penalized the evaluation because of the The program the won gold standard. perfect scores for were 3 out of 10 produced documents. We checked the three summaries other and found that with their texts written largely from by cut and paste, com- pared This indicates to summaries when only sentences completely is scratch by humans. well. that the decomposition task considered, the algorithm performs very 4.2 Since Human Judgments of Decomposition not Results the first experiment we did directly assess the program’s performance for the decomposition task, conducted First, we another experiment 50 to from evaluate the correctness of the decomposition results. program. selected summaries a telecommunications corpus and ran the decomposition A human subject was was asked to judge A considered when whether the decomposition all three questions posed in results were correct. problem result were correct in 1, the decom- position program needs correctly answered. As stated section (1) Is the decomposition to answer the following from three original questions: (2) a If summary constructed in reusing from the text original the document? so, what sentence by (3) From where in phrases the sentence come the Eighteen the phrases come? The 50 document? and summaries contained a total of 305 the doc- ument do Most errors (6.2%) occurred were wrongly sentences. sentences when decomposed, was not for an accuracy rate of 93.8%. many a overlapping summary sentence words with constructed by in cutting and pasting ment. The but contained was much certain sentences the docu- accuracy rate here important higher is than the we precision not and recall results the factor did require program in first experiment. An here that wordings. sentence(s) if the to find semantically equivalent document a summary sentence used very different

0.785 0.86 1 0.67 1 0.6 0.79 0.67 0.67 1 Portability

In the third in and final evaluation of with decomposition, Westlaw we Group, tested which the program provides on legal documents with a joint experiment case Such the court documents which start with “synopsis” of of lawyers documents. a the by followed by “headnotes,” points law also written case, writ- ten attorneys, are by Computational Linguistics Volume 28, Number 4 A sample from output of original legal document decomposition in ( summary, boldface text indicates material cut and reused the and italic in that was sentence indicates the material document was added by the human writer). text the summary that attorneys and summarized from the discussions. The last part is the discussion, called “opinion.” opinion. The task When here was to match each headnote entry with the corresponding important legal case document, they can see not only text in the of lawyers law, study where a in opinion. We the our points but program also these points We are discussed not our the applied decomposition to Figure 3, in is this task. in Figure did ( 5. adjust Similar HMM A decomposition result shown to the notation param- eters. used where in sample is the phrases where number the headnote of are tagged FNUM:SNUM the phrase and SNUM is the number actual-text of ), FNUM the sequential the The from. SNUM = − from phrase comes borrowed 1 means the document not sentence original that Note in we ignored phrases are tagged ( the phrase did come ) in opinion. the document. the this example, the difference of FNUM actual-text of (“a,” writ” etc.) in that was the phrases, so the originate summary from phrase “a motion the determiners “the,” for is- suance “the identical. motion a peremptory issuance of considered to the the peremptory writ,” the document phrase not for although the two phrases were We of received The 11 headnotes program from Westlaw and examined the decomposition origins the correct source sentences and identified results for all them. correct of found the In summary, we the phrases for every headnote. ter, performed no or news, three experiments minimal in in three different domains—compu- telecommunications and legal—and each HMM. case achieved This good results, with proposed change parameter approach adjustment is to The the demonstrates that may our decomposition we portable. HMM reason indeed for this portability be that the heuristic rules that used to build from the are general and remain true for different humans and for articles different domains.

Applications of Decomposition Results

5.1 We Providing Training and Testing Corpora in for the our Summarization have used decomposition results The development of a text mimics generation system for operations domain-independent in summarization. generation system two revision The presented program section is 2: sentence reduction and sentence combina- tion. decomposition used to build corpora for training and evaluating Jing Decomposing Human-Written Summaries the sentence in reduction Figure 3. and Details combination of modules. The corpora contained in examples Jing as shown the summarization system can be found (2001). 5.2 We Corpus Analysis performed 300 a corpus analysis using the of decomposition news program. Foundation. The number articles of on We automatically analyzed human-written Benton summaries telecommunications, from sentences in pro- vided by to the of 1,642 each summary 315 contained a total summary sentences. The ranged indicated 2 21; the corpus sentences (19%) not have matching results that They were summary written from did sentences in the document: from original scratch Of by humans rather than 686 by (42%) cutting matched and pasting phrases in the text. sentence These the summary single document. with operations sentences were sentences, a the other constructed by sentence re- duction, sometimes together In 592 such (36%) as in 49 (3%) sentences matched matched lexical paraphrasing more two or and syntactic transformation. addition, three sen- tences the document These and sentences were than three sentences often in the document. with other operations, sentences constructed by sentence combination, were sentence reduction, since the sentences were to- gether especially usually reduced (81%) of before they combined. These produced results suggested were that summary by humans based on a significant portion original Sentence sentences was in 42% of cutting and pasting Sentence the text. was reduction in 39% of applied at least the cases. combination applied the cases.

Improving User Interfaces

The decomposition in result can we be used in applications with other Westlaw than (see summarization. 4.3), For example, the experiment performed original jointly section improve we found interfaces, that linking summaries and browse documents can potentially easily find relations between portions of user helping users to and the text.

Related Work

Researchers mostly have previously manual tried to (Edmundson align summary 1969; sentences Kupiec, Pedersen, with sentences Chen in a document, 1995; Teufel Moens by 1997). effort Given of manual process, and and of the cost Decomposition this annotation provides means only small collections text have automatically, been annotated. a of performing this alignment building large corpora for summarization research. Marcu (1999) presented in an approach for It aligning summary information sentences with seman- tically approach, equivalent sentences with a document. processing. adopted an coupled Although our retrieval based to with discourse the original documents, major decomposition also aims link approaches. summaries While Marcu’s operates differences or exist between two the clause level, our the word to program with algorithm at sentence decomposition deals phrases at various granularities Furthermore, (anything a to complete sentence). the approaches from a complicated phrase a used by the two systems are distinct. to Marcu’s which approach first breaks sentences into clauses, then uses rhetorical and employs IR-based structure decide similarity measure to clauses decide which should be considered, finally an in Our HMM clauses in the doc- ument HMM, are similar to those human-written programming abstracts. solution optimal first a technique to find the answer. builds the Marcu then uses dynamic F-measure, reported respectively, a performance when of 77.45%, 80.06%, was and 78.15% for precision, recall, in and the system evaluated at the sentence level the Computational Linguistics Volume 28, Number 4 summary alignment task described in our section 3.1. When tested 81.5% on the same set 79.1% F-measure, same task, system averaged precision, 78.5% of test documents and for the in Table 1. recall, and We as shown transformed the decomposition word problem in into summary, a problem which is, of in finding the most likely document position problem of for each the (Brown, Lai, some Mercer sense, to parallel bilingual corpora and 1991; simi- lar Gale the and Church aligning in 1991). Whereas Brown, Lai, Brown, we and Mercer and Mercer (1991) phrases in Gale and Church a bilingual corpus, aligned a HMM summary in with aligned sentences in parallel Lai, phrases a document. and Their model also our used model, an for corpora alignment. however, their solution Their bilingual model and whereas ours word differ greatly: used sentence length as a feature, used a they used an aligned training corpus to compute transition probabilities, position as whereas feature; we did not use any annotated training data.

Conclusions

We defined Markov the problem model of decomposing problem. human-written The summaries and proposed matically to the decomposition program a hidden original whether solution it a summary sentence is can auto- determine constructed by reusing can recognize the reused phrases in text from the document; accurately sentence different The granularities; it a summary despite their is can also pinpoint the exact need for a phrase. and straightforward. It origin the algorithm fast does not in document processing other tools such The as a tagger or parser as preprocessors. program It does not have well complex steps. evaluations show that the performs very for the decomposition task. Acknowledgments The work material in this article National is based Foundation by Grant the Science upon supported IRI under opinions, No. IRI 96-19124 and 96-18797. or Any findings, and conclusions in material recommendations expressed this not necessarily are those of the author of and do National Science Foundation. reflect the views the References Baum, Leonard maximization E. 1972. An inequality in and associated of probabilistic technique statistical of estimation functions Inequalities , 3:1–8. a Markov process. Brown, L. Mercer. Peter 1991. F., Jennifer C. Lai, and Robert Aligning In Proceedings sentences in parallel Meeting corpora. of the 29th Annual Computational Linguistics of the Association , for Edmundson, Berkeley, June. pages 169–176, H. P. 1969. New automatic Journal methods in abstracting. of the ACM , Endres-Niggemeyer, 16(2):264–285. Jens Müller, Simone Brigitte, Peist, Irene Kai Haseloh, Sigel, Sigel, Elisabeth Santini de Wansorra, Alexander Wollny. 1998. Jan Summarizing Wheeler, and Information Br¨unja Springer, Berlin. . Gale, 1991. William program A. and Kenneth W. Church. A In for Proceedings aligning sentences in parallel Meeting corpora. of the 29th Annual Computational Linguistics of the Association , for Berkeley, June. pages 177–184, Harman, TIPSTER Donna Complete and . Mark Linguistic Liberman. Data 1993. Jing, Consortium, Hongyan. University Cut-and-Paste of Pennsylvania. Summarization 2001. . Ph.D. Department Text of Computer Science, Columbia thesis, Kupiec, University, Julian, New Jan Pedersen, York. Chen. 1995. and Francine summarizer. A In trainable Proceedings document International Conference Research of the 18th Development Information on Retrieval and 68–73, in Seattle. , pages Jing Marcu, Daniel. 1999. of The automatic construction large-scale corpora In for summarization research. International Proceedings Conference of the 22nd Development Information on Research and 137–144, and University of Retrieval California, , pages Berkeley, Teufel, Simone August. Sentence and Mark Moens. 1997. extraction as a classification Decomposing Human-Written Summaries task. In Proceedings Workshop of Intelligent the ACL/EACL’97 Summarization on , Scalable 58–65, Text Viterbi, J. pages 1967. Error Madrid. Andrew bounds for

convolution optimal codes and an asymptotically decoding Transactions algorithm. Information IEEE 13:260–269. on Theory ,