Cross-Framework and Cross-Domain Parser Evaluation
Shared Task

Data release 3

Updates:

April 20, 2008:
- Added gold-standard Stanford dependencies to Set 1.
- Added automatically generated Stanford dependencies to Set 2.

April 8, 2008:
- Added PARC dependencies to Set 1.
- Revised Set 1 GRs

http://www-tsujii.is.s.u-tokyo.ac.jp/pe08-st/

WSJ data sets (release 3)

This distribution contains the two data sets based
on Wall Street Journal sentences.  The first is the
required set (10 sentences).  The second set is 
optional (15 sentences).
-----------------------------

Set 1 (required)
10 WSJ sentences 
This set contains 10 sentences from the Wall Street Journal portion of the Penn Treebank. The following representation formats are provided (thanks to the owners/providers of the data, shown in parenthesis): 
- Penn Treebank (PTB): phrase structure trees. (LDC).
- CoNLL-2008 shared task (CoNLL08): labeled syntactic dependencies   extracted from the PTB annotations, and predicate-argument dependencies   extracted from PropBank and NomBank. (LDC).
- RASP Grammatical Relations (GR): the Grammatical Relation scheme   proposed by Briscoe, Carroll and colleagues for parser evaluation.  (Ted Briscoe and Yusuke Miyao).
- UTokyo HPSG Treebank Predicate-Argument structures (HPSG-PA):   predicate-argument dependencies extracted from the University of Tokyo   HPSG Treebank.  (Yusuke Miyao and TsujiiLab at the University of Tokyo).
- CCGBank Predicate-Argument structures (CCG-PA): predicate-argument   dependencies extracted from the CCGBank. (LDC).
- PARC Dependency structures (PARC): Dependencies in the scheme used by King et al. in the PARC 700 Dependency Bank.  (Tracy Holloway King and PARC).
- Stanford Dependencies (Stanford): Dependencies in the scheme designed by de Marneffe et al. for representation of typed dependencies from PTB structures.  (Marie-Catherine de Marneffe).

-----------------------------

Set 2 (optional)
15 WSJ sentences

This set contains an additional 15 sentences from the Wall Street Journal portion of the Penn Treebank. 

Annotation is provided in the same formats as above, except for PARC (and Stanford dependencies were generated automatically from PTB and may contain errors).

-----------------------------

Note regarding the PARC annotation:

For more information on the PARC dependency representation,
including the meaning of the features and labels used in the
annotation, please see the documentation for the PARC700
corpus at:
http://www2.parc.com/isl/groups/nltt/fsbank/default.html

The files in this distribution contain sentences that are
not in the PARC700 corpus.  They are more likely to contain
annotation errors than the PARC700 corpus, since they were
not doubly annotated.