Steam-powered Turing Machine University of Washington Department of Computer Science & Engineering
 CSE454 Event Extraction Project
  CSE Home   About Us    Search    Contact Info 

Feel free to adapt this project in any of a myriad of ways. I'll describe it one way, aiming for concreteness, but don't consider it written in stone.

The primary deliverable would be a Web service that as input takes natural language (English) text, eg a news article and outputs a set of events that are described in the story. For example, when input a sentence like

The assault on the Paris offices of Charlie Hebdo, a French newspaper that has repeatedly satirized religion, was one of the deadliest in a history of violent responses and threats against the news media over the mockery of Islam.
the system might output:
*unknown* ATTACK "Paris offices of Charlie Hebdo"
a baseline event extraction system can be built on top of the href="http://reverb.cs.washington.edu/">ReVerb open information extraction system, which is an easy to use, fast and (reasonably) robust system developed here at UW. Reverb extracts triples from text, but doesn't normalize "relation phrases" which denote events in our example. ALso it extracts some triples that don't corespond to events at all, e.g. from a nutrition page it might extract ORANGES CONTAIN VITAMIN-C.

So the projects first task would be to build a classifier that determines if a relation phrase corresponds to an event and if so, which event. We'll provide a set of 40 events from the DARPA "Event Nugget" competition. I'd suggest that you start with a subset of these and that you first write a classifier that uses human generated, hand coded rules to do this classification. In parallel (or afterwards), you could use machine learning (which we'll explain in this course) to train a classifier from labeled training data. The ML code can be an easy download, e.g. from Weka or if you want you can build your own.

We will provide some training data, you can also (if you want) use crowdsourcing to create training data. In the next week, we'll be showing how to do this in class.

Taxonomy of Events

  1. LIFE
    1. BE-BORN
    2. MARRY
    3. DIVORCE
    4. INJURE
    5. DIE
  2. MOVEMENT
    1. TRANSPORT
  3. TRANSACTION
    1. TRANSFER-OWNERSHIP
    2. TRANSFER-MONEY
  4. BUSINESS
    1. START-ORG
    2. MERGE-ORG
    3. DECLARE-BANKRUPTCY
    4. END-ORG
  5. CONFLICT
    1. ATTACK
    2. DEMONSTRATE
  6. CONTACT
    1. MEET
    2. PHONE-WRITE
  7. PERSONELL
    1. START-POSITION
    2. END-POSITION
    3. NOMINATE
    4. ELECT
  8. JUSTICE
    1. ARREST-JAIL
    2. RELEASE-PAROLE
    3. TRIAL-HEARING
    4. CHARGE-INDICT
    5. SUE
    6. CONVICT
    7. SENTENCE
    8. FINE
    9. EXECUTE
    10. EXTRADITE
    11. ACQUIT
    12. APPEAL
    13. PARDON

Data

Enhancements

Time should permit you to extend the baseline system in one or more ways as your interest directs. Some ideas include:


CSE logo Department of Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX