CSE 454 Project Specification: WikiTruthiness

Group members

Our problem and our goal

We would like to look at how contentious parts of Wikipedia articles are. Wikipedia allows users the ability to get the full history of any article on the site, and we are interested in seeing what paragraphs (or possibly sentences) have seen the most reversions, edit wars, or other indicators that a certain piece of information is contentious.

Components of our project

Here is a list of all the components that we are going to need to implement. They are in approximate dependency order: unless otherwise listed, a task depends upon previously listed tasks for full functioning, although it may be able to be developed to some extent before previous parts have finished.

Expected schedule

Here is what we expect to have at each of the main milestones:

Milestone 1: 29 October

Milestone 2: 17 November

Code Complete: 3 December

Presentation: 14 December

Use of machine learning

We will need many examples of pages that are common targets of edit wars. Most of these pages will likely be easy to find; Wikipedia keeps manual lists of articles over which arbitration is taking place, and also has a list of lamest edit wars which we can enter manually.

Measurement of success

As one measurement, we will keep aside a selection of our machine learning-related data as a validation set that we can use.

In addition, we would like to ensure that the application is sufficiently easy to use, based on interviews with users. In particular, there should be a few users surveyed who are not from technical majors (engineering, math, hard sciences, and the like).

We would also like to ensure that the performance of our application is adequate. Requests should be handled within the normal timeframe of a web request (ten seconds in the absolute worst case). The application should make use of optimistic prefetching where appropriate.