CSE 517: Natural Language Processing

University of Washington

Spring 2018

The syllabus is subject to change; always get the latest version from the class website.
Lectures:EE 045, Wednesdays and Fridays 1:00–2:20 pm
Instructor:Noah A. Smith (nasmith@cs.washington.edu)
Instructor office hours:CSE 532, Fridays 12:00–1:00 pm or by appointment
Teaching assistants:Dianqi Li (dianqili@uw.edu)
Kelvin Luu (kellu@cs.washington.edu)
TA office hours:CSE 220, Mondays 10:00-11:00 am (Dianqi)
CSE 220, Wednesdays 9:00-10:00 am (Kelvin)

Natural language processing (NLP) seeks to endow computers with the ability to intelligently process human language. NLP components are used in conversational agents and other systems that engage in dialogue with humans, automatic translation between human languages, automatic answering of questions using large text collections, the extraction of structured information from text, tools that help human authors, and many, many more. This course will teach you the fundamental ideas used in key NLP components. It is organized into four parts:

Probabilistic language models, which define probability distributions over text passages.
Text classifiers, which infer attributes of a piece of text by “reading” it.
Analyzers, which map texts into linguistic representations that in turn enable various kinds of understanding.
Generators, which produce natural language as output.

1 Course Plan

Table 1 shows the planned lectures, along with readings. Slide links will start working once the slides are posted for a given lecture (usually shortly after the lecture). The textbook will be Eisenstein [1].

3/28 introduction [2]; [1] section A

3/30 generative
[1] section 5.1–2, [345]
4/4 probabilistic featurized [3], [6] section 2, 7.4
4/6 language neural (continued) [1] section 5.3–5.6; [7] section 0–4, 10–13
4/13 models cotext: topic models [1] section 13; [8] section 1–4
4/18 cotext and bitext [9]

4/20–25 text classifiers methods & applications V*L [1] section 1–3; [1011]

4/25 methods for sequences
[1] section 6; [12]
4/27 parts of speech [1] section 7.1; [13]
5/2 linguistic supersenses, entities, chunking [1] section 7.3; [14]
5/2–4representationsgraphical models
5/9 and phrase-structure trees [1] section 8–9; [16]
5/11 analyzers syntactic dependencies [1] section 10
5/16 semantic roles and relations [1] section 12; [17]
5/18 logical forms [1] section 11; [18]

text generators
translation, summarization V*V*[1] section 17–18; [19]

Table 1: Course structure and lecture topics. Blue links are to lecture slides, and green links are to references. In this notation, V is a vocabulary of discrete symbols—most commonly words—in a (natural) language. V* is a sequence of symbols of an arbitrary length. We use L to denote a smaller vocabulary of labels, Y to denote a constrained set of discrete structures (e.g., trees or directed graphs), and V to denote a vocabulary possibly in another natural language.

2 Evaluation

Students will be evaluated as follows:

3 Academic Integrity

Please read, print, sign, and return the academic integrity form.


[1]    Jacob Eisenstein. Natural Language Processing. 2018. URL https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf.

[2]    Julia Hirschberg and Christopher D. Manning. Advances in natural language processing. Science, 349(6245):261–266, 2015. URL https://www.sciencemag.org/content/349/6245/261.full.

[3]    Noah A. Smith. Probabilistic language models 1.0, 2017. URL http://homes.cs.washington.edu/~nasmith/papers/plm.17.pdf.

[4]    Michael Collins. Course notes for COMS w4705: Language modeling, 2011. URL http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/lm.pdf.

[5]    Daniel Jurafsky and James H. Martin. N-grams (draft chapter), 2015. URL https://web.stanford.edu/~jurafsky/slp3/4.pdf.

[6]    Michael Collins. Log-linear models, MEMMs, and CRFs, 2011. URL http://www.cs.columbia.edu/~mcollins/crf.pdf.

[7]    Yoav Goldberg. A primer on neural network models for natural language processing, 2015. URL http://u.cs.biu.ac.il/~yogo/nnlp.pdf.

[8]    Peter D. Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1):141–188, 2010. URL https://www.jair.org/media/2934/live-2934-4846-jair.pdf.

[9]    Michael Collins. Statistical machine translation: IBM models 1 and 2, 2011. URL http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf.

[10]    Daniel Jurafsky and James H. Martin. Classification: Naive Bayes, logistic regression, sentiment (draft chapter), 2015. URL https://web.stanford.edu/~jurafsky/slp3/7.pdf.

[11]    Michael Collins. The naive Bayes model, maximum-likelihood estimation, and the EM algorithm, 2011. URL http://www.cs.columbia.edu/~mcollins/em.pdf.

[12]    Michael Collins. Tagging with hidden Markov models, 2011. URL http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/hmms.pdf.

[13]    Daniel Jurafsky and James H. Martin. Part-of-speech tagging (draft chapter), 2015. URL https://web.stanford.edu/~jurafsky/slp3/9.pdf.

[14]    Daniel Jurafsky and James H. Martin. Information extraction (draft chapter), 2015. URL https://web.stanford.edu/~jurafsky/slp3/21.pdf.

[15]    Daphne Koller, Nir Friedman, Lise Getoor, and Ben Taskar. Graphical models in a nutshell, 2007. URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

[16]    Michael Collins. Probabilistic context-free grammars, 2011. URL http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf.

[17]    Daniel Jurafsky and James H. Martin. Semantic role labeling (draft chapter), 2015. URL https://web.stanford.edu/~jurafsky/slp3/22.pdf.

[18]    Mark Steedman. A very short introduction to CCG, 1996. URL http://www.inf.ed.ac.uk/teaching/courses/nlg/readings/ccgintro.pdf.

[19]    Michael Collins. Phrase-based translation models, 2013. URL http://www.cs.columbia.edu/~mcollins/pb.pdf.