CSE 599 D1: Advanced Natural Language Processing

University of Washington

Spring 2016



The syllabus is subject to change; always get the latest version from the class website.
Website:http://courses.cs.washington.edu/courses/cse599d1/16sp
Meetings:Mary Gates 271, Mondays and Wednesdays 1:30–2:50 pm
Instructor:Noah A. Smith (nasmith@cs.washington.edu)
Instructor office hours:CSE 532, by appointment
Teaching assistant:Jesse Dodge (dodgejesse@gmail.com)
TA office hours:TBD or by appointment



3/28introduction/reading practice [Taddy2015]
3/30presentation practice Gagan/Mandar/Ryan: [Neelakantan et al.2015]
Elizabeth/George/Julian: [Ng and Jordan2002]
Huan/Lucy/Maarten: [Zhang et al.2015]
Hannah/Minjoon: [Pham et al.2015]
Antoine/Hao/Kenton: [Li et al.2015]
Li/Nick/Ning/Mark: [Le and Mikolov2014]
Akshay/Colin/Conrad: [Taddy2012]






4/4information extraction Mark/Nick: [Hoffmann et al.2011Angeli et al.2015]
4/6 Julian/Kelvin/Lucy/Max/Victoria: [Heilman and Smith2010]
  Minjoon/George/Ning/Maarten/Elizabeth: [Riedel et al.2013]
Gagan/Kenton/Mark/Conrad/Colin/Nick: [Berant et al.2013]
4/11social media Conrad/Julian: [Benson et al.2011Eisenstein2013]
4/13 Julian/Conrad/Ning/Minjoon/Elizabeth: [Bamman et al.2014]
Gagan/Maarten/Kenton/Mark: [Ritter et al.2010]
Max/Kelvin/Lucy/Srini: [Tan et al.2014]
Nick/Xi/Colin/George: [Tsur et al.2010]
See also: [Baldwin et al.2013Ling et al.2013]
4/18domain adaptation Victoria/Minjoon: [Blitzer et al.2006Daumé2007]
4/20 Elizabeth/Mark/Srini/Minjoon: [Jiang and Zhai2007]
Colin/Ning/Gagan/Xi: [Finkel and Manning2009]
Maarten/Kelvin/Kenton/Nick/Julian: [Glorot et al.2011]
Max/George/Lucy/Conrad: [Daumé et al.2010]
4/25cross-lingual projection George/Max: [Hwa et al.2005Das and Petrov2011]
4/27 Kenton/Gagan/Nick: [Wei and Pal2010]
Colin/Maarten/Elizabeth: [Smith and Eisner2009]
Mark/Julian/Srini: [McDonald et al.2011]
Ning/George/Kelvin: [Padó and Lapata2009]
Max/Lucy/Victoria: [Schneider et al.2013]
Conrad/Minjoon/Ryan: [Faruqui and Dyer2014]
5/2machine translation Kelvin/Ning: [Lopez2008Bahdanau et al.2014]
5/4 Srini/Conrad/Maarten/Julian: [Galley et al.2004]
Mark/Kelvin/Max/Victoria: [Albrecht and Hwa2007]
Lucy/Kenton/Gagan/George: [Gimpel and Smith2012]
Elizabeth/Minjoon/Ning/Nick/Colin: [Green et al.2013]
5/9nonparametric Bayesian NLPElizabeth/Lucy: [Teh2006Cohn et al.2009]
5/11 George/Kelvin/Max/Srini: [Teh et al.2006]
Kenton/Elizabeth: [Cohen and Smith2009]
Julian/Nick: [Blunsom and Cohn2010]
Conrad/Mark: [Petrov et al.2006]
Maarten: [Johnson et al.2007]
Gagan/Ning: [Goodman1996]
Xi/Colin: [Goldwater et al.2006]
Lucy/Minjoon: [Johnson et al.2006]
5/16spectral NLP Maarten/Gagan: [Luque et al.2012Parikh et al.2014]
5/18 your choice: [Arora et al.2012Dhillon et al.2012Lari and Young1990Lei et al.2014Stratos et al.2013]
5/23structured prediction Colin/Kenton/Srini: [Collins2002Daumé et al.2009Smith2011, chapter 3]
5/25 your choice: [Sha and Pereira2003Taskar et al.2004Ross et al.2011Dyer et al.2015]






5/30(holiday)
6/1writing exercise

Table 1: Course structure and topics.

Natural language processing (NLP) seeks to endow computers with the ability to intelligently process human language. NLP components are used in conversational agents and other systems that engage in dialogue with humans, automatic translation between human languages, automatic answering of questions using large text collections, the extraction of structured information from text, tools that help human authors, and many, many more.

This advanced course deeply explores a series of important topics in NLP. It is assumed that participants have taken CSE 517 and are therefore familiar with the fundamental ideas of the field.

1 Course Plan

Table 1 shows the plan, along with readings (which will be filled in as they are decided).

The first week will include:

Starting on Wednesday, March 30, each weekly cycle will be:

The final week of the quarter will include a writing assignment draft exchange and general discussion.

2 Evaluation

Students will be evaluated as follows:

3 Writing Assignment

You are to write a 4-page white paper describing a line of research in NLP. Your white paper should be framed as a small grant proposal about a new project that would extend a clearly identified past research contribution (possibly we read about, but that’s not a requirement). Your proposed project should:

We know that you probably haven’t written anything like this before. We don’t expect you to get it right in one attempt, so there will be feedback on drafts at two points (once from the instructor and once from your peers).

Your goal in this white paper is not to summarize the background work. We’ve been doing that all semester. We want you to build creatively on what you’ve read and propose something new. You should cite relevant work, but you don’t need to explain it in detail.

The first draft is due on May 2. The draft is worth 2/7 of the grade of the assignment, so take it seriously. We will give you some feedback on your first draft to help guide you toward an improved second draft. The second draft is due in class June 1, where you’ll trade with peers and give each other feedback. This draft is worth 2/7 of the grade of the assignment. The final version of your paper is due June 8; it is worth 3/7 of the assignment grade.

There are many resources available online for those writing proposals. (Examples found by the TA but not necessarily endorsed by us: http://blog.regehr.org/archives/149 and http://pages.cs.wisc.edu/~markhill/grant-tips.html.) Apart from these general guidelines, feel free to schedule a meeting with the instructor or TA to discuss your paper at any stage.

Your white paper must be written by you alone. You are not permitted to collaborate with anyone else on this white paper. You may ask other students or faculty members to read and comment on your white paper, but you must acknowledge their comments, and all ideas in the paper must be your own. So, if someone suggests a good idea to extend your white paper, you should thank them and follow up later, but don’t put it in the paper.

We are serious about the 4-page limit! We will not read anything longer than four pages. Consider us very busy NSF or DARPA program directors; if you want our money, you have to make your case in just a few pages. References don’t count toward the page limit. Please use the ACL 2016 style files without modification.

References

   Matt Taddy. Document classification by inversion of distributed language representations. In Proc. of ACL, 2015. URL http://www.aclweb.org/anthology/P15-2008.

   Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. Efficient non-parametric estimation of multiple embeddings per word in vector space, 2015. URL http://arxiv.org/pdf/1504.06654.pdf. arXiv:1504.06654.

   Andrew Y. Ng and Michael I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In NIPS, 2002. URL http://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf.

   Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. URL http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf.

   Hieu Pham, Thang Luong, and Christopher Manning. Learning distributed representations for multilingual text sequences. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015. URL http://www.aclweb.org/anthology/W15-1512.

   Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. A hierarchical neural autoencoder for paragraphs and documents, 2015. URL http://arxiv.org/pdf/1506.01057v2.pdf.

   Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents, 2014. URL http://arxiv.org/pdf/1405.4053v2.pdf. arXiv:1405.4053.

   Matt Taddy. Measuring political sentiment on Twitter: factor-optimal design for multinomial inverse regression, 2012. URL http://arxiv.org/pdf/1206.3776v5.pdf.

   Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In Proc. of ACL, 2011. URL http://www.anthology.aclweb.org/P/P11/P11-1055.pdf.

   Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. Leveraging linguistic structure for open domain information extraction. In Proc. of ACL, 2015. URL http://www.aclweb.org/anthology/P15-1034.

   Michael Heilman and Noah A. Smith. Good question! statistical ranking for question generation. In Proc. of NAACL, 2010. URL http://www.aclweb.org/anthology/N/N10/N10-1086.pdf.

   Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. Relation extraction with matrix factorization and universal schemas. In Proc. of NAACL, 2013. URL http://www.aclweb.org/anthology/N13-1008.

   Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on Freebase from question-answer pairs. In Proc. of EMNLP, 2013. URL http://www.aclweb.org/anthology/D/D13/D13-1160.pdf.

   Edward Benson, Aria Haghighi, and Regina Barzilay. Event discovery in social media feeds. In Proc. of ACL, 2011. URL http://www.aclweb.org/anthology/P11-1040.

   Jacob Eisenstein. What to do about bad language on the internet. In Proc. of NAACL, 2013. URL http://www.aclweb.org/anthology/N13-1037.

   David Bamman, Jacob Eisenstein, and Tyler Schnoebelen. Gender identity and lexical variation in social media. Journal of Sociolinguistics, 18 (2):135–160, 2014. URL http://arxiv.org/pdf/1210.4567.pdf.

   Alan Ritter, Colin Cherry, and Bill Dolan. Unsupervised modeling of twitter conversations. In Proc. of NAACL, 2010. URL http://www.aclweb.org/anthology/N10-1020.

   Chenhao Tan, Lillian Lee, and Bo Pang. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter. In Proc. of ACL, 2014. URL http://www.aclweb.org/anthology/P14-1017.

   Oren Tsur, Dmitry Davidov, and Ari Rappoport. ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proc. of ICWSM, 2010. URL http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/download/1495/1851/.

   Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. How noisy social media text, how diffrnt social media sources? In Proc. of IJCNLP, 2013. URL http://www.aclweb.org/anthology/I13-1041.

   Wang Ling, Chris Dyer, Alan W. Black, and Isabel Trancoso. Paraphrasing 4 microblog normalization. In Proc. of EMNLP, 2013. URL http://www.aclweb.org/anthology/D13-1008.

   John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. In Proc. of EMNLP, 2006. URL http://www.aclweb.org/anthology/W06-1615.

   Hal Daumé. Frustratingly easy domain adaptation. In Proc. of ACL, 2007. URL http://www.aclweb.org/anthology/P07-1033.

   Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in NLP. In Proc. of ACL, 2007. URL http://www.aclweb.org/anthology/P07-1034.

   Jenny Rose Finkel and Christopher D. Manning. Hierarchical Bayesian domain adaptation. In Proc. of NAACL, 2009. URL http://www.aclweb.org/anthology/N/N09/N09-1068.

   Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proc. of ICML, 2011. URL http://www.icml-2011.org/papers/342_icmlpaper.pdf.

   Hal Daumé, Abhishek Kumar, and Avishek Saha. Frustratingly easy semi-supervised domain adaptation. In Proc. of the Workshop on Domain Adaptation for Natural Language Processing, 2010. URL http://www.aclweb.org/anthology/W10-2608.

   Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering, 11(03):311–325, 2005. URL http://www.cs.pitt.edu/~hwa/nle04draft.pdf.

   Dipanjan Das and Slav Petrov. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proc. of ACL, pages 600–609, June 2011. URL http://www.aclweb.org/anthology/P11-1061.

   Bin Wei and Christopher Pal. Cross lingual adaptation: An experiment on sentiment classifications. In Proc. of ACL, 2010. URL http://www.aclweb.org/anthology/P10-2048.

   David A. Smith and Jason Eisner. Parser adaptation and projection with quasi-synchronous grammar features. In Proc. of EMNLP, 2009. URL http://www.aclweb.org/anthology/D/D09/D09-1086.pdf.

   Ryan McDonald, Slav Petrov, and Keith Hall. Multi-source transfer of delexicalized dependency parsers. In Proc. of EMNLP, 2011. URL http://www.aclweb.org/anthology/D/D11/D11-1006.pdf.

   Sebastian Padó and Mirella Lapata. Cross-lingual annotation projection for semantic roles. Journal of Artificial Intelligence Research, 36(1):307–340, 2009. URL https://www.jair.org/media/2863/live-2863-4721-jair.pdf.

   Nathan Schneider, Behrang Mohit, Chris Dyer, Kemal Oflazer, and Noah A. Smith. Supersense tagging for Arabic: the MT-in-the-middle attack. In Proc. of NAACL, 2013. URL http://www.aclweb.org/anthology/N/N13/N13-1076.pdf.

   Manaal Faruqui and Chris Dyer. Improving vector space word representations using multilingual correlation. In Proc. of EACL, 2014. URL http://www.aclweb.org/anthology/E/E14/E14-1049.pdf.

   Adam Lopez. Statistical machine translation. ACM Computing Surveys, 40(3):8, 2008. URL http://dl.acm.org/citation.cfm?id=1380586.

   Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014. URL http://arxiv.org/pdf/1409.0473.pdf.

   Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. What’s in a translation rule? In Proc. of NAACL, 2004. URL http://www.aclweb.org/anthology/N/N04/N04-1035.pdf.

   Joshua Albrecht and Rebecca Hwa. Regression for sentence-level MT evaluation with pseudo references. In Proc. of ACL, 2007. URL http://www.aclweb.org/anthology/P07-1038.

   Kevin Gimpel and Noah A. Smith. Structured ramp loss minimization for machine translation. In Proc. of NAACL, 2012. URL http://www.aclweb.org/anthology/N12-1023.

   Spence Green, Jeffrey Heer, and Christopher D. Manning. The efficacy of human post-editing for language translation. In Proc. of CHI, 2013. URL http://idl.cs.washington.edu/files/2013-PostEditing-CHI.pdf.

   Yee Whye Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proc. of ACL, 2006. URL http://www.aclweb.org/anthology/P06-1124.

   Trevor Cohn, Sharon Goldwater, and Phil Blunsom. Inducing compact but accurate tree-substitution grammars. In Proc. of NAACL, 2009. URL http://www.aclweb.org/anthology/N/N09/N09-1062.pdf.

   Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566–1581, 2006. URL http://amstat.tandfonline.com/doi/pdf/10.1198/016214506000000302.

   Shay Cohen and Noah A. Smith. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proc. of NAACL, 2009. URL http://www.aclweb.org/anthology/N/N09/N09-1009.

   Phil Blunsom and Trevor Cohn. Unsupervised induction of tree substitution grammars for dependency parsing. In Proc. of EMNLP, 2010. URL http://www.aclweb.org/anthology/D10-1117.

   Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In Proc. of COLING-ACL, 2006. URL http://www.aclweb.org/anthology/P06-1055.

   Mark Johnson, Thomas Griffiths, and Sharon Goldwater. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Proc. of NAACL, 2007. URL http://www.aclweb.org/anthology/N/N07/N07-1018.

   Joshua Goodman. Parsing algorithms and metrics. In Proc. of ACL, 1996. URL http://www.aclweb.org/anthology/P96-1024.

   Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Contextual dependencies in unsupervised word segmentation. In Proc. of COLING-ACL, 2006. URL http://www.aclweb.org/anthology/P06-1085.

   Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In NIPS, 2006. URL http://oldbooks.nips.cc/papers/files/nips19/NIPS2006_0064.pdf.

   Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras. Spectral learning for non-deterministic dependency parsing. In Proc. of EACL, 2012. URL http://www.aclweb.org/anthology/E12-1042.

   Ankur P. Parikh, Avneesh Saluja, Chris Dyer, and Eric Xing. Language modeling with power low rank ensembles. In Proc. of EMNLP, 2014. URL http://www.aclweb.org/anthology/D14-1158.

   Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. A practical algorithm for topic modeling with provable guarantees, 2012. URL http://arxiv.org/abs/1212.4777.

   Paramveer Dhillon, Jordan Rodu, Michael Collins, Dean Foster, and Lyle Ungar. Spectral dependency parsing with latent variables. In Proc. of EMNLP, 2012. URL http://www.aclweb.org/anthology/D12-1019.

   Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech & Language, 4(1):35–56, 1990. URL lari-young-90.pdf.

   Tao Lei, Yu Xin, Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. Low-rank tensors for scoring dependency structures. In Proc. of ACL, 2014. URL http://www.aclweb.org/anthology/P14-1130.

   Karl Stratos, Alexander Rush, Shay B. Cohen, and Michael Collins. Spectral learning of refinement HMMs. In Proc. of CoNLL, 2013. URL http://www.aclweb.org/anthology/W13-3507.

   Michael Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proc. of EMNLP, 2002. URL http://www.aclweb.org/anthology/W02-1001.

   Hal Daumé, John Langford, and Daniel Marcu. Search-based structured prediction. Machine Learning, 75(3):297–325, 2009. URL daume-09.pdf.

   Noah A. Smith. Linguistic Structure Prediction. Synthesis Lectures on Human Language Technologies. Morgan and Claypool, 2011. URL http://www.morganclaypool.com/doi/pdf/10.2200/S00361ED1V01Y201105HLT013.pdf.

   Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proc. of NAACL, 2003. URL http://www.aclweb.org/anthology/N/N03/N03-1028.pdf.

   Ben Taskar, Dan Klein, Mike Collins, Daphne Koller, and Christopher Manning. Max-margin parsing. In Proc. of EMNLP, 2004. URL http://www.aclweb.org/anthology/W/W04/W04-3201.pdf.

   Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning, 2011. URL https://arxiv.org/pdf/1011.0686v3.pdf.

   Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. Transition-based dependency parsing with stack long short-term memory. In Proc. of ACL, 2015. URL http://www.aclweb.org/anthology/P15-1033.