The syllabus is subject to change; always get the latest version from the class website.
| |
Website: | |
Meetings: | Mary Gates 271, Mondays and Wednesdays 1:30–2:50 pm |
Instructor: | Noah A. Smith ( |
Instructor office hours: | CSE 532, by appointment |
Teaching assistant: | Jesse Dodge ( |
TA office hours: | TBD or by appointment |
3/28 | introduction/reading practice | [Taddy, 2015] | |||
3/30 | presentation practice | Gagan/Mandar/Ryan: [Neelakantan et al., 2015] | |||
Elizabeth/George/Julian: [Ng and Jordan, 2002] | |||||
Huan/Lucy/Maarten: [Zhang et al., 2015] | |||||
Hannah/Minjoon: [Pham et al., 2015] | |||||
Antoine/Hao/Kenton: [Li et al., 2015] | |||||
Li/Nick/Ning/Mark: [Le and Mikolov, 2014] | |||||
Akshay/Colin/Conrad: [Taddy, 2012] | |||||
4/4 | information extraction | Mark/Nick: [Hoffmann et al., 2011, Angeli et al., 2015] | |||
4/6 | Julian/Kelvin/Lucy/Max/Victoria: [Heilman and Smith, 2010] | ||||
Minjoon/George/Ning/Maarten/Elizabeth: [Riedel et al., 2013] | |||||
Gagan/Kenton/Mark/Conrad/Colin/Nick: [Berant et al., 2013] | |||||
4/11 | social media | Conrad/Julian: [Benson et al., 2011, Eisenstein, 2013] | |||
4/13 | Julian/Conrad/Ning/Minjoon/Elizabeth: [Bamman et al., 2014] | ||||
Gagan/Maarten/Kenton/Mark: [Ritter et al., 2010] | |||||
Max/Kelvin/Lucy/Srini: [Tan et al., 2014] | |||||
Nick/Xi/Colin/George: [Tsur et al., 2010] | |||||
See also: [Baldwin et al., 2013, Ling et al., 2013] | |||||
4/18 | domain adaptation | Victoria/Minjoon: [Blitzer et al., 2006, Daumé, 2007] | |||
4/20 | Elizabeth/Mark/Srini/Minjoon: [Jiang and Zhai, 2007] | ||||
Colin/Ning/Gagan/Xi: [Finkel and Manning, 2009] | |||||
Maarten/Kelvin/Kenton/Nick/Julian: [Glorot et al., 2011] | |||||
Max/George/Lucy/Conrad: [Daumé et al., 2010] | |||||
4/25 | cross-lingual projection | George/Max: [Hwa et al., 2005, Das and Petrov, 2011] | |||
4/27 | Kenton/Gagan/Nick: [Wei and Pal, 2010] | ||||
Colin/Maarten/Elizabeth: [Smith and Eisner, 2009] | |||||
Mark/Julian/Srini: [McDonald et al., 2011] | |||||
Ning/George/Kelvin: [Padó and Lapata, 2009] | |||||
Max/Lucy/Victoria: [Schneider et al., 2013] | |||||
Conrad/Minjoon/Ryan: [Faruqui and Dyer, 2014] | |||||
5/2 | machine translation | Kelvin/Ning: [Lopez, 2008, Bahdanau et al., 2014] | |||
5/4 | Srini/Conrad/Maarten/Julian: [Galley et al., 2004] | ||||
Mark/Kelvin/Max/Victoria: [Albrecht and Hwa, 2007] | |||||
Lucy/Kenton/Gagan/George: [Gimpel and Smith, 2012] | |||||
Elizabeth/Minjoon/Ning/Nick/Colin: [Green et al., 2013] | |||||
5/9 | nonparametric Bayesian NLP | Elizabeth/Lucy: [Teh, 2006, Cohn et al., 2009] | |||
5/11 | George/Kelvin/Max/Srini: [Teh et al., 2006] | ||||
Kenton/Elizabeth: [Cohen and Smith, 2009] | |||||
Julian/Nick: [Blunsom and Cohn, 2010] | |||||
Conrad/Mark: [Petrov et al., 2006] | |||||
Maarten: [Johnson et al., 2007] | |||||
Gagan/Ning: [Goodman, 1996] | |||||
Xi/Colin: [Goldwater et al., 2006] | |||||
Lucy/Minjoon: [Johnson et al., 2006] | |||||
5/16 | spectral NLP | Maarten/Gagan: [Luque et al., 2012, Parikh et al., 2014] | |||
5/18 | your choice: [Arora et al., 2012, Dhillon et al., 2012, Lari and Young, 1990, Lei et al., 2014, Stratos et al., 2013] | ||||
5/23 | structured prediction | Colin/Kenton/Srini: [Collins, 2002, Daumé et al., 2009, Smith, 2011, chapter 3] | |||
5/25 | your choice: [Sha and Pereira, 2003, Taskar et al., 2004, Ross et al., 2011, Dyer et al., 2015] | ||||
5/30 | (holiday) | ||||
6/1 | writing exercise | ||||
Natural language processing (NLP) seeks to endow computers with the ability to intelligently process human language. NLP components are used in conversational agents and other systems that engage in dialogue with humans, automatic translation between human languages, automatic answering of questions using large text collections, the extraction of structured information from text, tools that help human authors, and many, many more.
This advanced course deeply explores a series of important topics in NLP. It is assumed that participants have taken CSE 517 and are therefore familiar with the fundamental ideas of the field.
Table 1 shows the plan, along with readings (which will be filled in as they are decided).
The first week will include:
Starting on Wednesday, March 30, each weekly cycle will be:
The final week of the quarter will include a writing assignment draft exchange and general discussion.
Students will be evaluated as follows:
You are to write a 4-page white paper describing a line of research in NLP. Your white paper should be framed as a small grant proposal about a new project that would extend a clearly identified past research contribution (possibly we read about, but that’s not a requirement). Your proposed project should:
We know that you probably haven’t written anything like this before. We don’t expect you to get it right in one attempt, so there will be feedback on drafts at two points (once from the instructor and once from your peers).
Your goal in this white paper is not to summarize the background work. We’ve been doing that all semester. We want you to build creatively on what you’ve read and propose something new. You should cite relevant work, but you don’t need to explain it in detail.
The first draft is due on May 2. The draft is worth 2/7 of the grade of the assignment, so take it seriously. We will give you some feedback on your first draft to help guide you toward an improved second draft. The second draft is due in class June 1, where you’ll trade with peers and give each other feedback. This draft is worth 2/7 of the grade of the assignment. The final version of your paper is due June 8; it is worth 3/7 of the assignment grade.
There are many resources available online for those writing proposals. (Examples found by the TA but not necessarily endorsed by us: and Apart from these general guidelines, feel free to schedule a meeting with the instructor or TA to discuss your paper at any stage.
Your white paper must be written by you alone. You are not permitted to collaborate with anyone else on this white paper. You may ask other students or faculty members to read and comment on your white paper, but you must acknowledge their comments, and all ideas in the paper must be your own. So, if someone suggests a good idea to extend your white paper, you should thank them and follow up later, but don’t put it in the paper.
We are serious about the 4-page limit! We will not read anything longer than four pages. Consider us very busy NSF or DARPA program directors; if you want our money, you have to make your case in just a few pages. References don’t count toward the page limit. Please use the ACL 2016 style files without modification.
Matt Taddy. Document classification by inversion of distributed language representations. In Proc. of ACL, 2015. URL
Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. Efficient non-parametric estimation of multiple embeddings per word in vector space, 2015. URL arXiv:1504.06654.
Andrew Y. Ng and Michael I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In NIPS, 2002. URL
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. URL
Hieu Pham, Thang Luong, and Christopher Manning. Learning distributed representations for multilingual text sequences. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015. URL
Jiwei Li, Minh-Thang Luong, and Dan Jurafsky. A hierarchical neural autoencoder for paragraphs and documents, 2015. URL
Quoc V. Le and Tomas Mikolov. Distributed representations of sentences and documents, 2014. URL arXiv:1405.4053.
Matt Taddy. Measuring political sentiment on Twitter: factor-optimal design for multinomial inverse regression, 2012. URL
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In Proc. of ACL, 2011. URL
Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. Leveraging linguistic structure for open domain information extraction. In Proc. of ACL, 2015. URL
Michael Heilman and Noah A. Smith. Good question! statistical ranking for question generation. In Proc. of NAACL, 2010. URL
Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. Relation extraction with matrix factorization and universal schemas. In Proc. of NAACL, 2013. URL
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on Freebase from question-answer pairs. In Proc. of EMNLP, 2013. URL
Edward Benson, Aria Haghighi, and Regina Barzilay. Event discovery in social media feeds. In Proc. of ACL, 2011. URL
Jacob Eisenstein. What to do about bad language on the internet. In Proc. of NAACL, 2013. URL
David Bamman, Jacob Eisenstein, and Tyler Schnoebelen. Gender identity and lexical variation in social media. Journal of Sociolinguistics, 18 (2):135–160, 2014. URL
Alan Ritter, Colin Cherry, and Bill Dolan. Unsupervised modeling of twitter conversations. In Proc. of NAACL, 2010. URL
Chenhao Tan, Lillian Lee, and Bo Pang. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter. In Proc. of ACL, 2014. URL
Oren Tsur, Dmitry Davidov, and Ari Rappoport. ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proc. of ICWSM, 2010. URL
Timothy Baldwin, Paul Cook, Marco Lui, Andrew MacKinlay, and Li Wang. How noisy social media text, how diffrnt social media sources? In Proc. of IJCNLP, 2013. URL
Wang Ling, Chris Dyer, Alan W. Black, and Isabel Trancoso. Paraphrasing 4 microblog normalization. In Proc. of EMNLP, 2013. URL
John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. In Proc. of EMNLP, 2006. URL
Hal Daumé. Frustratingly easy domain adaptation. In Proc. of ACL, 2007. URL
Jing Jiang and ChengXiang Zhai. Instance weighting for domain adaptation in NLP. In Proc. of ACL, 2007. URL
Jenny Rose Finkel and Christopher D. Manning. Hierarchical Bayesian domain adaptation. In Proc. of NAACL, 2009. URL
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proc. of ICML, 2011. URL
Hal Daumé, Abhishek Kumar, and Avishek Saha. Frustratingly easy semi-supervised domain adaptation. In Proc. of the Workshop on Domain Adaptation for Natural Language Processing, 2010. URL
Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering, 11(03):311–325, 2005. URL
Dipanjan Das and Slav Petrov. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proc. of ACL, pages 600–609, June 2011. URL
Bin Wei and Christopher Pal. Cross lingual adaptation: An experiment on sentiment classifications. In Proc. of ACL, 2010. URL
David A. Smith and Jason Eisner. Parser adaptation and projection with quasi-synchronous grammar features. In Proc. of EMNLP, 2009. URL
Ryan McDonald, Slav Petrov, and Keith Hall. Multi-source transfer of delexicalized dependency parsers. In Proc. of EMNLP, 2011. URL
Sebastian Padó and Mirella Lapata. Cross-lingual annotation projection for semantic roles. Journal of Artificial Intelligence Research, 36(1):307–340, 2009. URL
Nathan Schneider, Behrang Mohit, Chris Dyer, Kemal Oflazer, and Noah A. Smith. Supersense tagging for Arabic: the MT-in-the-middle attack. In Proc. of NAACL, 2013. URL
Manaal Faruqui and Chris Dyer. Improving vector space word representations using multilingual correlation. In Proc. of EACL, 2014. URL
Adam Lopez. Statistical machine translation. ACM Computing Surveys, 40(3):8, 2008. URL
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2014. URL
Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. What’s in a translation rule? In Proc. of NAACL, 2004. URL
Joshua Albrecht and Rebecca Hwa. Regression for sentence-level MT evaluation with pseudo references. In Proc. of ACL, 2007. URL
Kevin Gimpel and Noah A. Smith. Structured ramp loss minimization for machine translation. In Proc. of NAACL, 2012. URL
Spence Green, Jeffrey Heer, and Christopher D. Manning. The efficacy of human post-editing for language translation. In Proc. of CHI, 2013. URL
Yee Whye Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proc. of ACL, 2006. URL
Trevor Cohn, Sharon Goldwater, and Phil Blunsom. Inducing compact but accurate tree-substitution grammars. In Proc. of NAACL, 2009. URL
Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566–1581, 2006. URL
Shay Cohen and Noah A. Smith. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In Proc. of NAACL, 2009. URL
Phil Blunsom and Trevor Cohn. Unsupervised induction of tree substitution grammars for dependency parsing. In Proc. of EMNLP, 2010. URL
Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In Proc. of COLING-ACL, 2006. URL
Mark Johnson, Thomas Griffiths, and Sharon Goldwater. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Proc. of NAACL, 2007. URL
Joshua Goodman. Parsing algorithms and metrics. In Proc. of ACL, 1996. URL
Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Contextual dependencies in unsupervised word segmentation. In Proc. of COLING-ACL, 2006. URL
Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. Adaptor grammars: A framework for specifying compositional nonparametric Bayesian models. In NIPS, 2006. URL
Franco M. Luque, Ariadna Quattoni, Borja Balle, and Xavier Carreras. Spectral learning for non-deterministic dependency parsing. In Proc. of EACL, 2012. URL
Ankur P. Parikh, Avneesh Saluja, Chris Dyer, and Eric Xing. Language modeling with power low rank ensembles. In Proc. of EMNLP, 2014. URL
Sanjeev Arora, Rong Ge, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. A practical algorithm for topic modeling with provable guarantees, 2012. URL
Paramveer Dhillon, Jordan Rodu, Michael Collins, Dean Foster, and Lyle Ungar. Spectral dependency parsing with latent variables. In Proc. of EMNLP, 2012. URL
Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech & Language, 4(1):35–56, 1990. URL lari-young-90.pdf.
Tao Lei, Yu Xin, Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. Low-rank tensors for scoring dependency structures. In Proc. of ACL, 2014. URL
Karl Stratos, Alexander Rush, Shay B. Cohen, and Michael Collins. Spectral learning of refinement HMMs. In Proc. of CoNLL, 2013. URL
Michael Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proc. of EMNLP, 2002. URL
Hal Daumé, John Langford, and Daniel Marcu. Search-based structured prediction. Machine Learning, 75(3):297–325, 2009. URL daume-09.pdf.
Noah A. Smith. Linguistic Structure Prediction. Synthesis Lectures on Human Language Technologies. Morgan and Claypool, 2011. URL
Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proc. of NAACL, 2003. URL
Ben Taskar, Dan Klein, Mike Collins, Daphne Koller, and Christopher Manning. Max-margin parsing. In Proc. of EMNLP, 2004. URL
Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning, 2011. URL
Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. Transition-based dependency parsing with stack long short-term memory. In Proc. of ACL, 2015. URL