Link Search Menu Expand Document

CSE 447: Natural Language Processing, Winter 2023

MWF 1:30-2:20pm, CSE2 G01

Instructor: Sofia Serrano

sofias6@cs.washington.edu

Office hours: Wednesdays 3-4pm in person in Allen Center 210

Teaching Assistant: Daksh Sinha

daksh97@uw.edu

Office hours: Mondays 4:30-5:30pm in person in Gates 152

Teaching Assistant: Khirod Sahoo

ksahoo@uw.edu

Office hours: Tuesdays 3-4pm over Zoom

Teaching Assistant: Leo Zeyu Liu

zeyuliu2@cs.washington.edu

Office hours: Fridays 4:30-5:30pm in person in Allen Center 220

Teaching Assistant: Leroy Wang

lryw@uw.edu

Office hours: Thursdays 11am-12pm over Zoom

Teaching Assistant: Urmika Kasi

ukasi@uw.edu

Office hours: Mondays 12-1pm over Zoom

Teaching Assistant: Xinyan (Velocity) Yu

xyu530@cs.washington.edu

Office hours: Wednesdays 10-11am over Zoom

Announcements

  • A3 is due Friday 3/10!
    • Don’t forget to also submit your *.preds files and your writeup pdf (and tag your A3)! If you’re not sure whether those have been uploaded correctly, check https://gitlab.cs.washington.edu/cse447-wi23/a3/cse447-wi23-a3-yournetid/ in a browser.
    • The number of late days you have available to use on A3 is min(3, 5 - total_late_days_used) 1 + min(3, 5 - total_late_days_used) (edited on Saturday 3/11 to reflect the previous evening’s announcement). If you’re unsure what this number is for you, please ask us!
    • We’ll be taking A3 regrade requests (as private posts on Ed) from whenever we get A3 grades out through the end of Friday 3/17 (so, probably for 2ish days)
  • Have anything you’d like to anonymously let the course staff know? You are of course welcome to email us, but feel free to use this anonymous feedback form instead if you’d prefer.

Summary

This course will explore foundational statistical techniques for the automatic analysis of natural (human) language text. Towards this end the course will introduce pragmatic formalisms for representing structure in natural language, and algorithms for annotating raw text with those structures. The dominant modeling paradigm is corpus-driven statistical learning, covering both supervised and unsupervised methods. Algorithms for NLP is a lab-based course. This means that instead of homeworks and exams, you will mainly be graded based on three hands-on coding projects.

This course assumes a good background in basic probability and a strong ability to program in Python. Experience using numerical libraries such as NumPy and neural network libraries such as PyTorch are a plus. Prior experience with machine learning is important. Prior experience in linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class.

Calendar

Calendar is tentative and subject to change. More details will be added as the quarter continues.

Week Date Topics Readings Homeworks
1 1/4 Logistics
[slides] [recording]
Course website, syllabus Academic Integrity Form out
1/6 Introduction
[slides] [recording]
Eis 1 HW1 out
2 1/9 Introduction
[slides] [recording]
Eis 1
1/11 Text classification
[slides] [recording]
Eis 2; J&M III 4
1/13 Text classification
[slides] [recording]
Eis 2; J&M III 4; Ng & Jordan, 2001 Academic Integrity Form due
3 1/16 No class (Martin Luther King Jr. Day)
1/18 Text classification
[slides] [recording]
Eis 2; J&M III 5; Pang et al. 2002 Quiz 1
1/20 Text classification
[slides] [recording]
J&M III 5
4 1/23 Text classification
[slides] [recording]
J&M III 5
1/25 Language modeling
[slides] [recording]
J&M III 3; Eis 6.1-6.2, 6.4 Quiz 2
1/27 Language modeling
[slides] [recording]
J&M III 3; Eis 6.1-6.2, 6.4, also page 449; Sennrich et al. 2016 (sections 1 and 3.2 only) HW1 due
5 1/30 Lexical semantics
[slides] [recording]
Eis 14; J&M III 6 HW2 out
2/1 Lexical semantics
[slides] [recording]
Eis 14; J&M III 6 Quiz 3
2/3 Neural networks I
[slides] [recording]
Eis 6.3, 6.5; J&M III 7.5; J&M III 9; Goldberg 10; Collobert et al. 2011
6 2/6 Sequence labeling
[slides] [recording]
Eis 7.1-7.4, 8.1; J&M III 8
2/8 Sequence labeling
[slides] [recording]
Eis 7.1-7.4, 8.1; Collins notes Quiz 4
2/10 Sequence labeling
[slides] [recording]
Eis 7.5, 7.7, 8.3; Sutton & McCallum 2.1 - 2.5
7 2/13 Neural sequence labeling
[slides] [recording]
Eis 7.6
2/15 Neural networks II
[slides] [recording]
Annotated Transformer; Illustrated Transformer Quiz 5
2/17 Machine translation
[slides] [recording]
Eis 18 (skim) HW2 due
8 2/20 No class (Presidents' Day)
HW3 out
2/22 Parsing
[slides] [recording]
Eis 10.1-10.2; J&M III 17 Quiz 6
2/24 Parsing
[slides] [recording]
Eis 11.1, 11.3; J&M III 18
9 2/27 Parsing
[slides] [recording]
Eis 11.1, 11.3; Chen and Manning 2014
3/1 Guest lecture by Saadia Gabriel: "Socially Responsible and Factual Reasoning for Equitable AI Systems"
[slides] [recording]
Gabriel et al. 2022; (Optional) Any part of the ACL 2020 tutorial on commonsense reasoning that piques your interest (can browse slide decks or view recorded talks) Quiz 7
3/3 Guest lecture by Alane Suhr on multimodal NLP and grounding for NLP
[slides] [recording]
Bisk et al. 2020
10 3/6 Guest lecture by Sewon Min on prompting and in-context learning with large language models
[slides] [recording]
GPT-3 paper (pages 3-7 only); Blog post on in-context learning; Blog post on how in-context learning works
3/8 Guest lecture by Akari Asai on multilingual NLP
[slides] [recording]
"The State of Multilingual AI" blog post; (Optional) Conneau et al. 2020; (Optional) Hu et al. 2020; (Optional) Lauscher et al. 2020 Quiz 8
3/10 Conclusion and Q&A
[slides] [recording]
HW3 due

Google Calendar

Note that this doesn’t include class readings and lecture topics; see the calendar table in the previous section for those.

Resources

Assignments/Grading

  • Project 1 (sequence classification): 30%
    • We will build a system for automatically classifying song lyrics comments by era.
    • Specifically, we build machine learning text classifiers, including both generative and discriminative models, and explore techniques to improve the models.
  • Project 2 (sequence labeling): 30%
    • We focus on sequence labeling with Hidden Markov Models and some simple deep learning based models.
    • Our task is part-of-speech tagging on English and Norwegian from the Universal Dependencies dataset.
    • We will cover the Viterbi algorithm.
  • Project 3 (dependency parsing): 30%
    • We will implement a transition-based dependency parser.
    • The algorithm would be new and specific to the dependency parsing problem, but the underlying building blocks of the method are still some neural network modules covered in P1 and P2.
  • Quizzes: 10%
    • Starting from the 3rd week, we will have quizzes on Wednesdays.
    • There will be 8 quizzes in total.
    • Quizzes will be released at the end of class on Canvas and be available for 24 hours. They should take approximately ten minutes to complete.
    • 5 best quizzes will be counted into final score. Each quiz will occupy 2% of final score.
  • Participation: 10% bonus

Policies

  • Late policy. Each student will be granted 5 late days to use over the duration of the quarter. You can use a maximum of 3 late days on any one project. Weekends and holidays are also counted as late days. Late submissions are automatically considered as using late days. Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Be careful!

  • Academic honesty. Homework assignments are to be completed individually. Verbal collaboration on homework assignments is acceptable, as well as re-implementation of relevant algorithms from research papers, but everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. Suspected violations of academic integrity rules will be handled in accordance with UW guidelines on academic misconduct. See also the academic integrity form posted to Canvas.

  • Accommodations. If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the office of Disability Resources for Students, I encourage you to apply here.

Note to Students

Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of having a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. UW services are available, and treatment does work. You can learn more about confidential mental health services available on campus here. Crisis services are available from the counseling center 24/7 by phone at +1 (866) 743-7732 (more details here).

COVID-19 Safety

In accordance with UW guidelines, we are implementing the following policies to ensure the safety of our students and instructors to the maximum extent possible:

  • Course instruction The course will be taught in-person only, following the UW guidelines. However, links to recordings of each lecture will be posted on this site by the day following class.

  • Remote access. If you are sick or have potentially been exposed to COVID-19, stay home! While we encourage everyone to attend class in-person when they are well, there will always be a recording of class posted shortly after each lecture and there is no penalty for missing lecture in person. Office hours are also available both in-person and over Zoom (by appointment); each staff member’s office hours are posted under their name at the top of this webpage.

  • Masking. In accordance with UW’s masking policy, masks are strongly recommended the first two weeks of the quarter and will be recommended after that, so long as we stay in the CDC’s “low” community level. Given the flexibility in choosing whether to wear a mask or not, please be respectful of others’ choices. Read more about UW’s policy here.

    If you would like a mask, please feel free to stop by the reception desk in the Allen Center, where they can provide you your choice of either a KN95/N95 mask or a cloth mask. Additionally, UW mask distribution will continue at various library locations, the Health Sciences Center, the HUB, and testing sites.

  • Social distancing. Currently, UW does not require social distancing in the classroom or office hours for students who are vaccinated and wearing a mask; it can also make it difficult to navigate and interact in such spaces. We do not mandate social distancing, but ask that if another student asks you to maintain distance from them, that you respect their request.

  • What if you get sick? Stay home if you are sick! The COVID-19 Public Health Flowchart indicates what you should do if you test positive, have been exposed to COVID-19, or have symptoms. Also see this FAQ for what to do.

  • What if we get sick? We will reschedule class, hold it remotely, or bring in a substitute lecturer/facilitator if necessary to prevent exposing students. We will try to give notice as far in advance as possible if an in-person event is moving to be held remotely, but please check your email beforehand to be sure you don’t miss anything.