CSE 447: Natural Language Processing, Winter 2023
MWF 1:30-2:20pm, CSE2 G01
Instructor: Sofia Serrano
OH: Wednesdays 3-4pm in person in Allen Center 210
Teaching Assistant: Leo Zeyu Liu
OH: Fridays 4:30-5:30pm in person in Allen Center 220
Announcements
-
A1 is due this Friday (1/27) at 11:59pm! Leading up to that deadline, we’ll have extended office hours this week. Here’s the link to the full schedule of office hours this week, organized chronologically.
-
Quiz 2 will be released on Wednesday 1/25 at 2:20pm on Canvas, and will be available for you to complete for 24 hours (until 2:20pm on Thursday 1/26). Once you start the five-question quiz, there will be a 10-minute time limit for you to complete it.
-
Have anything you’d like to anonymously let the course staff know? You are of course welcome to email us, but feel free to use this anonymous feedback form instead if you’d prefer.
Summary
This course will explore foundational statistical techniques for the automatic analysis of natural (human) language text. Towards this end the course will introduce pragmatic formalisms for representing structure in natural language, and algorithms for annotating raw text with those structures. The dominant modeling paradigm is corpus-driven statistical learning, covering both supervised and unsupervised methods. Algorithms for NLP is a lab-based course. This means that instead of homeworks and exams, you will mainly be graded based on three hands-on coding projects.
This course assumes a good background in basic probability and a strong ability to program in Python. Experience using numerical libraries such as NumPy and neural network libraries such as PyTorch are a plus. Prior experience with machine learning is important. Prior experience in linguistics or natural languages is helpful, but not required. There will be a lot of statistics, algorithms, and coding in this class.
Calendar
Calendar is tentative and subject to change. More details will be added as the quarter continues.
| Week | Date | Topics | Readings | Homeworks |
|---|---|---|---|---|
| 1 | 1/4 |
Logistics
[slides] [recording] |
Course website, syllabus | Academic Integrity Form out |
| 1/6 |
Introduction
[slides] [recording] |
Eis 1 | HW1 out | |
| 2 | 1/9 |
Introduction
[slides] [recording] |
Eis 1 | |
| 1/11 |
Text classification
[slides] [recording] |
Eis 2; J&M III 4 | ||
| 1/13 |
Text classification
[slides] [recording] |
Eis 2; J&M III 4; Ng & Jordan, 2001 | Academic Integrity Form due | |
| 3 | 1/16 |
No class (Martin Luther King Jr. Day)
|
||
| 1/18 |
Text classification
[slides] [recording] |
Eis 2; J&M III 5; Pang et al. 2002 | Quiz 1 | |
| 1/20 |
Text classification
[slides] [recording] |
J&M III 5 | ||
| 4 | 1/23 |
Text classification
[slides] [recording] |
J&M III 5 | |
| 1/25 |
Language modeling
[slides] [recording] |
J&M III 3; Eis 6.1-6.2, 6.4 | Quiz 2 | |
| 1/27 |
Language modeling
|
J&M III 3; Eis 6.1-6.2, 6.4 | HW1 due | |
| 5 | 1/30 |
Lexical semantics
|
J&M III 6; Eis 14 | HW2 out |
| 2/1 |
Lexical semantics
|
Eis 14; J&M III 6 | Quiz 3 | |
| 2/3 |
Lexical semantics
|
Eis 14; J&M III 6 | ||
| 6 | 2/6 |
Neural networks
|
Eis 6.3, 6.5; J&M III 7.5; J&M III 9; Goldberg 10; Collobert et al. 2011 | |
| 2/8 |
Neural networks
|
Annotated Transformer; Illustrated Transformer | Quiz 4 | |
| 2/10 |
Sequence labeling
|
Eis 7.1-7.4, 8.1; J&M III 8 | ||
| 7 | 2/13 |
Sequence labeling
|
Eis 7.1-7.4, 8.1; Collins notes | |
| 2/15 |
Sequence labeling
|
Eis 7.5, 7.7, 8.3; Sutton & McCallum 2.1 - 2.5 | Quiz 5 | |
| 2/17 |
Sequence labeling
|
Eis 7.6 | HW2 due | |
| 8 | 2/20 |
No class (Presidents' Day)
|
HW3 out | |
| 2/22 |
Neural sequence labeling
|
Eis 7.6 | Quiz 6 | |
| 2/24 |
Parsing
|
Eis 10.1-10.2; J&M III 13 | ||
| 9 | 2/27 |
Parsing
|
Eis 11.1, 11.3; J&M III 14 | |
| 3/1 |
Parsing
|
Eis 11.1, 11.3; Chen and Manning 2014 | Quiz 7 | |
| 3/3 |
Advanced topics: Recommender systems and online training
|
Recommender Systems Lectures | ||
| 10 | 3/6 |
Research topics: Summarization
|
Kassas et al. 2021 | |
| 3/8 |
Advanced topics: Computational ethics
|
The Trouble With Bias | Quiz 8 | |
| 3/10 |
Advanced topics: Natural Language Understanding
|
HW3 due |
Resources
- Readings
- Ed discussion board
- Canvas
- GitLab
Assignments/Grading
- Project 1 (sequence classification): 30%
- We will build a system for automatically classifying song lyrics comments by era.
- Specifically, we build machine learning text classifiers, including both generative and discriminative models, and explore techniques to improve the models.
- Project 2 (sequence labeling): 30%
- We focus on sequence labeling with Hidden Markov Models and some simple deep learning based models.
- Our task is part-of-speech tagging on English and Norwegian from the Universal Dependencies dataset.
- We will cover the Viterbi algorithm.
- Project 3 (dependency parsing): 30%
- We will implement a transition-based dependency parser.
- The algorithm would be new and specific to the dependency parsing problem, but the underlying building blocks of the method are still some neural network modules covered in P1 and P2.
- Quizzes: 10%
- Starting from the 3rd week, we will have quizzes on Wednesdays.
- There will be 8 quizzes in total.
- Quizzes will be released at the end of class on Canvas and be available for twelve hours. They should take approximately ten minutes to complete.
- 5 best quizzes will be counted into final score. Each quiz will occupy 2% of final score.
- Participation: 10% bonus
Policies
-
Late policy. Each student will be granted 5 late days to use over the duration of the quarter. You can use a maximum of 3 late days on any one project. Weekends and holidays are also counted as late days. Late submissions are automatically considered as using late days. Using late days will not affect your grade. However, projects submitted late after all late days have been used will receive no credit. Be careful!
-
Academic honesty. Homework assignments are to be completed individually. Verbal collaboration on homework assignments is acceptable, as well as re-implementation of relevant algorithms from research papers, but everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. Suspected violations of academic integrity rules will be handled in accordance with UW guidelines on academic misconduct. See also the academic integrity form posted to Canvas.
-
Accommodations. If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the office of Disability Resources for Students, I encourage you to apply here.
Note to Students
Take care of yourself! As a student, you may experience a range of challenges that can interfere with learning, such as strained relationships, increased anxiety, substance use, feeling down, difficulty concentrating and/or lack of motivation. All of us benefit from support during times of struggle. There are many helpful resources available on campus and an important part of having a healthy life is learning how to ask for help. Asking for support sooner rather than later is almost always helpful. UW services are available, and treatment does work. You can learn more about confidential mental health services available on campus here. Crisis services are available from the counseling center 24/7 by phone at +1 (866) 743-7732 (more details here).
COVID-19 Safety
In accordance with UW guidelines, we are implementing the following policies to ensure the safety of our students and instructors to the maximum extent possible:
-
Course instruction The course will be taught in-person only, following the UW guidelines. However, links to recordings of each lecture will be posted on this site by the day following class.
-
Remote access. If you are sick or have potentially been exposed to COVID-19, stay home! While we encourage everyone to attend class in-person when they are well, there will always be a recording of class posted shortly after each lecture and there is no penalty for missing lecture in person. Office hours are also available both in-person and over Zoom (by appointment); each staff member’s office hours are posted under their name at the top of this webpage.
-
Masking. In accordance with UW’s masking policy, masks are strongly recommended the first two weeks of the quarter and will be recommended after that, so long as we stay in the CDC’s “low” community level. Given the flexibility in choosing whether to wear a mask or not, please be respectful of others’ choices. Read more about UW’s policy here.
If you would like a mask, please feel free to stop by the reception desk in the Allen Center, where they can provide you your choice of either a KN95/N95 mask or a cloth mask. Additionally, UW mask distribution will continue at various library locations, the Health Sciences Center, the HUB, and testing sites.
-
Social distancing. Currently, UW does not require social distancing in the classroom or office hours for students who are vaccinated and wearing a mask; it can also make it difficult to navigate and interact in such spaces. We do not mandate social distancing, but ask that if another student asks you to maintain distance from them, that you respect their request.
-
What if you get sick? Stay home if you are sick! The COVID-19 Public Health Flowchart indicates what you should do if you test positive, have been exposed to COVID-19, or have symptoms. Also see this FAQ for what to do.
-
What if we get sick? We will reschedule class, hold it remotely, or bring in a substitute lecturer/facilitator if necessary to prevent exposing students. We will try to give notice as far in advance as possible if an in-person event is moving to be held remotely, but please check your email beforehand to be sure you don’t miss anything.