CSE 599J1: Social Reinforcement Learning

Overview

How can we accelerate AI when learning in an environment with other intelligent agents? This course focuses on Social Reinforcement Learning in multi-agent and human-AI interactions. From studying the natural world, we know that social learning is an incredibly powerful mechanism that helps both humans and animals rapidly adapt to new circumstances, coordinate with others, and drives the emergence of complex learned behaviors. From recent advances in AI, we know that reinforcement learning from human feedback (RLHF) is an incredibly powerful mechanism for improving the capabilities and alignment of large models. This course will link these two perspectives, examining the complexities of modeling, learning from, and coordinating with other agents, whether those agents are humans or other RL agents in a simulation. We will study how social learning can address fundamental issues in AI like learning and generalization, as well as improving the ability of AI to coordinate with and interact with people.

Although we will cover a brief introduction to reinforcement learning (RL), familiarity with RL and deep learning is encouraged. The course is a project course; in addition to reading and discussing relevant research papers, students will submit a team-based final project in the form of a research paper.

Schedule and Details

Instructor: Natasha Jaques (nj at cs)

TAs:
Yancheng Liang (yancheng at cs)

Lecture: WF: 10:00 AM - 11:20 AM (80 min lectures) Location: Johnson Hall (JHN) 175

Sept 24 - Dec 5, Nov 28 is a holiday

Remote attendance: https://meet.google.com/tge-vofz-zdw

Office Hours:

Natasha: 2:30-3:30pm on Fridays in Gates 234. No office hours Nov 28 and Dec 5.
Yancheng Liang (TA): 1:30-2:30pm on Wednesdays in Gates 151. No office hours Oct 1 and Dec 3.

Important Links

  • Main spreadsheet (detailed schedule, slides, recording, and paper presentation sign-up) (link)
  • Schedule and lecture slides (recording) (link)
  • EdSTEM discussion board (link)
  • Project team matching (link)

Schedule overview (by week):

  • 1. Introduction to RL and RL post-training of LLMs
  • 2. RLHF, personalized RLHF
  • 3. RLVR, Embodied instruction following
  • 4. Learning from human beyond LLMs (inverse RL)
  • 5. Multi-agent RL
  • 6. Human-AI coordination
  • 7. Emergent complexity
  • 8. Social learning
  • 9. Multi-agent RL for LLMs
  • 10. Bonus week / Thanksgiving. TBD topic or guest lecture.
  • 11. Project presentations.

Class Format

Classes will be split so that on approximately half the days Natasha will present a ~50 minute lecture followed by ~30 minutes of questions and discussion, and on the remaining days we will have student-led paper presentations and discussions.

Each student in the class will sign up to present in one of the slots in the syllabus. They will prepare a 10 minute presentation on one of the papers for that discussion day. Each presentation will be followed by 5 minutes of clarifying question. After all the presentations finish, we will break into groups to discuss each of the papers, the themes that connect them, and interesting and impactful research directions that relate to them.

Grading

  • Class participation (15%)

    Because this is primarily a discussion course, to make it work we need students to attend class in person, ask questions, and participate in paper discussions. Therefore, a portion of your grade depends on doing this. To get the full 15%, you need to read the relevant papers, show up on time, and make comments or ask questions in at least 16/19 classes (you are allowed to miss 3; see the Course Policies section). However, we hope you will go beyond this minimum requirement and make the most of the class by actively participating and sharing your thoughts and questions on the research we are learning about.

  • Paper reflections (10%)

    We will have 9 Discussion classes for which you will prepare a 300-word paper reflection on one of the suggested papers, to be uploaded to EdSTEM. Reflections will be graded on a pass/fail basis. As per the Course Policies, you can miss submitting 1 reflection with no penalty. Presenters still need to submit reflections for the papers they are presenting. We are aware that it would be extremely easy to generate these summaries with an LLM. My only comment on this is if you want to learn something from the course, you’ll need to actually read the papers. If you’re not planning to read the papers, it would be better to drop and give up your spot to one of the other 55 students petitioning to get in. Also, if you and another student review the same paper and submit very similar LLM-generated summaries in public Ed posts, that might be a little embarrassing.

  • Lead discussion (10%)

    During the quarter, every student is expected to present a paper at least once. Use the class schedule spreadsheet to sign up for a particular presentation time, and choose the paper you would like to present. Add a link to your presentation slides to the spreadsheet by at least 11:59pm two days before class.

  • Course project (50%)

    Proposal (5%): due 11:59pm Oct 17, 2024
    Writeup (35%): due 11:59pm Dec 1, 2025
    Project presentation (10%): in class on Dec 3 and Dec 5

    See the Class Project section for more information.

  • Peer review (15%)

    Peer review is a big part of research, and in this class we will learn how to write high quality peer reviews. We will upload our course projects to a peer review system, such as OpenReview. You will be responsible for submitting 3 reviews of other students’ papers in the system, which will be due 1 week after the project is due, by 11:59pm on Dec 8. Note that this review procedure will be single-blind, since students will have seen the class project presentations. Each review will be worth 5% of your grade and will be graded based on whether it is thorough, complete, fair, and whether it gives the authors useful feedback for improving their paper. Note that the peer review feedback will not be used to determine the final grade for students’ papers, they will be graded by the instructors.

Course Policies

Late submissions and absences

  • To reduce the burden on instructors, create flexibility for students, and maintain consistent treatment across students, we will allow you to drop 1/9 paper reflections with no penalty. Your participation grade will also be based on participating in 16/19 class discussions, so you can miss 3 classes with no penalty. We understand that sometimes you may need to miss class due to travel obligations, or various other circumstances. Our goal is that for most of these issues you do not need to contact us about it. It will not affect your grade unless you go beyond 3 classes.
  • Late assignments will not be graded by default. The due dates have been set to being as late as possible while allowing for feedback on the presentation slides and prompt grading of the course project.

Inclusion, feedback, accommodations, and other policies

  • DEI: This course welcomes all students of all backgrounds. The computer science and computer engineering industries have a significant lack of diversity. This is due to a lack of sufficient past efforts by the field toward facilitating diversity, equity, and inclusion. The Allen School seeks to create a more diverse, inclusive, and equitable environment for our community and our field. You should expect and demand to be treated by your classmates and myself with respect.
    • If any incident occurs that challenges this commitment to a supportive, diverse, inclusive, and equitable environment, please let me know so the issue can be addressed. I have created an anonymous feedback form to make this easier. If you have any feedback, suggestions, or experience any issues related to diversity, equity, and inclusion, and would like to report them anonymously, please use the form. Supporting DEI is a process that requires continual learning and growth, and I value your feedback on how we can improve the course on this dimension.
    • You can also submit feedback through the Allen School’s anonymous feedback form: https://feedback.cs.washington.edu/
  • Generative AI: Do not use generative AI tools to write your assignments. This will inhibit your ability to gain useful skills from taking this course, which is the whole point.
  • DRS accommodations: Embedded in the core values of the University of Washington is a commitment to ensuring access to a quality higher education experience for a diverse student population. Disability Resources for Students (DRS) recognizes disability as an aspect of diversity that is integral to society and to our campus community. DRS serves as a partner in fostering an inclusive and equitable environment for all University of Washington students. The DRS office is in 011 Mary Gates Hall. Please see the UW resources at http://depts.washington.edu/uwdrs/current-students/accommodations/. If you have DRS accommodations that the course staff should know about, please contact us at the beginning of the course.
  • Religious accommodations: Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW’s policy, including more information about how to request an accommodation, is available at Religious Accommodations Policy. Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form".
  • Sexual harassment: University policy prohibits all forms of sexual harassment. If you feel you have been a victim of sexual harassment or if you feel you have been discriminated against, you may speak with your instructor, teaching assistant, the chair of the department, or you can file a complaint with the UW Ombudsman’s Office for Sexual Harassment. Their office is located at 339 HUB, (206)543-6028. There is a second office, the University Complaint Investigation and Resolution Office, who also investigate complaints. The UCIRO is located at 22 Gerberding Hall. Please see additional resources at the UW office of Ombud.
  • Land acknowledgement: The University of Washington acknowledges the Coast Salish peoples of this land, the land which touches the shared waters of all tribes and bands within the Suquamish, Tulalip and Muckleshoot nations.
  • Resources: For additional resources, see CSE Students and Student Resources.
  • EdSTEM: In addition to submitting paper summaries via EdSTEM, we encourage you to submit questions there to get help from other students.

Acknowledgements

We thank Brian Hou, Abhishek Gupta and Zoey Chen for providing us the template for the website.