CSE 390HA - 122 Honors Seminar
This is the website for the Spring 2024 iteration of CSE 390HA.
Note: this is still an in-progress website; items may be subject to change.
- Instructor: Matt Wang (email: mxw@cs.washington.edu)
- Meeting Time: Tuesdays, 3:30 - 4:50 PM (starting Tuesday, April 2nd)
- Location: LOW 112
Overview
Welcome to CSE 390HA, the Honors section for CSE 122! Each week, we will discuss various topics related to computer science. Our sessions will mostly focus on the societal and cultural impacts of computer science (and more broadly, technology), with some exploration of technical concepts to support these discussions. This is intended to be an opportunity to think about computer science and other related topics in a broader context.
Notably: this course is not an opportunity to learn more programming, computer science, or add more "rigor" to 122. No background or familiarity with computer science is required outside of what is necessary for CSE 122.
All information in this class will be available on this course website. Canvas will be used for peer discussion, and Google Forms will be used for submitting work. Further course policies, including how to get credit, are listed under course policies.
Course Content
Overview & Schedule
Click on the topic entry to go to a more detailed overview!
Date | Topic | Homework (for next week) |
---|---|---|
April 2, 2024 | Introduction | Post introduction; complete required reading & one optional reading; answer reflection. |
April 9, 2024 | Computing Education | Complete required reading & one optional reading; answer reflection. |
April 16, 2024 | Accessibility & Disability | Complete required reading & one optional reading; answer reflection. |
April 23, 2024 | Machine Learning & AI | Complete required reading & one optional reading; answer reflection. |
April 30, 2024 | Privacy & Security | Complete required reading & one optional reading; vote on student choice topics. |
May 7, 2024 | Student's Choice #1 | Propose culminating reflection idea; more TBD |
May 14, 2024 | Student's Choice #2 | TBD |
May 21, 2024 | Student's Choice #3 | Complete your culminating reflection! |
May 28, 2024 | Careers Panel! | Before the start of finals week: complete peer review. |
Week 1: Introduction
This class will be a combination of:
- a quick (but important) set of introductions!
- a typical "syllabus day" overview of course policies and expectations
- a meta discussion on what you want to get out of this class, and setting up community norms
- some priming questions for our quarter: what computer science is, why we study it, what computers do, and whether or not it's "good" or "bad"
- finally, setting the stage for next week - who gets to study computer science?
For this session, you don't have to prepare anything - just show up and bring your best self! After the session, I'll also post a more detailed summary with what we talked about.
Week 1 Summary
We generally followed the structure outlined above! First we did introductions: armed with a table tent and some Mott's fruit snacks, we partnered up and then introduced our partners to the class.
Then, we did some typical "syllabus day" activities with a co-design twist. In particular, we covered:
- Core course policies: how to get credit, disability and religious accommodations, and academic honesty.
- What this class is (an exploration of sociotechnical content; learning by discussing and creating knowledge together; a class which can be tailored towards students' interests, but needs buy-in) and is not (more Java, Matt telling you the "right" answer).
- What we all wanted to get out of this class (with a think-pair-share).
- Our community norms (which we devised and voted on).
We then answered some preliminary questions with "Yes, And" - Matt seeded an (incomplete) worldview, and students chimed in to expand it!
What is computer science (CS)? Can you come up with a formal definition?
Matt's incomplete seed: a degree and field of study at universities like UW. CS is what you study in the major: some programming, and some math.
Many great student answers here! Some choice "Yes, And"s included: CS is a broader skill about understanding how computers work and how to use them; CS is not just a formal program at universities, but informal and across ages; CS is also a field to study the impacts of computers on the world.
Why do "we" study computer science?
Matt's incomplete seed: it's a very lucrative field, and makes a lot of money! And, it's required for my major.
Wonderful student answers included: it's hard to "escape" from computing, so it's better to face it head-on; computing is everywhere in any field; computers can make you more productive; computers impact people, so we should learn about them.
How are computers used in your life today?
Matt's incomplete seed: when I think of my "computer", I think of my laptop and phone.
A diverse pool of answers, including: any data about you is stored somewhere (the cloud is just someone else's computer), computers are embedded in many devices (from calculators to cars to boats), many physical items have been manufactured with (and made more productive by) computers, and almost any form of communication!
Are computers good or bad? How do you make a decision like this?
Matt's incomplete seed: computers are mostly good, since they've rapidly improved people's lives through technological progress.
Interestingly, there was almost consensus on: computers are tools and are not inherently good or bad (like a hammer). Instead, it's how humans use and embody their values in computers that affect people in positive or negative ways.
Finally, we set the stage for next week. In particular, Matt explained the context for the required reading + the different types of optional readings. See the Homework for Week 2 tab for more information.
After class, Matt wrote this summary up, added the community norms to the website, and notified the students!
Homework for Week 2
- post a short introduction on Canvas; please include:
- your name, pronouns (optional), major, and year
- why you took this class
- what you're looking to get out of this class
- one goal for this quarter that's related to this class
- one goal for this quarter that's unrelated to this class
-
read the required reading(s):
- pages 1-25 (Introduction + Chapter 1) of Stuck In The Shallow End, Jane Margolis (et al.), MIT Press, 2008.
- one-sentence pitch: an unlikely metaphor on barriers for minoritized students in computing and a reminder that diversity problems facing computing educators are not new.
-
read (at least) one of the optional readings:
- short & sweet (one short-ish article)
- article: Computer science was always supposed to be taught to everyone, and it wasn't about getting a job, Mark Guzdial, 2021.
- one-sentence pitch: Michigan has a unique system where there are three departments that teach computing (engineering CS, information school, and "LSA" - similar to letters and sciences) - learn about some of the history and ethos.
- back & forth (two opposing viewpoints)
- article 1: Computational Thinking, Jeannette M. Wing, Communications of the ACM, 2006.
- article 2: Do We Really Need Computational Thinking, Enrico Nardelli, Communications of the ACM, 2019.
- one-sentence pitch: "computational thinking" is a pair of words that has dominated the computing education discourse (including a shoutout from Obama), but what really is it?
- rabbit hole (a set of short articles meant to spark a deep dive)
- article 1: CS education policymaking: how a (state) bill becomes a (state) law, Amy J. Ko, 2019.
- article 2: What counts as computer science in K-12 education?, Amy J. Ko, 2019.
- article 3: Computer science education bill would add new graduation requirements in Washington state, Nathalie Graham, GeekWire, 2024.
- one-sentence pitch: learn a bit about the current state of computing education policy in Washington, and how surprisingly many of Matt's coworkers are involved!
-
answer the reflection on Google Forms
Canvas; this is around seven short answer questions.
Week 2: Computing Education
Before class: complete homework for week 2.
Broadly speaking, this class will have two focuses. First, we'll dive deep into the required reading (Stuck In The Shallow End) and reflect on how it connects to computer science then and now. Then, we'll talk about the topic of the optional readings: different mechanisms for broadening participation in computing "in the large", with some focus on Washington state.
Week 2 Summary
Here is a stripped-down, (mostly) anonymized summary of the questions we discussed today! Each bullet is a different point (and most are in response to the previous bullet).
Why is the book called "Stuck in the Shallow End"?
- the history of segregated schools and pools is relevant to computer science education (CS Ed) the 2000s
- metaphorically, being "stuck in the shallow end" happens when you don't have access to the right skills and resources. This is just like CS Ed: it's not a lack of interest, but a lack of resources.
- similar to swimming, insecurity is a huge barrier to entry for CS. These insecurities stem from historical elements of society.
- expectations are built from existing societal structures - similar to the floating line that separates the shallow and deep ends of a pool, students may feel like an invisible expectation prevents them from doing CS.
- in both cases, we often don't solve the true, structural problem - often ignoring it or applying a bandaid fix.
When discussing barriers to participation in CS, how does identity come up?
- the book explicitly discusses the stereotype that computer scientists are white or Asian (or Indian); this pressure feels real!
- similar identity issues around being a "math-oriented", "STEM-oriented", or "writing-oriented" person.
- computer scientists are perceived as technical, concise speakers, and good at math - even though CS is much more than that!
- a relationship between "these groups are good at math" to "these groups are good at computing"
- core underlying issue: representation. Not seeing someone who looks like you in computer science impacts if you'd engage yourself.
- as a class, we then tried to name famous swimmers (Michael Phelps, Katie Ledecky, and Ryan Lochte) - who were all white.
- as a class, we then tried to name famous computer scientists (Bill Gates, Steve Jobs, Tim Cook, Paul Allen, Ada Lovelace, Alan Turing, Richard Feynman, "the people from Hidden Figures"). Matt commented that typically, people are unable to name women, and that none of these people are asian - even though that's a heavy stereotype of software engineers.
- "Asian" isn't a monolithic categorization, and depends on what you count as Asia (e.g. is Russia in Asia? the Middle East?). Participation and inclusion is very different for East Asians, Southeast Asians, and further breakdowns.
Did the CS education experience in the LAUSD reflect yours? Did you take CS in high school - why or why not?
Note from Matt: going to mostly omit the specific points here since they're personally identifying. Instead, these were just the broad-strokes themes - not a play-by-play of the conversation.
- large difference in expected graduation rates (~50% versus always 100%)
- in a different state, had single-digit Latino/a representation - "almost too few people to discriminate against".
- connection to generational expectations in the book. Some students grew up in engineering centers where everybody had at least one parent as an engineer: so, for them and their peers, tech was highly valued, and going to college was highly valued - the expectation was that you'd do it!
- stark differences for many students who moved during their childhood, with different emphasis placed on technological education, "traditional" education, and sports. [note from Matt: this was a big chunk of the conversation, but it's hard to do this without leaking info about each student!]
- in IB schools, taking computer science is awkward with the rest of the curriculum.
- in affluent areas in Washington, some schools had many, many CS classes (including cybersecurity and robotics)! Perhaps influenced by the demographics of the parents, more of which work in tech. Yet, some of our students still didn't take them!
What details from the book are the same now (~ 20 years later) and what are different? How does this tie into the problems facing computing education?
- book mentions "technologically rich, educationally poor". The government has funded programs, but have relied on (new) organizations like Code.org to build out courses and pedagogy.
- Scratch is much more popular and is frequently used to teach kids programming!
- the field of computer science has skyrocketed (especially with social media), with a bigger emphasis on it being a great career, having prestige, and being put on a pedestal
- everything is now digital: menus, QR Codes, even Disneyland! But, access to technology has not improved proportionally, which widens the technological gap.
- not having computer skills puts a harsher limit on your socioeconomic status.
- separate from access, teachers are now more overworked, burned out, and (sometimes) hate their jobs! Giving them more technology will not solve the problem by itself - broader reform is needed.
- accessibility has two different meanings in computer science, but both are related. Connection to CSE 121 reflection TED talk on "CS Unplugged": how to teach CS on pen and paper.
How would you define a "CS class"? Or, in other words, why is a class on how to use Word or Excel not a CS class?
- argument: Word is interacting with software, but CS is more about writing something out to accomplish a function/goal
- counterpoint: Excel does this too! You can write formulas that do most things Java methods can do (including 121 P3). Does this mean that Excel is the most used programming language in the world?
- in "Stuck in the Shallow End", there is a specific difference between computer literacy (using computers) and computer science (thinking about computers).
- everybody should learn computer literacy, it should be a mandatory part of K-12.
- but, computational thinking - and computer science - is not so different from other sciences (e.g. learning Java is like designing a lab, analyzing results - just with different tools). So, can think of this as adding another "science" (similar to biology or physics).
- computer science is not just programming - there's also an element of how computers work. For example, in AP CS Principles (AP CSP), you learn how wifi works!
- depending on how deep you go into Excel, it could be computer science? But, literacy is more about font sizes and solving "simple" problems. CSE 121's problems take hours to dive in to.
- CSE classes are about demystifying computers. Learning to use Excel is like learning to label pictures, while CS classes are like describing them in your own words and analyzing their meaning. CS is learning how these computing tools work and how you apply them in your lives.
Should we make computer science a requirement for graduating high school?
- if we do, we need to improve access to technology (via education reform) - you need a system that guarantees access to computers for all students.
- personally regret not taking CS, even though the high school had it. Avoided it because of the association that "only smart kids take it", and if it was required, could have dispelled that notion.
- completely agree: same experience in high school, and only realized after doing 121. While this doesn't "fix" the perception/insecurity issue fully, it helps quite a bit!
- had Scratch in middle school - this felt like a good balance, sicne it wasn't a heavy requirement but still left time for exploration.
- if we add another requirement, could delay graduation more. Some students already struggle to graduate because they can't pass their required classes - this would make it worse!
- earlier exposure in middle school (with simpler content and lower stakes) may be helpful.
- also, many bad experiences with high school CS teachers - perhaps mandatory requirements induce more demand for good teachers?
- or, we'd run out of good CS teachers - and we'd have a big problem!!
- requiring CS feels idealistic - would be great if we had infinite resources, but we don't.
- if the requirement is "just have a CS class", that's too broad! Especially since school curricula are often very, very specific items with learning objectives. What would these be?
- one model: don't require it, but let it fulfill a "bundle" like math or a science (instead of physics or biology), and make sure that it hits the same learning objectives.
- going back to "Stuck in the Shallow End": students didn't take CS then because they didn't know what it was. We have the same problem now, and requiring it could help fix this!
- if CS is optional, could result in less emphasis on making sure that all students can access a computer.
- feels related to "test-optional" applications, which slightly backfired (by exacerbating existing equity gaps).
Homework for Week 3
-
read/watch the required materials (each on a different accessible technology):
- article: Semantics to Screen Readers, Melanie Richards, A List Apart, 2019.
- article: Captions vs. Subtitles: Breaking Down the Differences, Jena Wallace, 3Play Media, 2023.
- video: Detecting and Defending Against Seizure-Inducing GIFs in Social Media, South, Saffo, and Borkin, ACM CHI '21, 2021.
- video: Introducing the Xbox Adaptive Controller, Xbox, 2018.
- article: Assistive Technologies - The Switch, Hampus Sethfors, Axess Lab, 2018.
- article: What makes writing more readable?, Monteleone, Brew, and McGhee, The Pudding, 2022.
-
read (at least) one of the optional readings:
- short & sweet (one short-ish article)
- article: Disabled people don't need so many fancy new gadgets. We just need more ramps., s.e. smith, Vox, 2019.
- one-sentence pitch: engineers often jump to "there's a problem? technology will fix it!" - resulting in technologies that don't actually help disabled people.
- back & forth (two opposing viewpoints)
- article 1: How generative AI tools like ChatGPT can revolutionize web accessibility, Ran Ronen, VentureBeat, 2023.
- article 2: No, 'AI' Will Not Fix Accessibility, Adrian Roselli, 2023.
- one-sentence pitch: while we haven't yet talked about AI in-depth, the intersection with accessibility is such a hot button topic - I couldn't not include it!
- rabbit hole (a set of short articles meant to spark a deep dive)
- article 1: What are legal issues associated with the design of accessible software?, UW DO-IT, 2022.
- article 2: The ADA was a victory for the disabled community, but we need more. My life shows why, Shruti Rajkumar, NPR, 2022.
- article 3: Driving Forward the ADA for Digital Inclusion, Sarah Malaier, American Foundation for the Blind, 2023.
- one-sentence pitch: learn about the ADA, one of the central laws in the US on disability rights, and reflect on how it works (and doesn't work) with technology.
- bonus (this is a long, long article): #accessiBe Will Get You Sued, Adrian Roselli, 2020.
- answer the reflection on Google Forms; this is around five short answer questions.
Week 3: Accessibility & Disability
Before class: complete homework for week 3.
Broadly speaking, this class will focus on the interplay between accessibility and technology. This will cover both technology that includes and technology that excludes, whether it be intentional or not. We'll also discuss how students can begin to build accessible software (and communities).
Week 3 Summary
Here is an updated summary of what we discussed in class!
First, quick logistics:
- after all the votes are in, students have voted on having a careers panel (instead of a research panel) and a structured culminating reflection
- the course schedule to reflect this; one consequence of this is that it changes the schedule (since we no longer have a "museum walk" last day). Instead, there will be an extra student's choice session!
We then discussed the accessible technologies in the required readings.
Did anything surprise you from the accessible technologies mentioned in the reading?
- the Xbox Adaptive Controller: did not grow up with video game consoles, and didn't think about the implications this had on disabled gamers.
- had not heard of the difference between subtitles and captions before the reading, but definitely makes sense in retrospect.
- shocked to hear that people deliberately give others seizures via flashing images, and sad that research is needed to defend against this.
- reading ability was a surprising but important one. It was jarring to see how out-of-date school standards can be.
- the adaptive controller had many different use-cases depending on the relevant disability (e.g. pressing the buttons with many different body parts), and that seems related to broader ideas when designing for accessibility.
- was familiar with the idea of switches, but had not realized that breath-controlled devices were switches (and is curious to learn more)!
Let's zero in on readability. This one is divisive, especially since some argue that reading comprehension is a core part of learning English. More broadly speaking (i.e. outside the lens of disability), good readability is just a good goal to aim for - more people understand what you're writing!. What do you think?
- short, concise text can be very helpful to those who have ADHD or other attention deficit disabilities.
- outside of disability, it's very helpful to those who don't speak English as a first language - e.g. in helping grandparents understand documents.
- important for things like taxes and bank statements to be readable by others in simple language. But, this is different from academia, where more nomenclature might be needed.
- similar to other ADA standards (e.g. all buildings need to be built to specific codes), it would be great if all government documents, bank statements, etc. were readable.
- also similar to conversations around absurdly long terms & conditions for apps (you shouldn't need a law degree to understand the terms & conditions)!
- the argument of "this is just learning English" isn't effective - there's a difference between literature analysis and finishing day-to-day tasks.
Matt then did a live screenreader demo (using VoiceOver on macOS). We didn't record this, but for a first-order approximation, this video from Google's "Chrome for Developers" channel titled "Screen Reader Basics: NVDA" is a first-order approximation that uses NVDA, an open-source screenreader for Windows.
Switching gears, what is a disability dongle?
- a product that designers make that try to solve a problem that may not actually exist, not correctly solve the problem, or overlook elements of that disability.
- when it's created, it often doesn't take into account the experiences of those who face those issues.
- learning about user research in INFO 200 right now! However, one common pitfall is when designers come in with preconceptions on how users would use their work, and only really ask questions based on the designer's own experience (rather than the user) - similar to this!
- feels like those creating disability dongles are trying to compensate for guilt/pity towards disabled people - instead of properly putting themselves in other people's shoes, and not getting the input from actual users.
- favorite example was the stair-climbing wheelchair: why not install a ramp (instead of charging people thousands of dollars)!
- instead of making something accessible in the first place, disability dongles are almost an "add-on" (that you have to pay for), which is not useful!
- real dongles are converters which exist when hardware doesn't have the necessary plug. Instead, we should try to standardize things so that a dongle isn't necessary.
- as an aside, nobody enjoys using dongles!!
What advice would you give to engineers (or engineering students) to avoid making disability dongles?
- when solving a problem, figure out if there is a systematic solution or not
- one example: "smart" glasses for deaf people that display captions on the lenses sound interesting. But, this puts the burden on deaf people. Instead, systems like closed captions or sign translators fix these systemic issues, rather than forcing deaf people to help themselves.
- currently taking HCDE 315 (inclusive design) which focuses on disability as a mismatch with how the world is designed. It's the world that we should change!
- too easy to slip into "let's build technology" and build something cool, rather than actually helping people.
After reading, what are our thoughts on AI and accessibility? Will it revolutionize accessibility, do nothing, or something in-between?
- AI feels too much like a buzzword - e.g. AI rice cookers?
- but, it's certainly helpful - e.g. with automatic captioning. However, it's still unreliable/inaccurate and may not solve the problem.
- in a way, feels like a dongle or a bandaid fix.
- going back to the plain text readability project: you could have a crowdsourced database that converts complicated text into plaintext. AI could do some of this, but it can miss a lot of nuance. Useful as a tool, but not the solution.
- what are we letting AI learn off of? Machine learning could be learning data from exclusionary sources or replicate biases.
- and, generative AI tools are still inaccurate!
In some cases, AI tools may be the "only option". In these cases, should we use it? Is it fair for us to make this judgement?
- reasonable as an only option - better than nothing. But, not a permanent solution.
- feels like a constant "imperfect solution with issues" versus "is this perpetuating the problem" debate.
- AI is not at a place where we can trust it to perform important jobs, but AI is in a "beta" phase - and we need to collect feedback!
- throwing AI at things without monitoring its use "feels weird", e.g. image-generating AIs could be trained on other images generated by AI, creating nightmare fuel
- can't discount how big of an impact this could be: imagine if someone could select and copy-paste a paragraph into ChatGPT, and ask "summarize this concisely" or "translate this into another language" - that's a huge deal!
- need to be careful of utilitarian thinking - can be easy to fall down the rabbit hole of "if it doesn't work for everybody, we can't use it". Okay to use things even if there are some downsides.
- we should help develop what people are using - so if disabled people are using ChatGPT, we should work to make that better!
What are potential solutions to the problems we discussed?
In-person, we discussed:
- implementing some legislation forcing websites to be accessible (e.g. requiring all images to have captions)
- but, people could just write "image", or add some well-intentioned (but bad) alt text.
- what would you do about legacy websites?
- creating software or languages that make it easier to be accessible (e.g. describing images by default, making it easily navigable)
- related solution: can we feed pre-existing structure from things like markdown into a website?
- teaching accessibility somewhere. But, where? Last week, we saw there are many tough questions (what age, what program, should it be required)?
- should be similar to how we teach civil engineers or architects ADA guidelines - through accreditation. Why not add it as a checkmark for a CS degree?
- for K-12, we should teach it somewhere, but maybe not the technical details (are students building websites)?
- K-12 doesn't immediately solve the problem, since you need to convince many CEOs and VPs. How long would it take for kids now to get to these positions of power?
- need some sort of workforce education; could be training programs or through representation.
- when onboarding as a research assistant, had to do mandatory modules about Title IX, OSHA, etc. - could do something similar.
- but, many people don't pay attention to mandatory trainings, and those people are the most important people to teach! You need to teach people to care!
- as a resident assistant, have to go through many accessibility trainings (over a week). But, they're quite effective - in part because they weren't just tacked on.
- in addition, there were tangible outcomes - e.g. if a poster wasn't accessible, it would be denied.
- could require government agencies to have accessible websites through law, and incentivize private companies to make accessibility a priority through tax breaks.
I also asked us all to answer this question on paper as an "exit ticket". Here are a few of the answers that touch on ideas we didn't talk about in-person.
- on a personal level, taking more initiative to add alt-text and other screen-reader friendly items for individual creations.
- teach about accessibility from a very early age - so that thinking about how to make the world accessible is something we all think about, in everything that we do. Almost making it the default!
- avoid reinventing the wheel - instead, first ask if the design is needed!
- Wordplay!
Homework for Week 4
-
read/watch the required materials:
- article: Machine Bias, Angwin, Larson, Mattu, and Kirchner, ProPublica, 2016.
- one-sentence pitch: one of the "classic" articles on algorithmic bias, but not on a machine learning system!
- video: But what is a GPT? Visual intro to transformers, 3Blue1Brown, YouTube, 2024.
- one-sentence pitch: a math-y partial explanation of how GPTs work, from the best in the business of math visual explanations.
- note: I'm not expecting you to understand any of the math, but rather to just build some intuition. In addition, some of Grant's examples are provocative, and I'll ask you to comment on them in the reflection.
-
read (at least) one of the optional readings:
- short & sweet (one short-ish article)
- article: Is generative AI bad for the environment?, Kate Saenko, The Conversation, 2023.
- one-sentence pitch: a quick primer into a surprisingly hard-to-measure problem.
- back & forth (two opposing viewpoints)
- article 1: A.I. Poses 'Risk of Extinction,' Industry Leaders Warn, Kevin Roose, The New York Times, 2023.
- article 2: AI Causes Real Harm. Let's Focus on That over the End-of-Humanity Hype, Alex Hanna, Emily Bender, Scientific American, 2023.
- one-sentence pitch: "existential risk" is an attention-grabbing headline - but does it deserve this level of discussion?
- rabbit hole (a set of short articles meant to spark a deep dive)
- article 1: We read the paper that forced Timnit Gebru out of Google. Here's what it says., Karen Hao, MIT Technology Review, 2020.
- article 1.5 (key context): About Google's approach to research publication, Jeff Dean, 2020.
- article 2: 'There was all sorts of toxic behaviour': Timnit Gebru on her sacking by Google, AI's dangers and big tech's biases , John Harris, The Guardian 2023.
- one-sentence pitch: Timnit Gebru's controversial departure (resignation? firing?) from Google touches on many key themes in this class, from core AI questions on ethics, interpretability, bias, and environmental impacts, to reflections on tech culture. There's also an unlikely cameo from folks at UW.
- answer the reflection on Google Forms.
Week 4: Machine Learning & AI
Before class: complete homework for week 4.
Broadly speaking, this class will be a broad-strokes overview of machine learning with an emphasis on its societal impacts. While one big focus will be algorithmic bias and fairness, we'll also touch on many other issues (such as interpretability, provenance, labor inputs, and environmental impacts).
Week 4 Summary
We spent the first 20 minutes talking about the questions that you had from the reflections; check out the "Answering your questions" section for more!
We then mostly talked about three topics: embeddings, debiasing humans versus algorithms, and explainability. We closed with a short conversation on the environment.
What did you think about the "word embeddings" from the video? What questions do you still have?
- The video uses a very simple example (3 dimensions instead of 120000), so it's not clear if computers are actually subtracting gender to assume that things are feminine or masculine.
- Using gender could lead to the wrong context: for example, "king - man = queen" ignores that queen doesn't always refer to royalty.
- This adding/subtracting directions feels a bit binary.
- You could have issues where historical biases creep in: e.g. maybe "president" is assigned a male connotation since the U.S. has not had a female president, so "subtracting" female could give you vice-president or first lady?
- In some languages (e.g. French or Spanish), nouns themselves have genders (and this is a core part of the grammar). How does this work?
What do you think about the argument "algorithms are easier to debias than humans"? Should we use AI in areas like policing (where bias is very, very prevalent and obvious - in both humans and algorithms)?
- Algorithms & humans are actually quite similar: these algorithms are human-made. So, if the person who makes it has a subconscious bias, than the algorithm will too.
- Throughout U.S. history, policing has always been racist - so we can't just "fix the data" when all of the data is the problem. Even if we remove explicit categories (e.g. "race"), there are many proxies (e.g. see redlining and low-income housing).
- Algorithms are controlled, tailored, and have lots of nuance; they can be more explainable than humans (since you can't pry open a human's brain). So, a more realistic solution could be fixing our current algorithms.
- Change in human behaviour is very hard, especially since biases can be integrated into your environment (e.g. parents, school, where you're from). Also noticed that the locations discussed in the ProPublica article are more likely to be historically racist.
- The questions that COMPAS asks are inherently biased - is it even possible to tweak questions like "have your parents gone to jail?" to not have racial undertones?
- You could try to weigh algorithsm differently, but it's a delicate balance. For example, Google's Gemini tool went way too far in the opposite direction. And swings in either direction have big impacts on people's lives.
- Fundamental problem is that humans are prejudiced and we need to fix that within humans. But, since governments and companies are going to use these tools, we need to fix them anyways.
- In order to debias algorithms, you need completely unbiased data and people. But that's not possible - this is all systemic!
- Slight disagreement with previous point - do you need unbiased people to recognize and fix bias in data? We're all biased, and yet can see issues with current systems.
- Would like to see AI with a human in the loop - AI cannot have historical context or understand why data has different outcomes, but a human could!
- Related to our education discussion: colleges have essays because we don't think just an SAT/ACT score fully tells someone's story. Algorithms can't truly understand language, systemic oppression, or context!
COMPAS is an "explainable" algorithm. But, for most of machine learning and large language models, we can't explain why the model comes to a certain decision. But, their accuracy seems to be really good. Should we use these systems?
- In everday life, we use lots of things we don't understand (like our brains or our phones). This should be no different! (but of course, like the brain, we should still try to understand it.)
- What's most important is if they help us - and in this case, these do, so we should use them.
- Tying back to computing education - those who understand computer science (and LLMs) have much more power and can make these decisions on behalf of many other people. So, we need to think about who has this knowledge (and what biases that reflects).
- If you understand that ChatGPT is a black box and can make mistakes (and is not an infallible god), it would be good to use. However, not sure if this is how people really view it.
- There is some accountability - e.g. the recent "Lying AI Chatbot" case with Air Canada.
What are the environmental impacts of LLMs? Who pays for them?
- missing from the article: the physical cost of making the computers, like raw minerals and labour
- many of the raw minerals come from conflict zones (e.g. cobalt in the DRC)
- land use! (to host the data centers and servers)
- people who do not use these technologies still have to pay for them - e.g. people in the DRC may not even be using ChatGPT, and are being disprivileged of their own land and resources.
- climate change is regressive (i.e. marginalized communites get affected more). For example, unhoused people are disproportionately affected by climate change.
- also, climate change affects everyone! literally, everyone!
Answering your questions (from the reflections)!
How big are the datasets they use for these, and how do they get them?
These datasets are gigantic, within a few orders of magnitude of "the entire internet". For proprietary models (like GPT-4), it's hard to get an exact number (since OpenAI has chosen not to disclose this). For GPT-3 (its precursor), OpenAI's 2020 paper says roughly 500 billion tokens from the Common Crawl dataset, various public domain books, and Wikipedia. OLMo, an open-source LLM developed in part by people at UW, uses the 3 trillion token dataset called Dolma.
Many of these datasets combine archives of written books and content produced on the internet (often gathered through "web scraping"). The internet is famously toxic, and many AI models can replicate this toxicity - Microsoft's Tay was an infamous example that was shut down within 24 hours of being released. One of the most famous papers studying this phenomena (RealToxicityPrompts) is from CSE professors at UW!
How does the data source affect bias, and how can you avoid this?
Data sources certainly create biases in machine learning models! "Bias" can mean all sorts of things - from racism, sexism, and ableism to working better for certain human languages or opinions the model may repeat. It's hard to summarize the history of bias in ML in one paragraph (or even one article), but Joy Buolamwini's TED Talk is a common entrypoint.
Avoiding bias is very challenging, and broadly speaking is an open problem in machine learning (and in computer science). To summarize the lay of the land: "just get unbiased data" is typically not feasible, and may not itself resolve the problem. Many researchers work in a subfield of AI called fair machine learning (and related ideas, such as "responsible AI"). Some approaches are purely technical (can we quantify "fairness"? can we then optimize for this metric), while others focus on transparency (e.g. "Model Cards"), representation, or regulation. Timnit Gebru (from the rabbit hole readings) is one of the leaders in this field!
How do machine learning models deal with data they haven't seen before?
Long story, short, "their best"! Like the video discusses, many of these models are based on probabilistic reasoning. Typically, models will still pick the most likely option (e.g. most likely token) rather than explicitly fail, and then continue chuggling along. Among other things, this explains how language models can output gibberish (if you give it inputs of things that don't exist)!
What physical structures are necessary to make machine learning work? Where (physically) does this happen?
Great question! Generally speaking, there are two broad steps in machine learning (that involve different physical structures).
The first is "training", where the model tries to figure out the best weights (the linear algebra mentioned in the 3Blue1Brown video). Practically speaking, training is a bunch of math (in particular: matrix multiplication and some calculus + probability calculations). This is done on specialized computer chips that are really, really good at doing math; the most common example is a "Graphics Processing Unit" (GPU), which is particularly good at doing matrix multiplication (the fundamental operation for much of computer graphics and games). The exact data is not available for proprietary models (though Sam Altman has claimed that training GPT-4 was at least $100 million). Using OLMo as a proxy again, they trained their model twice: once on 1024 AMD MI250Xs (10k per chip), and once on 216 Nvidia A100s (very hard to actually buy, but also ~ 10k). Competition for buying graphics cards is fierce - this is one of the biggest "moats" established ML players have.
OLMo also publishes power consumption and carbon footprint estimates - they say that training used 239 MWh of energy, which is about the amount of energy generated by all hydroelectric dams in the Northwest United States (thanks Nathan Brunelle for finding this link!). They estimate training used about 70 tonnes of CO2, which is about 150000 miles of driving.
The second step is deploying the model - or in other words, letting people use it. Generally, this is done with servers across the country (and the world!) - every time a website is visited, a complicated set of algorithms figure out how to direct that query to a specific computer, that then tokenizes the input, does the matrix math, and then sends it back to the user. Cloud providers (such as Amazon Web Services, or AWS) own hundreds of datacenters that serve this exact purpose (and make boatloads of money). One of the most famous AWS regional datacenters is us-west-2, which is right beside us in Oregon! There is no explicitly public data on how many of these servers AWS owns, but almost every estimate puts it into the millions.
Do people who make ML models know how they work?
In short, not really (but it depends)! These models are so complicated (with billions of parameters), so it's not currently possible to truly "explain" what each of the individual pieces of the model are doing. This subfield of AI is broadly called "explainable" and "interpretable" AI, and is one of the most active fields of research.
However, researchers (and members of the public) can make broad explanations of how some parts of the model work - such as the explainer video that we just watched!
How does tokenization actually work?
In short, this is really challenging! Very briefly (and reductively): people used to use hard rules to split tokens up into nouns, verbs, adjectives, and other "rules" of grammar. However, these tended to be very manual, error-prone, and not scalable (especially to other languages). Modern approaches often blend some of this domain-specific knowledge with statistical tools that try to "guess" what tokens are based on a set of data (in other words ... more machine learning). A famous (very technical) tutorial that gets quite close to the bleeding edge is Andrej Karpathy's Let's build the GPT Tokenizer.
In their reflection, a student recommended InfiniteCraft, which is a fun game that shows some of this in action. Neal also has a ton of other fun visualizations on his website!
What is softmax and why is it used here?
I will mostly skip this question since it's not super relevant to our discussion, but long story short, it takes in a set of numbers, and "normalizes" them to all be between 0 and 1 (but keeping their relative "size" to each other). Why? In math, we typically define probabilities as being between 0 and 1 - so softmax lets us "shrink" large sets of numbers to be valid probabilities.
How can models deal with these absurd amounts of data? Would they not lose context, overfit, or not fit on a computer?
Long story short, models do lose context, overfit, and often cannot fit on cheap computers. So, lots of smart people work on this (and invent cool tricks to make it work). One of the most relevant ones (that you'll explore yourself in CSE 123) is "compression" - basically, treating the input from the user, weights, and other "big" things in the language model as things that we can put into a zip file and make smaller. If you stick around in 123, you'll learn about one of the classic ways of doing this, Huffman coding; modern linear algebra techniques like sparse matrix computation are extremely helpful.
How is AI development protected and/or regulated? Who is it regulated by?
Generally speaking, it isn't. Some existing laws that apply to technology (like intellectual property and copyright) may apply, but courts are in the process of deciding that just right now! The closest thing we have to American legislation is an executive order from the Biden administration from October 2023, but it's not yet clear how this will be enforced.
What is the cost of one query to ChatGPT (for OpenAI or the environment)?
This was a great question! Unfortunately, after some digging, I couldn't find a reliable first-party source (many articles claim to know the answer, but looking through the citations left me unsatisfied).
Some things I could verify were:
- as of April 2024, OpenAI charges $30 for 1M input tokens and $60 for 1M output tokens via its GPT-4 API. The token:word ratio seems to be 4:3, so if you estimate that a query is about 50 words and a response is 250, you get a napkin math pricing of 1.4 cents per query.
- BLOOM, a comparable LLM, is estimated to generate 1.5g of CO2 per query (per a paper written by members of the BLOOM team
- estimations are quite a bit more complicated because it's hard to factor in upfront costs from purchasing hardware and needing to repair it. There are many more estimates with "amortized" costs; the controversial Nature paper The carbon emissions of writing and illustrating are lower for AI than for humans may be of interest!
Your suggestions on what's "missing" from the legal, policy, and education discussons on AI!
Homework for Week 5
-
read/watch the required materials:
- video: Computer Security and the Internet of Things, Tadayoshi Kohno, USENIX Enigma, YouTube, 2016.
- one-sentence pitch: hear from a UW professor (who teaches security!) on some of the pragmatic research that folks do here on security :)
- video: Protecting Privacy with MATH, Minute Physics, YouTube, 2020.
- one-sentence pitch: a slightly different angle on privacy than the typical popular science topics on surveillance and passwords!
- article: The State of Consumer Data Privacy Laws in the US (And Why It Matters), Thorin Klosowski, The New York Times, 2021.
- one-sentence pitch: a quick overview of where we are with data privacy in the US!
-
read (at least) one of the optional readings:
- short & sweet (one short-ish article)
- article: Most Americans support right to have some personal info removed from online searches, Brooke Auxier, Pew Research, 2020.
- one-sentence pitch: what kinds of personal information should be removed from online searches?
- back & forth (two opposing viewpoints)
- article 1: Ban social media for kids? Fed-up parents in Senate say yes, Mary Clare Jalonick, The New York Times, 2023.
- article 2: The Protecting Kids on Social Media Act is A Terrible Alternative to KOSA, Jason Kelley, Sophia Cope, Electronic Frontier Foundation, 2023.
- one-sentence pitch: a controversial topic that spans privacy, security, and policy - pitting an uncommon bipartisan coalition against the EFF and ACLU!
- rabbit hole (a set of short articles meant to spark a deep dive)
- article 1: G.D.P.R., a New Privacy Law, Makes Europe World's Leading Tech Watchdog, Adam Satariano, The New York Times, 2018.
- article 2: Europe's Privacy Law Hasn't Shown Its Teeth, Frustrating Advocates, Adam Satariano, The New York Times, 2020.
- article 3: Meta Fined $1.3 Billion for Violating E.U. Data Privacy Rules, Adam Satariano, The New York Times, 2023.
- one-sentence pitch: the GDPR is one of the most impactful pieces of data privacy legislation, well, ever; here's a rare chance of seeing its evolution from the perspective of the same author (over the past half-decade)!
- answer the reflection on Google Forms (and vote on the student choice topics).
Week 5: Privacy & Security
Before class: complete homework for week 5.
Broadly speaking, this class will focus on data privacy and security in computing systems. We'll briefly touch on some interesting technical ideas, but mostly focus on many, many different case studies (with different stakeholders and harms). More coming soon!
Week 5 Sketch
- we'll try a sticky note activity about the kinds of information we're willing to have available to schools, tech companies, and the public!
- we'll talk about public trust, math, and computer science - and what that means for us as students!
- we'll talk about privacy and safety regulations, and talk about KOSA + related laws.
Week 6: Student's Choice #1
We'll vote on the topic to cover this week. Potential topics include: physical computing and hardware, user interfaces, a partial history of computing, global computing, open-source software, or "theoretical" computer science. Stay tuned!
Week 7: Student's Choice #2
See above.
Week 8: Student's Choice #3
See above.
Week 9: Panel & What's Next
We'll have a one-hour panel from folks who work in some tech-adjacent job (i.e., they would have taken CSE 121 and 122), but with a breadth of discipline and job type. Come ready with questions :)
If time permits, we'll have a closing conversation on the entire class as a whole, reflect on the goals we set in our first week of class, and talk about "what's next" in our CSE journey!
Community Norms
We initially drafted these community norms in our first lecture session, though they may evolve over time (and we might revisit them in the future)! Last updated: 04/02.
- respect others and respect others' opinions, even if you disagree with them
- listen to other people, don't just hear them
- try to engage in good faith conversations, and have constructive discussions and criticism
- use inviting, encouraging, and non-confrontational language
- minimize background noise when people are speaking
- respect the speaker
- ensure that all can hear
- active listening & eye contact
- complete the readings and speak purposefully in discussions
- try not to force others into discussion when they aren't ready
- avoid uneducated judgement
- allow for anonymous posts to protect individual's privacy
- "be nice :)"
In addition, Matt promises to:
- always finish the class on time
- do his best to upload summaries and content to the course website quickly
- respond to any communication (e.g. email, Canvas) quickly
Course Policies
Credit
This is a 1-credit, discussion-based course. To earn credit for this course, you need to complete 7 weeks of discussion activities and the culminating activity.
To complete a weekly discussion activity, you need to:
- do the assigned reading
- do any assigned activities (requires some effort for completion)
- attend the discussion for that week.
If you finish all of the above tasks for any given week, it's considered completed.
Our class will meet for 9 weeks in the quarter. This means that students can still miss up to 2 discussion activities and receive credit for the class. Details about the culminating activity will be posted towards the end of the quarter.
Readings and activities for this class are not intended to take up a significant portion of time. The focus of this class is to start conversations and reflections on computer science and its impacts on the world around us - not understanding of the material. If you have concerns about the workload for this class, we strongly encourage you reach out to the instructors to discuss.
Disability and Accessibility
All students deserve an equitable opportunity to education, regardless of if they have a temporary health condition or permanent disability. This applies to both CSE 390HA and your broader academic experience at UW. If there are ways that we can better accomodate you, please let us know.
We are happy to work with you directly or through Disability Resources for Students (DRS) to make sure that this class meets your needs. If you have not yet established services through DRS, we encourage you to contact DRS directly at uwdrs@uw.edu. DRS offers a wide range of services that support students with individualized plans while simultaneously removing the need to reveal sensitive medical information to course staff. However, these processes can take time - so we encourage students to start this process as soon as possible to avoid delays.
Religious Accomodations
Washington state law requires that UW develop a policy for accommodation of student absences or significant hardship due to reasons of faith or conscience, or for organized religious activities. The UW's policy, including more information about how to request an accommodation, is available at the registrar's page on the Religious Accomodations Policy. Accommodations must be requested within the first two weeks of this course using the Religious Accommodations Request form.
Academic Honesty and Collaboration
Broadly speaking, the philosophy and policy for academic honesty and collaboration in this class mirrors the CSE 122 Academic Honesty and Collaboration policies. In particular, all work that you submit for grading in this course must be predominantly and substantially your own. In particular, quoting from the CSE 122 syllabus:
Predominantly means that the vast majority of the work you submit on any given assignment must be your own. Submitting work that includes many components that are not your own work is a violation of this clause, no matter how small or unimportant the pieces that are not your work may be.
Substantially means that the most important parts of the work you submit on any given assignment must be your own. Submitting work that includes major components that are not your own work is a violation of this clause, no matter how little work that is not your own you include.
In this class, this primarily applies to the culminating activity and weekly discussion activities that involve submitting an artifact (e.g. a short answer response to a question). Allowed behaviours under this policy include discussing the question and answers with others or using search engines and generative AI to explore more information on the topic. Prohibited behaviours under this policy are primarily related to copying work written by others, where "others" can be other students in the class, other people in general, or generative AI tools.
You are welcome (and in fact, encouraged) to draw on outside sources when creating your artifacts. In situations like these, we simply ask that you cite these sources. The exact format (e.g. MLA or APA) is not important, as long as it is clear which works are cited and how they have influenced your own work.
Acknowledgements
Many folks have helped shape the overall direction of this course through direct and indirect advice, conversations, and support. Thanks to Miya Natsuhara, Brett Wortzman, Elba Garza, Lauren Bricker, Nathan Brunelle, and Kevin Lin.
Much of the accessibility module is inspired by the 2023 autumn offering of CSE 493E: Accessibility, taught by Jennifer Mankoff. This includes some of the readings (Richards, Sethfors, South et al., Monteleone et al., s.e.smith) and the overall framing of the conversation. Thank you Jen!