CSE 163, Winter 2020: Intermediate Data Programming

🎉 Welcome to CSE163! 🎉

What is this class? What will I learn?

The world has become data-driven. Domain scientists and industry increasingly rely on data analysis to drive innovation and discovery; this reliance on data is not only restricted to science or business, but also is crucial to those in government, public policy, and those wanting to be informed citizens. As the size of data continues to grow, everyone will need to use powerful tools to work with that data.

This course teaches intermediate data programming. It is a follow on to CSE142 (Computer programming I) or CSE160 (Data Programming).

The course complements CSE143, which focuses more deeply on fundamental programming concepts and the internals of data structures. In contrast, CSE163 emphasizes the efficient use of those concepts for data programming.

In this course, students will learn:

  1. More advanced programming concepts than in CSE142 or CSE160 including how to write bigger programs with multiple classes and modules.
  2. How to work with different types of data: tabular, text, images, geo-spatial.
  3. Ecosystem of data science tools including Jupyter Notebook and various data science libraries including scikit image, scikit learn, and Pandas data frames.
  4. Basic concepts related to code complexity, efficiency of different types of data structures, and memory management.

Prequisites and Expectations

This is class is designed as the second introductory programming course that focusing on writing programs that work with data. The prerequisites for the class require students having taken CSE 142 or CSE 160 and the class has been designed to be accessible to students from either of those backgrounds. Students that have taken 143 are welcome to take this class as it will serve as a compliment to the material learned in 143 with only minor overlap.

Because this course will have students coming from many difference class backgrounds, the first couple weeks will be pretty different for students depending on what classes they have taken. Here is what we expect students to see in the first weeks based on their background:

  • 142: The first two weeks might go pretty fast, but will be doable since you already know all the concepts (loops, conditionals, methods) and you are just learning all the new "words" in Python to use those concepts. This might require a little bit of extra practice early in the quarter so you are familiar translating all the ideas you have learned in 142 to this new language. The first week has been designed to be a recap of all things 142 so you don't also have to be learning a ton of new material while learning a new language in the first week.
  • 160: The first week will just be a bit of review for you, but the class will start covering material you haven't seen before starting in the second week.
  • 143: You are in a similar boat as the 142 students, where you know a lot of the concepts but don't know the Python language. You'll probably see a few things that you saw in 143 in this class, but I think the new context of processing data in a new language will still keep it new, exciting, and challenging.

If you want to learn more about the policies and structure for this class, please check the course syllabus

Calendar

Note: This is a rough sketch of the quarter that is likely to change. We can accurately predict the past, but predicting the future is hard!

Day
Topic
Materials
References
Assignments
Week 1: Intro/Review Python
Lecture 1
(Mon, Jan 6)
Class Introduction
Intro/Review to Python
Lecture 2
(Wed, Jan 8)
Loops, Conditionals, Functions, Strings
Section 1
(Thu, Jan 9)
Python Review
Lecture 3
(Fri, Jan 10)
Lists, Files, HW1
Week 2: Data Structures, CSVs, Pandas
Lecture 4
(Mon, Jan 13)
More Lists, Sets, Dictionaries, Tuples
  • Practice : Ed
Lecture 5
(Wed, Jan 15)
Advanced Data Structures, CSVs, Intro to Pandas
Section 2
(Thu, Jan 16)
Pandas Practice
Lecture 6
(Fri, Jan 17)
More Pandas
Week 3: Data Science Libraries
Lecture 7
(Mon, Jan 20)
Holiday 🏖
  • No Reading Due
Lecture 8
(Wed, Jan 22)
Data Visualization
Section 3
(Thu, Jan 23)
More Pandas + Data Visualization
Lecture 9
(Fri, Jan 24)
Machine Learning
Week 4: Classes, Modules, Text Data
Lecture 10
(Mon, Jan 27)
Introduction to Classes / Objects
Lecture 11
(Wed, Jan 29)
More classes, Modules, Packages
Section 4
(Thu, Jan 30)
Classes/Modules
Lecture 12
(Fri, Jan 31)
HW4 Introduction
Week 5: Efficiency: Time and Space
Lecture 13
(Mon, Feb 3)
Algorithmic Efficiency
Lecture 14
(Wed, Feb 5)
Profiling Code + Performance
Section 5
(Thu, Feb 6)
TA's Choice / Open Office Hours
Lecture 15
(Fri, Feb 7)
Memory Management
  • No Reading Due
Week 6
Lecture 16
(Mon, Feb 10)
Hashing
Lecture 17
(Wed, Feb 12)
Exam Review
Section 6
(Thu, Feb 13)
Exam Review
Lecture 18
(Fri, Feb 14)
Exam 1
  • No Reading Due
Week 7: Geospatial Data
Lecture 19
(Mon, Feb 17)
Holiday 🏖
  • No Reading Due
Lecture 20
(Wed, Feb 19)
Geospatial Data
Section 7
(Thu, Feb 20)
Geospatial Data
Lecture 21
(Fri, Feb 21)
Joins and Spacial Indices
Week 8: Images
Lecture 22
(Mon, Feb 24)
Numpy and Images
Lecture 23
(Wed, Feb 26)
Image Processing
Section 8
(Thu, Feb 27)
Images and Numpy
Lecture 24
(Fri, Feb 28)
Machine Learning: Images
Week 9: Miscellaneous Topics
Lecture 25
(Mon, March 2)
Ethics
Lecture 26
(Wed, March 4)
Exam Review
Section 9
(Thu, March 5)
Exam Review
  • Handout : pdf
  • Solutions : pdf
Lecture 27
(Fri, March 6)
Distributed Computing
Week 10: Course Wrap Up
Lecture 28
(Mon, March 9)
Exam 2
Lecture 29
(Wed, March 11)
Victory Lap + Next Steps
Section 10
(Thu, March 12)
Office Hours
Lecture 30
(Fri, March 13)
TBD
Finals Week
Final Slot
(Thu, March 19)
Final Project Presentations
(2:30 pm - 4:20 pm)
  • Project Part 3 Due