CSE 163, Spring 2019: Intermediate Data Programming

🎉 Welcome to the inaugural offering of CSE163! 🎉

What is this class? What will I learn?

The world has become data-driven. Domain scientists and industry increasingly rely on data analysis to drive innovation and discovery; this reliance on data is not only restricted to science or business, but also is crucial to those in government, public policy, and those wanting to be informed citizens. As the size of data continues to grow, everyone will need to use powerful tools to work with that data.

This course teaches intermediate data programming. It is a follow on to CSE142 (Computer programming I) or CSE160 (Data Programming).

The course complements CSE143, which focuses more deeply on fundamental programming concepts and the internals of data structures. In contrast, CSE163 emphasizes the efficient use of those concepts for data programming.

In this course, students will learn:

  1. More advanced programming concepts than in CSE142 or CSE160 including how to write bigger programs with multiple classes and modules.
  2. How to work with different types of data: tabular, text, images, geo-spatial.
  3. Ecosystem of data science tools including Jupyter Notebook and various data science libraries including scikit image, scikit learn, and Pandas data frames.
  4. Basic concepts related to code complexity, efficiency of different types of data structures, and memory management.

Prequisites and Expectations

This is class is designed as the second introductory programming course that focusing on writing programs that work with data. The prerequisites for the class require students having taken CSE 142 or CSE 160 and the class has been designed to be accessible to students from either of those backgrounds. Students that have taken 143 are welcome to take this class as it will serve as a compliment to the material learned in 143 with only minor overlap.

Because this course will have students coming from many difference class backgrounds, the first couple weeks will be pretty different for students depending on what classes they have taken. Here is what we expect students to see in the first weeks based on their background:

  • 142: The first two weeks might go pretty fast, but will be doable since you already know all the concepts (loops, conditionals, methods) and you are just learning all the new "words" in Python to use those concepts. This might require a little bit of extra practice early in the quarter so you are familiar translating all the ideas you have learned in 142 to this new language. The first week has been designed to be a recap of all things 142 so you don't also have to be learning a ton of new material while learning a new language in the first week.
  • 160: The first week will just be a bit of review for you, but the class will start covering material you haven't seen before starting in the second week.
  • 143: You are in a similar boat as the 142 students, where you know a lot of the concepts but don't know the Python language. You'll probably see a few things that you saw in 143 in this class, but I think the new context of processing data in a new language will still keep it new, exciting, and challenging.

If you want to learn more about the policies and structure for this class, please check the course syllabus

Calendar

Note: This is a rough sketch of the quarter that is likely to change. We can accurately predict the past, but predicting the future is hard!

Week
Topic
Materials
References
Assignments
Week 1: Intro/Review Python
Lecture 1
(Mon, April 1)
Class Introduction
Intro/Review to Python
Lecture 2
(Wed, April 3)
Loops, Conditionals, Functions, Strings
Section 1
(Thur, April 4)
Python Review
Lecture 3
(Fri, April 5)
Lists and Files
Week 2: Data Strutures, CSVs, Pandas
Lecture 4
(Mon, April 8)
More Lists, Sets, Dictionaries, Tuples
Lecture 5
(Wed, April 10)
Advanced Data Structures, CSVs, Intro to Pandas
Section 2
(Thur, April 11)
Pandas Practice
Lecture 6
(Fri, April 12)
More Pandas
Week 3: Data Science Libraries
Lecture 7
(Mon, April 15)
Missing Data & Time Series
Lecture 8
(Wed, April 17)
Data Visualization
Section 3
(Thur, April 18)
More Pandas + Data Visualization
Lecture 9
(Fri, April 19)
Machine Learning
Week 4: Classes, Modules, Text Data
Lecture 10
(Mon, April 22)
Introduction to Classes / Objects
Lecture 11
(Wed, April 24)
Modules, Packages, and Processing Text
Section 4
(Thur, April 25)
Classes/Modules and Processing Text
Lecture 12
(Fri, April 26)
Classes + HW6
Week 5: Efficiency: Time and Space
Lecture 13
(Mon, April 29)
Algorithmic Efficiency
Lecture 14
(Wed, May 1)
Profiling Code + Performane
Section 5
(Thur, May 2)
TA's Choice
Lecture 15
(Fri, May 3)
Memory Management
Week 6
Lecture 16
(Mon, May 6)
Hashing
Lecture 17
(Wed, May 8)
Exam Review
Section 6
(Thur, May 9)
Exam Review
Lecture 18
(Fri, May 10)
Exam 1
Week 7: Geospatial Data
Lecture 19
(Mon, May 13)
GeoSpatial Data / Geopandas
Lecture 20
(Wed, May 15)
Joins / Spatial Indices
Section 7
(Thur, May 16)
Spatial Joins and Spacial Indices
Lecture 21
(Fri, May 17)
Ethics
Week 8: Images
Lecture 22
(Mon, May 20)
Numpy
Lecture 23
(Wed, May 22)
Images
Section 8
(Thur, May 23)
Images and Numpy
Lecture 24
(Fri, May 24)
Machine Learning: Images
Week 9: Integrating Data
Lecture 25
(Mon, May 27)
No School
Lecture 26
(Wed, May 29)
Distributed Computing
Section 9
(Thur, May 30)
Exam Review
  • Handout : pdf
  • Solutions : pdf
Lecture 27
(Fri, May 31)
Exam Review
Week 10: Next Steps
Lecture 28
(Mon, June 3)
Exam 2
Lecture 29
(Wed, June 5)
Web Scraping
Section 10
(Thur, June 6)
Project Help
Lecture 30
(Fri, June 7)
Victory Lap + Next Steps
Finals Week
Final Exam Slot
(Tues, June 11)
Final Project Presentations
(2:30 pm - 4:20 pm)