Logistics

Instructor



Content

What is this course about?

Data analysis is a central activity for scientific research and is increasingly a critical part of decision making in government and business. However, producing reliable data analysis outcomes is challenging since the decisions made throughout the analysis process can dramatically affect the eventual outcome. This Data Science Capstone focuses on the complete end-to-end process of data analysis performed with code: the iterative, and often exploratory, steps that analysts go through to turn data into results. Our focus is not limited to statistical modeling or machine learning, but rather the complete process, including transformation, exploration, modeling, and evaluation choices.

Students will work in groups of four on a single project that will tie together and apply previous experiences from CSE 312, 332, 446, 442, 344, and other classes. Students are expected to already possess knowledge of appropriate machine learning, visualization and database methods, and will focus on independently applying those methods in the context of your project. There will therefore be limited lecture material in this course. Course staff will instead work closely with students to critique and advise on their group project. Students will experience the end-to-end data analysis process from transformation and exploration of data to modeling and evaluation. Your group will brainstorm on a project during the first week, before collaboratively exploring the data and implementing a complete data analysis workflow. This capstone course gives hands-on experience with selecting a data science question, and with crafting and evaluating a data science process to answer that question. question.

Prerequisites

Students should have completed CSE 332 and CSE 312, and at least one of CSE 446, CSE 442, or CSE 344. There are no other requirements for participating in this capstone class.


Schedule

Note: Lectures will be conducted via Zoom. Links posted on Canvas.

Lecture slides will be posted here shortly before each lecture.

This schedule is subject to change.

Date Today's Class Due at Midnight before Class (see Deliverables for details)
Tue Oct 6 Introduction, Project Pitches, and Group Assignment
[slides]
Mandatory Project Pitches
Tue Oct 13 Data Science Process and Objectives
[slides]
Project Plan &
Project Selection Reflection
Tue Oct 20 Data Science by Example
[slides]
Reflection on example data science paper
Tue Oct 27 Data Science at Scale
[slides] [demo live] [demo pdf]
Validity Reflection Presentation
Tue Nov 3 Communicating Data Science through Visualization
[slides] [ demo live]
Spark Word Count Assignment
Tue Nov 10 Midpoint Project Presentations and Feedback
Midpoint Presentation Video
Tue Nov 17 Data Science through Causal Inference I
[slides]
Midpoint Feedback Reflection and Action Plan
Tue Nov 24 Data Science through Causal Inference II
[slides] [live demo]
-
Tue Dec 1 Technical Writing for Data Science
[slides]
-
Tue Dec 8 Final Project Presentations and Feedback
Final Presentation Video
Sun Dec 13 Final Project Report &
Summary of Individual Contribution to Project &
Final Reflection