What is this course about?
Data analysis is a central activity for scientific research and is increasingly a critical part of decision making in government and business. However, producing reliable data analysis outcomes is challenging since the decisions made throughout the analysis process can dramatically affect the eventual outcome. This Data Science Capstone focuses on the complete end-to-end process of data analysis performed with code: the iterative, and often exploratory, steps that analysts go through to turn data into results. Our focus is not limited to statistical modeling or machine learning, but rather the complete process, including transformation, exploration, modeling, and evaluation choices.
Students will work in groups of four on a single project that will tie together and apply previous experiences from CSE 312, 332, 446, 442, 344, and other classes. Students are expected to already possess knowledge of appropriate machine learning, visualization and database methods, and will focus on independently applying those methods in the context of your project. There will therefore be limited lecture material in this course. Course staff will instead work closely with students to critique and advise on their group project. Students will experience the end-to-end data analysis process from transformation and exploration of data to modeling and evaluation. Your group will brainstorm on a project during the first week, before collaboratively exploring the data and implementing a complete data analysis workflow. This capstone course gives hands-on experience with selecting a data science question, and with crafting and evaluating a data science process to answer that question.
Students should have completed CSE 332 and CSE 312, and at least one of CSE 446, CSE 442, or CSE 344. There are no other requirements for participating in this capstone class.