This assignment and its reflection are due by Thursday, January 30 at 11:59 pm.
You should submit your finished
hw3.py
,
and hw3-written.txt
on Ed and the reflection on Google Forms
You may submit your assignment as many times as you want before the late cutoff (remember submitting after the due date will cost late days). Recall on Ed, you submit by pressing the "Mark" button. You are welcome to develop the assignment on Ed or develop locally and then upload to Ed before marking.
In this assignment, you will apply what you've learned so far in a more extensive "real-world" dataset using more powerful features of the Pandas library. As in HW2, this dataset is provided in CSV format. We have cleaned up the data some, but you will need to handle more edge cases common to real-world datasets, including null cells to represent unknown information.
Note that there is no graded testing portion of this assignment. We still recommend writing tests to verify the correctness of the methods that you write in Part 0, but it will be difficult to write tests for Part 1 and 2. We've provided tips in those sections to help you gain confidence about the correctness of your solutions without writing formal test functions!
This assignment is supposed to introduce you to various parts of the data science process involving being able to answer questions about your data, how to visualize your data, and how to use your data to make predictions for new data. To help prepare for your final project, this assignment has been designed to be wide in scope so you can get practice with many different aspects of data analysis. While this assignment might look large because there are many parts, each individual part is relatively small.
After this homework, students will be able to:
Here are some baseline expectations we expect you to meet:
Follow the course collaboration policies
If you are developing on Ed, all the files are there. If you are developing locally, you should download the starter code hw3.zip and open it as the project in Visual Studio Code. The files included are:
hw3-nces-ed-attainment.csv
: A CSV file that contains data from the National Center for Education Statistics. This is described in more detail below.hw3.py
: The file for you to put solutions to Part 0, Part 1, and Part 2. You are required to add a main method that parses the provided dataset and calls all of the functions you are to write for this homework.hw3-written.txt
: The file for you to put your answers to the questions in Part 3.cse163_utils.py
: Provides utility functions for this assignment. You probably don't need to use anything inside this file except importing it if you have a Mac (see comment in hw3.py
)The dataset you will be processing comes from the National Center for Education Statistics. You can find the original dataset here. We have cleaned it a bit to make it easier to process in the context of this assignment. You must use our provided CSV file in this assignment.
The original dataset is titled: Percentage of persons 25 to 29 years old with selected levels of educational attainment, by race/ethnicity and sex: Selected years, 1920 through 2018. The cleaned version you will be working with has columns for Year, Sex, Educational Attainment, and race/ethnicity categories considered in the dataset. Note that not all columns will have data starting at 1920.
Our provided hw3-nces-ed-attainment.csv
looks like: (⋮ represents omitted rows):
Year | Sex | Min degree | Total | White | Black | Hispanic | Asian | Pacific Islander | American Indian/Alaska Native | Two or more races |
---|---|---|---|---|---|---|---|---|---|---|
1920 | A | high school | --- | 22.0 | 6.3 | --- | --- | --- | --- | --- |
1940 | A | high school | 38.1 | 41.2 | 12.3 | --- | --- | --- | --- | --- |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
2018 | F | master's | 10.7 | 12.6 | 6.2 | 3.8 | 29.9 | --- | --- | --- |
Part 4a: Submit Assignment and Part 4b: Complete Reflection. On Ed, you should submit:
hw3.py
hw3-written.txt
Your submission will be evaluated on the following dimensions:
hw3.py
uses the main method structure we've shown on previous assignments.
hw3.py
must call every one of the methods you implemented in this assignment. There are no requirements on the format of the output, besides that it should save the files for Part 1 with the proper names specified in Part 1.hw3.py
without it crashing or causing any errors.flake8
snake_case
), your code should not be overly redundant and should avoid unnecessary computations.