In this part of the homework, you will write code to perform various analytical operations on data parsed from a file.
hw3.py
math
and pandas
modules, but you may not use any other imports to solve these problems.pandas
objects. The one exception is when a function asks you to return a Python list/dictionary and you need to convert from a pandas
object to a Python list/dictionary; if this is the case you are allowed to have one loop at the end of the function to build up the list/dictionary, but this loop should not have any real logic in it besides moving values from the pandas
structure to the list/dictionary. The goal of this part of the assignment is to use pandas
as a tool to help answer questions about your dataset.In your main method, parse the data from the CSV file using pandas. Note that the file uses '---'
as the entry to represent missing data. The function to read a CSV file in pandas takes a parameter called na_values
that takes a list of strings that specify NaN values in the file and will replace all occurrences of those characters with NaN. You should specify this parameter to make sure the data parses correctly.
completions_between_years
What are the percent of different degrees completed for a given year range and sex? Write a function completions_between_years
that takes as arguments a Pandas DataFrame
object, two year arguments, and a value for sex ('A', 'F', or 'M'). The function should return all rows of the data which match the given sex, and have data between the given years (inclusive for the start, exclusive for the end). If no data is found for the parameters, return None
.
For example, assuming we have parsed hw3-nces-ed-attainment.csv
and stored it in a variable called data
:
completions_between_years(data, 2007, 2008, 'F')
Year | Sex | Min degree | Total | White | Black | Hispanic | Asian | Pacific Islander | American Indian/Alaska Native | Two or more races | |
---|---|---|---|---|---|---|---|---|---|---|---|
152 | 2007 | F | high school | 89.1 | 94.2 | 87.9 | 70.7 | 98.5 | 86.0 | 90.2 | 87.9 |
168 | 2007 | F | associate's | 43.2 | 50.8 | 28.0 | 23.5 | 69.6 | 42.5 | 14.5 | 40.2 |
186 | 2007 | F | bachelor's | 33.0 | 39.2 | 20.0 | 15.4 | 62.5 | 32.1 | NaN | 29.6 |
202 | 2007 | F | master's | 7.6 | 9.4 | 3.7 | 2.6 | 17.7 | NaN | NaN | NaN |
compare_bachelors_1980
What were the percentages for women vs. men having earned a Bachelor's Degree in 1980? Call this method compare_bachelors_1980
and return the percentages as a tuple: (% for men, % for women)
.
For example, assuming we have parsed hw3-nces-ed-attainment.csv
and stored it in a variable called data
, compare_bachelors_1980(data)
will return (24.0, 21.0)
.
top_2_2000s
What were the two most commonly awarded levels of educational attainment awarded between 2000-2010 (inclusive)? Use the mean percent over the years to compare the education levels. Call this method top_2_2000s
and return a list of tuples as follows: [(#1 level, mean % of #1 level), (#2 level, mean % of #2 level)]
.
For example, assuming we have parsed hw3-nces-ed-attainment.csv
and stored it in a variable called data
, then top_2_2000s(data)
will return [('high school', 87.55714285714285), ("associate's", 38.75714285714286)]
. Our assert_equals
only checks that floating point numbers are within 0.001 of each other, so your floats do not have to match exactly.
percent_change_bachelors_2000s
What is the difference between total percent of bachelor's degrees received in 2000 as compared to 2010? Take a sex parameter so the client can specify 'M', 'F', or 'A' for evaluating. If a call does not specify the sex to evaluate, you should evaluate the percent change for all students (sex = ‘A’). Call this method percent_change_bachelors_2000s
and return the difference as a float.
For example, assuming we have parsed hw3-nces-ed-attainment.csv
and stored it in a variable called data
, then the call percent_change_bachelors_2000s(data)
will return 2.599999999999998
. Our assert_equals
only checks that floating point numbers are within 0.001 of each other, so your floats do not have to match exactly.