CSE 163, Spring 2019: Homework 3: Part 1

Plotting with Seaborn

Next, you will write functions to generate data visualizations using the Seaborn library. For each of the functions save the generated graph with the specified name. These methods should only take the pandas DataFrame as a parameter and the generated graphs should omit any years that are missing the pertinent data for the problem.

Expectations

  • All functions for this part of the assignment should be written in hw3.py
  • For this part of the assignment, you may import the math, pandas, seaborn, and matplotlib modules, but you may not use any other imports to solve these problems.
  • For all of the problems below, you should not use ANY loops or list/dictionary comprehensions.

Development Strategy

As stated in the Overview, it is difficult to write tests for functions that create graphs. Instead, you can check the graphs manually. Some ways to gain confidence in your generated graph:

  • Print your filtered DataFrame before creating the graph to ensure you’re selecting the correct data.
  • Call the DataFrame describe() method to see some statistical information about the data you've selected. This can sometimes help you determine what to expect in your generated graph.
  • Re-read the problem statement to make sure your generated graph is answering the correct question.
  • Compare the data on your graph to the values in hw3-nces-ed-attainment.csv. For example, for problem a you could check that the generated line goes through the point (2005, 28.8) because of this row in the dataset: 2005,A,bachelor's,28.8,34.5,17.6,11.2,62.1,17.0,16.4,28.0

Seaborn Reference

Of all the libraries we will learn this quarter, Seaborn is by far the best documented. We want to give you experience reading real world documentation to learn how to use a library so we will not be providing a specialized cheat-sheet for this assignment. What we will do to make sure you don't have to look through pages and pages of documentation is link you to some key pages you might find helpful for this assignment; you do not have to use every page we link, so part of the challenge here is figuring out which of these pages you need. As a data scientist, a huge part of solving a problem is learning how to skim lots of documentation for a tool that you might be able to leverage to solve your problem.

We recommend to read the documentation in the following order:

  • Start by skimming the examples to see the possible things the function can do. Don't spend too much time trying to figure out what the code is doing yet, but you can quickly look at it to see how much work is involved.
  • Then read the top paragraph(s) that give a general overview of what the function does.
  • Now that you have a better idea of what the function is doing, go look back at the examples and look at the code much more carefully. When you see an example like the one you want to generate, look carefully at the parameters it passes and go check the parameter list near the top for documentation on those parameters.
  • It sometimes (but not always), helps to skim the other parameters in the list just so you have an idea what this function is capable of doing

As a reminder, you will want to refer to the lecture/section material to see the additional matplotlib calls you might need in order to display/save the plots. You'll also need to call the set function on seaborn to get everything set up initially.

Here are the seaborn functions you might need for this assignment:

Problems

Problem 0) Line Chart

Plot the total percentages of all people of bachelor's degree as minimal completion with a line chart over years. Name your method line_plot_bachelors and save your generated graph as line_plot_bachelors.png.

Problem 1) Bar Chart

Plot the total percentages of women, men, and total people with a minimum education of high school degrees in the year 2009. Name your method bar_chart_high_school and save your generated graph as bar_chart_high_school.png.

Do you think this bar chart is an effective data visualization? Include your reasoning in hw3-written.txt as described in Part 3.

Problem 2) Custom Plot

Plot the results of how the percent of Hispanic individuals with degrees has changed between 1990 and 2010 (inclusive) for high school and bachelor's degrees with a chart of your choice. Name your method plot_hispanic_min_degree and save your visualization as plot_hispanic_min_degree.png.

Include a justification of your choice of data visualization in hw3-written.txt, as described in Part 3.