CSE 163, Winter 2020: Homework 5: Part 1

Overview

In this section, we will perform some various data analyses on the combined dataset you created in Part 0.

Expectations

  • For this part of the assignment, you may import and use the math, matplotlib.pyplot, geopandas, and pandas packages, but you may not use any other imports.
  • The first line of your file should be a comment with your uwnetid.
  • Every plot should have a descriptive title you come up with. For problem 4, we tell you what the titles of the plots should be. You do not need to make labels for the x-axis or y-axis.
  • For full credit, your solutions to this problems must not use any loops or list comprehensions that have to iterate over the entire dataset.

Data Analysis

For each of the functions below, they should be written in hw5_main.py and each one should take the merged data from Part 0 as a parameter.

Problem 0: percentage_food_data

Write a function called percentage_food_data that returns the percentage of census tracts in Washington that we have food access data for. The returned number should be a float between 0 and 100 (e.g. 73.212456). Note that the example shown in the last sentence is not necessarily the expected output, it is just meant to show you what the returned number should look like.

Problem 1: plot_map

Write a function called plot_map that plots a map of Washington. There is no need to customize this plot or add any data on top of it; it should just plot the shape of all the census tracts. The output should look like Washington state. You should save the plot in a file called washington_map.png.

Problem 2: plot_population_map

Write a function called plot_population_map that plots a map of Washington with each census tract colored by its population. It is expected that there will be some missing census tracts. You should also include a legend to indicate what the colors mean. You should save the plot in a file called washington_population_map.png.

Problem 3: plot_population_county_map

Write a function called plot_population_county_map that plots a map of Washington with each county colored by its population. You'll need to aggregate the census tract data to be for each county instead. It is expected that there will be some missing counties. You should also include a legend to indicate what the colors mean. You should save the plot in a file called washington_county_population_map.png.

Problem 4: plot_food_access_by_county

For this problem, you will be writing a function called plot_food_access_by_county that takes the merged data as a parameter and makes various plots on the same figure showing information about food access and low income. This problem is more complicated than the others so we will provide a breakdown of the steps needed to solve it (some with provided code). Here is the final result that you should produce. Food Access

  1. To reduce some of the computation on unnecessary data, make a copy of the GeoDataFrame that only has the columns 'County', 'geometry', 'POP2010', 'lapophalf', 'lapop10', 'lalowihalf', 'lalowi10'.
  2. Aggregate this dataset by county, summing up all of the numeric columns.
  3. Compute columns named 'lapophalf_ratio', 'lapop10_ratio', 'lalowihalf_ratio', 'lalowi10_ratio' that store the ratio of people in that county that fall under each group respectively. These columns should be added to the local copy of the dataset.

    Clarifying Example

    For example if we had a row for a county with the following data (shown as a dictionary for simplicity):

    {
      'County': 'Hunter County',
      'geometry': ...,
      'POP2010': 50,
      'lapophalf', 15,
      'lapop10': 3,
      'lalowihalf': 7,
      'lalowi10': 1
    }
    Then after this step, the row would have the data:
    {
      'County': 'Hunter County',
      'geometry': ...,
      'POP2010': 50,
      'lapophalf', 15,
      'lapop10': 3,
      'lalowihalf': 7,
      'lalowi10': 1,
      'lapophalf_ratio', 0.30,
      'lapop10_ratio': 0.06,
      'lalowihalf_ratio': 0.14,
      'lalowi10_ratio': 0.02
    }

  4. Create a figure with subplots. To do this, you will use the following line of code which will create a figure with 4 separate axes to draw subplots.
    fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, figsize=(20, 10), ncols=2)
    This line of code looks complicated, but all you need to know is the variable fig stores a reference to the whole figure (i.e. the picture) and each of the variables that start with ax store a reference to one of sub-plot's axis.
  5. For each of the ratio columns you computed, you should plot it by calling the plot function on the dataset and changing the color by specifying the column you want. As before each plot should have legend. You'll need to specify the ax parameter and pass in the axis from the previous step to have it draw in the proper place. To keep things consistent, you should also specify vmin and vmax to be 0 and 1 respectively so they all use the same scale.
  6. Set the titles for each axis to match the picture shown above. For example, if you want to change the title of the first subplot you would write the line of code: ax1.set_title('Foo')
  7. Save the figure to a file by calling fig.savefig('washington_county_food_access.png')

If these steps are done correctly, you should end up like the picture shown above.

Development Strategy: It might help to start by making these on separate plots and then figuring out how to plot them on the same figure.

Problem 5: plot_low_access_tracts

In this problem, we will plot all of the census tracts that are considered low access. You should write a function called plot_low_access_tracts that saves the information described below in a file named washington_low_access.png. The definition for low access depends on whether or not the census tract is "urban". The data is set up so that each census tract is either "urban" or "rural".

  • Urban: If the census tract is "urban", the distance of interest is half a mile from a food source. The threshold for low access in an urban census tract is at least 500 people or at least 33% of the people in the census tract being more than half a mile from a food source. An urban census tract that satisfies either of these conditions is considered low access.
  • Rural (i.e. non-urban): If the census tract is "rural", the distance of interest is 10 miles from a food source. The threshold for low access in a rural census tract is at least 500 people or at least 33% of the people in the census tract being more than 10 miles from a food source. A rural census tract that satisfies either of these conditions is considered low access.

In this problem, you should compute all of the census tracts that match the definition above (depending on if it is urban or not). We will then make a plot in layers (all on the same axis) to highlight the census tracts that have low food access. Because we are plotting on the same set of axes, a new plot will "draw over" the old one which will allow us to highlight exactly as we want. You should plot the data in the following order.

  • First, plot all of the census tracts. You should pass in color='#EEEEEE' when plotting to make the census tracts a light gray.
  • Second, plot all of the census tracts that we have food access data for. You should pass in color='#AAAAAA' when plotting to make these census tracts a dark gray.
  • Third, plot all of the census tracts that your computation has considered "low access". You should not pass in a color for this plot so that the low access census tracts are highlighted blue.

For this problem, you are NOT allowed to use the 'LATracts_half' or 'LATracts10' columns since we are trying to compute these. As a sanity check, you can verify that census tracts you computed match the ones indicated with these columns, but you should not leave this sanity check code when you submit.