Useful CSE 163 Resources¶
Learning objective: Analyze and plot geospatial data to investigate food deserts in Washington.
-
hw5.py
is the file for you to put your implementations. The Run button executes this program andcse163_imgd.py
. -
hw5-writeup.md
is the file for your writeup. Instead of testing, this assessment emphasizes reflection on our data analysis. -
cse163_imgd.py
is a helper file that checks your plot outputs against expected output, and creates an image showing any pixel differences. The Run button executes this program and yourhw5.py
. -
expected
is a folder containing the expected output for the 5 plotting functions you’ll be writing. Don’t modify the contents of this folder. -
Playground.ipynb
is a Jupyter Notebook playground. Feel free to edit this Jupyter Notebook to prototype ideas and explore the data.
Info
The Run button works differently in this assessment than previous ones. Once you’ve implemented plotting functions in hw5.py
with calls to plt.savefig()
, you’ll see that Run generates some images showing the pixel differences between your plot and the expected plot highlighted in red. If the image is blank, then all the pixels match. A summary of the percentage of pixels that match will appear in the console.
Note that if there are no differences between your image and the expected image, it will display some text saying no differences were found.
It is still possible for you to receive an E on behavior as long as your plots look the same as the expected image to the human eye. It’s possible that you have a working plot that is marked as wrong if there is even the slightest difference in outlines of counties. As long as it looks visually the same, you can earn full credit.
Context¶
“Food deserts” are neighborhoods where residents do not have nearby access to grocery stores offering affordable and nutritious food. In a June 2009 report to the US Congress, the US Department of Agriculture reports:
According to data from the latest census (2000), about 23.5 million people, or 8.4 percent of the U.S. population, live in low-income neighborhoods that are more than a mile from a supermarket. Low-income neighborhoods are areas where more than 40 percent of the population has income less than or equal to 200 percent of the Federal poverty threshold ($44,000 per year for a family of four in 2008).
In this assessment, we’ll join 2010 US census data with food access data to investigate food deserts in Washington state. A census tract is defined as a food desert if enough people in the tract do not have nearby access to food sources.
-
In urban areas, “low access” is defined as 0.5 miles, which we’ve stored in the column
lapophalf
. -
In rural areas, “low access” is defined as 10 miles, which we’ve stored in the column
lapop10
.
The 2010 US census dataset is geospatial data in shapefile format. The only columns you need to understand are CTIDFP00
, the census tract identifier, and geometry
, the geometric shape of the tract.
The food access dataset is tabular data in CSV format. Each row in the dataset corresponds to a census tract for every state in the country. The data has many columns, but you only need to understand the following:
-
CensusTract
is the census tract identifier. -
State
is the state name for the census tract. -
County
is the county name for the census tract. -
Urban
is a flag (0 or 1) that indicates if this census tract is an urban environment. -
Rural
is a flag that indicates if this census tract is a rural environment. -
LATracts_half
is a flag that indicates if this census tract is “low access” in a half mile radius. -
LATracts10
is a flag that indicates if this census tract is “low access” in a 10 mile radius. -
POP2010
is the number of people in this census tract according to the 2010 census. -
lapophalf
is the number of people in this census tract considered having “low access” in a half mile radius. -
lapop10
is the number of people in this census tract considered having “low access” in a 10 mile radius. -
lalowihalf
is similar tolapophalf
but only counts people considered low access and low income. -
lalowi10
is similar tolapop10
but only counts people considered low access and low income.
Info
Due to the large datasets, the Mark automated checks can take a couple minutes to run. Some tests are hidden: they’ll display pass/fail information but not the exact details. If a hidden test is failing, write more of your own test cases.
load_in_data
¶
Task: Write a function load_in_data
that takes two parameters, the filename for the census dataset and the filename for the food access dataset. load_in_data
should merge the two datasets on CTIDFP00
/ CensusTract
and return the result as a GeoDataFrame
. Assume the census identifier column names exist, but don’t assume any other columns in the datasets. The resulting merged dataset might have missing data. For the provided datasets, the shape
of the resulting GeoDataFrame
is (1318, 30)
rows by columns.
percentage_food_data
¶
Task: Write a function percentage_food_data
that takes the merged data and returns the percentage of census tracts in Washington for which we have food access data. The percentage should be a float
between 0 and 100. Do not round the result.
plot_map
¶
Warning
For all plotting functions, don’t pass in bbox_inches="tight"
to plt.savefig()
.
Task: Write a function plot_map
that takes the merged data and plots the shapes of all the census tracts in Washington in a file map.png
. Give the plot a title of “Washington State” with plt.title()
. Do not customize this plot or overlay any data. The output should look like Washington state—make sure that it doesn’t have any holes!
plot_population_map
¶
Info
Layered plots can vary depending on the layering procedure, so it’s OK if your plots look as expected after checking the diff but match only 98 or 99%.
Task: Write a function plot_population_map
that takes the merged data and plots the shapes of all the census tracts in Washington in a file population_map.png
where each census tract is colored according to population. There will be some missing census tracts. Under the census tracts, plot a background of all of Washington’s Census Tracts in the color #EEEEEE
. Include a legend to indicate the meaning of each census tract color and give the plot a title of “Washington Census Tract Populations” with plt.title()
.
plot_population_county_map
¶
Task: Write a function plot_population_county_map
that takes the merged data and plots the shapes of all the census tracts in Washington in a file county_population_map.png
where each county is colored according to population. This will involve aggregating all the census tract data in each county, and there will be some missing counties. Under the census tracts, plot a background of all of Washington’s Census Tracts in the color #EEEEEE
. Include a legend to indicate the meaning of each census tract color and give the plot a title of “Washington County Populations” with plt.title()
.
plot_food_access_by_county
¶
Task: Write a function plot_food_access_by_county
that takes the merged data and produces 4 plots on the same figure showing information about food access across income level.
First, compute the ratio of people in each category.
-
Slice the dataframe to keep only the columns
County
,geometry
,POP2010
,lapophalf
,lapop10
,lalowihalf
,lalowi10
. -
Aggregate this dataset by
County
, summing up all the numeric columns. -
For each
County
, compute the ratio of people in each category:lapophalf_ratio
,lapop10_ratio
,lalowihalf_ratio
,lalowi10_ratio
and store them as new columns on this aggregated dataset.Example
For example if we had a row for a county with the following data (shown as a dictionary for simplicity):
{ 'County': 'Hunter County', 'geometry': ..., 'POP2010': 50, 'lapophalf', 15, 'lapop10': 3, 'lalowihalf': 7, 'lalowi10': 1 }
Then after this step, the row would have the data:
{ 'County': 'Hunter County', 'geometry': ..., 'POP2010': 50, 'lapophalf', 15, 'lapop10': 3, 'lalowihalf': 7, 'lalowi10': 1, 'lapophalf_ratio', 0.30, 'lapop10_ratio': 0.06, 'lalowihalf_ratio': 0.14, 'lalowi10_ratio': 0.02 }
Then, setup the figure using subplots
.
fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(20, 10))
Finally, plot the data on each ax1
ax2
ax3
ax4
subplot axis. For each subplot:
-
Plot a background of Washington’s Census Tracts in the color
#EEEEEE
. -
Call the
plot
function on the appropriate column with the argumentsax
(to specify the subplot axis) andvmin=0
andvmax=1
(so they all share the same scale). Include a legend. -
Set the titles for each axis to match the expected output picture, such as
ax1.set_title('Low Access: Half')
.
Save the figure to a file county_food_access.png
.
plot_low_access_tracts
¶
Task: Write a function plot_low_access_tracts
that takes the merged data and plots all census tracts considered “low access” in a file low_access.png
.
Info
For this problem, you should not use the LATracts_half
or LATracts10
columns in the GeoDataFrame
. The intent of this problem is to have you practice the computation necessary to recreate those values. Note that the procedure described below won’t compute the exact same values, which is intended.
First, compute the above statistics for each census tract depending on its classification as Urban
or Rural
. Don’t use the LATracts_half
or LATracts10
columns in the GeoDataFrame
since this computation is slightly more subtle. The thresholds for urban tracts are considered differently from rural tracts.
-
Urban: If the census tract is “urban”, the distance of interest is half a mile from a food source. The threshold for low access in an urban census tract is at least 500 people or at least 33% of the people in the census tract being more than half a mile from a food source. An urban census tract that satisfies either of these conditions is considered low access.
-
Rural (i.e. non-urban): If the census tract is “rural”, the distance of interest is 10 miles from a food source. The threshold for low access in a rural census tract is at least 500 people or at least 33% of the people in the census tract being more than 10 miles from a food source. A rural census tract that satisfies either of these conditions is considered low access.
Then, produce a layered plot (all on the same axes) to highlight low access census tracts. Each plot will draw on top of the previous plot.
-
Plot a background of Washington’s Census Tracts in the color
#EEEEEE
. -
Plot all the census tracts for which we have food access data in the color
#AAAAAA
. -
Plot all the census tracts considered low access in the default color, blue.
Give it a title of “Low Access Census Tracts” with plt.title()
.
Writeup¶
Task: In hw5-writeup.md
, apply critical thinking to address the following questions about data collection and analysis. You could spend an entire course talking about any of these topics, but we’re just looking for 2 to 4 sentences on each question.
What is one way that government officials could use plot_food_access_by_county
or plot_low_access_tracts
to shape public policy on how to improve food access?
What is a limitation or concern with using the plots in this way?
Quality¶
Assessment submissions should pass these checks: flake8
and code quality guidelines. The code quality guidelines are very thorough. For this assessment, the most relevant rules can be found in these sections (new sections bolded):
Submission¶
Submit your work by pressing the Mark button. Submit as often as you want until the deadline for the initial submission. Note that we will only grade your most recent submission. You can view your past submissions using the “Submissions” button.
Please make sure you are familiar with the resources and policies outlined in the syllabus and the take-home assessments page.
A5 - Mapping
Initial Submission by Thursday 05/19 at 11:59 pm.
Submit on Ed