CSE 163, Spring 2019: Homework 6: Processing Geospatial Data

Submission

This assignment and its reflection are due by Thursday, May 23 at 11:59 pm.

You should submit your finished hw6_main.py, washington_map.png, washington_population_map.png, washington_county_population_map.png, washington_county_food_access.png, and washington_low_access.png on Gradescope and the reflection on Google Forms.

Overview

In this assignment, you will do a bit of data analysis involving geospatial data in order to investigate food deserts in Washington state.

Learning Objectives

After this homework, students will be able to:

  • Use a join to merge data from different datasets.
  • Graph geospatial information on a map (including multiple layers).
  • Read library documentation to figure out how to call a library function

Expectations

Here are some baseline expectations we expect you to meet:

  • Follow the course collaboration policies

  • For this assignment, you do not need to write any tests. However, you should include a main method in hw6_main.py that uses the main method pattern that calls every method you write using the provided dataset.

Files

You should download the starter code hw6.zip and open it as the project in Visual Studio Code. The files included are:

  • cse163_utils.py: A file where we will store utility functions to help you write any tests you might want to write.
  • tl_2010_53_tract00: A directory containing all of the shapefile information. You will most likely only be working with file tl_2010_53_tract00/tl_2010_53_tract00.shp inside this directory. The data is described below.
  • food_access.csv: CSV file containing information about food access. The data is described below.

Data

In this assignment, you will be working with two datasets.

The first dataset you will be using comes from the 2010 census. The information is stored in the tl_2010_53_tract00 directory, but you will most likely only be using the tl_2010_53_tract00/tl_2010_53_tract00.shp file as the access point to this data. The shapefile is similar to a CSV in the sense that it has columns and rows, but it has special functionality for geo-spatial data. Each row of the dataset corresponds to one census tract. The data has many columns, but you only need to understand the following:

  • CTIDFP00: This is the identifier that specifies each census tract. This number will be how we link the two datasets.
  • geometry: The column that stores the actual geometry of the census tract.

The second dataset stores information about food access in each of these census tracts. The file is stored as a CSV format that we have been using all quarter. Each row in the dataset corresponds to a census tract and has the following columns. The data has many columns, but you only need to understand the following:

  • CensusTract: This is the identifier that specifies each census tract. This number will be how we link the two datasets.
  • State: Which state the census tract is in.
  • County: The name of the county this census tract is in.
  • Urban: Flag (0 or 1) that indicates if this census tract is an urban environment (i.e. city).
  • Rural: Flag (0 or 1) that indicates if this census tract is a rural environment (i.e. not a city).
  • LATracts_half: Flag (0 or 1) if this census tract is "low access" at the half mile level. This means there are a sufficient number of people in the census tract that are at least a half mile away from a food source (i.e. grocery store).
  • LATracts10: Flag (0 or 1) if this census tract is "low access" at the 10 mile level. This means there are a sufficient number of people in the census tract that are at least 10 miles away from a food source (i.e. grocery store).
  • POP2010: The number of people in this census tract according to the 2010 census.
  • lapophalf: The number of people in this census tract that are considered having "low access" at the half mile level. You will use this number to determine if the census tract is low access like LATracts_half does.
  • lapop10: The number of people in this census tract that are considered having "low access" at the 10 mile level. You will use this number to determine if the census tract is low access like LATracts10 does.
  • lalowihalf: Similar to lapophalf but only counts the people that are considered low access and low income.
  • lalowi10: Similar to lapop10 but only counts the people that are considered low access and low income.

This can be a lot to take in at first. Remember the goal here is to count how many people in a given census tract do not have easy access to food. For this dataset, we define "access" as being more than X miles from a food source. For urban areas, we want to look at the number of people more than half a mile away from their closest food source (lapophalf) while for rural environments, we want to look at people more than 10 miles away from their closest food source (lapop10). We will use these counts to determine if a census tract is low access as a whole to identify likely "food deserts".

Playground

You can access a playground notebook here

Table of Contents

Evaluation

Your submission will be evaluated on the following dimensions

  • Your solution correctly implements the described behaviors. You will not have access to tests when you turn in your assignment, All behavior we test is completely described by the problem specification or shown in an example.
  • The first line of hw6_main.py is a comment with your name and uwnetid.
  • Your code meets our style requirements:
    • All code files submitted pass flake8
    • Every function written is commented, in your own words, using a doc-string format that describes its behavior, parameters, returns, and highlights any special cases.
    • There is a comment at the top of each code file you write with your name, section, and a brief description of what that program does.
    • Any expectations on this page or the sub-pages for the assignment are met as well as all requirements for each of the problems are met.