CSE 163, Winter 2020: Homework 5: Processing Geospatial Data

Submission

This assignment and its reflection are due by Thursday, Feb 27 at 11:59 pm.

You should submit your finished hw5_main.py on Ed and the reflection on Google Forms.

Overview

In this assignment, you will do a bit of data analysis involving geospatial data in order to investigate food deserts in Washington state.

Learning Objectives

After this homework, students will be able to:

  • Use a join to merge data from different datasets.
  • Graph geospatial information on a map (including multiple layers).
  • Read library documentation to figure out how to call a library function

Expectations

Here are some baseline expectations we expect you to meet:

  • Follow the course collaboration policies

  • For this assignment, you do not need to write any tests.
  • However, you should include a main method in hw5_main.py that uses the main method pattern that calls every method you write using the provided dataset.

Files

You should download the starter code hw5.zip and open it as the project in Visual Studio Code. The files included are:

  • cse163_utils.py: A file where we will store utility functions to help you write any tests you might want to write.
    • ⚠️ If you use a Mac, you'll need to import cse163_utils.py in your hw5_main.py to make sure the plotting works. However, this causes problems with flake8 because the import is technically unused. In this case, you are allowed to bypass flake8 by importing with this syntax: import cse163_utils # noqa: F401
  • tl_2010_53_tract00: A directory containing all of the shapefile information. You will most likely only be working with file tl_2010_53_tract00/tl_2010_53_tract00.shp inside this directory. The data is described below.
  • food-access.csv: CSV file containing information about food access. The data is described below.

Data

In this assignment, you will be working with two datasets.

This can be a lot to take in at first. Remember the goal here is to count how many people in a given census tract do not have easy access to food. For this dataset, we define "access" as being more than X miles from a food source. For urban areas, we want to look at the number of people more than half a mile away from their closest food source (lapophalf) while for rural environments, we want to look at people more than 10 miles away from their closest food source (lapop10). We will use these counts to determine if a census tract is low access as a whole to identify likely "food deserts".

Shape File

The first dataset you will be using comes from the 2010 census. The information is stored in the tl_2010_53_tract00 directory, but you will most likely only be using the tl_2010_53_tract00/tl_2010_53_tract00.shp file as the access point to this data. The shapefile is similar to a CSV in the sense that it has columns and rows, but it has special functionality for geo-spatial data. Each row of the dataset corresponds to one census tract. The data has many columns, but you only need to understand the following:

  • CTIDFP00: This is the identifier that specifies each census tract. This number will be how we link the two datasets.
  • geometry: The column that stores the actual geometry of the census tract.

This dataset only has entries for census tracts in Washington state.

Food Access Data

The second dataset stores information about food access in each of these census tracts. The file is stored as a CSV format that we have been using all quarter. Each row in the dataset corresponds to a census tract and has the following columns. The data has many columns, but you only need to understand the following:

  • CensusTract: This is the identifier that specifies each census tract. This number will be how we link the two datasets.
  • State: Which state the census tract is in.
  • County: The name of the county this census tract is in.
  • Urban: Flag (0 or 1) that indicates if this census tract is an urban environment (i.e. city).
  • Rural: Flag (0 or 1) that indicates if this census tract is a rural environment (i.e. not a city).
  • LATracts_half: Flag (0 or 1) if this census tract is "low access" at the half mile level. This means there are a sufficient number of people in the census tract that are at least a half mile away from a food source (i.e. grocery store).
  • LATracts10: Flag (0 or 1) if this census tract is "low access" at the 10 mile level. This means there are a sufficient number of people in the census tract that are at least 10 miles away from a food source (i.e. grocery store).
  • POP2010: The number of people in this census tract according to the 2010 census.
  • lapophalf: The number of people in this census tract that are considered having "low access" at the half mile level. You will use this number to determine if the census tract is low access like LATracts_half does.
  • lapop10: The number of people in this census tract that are considered having "low access" at the 10 mile level. You will use this number to determine if the census tract is low access like LATracts10 does.
  • lalowihalf: Similar to lapophalf but only counts the people that are considered low access and low income.
  • lalowi10: Similar to lapop10 but only counts the people that are considered low access and low income.

This dataset has entries for the entire country.

How To Run on Ed

Just like for HW4, to avoid having to duplicate the datasets, we will all use a shared location for the data. You can find the data files on Ed at the locations below. When submitting your assignment, you will need to use these path names so you can submit.

  • /course/food-access/tl_2010_53_tract00/tl_2010_53_tract00.shp
  • /course/food-access/food-access.csv

Playground

You can access a playground notebook here. We recommend trying this out and seeing how the dataset looks like and for prototyping your solutions!

Table of Contents

Evaluation

Your submission will be evaluated on the following dimensions

  • Your solution correctly implements the described behaviors. You will have access to some tests when you turn in your assignment, but many are hidden. All behavior we test is completely described by the problem specification or shown in an example.
  • The first line of hw5_main.py is a comment with your name and uwnetid. Your file should also have a comment at the top explaining what this file should be used for.
  • Your code meets our style requirements:
    • All code files submitted pass flake8.
    • Your program uses the main method pattern and the main method calls all the functions you implemented in this assignment.
    • Every function written is commented, in your own words, using a doc-string format that describes its behavior, parameters, returns, and highlights any special cases.
    • There is a comment at the top of each code file you write with your name, section, and a brief description of what that program does.
    • Any expectations on this page or the sub-pages for the assignment are met as well as all requirements for each of the problems are met.