Background & Pre-coding Work

Info

Most of this page is informational, however there is requird work at the bottom! Reading this will be useful in doing that work.

Fishing Data

We as people need to consume energy to survive, and one common source of energy is that from fish. Fish is a major source of protein and nutrition around the world, and as our population grows, so does our consumption of fish and thus the industry of farming and catching fish. However, it is important that we check and evaluate the difference between the growth in our consumption of fish and the growth of the industries for farming and catching fish. Many organizations locally, and globally record and track this data to ensure both environmental and agricultural safety. One organization that does this is the United Nations Food and Agriculture Organization. We will be using global fishing data collected by the United Nations Food and Agriculture Organization for this assignment!

In this Homework, you will be utilizing the data collected from the United Nations Food and Agriculture Organization, which they provide in a CSV file, to generate statistics and visual representations of the data to analyse and make informed conclusions from. You will be writing your code that generates the statistics and visualizations from the data in a python script, Fishing.py and write your conclusions in the answers.txt file provided.

UN FAO Logo

Context: Reading in the data

You do not need to worry about sourcing the data from the United Nation Food and Agriculture Organization, we will be providing it to you through two CSV files (small.csv and large.csv). The smaller data file, small.csv, contains production (both wild and farmed), consumption, and population data for three countries for a small number of years.

The input files have the following format:

country full name, country code, measure (wild caught, farmed, consumption, or population), 1950, 1951, 1952, …

There will always be at minimum three fields for each row: Every line will have at minimum the full country name, a country code, and which measure this row is for. There will be the proper number of commas to represent every year, but not all years are guaranteed to have values. For example, the following is an example file containing the header row and four data rows:

country, country code, measure, 1960, 1961
Ankh-Morpork, AMP, consumption, ,10.42
Ankh-Morpork, AMP, wild caught, 7777, 8888
Ankh-Morpork, AMP, farmed, 321, 333
Ankh-Morpork, AMP, population, 995623, 996235

Tip

Since you are working with CSV files, you should make use of csv.DictReader, which will make it easy to read lines from a csv file. csv.DictReader takes one parameter, an opened file object, and returns a “reader” object. Looping through this object as in for row in reader: ... gives you a dictionary for each line of the file (minus the first column header line) where the keys are the columns (e.g., “country code”) and the values are the values for that column and row (e.g., “AMP”). Even if a value is missing for a given column in the source file, csv.DictReader will still create a key for it, setting the value to an empty string (‘’)

Pre-coding questions:

You may have noticed in class that we stress the importance of thinking about how you will solve a problem before writing the solution code. We will be implementing this approach here!

Before writing code for this assignment, answer the following questions in the provided answers.txt file. You may find it helpful in answering the questions to read the entire assignment specification first. These are meant to help you better understand the structure of the data and how to parse it from the CSV files.

  1. Why are there multiple rows for Ankh-Morpork? Why isn’t there just a single row for this country?
  2. What do you notice about the data for 1960? How do you think that affects how you’ll handle reading the data in this problem’s functions? (Hint: read the tip above.)
  3. The consumption measurement type is clearly labeled (line 2 of the file), but none of the other lines are labeled “production”. If you were to ask “how much seafood did Ankh-Morpork produce in 1960?” how would you come up with the answer? (Hint: the answer would be 8098.)