Background & Pre-coding Work

Info

Most of this page is informational, however there is required work at the bottom! Reading this will be useful in doing that work.

Fishing Data

We as people need to consume energy to survive, and one common source of energy is that from fish. Fish is a major source of protein and nutrition around the world, and as our population grows, so does our consumption of fish and thus the industry of farming and catching fish. However, it is important that we check and evaluate the difference between the growth in our consumption of fish and the growth of the industries for farming and catching fish. Many organizations locally, and globally record and track this data to ensure both environmental and agricultural safety. One organization that does this is the United Nations Food and Agriculture Organization. We will be using global fishing data collected by the United Nations Food and Agriculture Organization for this assignment!

In this Homework, you will be utilizing data collected from the United Nations Food and Agriculture Organization, which they provide in a CSV file, to generate statistics and visual representations of the data to analyse and make informed conclusions from. You will be writing code that generates the statistics and visualizations from the data in the files classes.py and fishing.py and write your conclusions in the provided answers.txt file.

UN FAO Logo

Context: Reading in the data

You do not need to worry about sourcing the data from the United Nation Food and Agriculture Organization, we will be providing it to you through two CSV files (small.csv and large.csv). The smaller data file, small.csv, contains production (both wild and farmed), consumption, and population data for three countries for a small number of years.

The first line of each .csv file contains the column header that applies to all other lines in the file. All lines after the first line provide data about one measure for one country for some subset of the years mentioned in the first line of the file. So a line of data would look like:

country full name, country code, measure (wild caught, farmed, consumption, or population), 
value of measure for the first year, value of measure for the second year,…

There will always be at minimum three fields provided for each row of data: the full country name, a country code, and which measure this row contains data for. There will be the proper number of commas to represent every year mentioned on the header row, but not all years are guaranteed to have values. For example, the following is an sample .csv file containing the header row and four data rows:

country, country code, measure, 1960, 1961
Ankh-Morpork, AMP, consumption, ,10.42
Ankh-Morpork, AMP, wild caught, 7777, 8888
Ankh-Morpork, AMP, farmed, 321, 333
Ankh-Morpork, AMP, population, 995623, 996235

In the .csv file above, we see data for the country Ankh-Morpork, which has a country code of AMP, for just two years: 1960 and 1961. Row 2 provides data on the measure consumption for the year 1961 only. Row 3 provides data on the measure wild caught for the years 1960 and 1961. Row 4 provides data on the measure farmed for the years 1960 and 1961. Row 5 provides data on the measure population for the years 1960 and 1961.

Tip

Since you are working with CSV files, you can make use of the csv.DictReader, which will make it easy to read lines from a csv file. csv.DictReader takes one parameter, an opened file object, and returns a “reader” object. Looping through this object as in for row in reader: ... gives you a dictionary for each line of data in the file, where the keys in each dictionary are the column names indicated by the first line in the file (e.g., “country code”) and the values are the values for that column and row (e.g., “AMP”). Even if a value is missing for a given column in the source file, csv.DictReader will still create a key for it, setting the value to an empty string (“”) (see CSV DictReader lecture for more information).

Pre-coding questions:

You may have noticed in class that we stress the importance of thinking about how you will solve a problem before writing the solution code. We will be implementing this approach here!

Before writing code for this assignment, answer the following questions in the provided answers.txt file. You may find it helpful to read the entire assignment specification before answering the questions. These questions are meant to help you better understand the structure of the data and how to parse it from the CSV files.

  1. Why are there multiple rows for Ankh-Morpork? Why isn’t there just a single row for this country?
  2. What do you notice about the data for 1960? How do you think that affects how you’ll handle reading the data in this problem’s functions? (Hint: read the tip above.)
  3. The consumption measurement type is clearly labeled (line 2 of the file), but none of the other lines are labeled “production”. If you were to ask “how much seafood did Ankh-Morpork produce in 1960?” how would you come up with the answer? (Hint: the answer would be 8098.)