In this homework, we will define our own classes to parse and analyze worldwide fish consumption, farming, and catching data as they relate to national population metrics over time.
By the end of this assignment, students will feel more comfortable:
Writing classes that enable statistical data analysis.
Processing large datasets in comma-separated values (CSV) file format.
Creating data visualizations with matplotlib.
Background¶
Fish is a major source of protein and nutrition around the world. Increasing worldwide population has led to increasing fish consumption, fish farming, and fish catching. But is demand outstripping supply? Is the world experiencing overfishing? And, if current trends continue, will we run out fish in the future? Organizations such as the United Nations Food and Agriculture Organization track this data to assess the impacts of supply and demand on food systems and sustainability.[1]
We’ve prepared two datasets, small.csv and large.csv, that both share the same columns:
country, such as United Statescountry code, such as USAmeasure, which will be either wild caught, farmed, consumption, or populationand columns for each year, such as
1995,1996,1997.
Each year column represents the value of the measure in that year. There will be the proper number of commas to represent every year mentioned on the header row, but not all years are guaranteed to have values. Can you spot the missing value in the small.csv file?
country,country code,measure,1995,1996,1997,1998,1999,2000
Canada,CAN,consumption,22.62,23.41,23.23,24.1,24.97,23.49
Canada,CAN,farmed,65207,72376,81676,91046,112916,127665
Canada,CAN,population,29137000,29442000,29746000,30020000,30270000,30530000
Canada,CAN,wild caught,948410,1195329,1274391,1291796,1277027,1121610
Mexico,MEX,consumption,9.67,10.42,11.26,9.23,9.17,9.61
Mexico,MEX,farmed,25580,31339,39500,41068,48443,53918
Mexico,MEX,population,89155000,90784000,92389000,93977000,95557000,97112000
Mexico,MEX,wild caught,1380516.375,1500101.5,1533101.375,1193486.25,1238933.5,1351027.375
United States,USA,consumption,21.96,21,20.96,,21.6,22.04
United States,USA,farmed,413454,393346,438356,445119,479212,456830
United States,USA,population,264020000,267301000,270667000,274123000,277547000,280817000
United States,USA,wild caught,5414226,5268586,5385731,5059144,5205206,5041558
The first row for the country United States is missing a data point for the measure consumption in the year 1998!
United States,USA,consumption,21.96,21,20.96,,21.6,22.04Problem 0: Dataset comprehension¶
Answer the following questions in answers.txt. You may find it helpful to read the entire assignment instructions.
Why are there multiple rows for each country? Why isn’t there just a single row for each country?
What do you notice about the data for 1998? How will this affect how you’ll handle reading the data in this problem’s functions?
The measure consumption is clearly labeled in the dataset, but no lines directly measure the production. How do you determine how much fish was produced by Canada in 1995? (Hint: the answer to this specific question is 1013617.)
Problem 1: Country class¶
Problem 1a: Initialization¶
Document and implement the __init__ method that takes self, the name of a country, the start_year, and the end_year and initializes the following fields:
self.name: The name of the country given as an argument.Empty dictionaries for each
measurethat will be given by theupdate_datamethod:self.farmedself.wild_caughtself.consumptionself.population
Empty dictionaries for two more measures we will calculate:
self.productionself.need
Notice that self.years has already been initialized for you to be a list of integers between start_year and end_year (inclusive).
We have provided several doctests to check your work with the console command:
%run fishing.pyProblem 1b: Update data¶
Document and implement the update_data method that takes self and a row and updates self to match the given row. Years are converted to int values, measurements are converted to float values, and missing measurements are represented with None.
rowA dictionary representation of one row of data as provided by
csv.DictReader.The row
United States,USA,consumption,21.96,21,20.96,,21.6,22.04would be represented in Python as:row = { "country": "United States", "country code": "USA", "measure": "consumption", "1995": "21.96", "1996": "21", "1997": "20.96", "1998": "", "1999": "21.6", "2000": "22.04" }
Running update_data on this row would result in the following dictionary updates:
self.consumption[1995] = 21.96
self.consumption[1996] = 21.0
self.consumption[1997] = 20.96
self.consumption[1998] = None
self.consumption[1999] = 21.6
self.consumption[2000] = 22.04Remember to use self.years to iterate through the years!
Problem 1c: Actual production¶
Document and implement the calc_actual_production method that takes self and a year and updates the self.production dictionary with an entry for the country’s actual production in the given year.
- actual production
- The sum of the farmed and wild caught values for the country in the given year.
For example, in small.csv, the production of Canada in 1995 was 1013617: the sum of the farmed amount 65207 and the wild caught amount 948410. self.production should have an updated entry for the year 1995:
{1995: 1013617.0}While you may assume self.farmed and self.wild_caught will not be empty, note that they may still contain missing measurements for a given year. To handle these missing values:
If both farmed and wild caught data are missing for the given
yearfor this country, then the actual production for the given year should beNone.If only one of the farmed or wild caught data points is missing for the given
yearfor this country, then the actual production should just be the existing value. In other words, treatNoneas 0 in this case.
Problem 1d: Production need¶
Document and implement the calc_production_need method that takes self and a year and updates the self.need dictionary with an entry for the country’s production need to feed the country’s population for the given year. If either population or consumption is missing for the given year and country, then the need value should be None.
- production need
- The product (multiplication) of the population and consumption divided by 1000 for the country in the given year.
We divide by 1000 to correct for different units: consumption is given in kilograms per person, but production is given in metric tons (1000 kg).
Problem 1e: Calculate¶
Document and implement the calculate method that takes self and updates the actual production and the production need for each year in self.years. Call calc_actual_production and calc_production_need for each year. This method does not return anything.
Problem 2: Fishing class¶
Problem 2a: Initialization¶
Document and implement the __init__ method that takes self and a filename and opens the file filename, uses csv.DictReader to read the file, and initializes the following fields:
self.min_year: The minimum year in the entire dataset.self.max_year: The maximum year in the entire dataset.self.countries: A dictionary mapping eachcountry codein the dataset to aCountryobject with all the country’s data loaded.
Problem 2b: Total production need¶
So, now that we’ve done all that work, how much fish will the entire world need to produce 50 years from now?
Document and implement the total_production_need method that takes self and an int number of years_to_predict and returns a single number: how many metric tonnes of fish will the world need to produce years_to_predict years from now? For each country code in self.countries:
Calculate the needed production using the country’s
calculatemethod. (This doesn’t return anything, just updates theCountryobject.)Predict the production need for
years_to_predictyears from now using the country’spredict_needmethod (already implemented for you).Extract the last predicted production need, such as
prediction[1][-1].
Then, return the sum of all the predicted production needs.
Finally, let’s run the program! At the bottom of fishing.py, edit the __main__ conditional block to create an instance of the Fishing class with small.csv as the file. Save the instance in a variable called data before calling total_production_need for 50 years. %run fishing.py should report no doctest failures and a global production need of approximately 13243690.76...
Problem 3: Visualizations¶
Problem 3a: Production vs. need¶
Document and implement the plot_production_vs_need method in the Country class that takes self and plots the country’s actual production versus its production need (not consumption) over time. To access this data for each country, calls to calc_actual_production and calc_production_need are required to populate those fields before accessing the self.production and self.need dictionaries.
Produce two lists, one for the actual production and one for the production need.
Plot the two lists of data on a line plot, making two calls to
plt.plotwith the sameyearsas the x-axis values. In eachplotcall:Add a label for each line by specifying the
labelparameter.Due to missing y-axis values, specify
marker='s'.
Additionally, the plot should have the following attributes:
“Year” as the x-axis label
“Metric Tonnes” as the y-axis label
“Production vs. Need for
self.name” as the titleA legend
Finally, call plt.savefig(str(self.name) + "-prod-vs-need.png") to save the plot as a PNG image and uncomment the lines of code in the __main__ block to call your method. The result should match the following plot.
Problem 3b: Pause and think¶
Pause for a few minutes to think about the plot above. Answer the following questions in answers.txt. When did the United States’ need surpass its production? What’s missing from the data we gave you that would help explain why the United States was still able to consume more fish than it produced?
Problem 3c: Linear prediction¶
Let’s use the given predict_need method to plot a linear prediction for US production need 50 years from now. Simply uncomment the following line of code in the __main__ block.
plot_linear_prediction(usa)Running your program should then result in no doctest failures and a new PNG image, United-States-need-prediction.png.
Running on large.csv¶
So far, we’ve been testing our program with small.csv. Let’s try running the program on large.csv. Edit your Fishing data to read the file large.csv. Then, at the very end of your program, add a print statement to display the total_production_need. %run fishing.py should print:
Metric tonnes of fish needed to be produced in 50 years: 245637243.2239474Code quality¶
Run our linter (automated code style checker) in the Python console with the expression !flake8. Edit the file and save your changes after addressing all reported issues. A successful !flake8 run will print nothing when there are no linting issues to report.
!flake8Then, review our style guide.
Collaboration¶
If you discuss an assignment with one or more classmates, you must specify with whom you collaborated in a comment at the bottom of your submission. You may discuss with as many classmates as you like, but you must cite all of them in your work. Note that you may not collaborate in a way that is prohibited, even if you cite the collaboration.
At the bottom of your fishing.py and answers.txt files, state which students or other people (besides the course staff) helped you with the assignment, or that no one did.
Submission¶
Submit fishing.py and answers.txt on Gradescope under the assignment Homework: Fishing Analysis.
Hannah Ritchie and Max Roser. 2021. “Fish and Overfishing.” In OurWorldinData.org. https://
ourworldindata .org /fish -and -overfishing