There will be NO resubmission opportunities for HW5.
Useful CSE 160 Resources¶
Learning Objectives¶
In this assignment, you will construct classes with Python to effectively parse and utilize data regarding fishing, population, and farming around the world! For this assignment, you will be doing the following:
- Write a full Python program from scratch and with only a minimal starting template
- Write classes for application in creating data statistics
- Read and process data from a given comma-separated values (CSV) file
- Build and handle a highly nested structure
- Create plots and graphs to visualize different aspects of the data
- Practice good coding style, as defined by the course style guide
There are many problems and sub-problems to this assignment, so it may appear daunting. Fear not! It’s a simpler assignment than it may appear at first sight.
Problem 0: Setting up & Program Organization:¶
- You should see five files in your homework5 folder:
fishing.py- a file that will run your code; this is a nearly-empty file!classes.py- a file that contains two classes that contain most of the code you will write for this assignment.utils.py- a file with a handful of functions that you will may use infishing.pyorclasses.pysmall.csv- a small subset of the data that you should use for testing. It contains information on the four measures for 3 countries for the years 1995-2000large.csv- the full, larger set of data that you will use for answering the final questionsanswers.txt- where you’ll write your answers to the questions in this homeworktest_fishing.py- this is a file provided to help test and debug your program. You can (and should!) run this program periodically while working on the assignment to see if your program is meeting the spec’s expectations. It is set up similarly to HW4’s test files, and can be run in the terminal withpython test_fishing.py.
In this assignment you will be asked to write the following methods in the classes Country and Fishing in the file classes.py. All of these methods will be expected and run by the autograder. You are expected to implement all of them as defined in this specification:
Country__init__(self, name, start_year, end_year)update_data(self, row)get_actual_production(self, year)get_production_need(self, year)calculate(self)plot_production_vs_need(self)predict_need(self, predict_years)- This method is written for you to you in the starter files and will be used later in the assignment!
Fishing__init__(self, filename)parse_data(self, raw_data)total_production_need(self, years_to_predict)
The development pattern for this assignment is similar to past assignments. You will start by implementing methods in the class Country that will generate statistics for data. Then you will implement methods in the class Fishing that will read data from a file, create appropriate Country objects, and calculate the total production need. Along the way, we will ask you guiding questions and ask you to plot some of the data.
Problem 1: Country¶
Tip
We have created tests for this section, so be sure to test your class by running them and checking in with how your program is functioning for this problem!
Problem 1a: __init__(self, name, start_year, end_year):¶
For this problem, you will write the initialization method __init__(self, name, start_year, end_year) in the Country class within the classes.py file that creates an instance of the class, Country, and initializes the following:
- Name of the country as a string stored in
self.name - Empty dictionaries for each
measureread from a data file:- Farmed, as
self.farmed - Wild caught as
self.wild_caught - Consumption as
self.consumption - Population as
self.population
- Farmed, as
- Empty dictionaries for two more items we will calculate:
- Production as
self.production - Need as
self.need
- Production as
Notice that self.years has already been initialized for you to be a list of integers between start_year and end_year (inclusive). (Later, in Problem 2, we will describe how to extract the start_year and end_year from a data file so you can pass them to this constructor).
Problem 1b: update_data(self, row):¶
For this problem, you will write the method update_data(self, row) in the Country class. This method should update the instance (“self”) to contain the data for one of the following measures for the range of self.years:
- Farmed
- Wild caught
- Consumption
- Population
The input parameter for this method is:
row: a dictionary representation of one row of data from the data file as provided bycsv.DictReader. (Remember thatcsv.DictReaderreads in numbers (e.g., 1960, 333) as strings. You must convert them to numbers in theupdate_datamethod!) Using the example.csvfile discussed in the Background section, the line of data:"Ankh-Morpork,AMP,farmed,321,333"would be provided as the following dictionary:
{
"country": "Ankh-Morpork",
"country code": "AMP",
"measure": "farmed",
"1960": "321",
"1961": "333"
}
Warning
If data is missing for a specific year, it would be automatically stored as an empty string. In order to avoid errors later on, missing data should be stored as None for the given year.
update_data does not return anything, however it updates one of the self.farmed, self.wild_caught, self.consumption or self.population as indicated by the value associated with the key "measure" in the provded row.
If the dictionary above was passed in as the argument for row, the Country’s self.farmed field would be updated to be the following dictionary:
{
1960: 321.0,
1961: 333.0
}
This is because the row’s "measure" was "farmed" and it contains data from the years 1960 and 1961.
Note again how the numbers are converted from strings! Years are integers, and values associated with those years are floats. Remember to use self.years to iterate through the years!
Problem 1c: get_actual_production(self, year):¶
For this problem, write a method in the Country class called get_actual_production(self, year) that calculates the actual total production for the country in the given year. The total production should be the sum of the farmed and wild caught values for that country for the given year. While you may assume we have already updated the data into the dictionaries self.farmed and self.wild_caught such that they will not be empty dictionaries, note that we might still be missing data for a given year.
Calling the get_actual_production method should update the self.production dictionary to contain a pairing between the given year and the total production for that year. For example, given the sample data shown in Problem 1b, calling get_actual_production(1960) should change self.production to be {1960: 321.0}. If an additional dictionary such as the following was updated into the class via the update_data method:
{
"country": "Ankh-Morpork",
"country code": "AMP",
"measure": "wild caught",
"1960": "400"
}
Then calling get_actual_production(1960) should update self.production to be {1960: 721.0}, as it adds the number of farmed and wild_caught values for that country for that year.
Recall that we might have missing data for certain years and measures that we have stored as None. To handle these missing values:
- If both farmed and wild caught data are missing for the given
yearfor this country, then the total production for the given year should beNone. The entry should still exist in theself.productiondictionary, for example, like so{1960: None}. - If only one of the farmed or wild caught data points is missing for the given
yearfor this country, then the total production should just be the existing value. That is, if we are missing farmed data for 1960, but we have wild caught data, total production should be the same as wild caught.
Problem 1d: get_production_need(self, year):¶
For this problem, write a method in the Country class called get_production_need(self, year) that calculates the amount of seafood production that is needed to feed the country’s population for the given year. The values for the consumption measure in the data are specified in kilograms / capita for a particular year. We will calculate the need for a given year by multiplying the population for that year by the consumption for that year. However, since the values for the two production measures in the data (farmed and wild caught) are specified in metric tons (1000kg) for a particular year, to be able to better compare a country’s need to its actual production, we will need to convert the need to metric tons as well.
We can therefore calculate the production need with:
Calling this method should update the self.need dictionary to contain a pairing between the year and the seafood need for that year.
Tip
If either population or consumption is missing for the given year and country, then the need value should be None.
Problem 1e: calculate(self):¶
For this problem, you will write a method called calculate(self) that updates the actual production and the production needed for each year in the range of self.years. You will want to use the get_actual_production(self, year) and the get_production_need(self, year) methods for each of the years within the range.
This method should not return anything. After it is called the dictionaries self.production and self.need should be populated with the correct values for each year in self.years.
Problem 2: Fishing:¶
Problem 2a: __init__(self, filename):¶
For this problem, write the initialization method __init__(self, filename) in the Fishing class within the classes.py file that opens the file filename, uses csv.DictReader to read the file, and creates an instance of the Fishing class.
The Fishing class should contain the following attributes:
- A dictionary,
self.countries, that contains information for all the countries in the file, where the keys are the country codes and the values are instances of theCountryclass - The minimum year seen in the entire dataset as
self.min_year - The maximum year seen in the entire dataset as
self.max_year
General solution approach:
- Make an empty dictionary that stores all countries
- Open the file and use CSV DictReader to read the data
- Determine
self.min_yearandself.max_year, using the helper methodsmin_yearandmax_yearfromutils.py. Assuming you read the csv.DictReader object into a variable calledreader, you can callmin_yearandmax_yearusingreader.fieldnamesas a parameter as following:
self.min_year = min_year(reader.fieldnames)
self.max_year = max_year(reader.fieldnames)
- Call the
parse_datamethod, giving it a list of dictionaries (i.e., what’s returned from csv.DictReader). (You will write theparse_datamethod in the next problem, but the call to it has been given to you in the starter code). Hint: You will want to convert the data from csv.DictReader to a list of dictionaries before passing it intoparse_data.
Problem 2b: parse_data(self, raw_data):¶
Given raw_data as a list of dictionaries read from an input file by csv.DictReader, parse_data should populate the Fishing class’s self.countries dictionary.
For each dictionary in the raw_data list, parse_data should use the "country code" field as the key into the self.countries dictionary. If a value for that country code doesn’t already exist in the self.countries dictionary, a new instance of the Country class should be created. Then, the Country object’s update_data method should be called, given the current row of the data.
This method doesn’t return anything, but does have the effect of populating the self.countries dictionary, which associates country codes (keys) with instances of Country classes.
Problem 2c: total_production_need(self, years_to_predict):¶
So, now that we’ve done all that work, how much seafood will the entire world need to produce 50 years from now?
For this problem, write a method in the Fishing class called total_production_need(self, years_to_predict) that returns a single number: how many metric tonnes will the world need to produce years_to_predict years from now?
This method should do the following:
- Take as input the number of years in the future to predict.
- For each country code in
self.countries:- Calculate the needed production using the country’s
calculatemethod from Problem 1. (This doesn’t return anything, just creates/modifies attributes within theCountryobject.) - Predict the production need for
years_to_predictyears from now usingCountry’spredict_needmethod (already implemented for you).
- Calculate the needed production using the country’s
- Get the last value in the predicted values.
- For example, if you assign what is returned from the
predict_needmethod toprediction, you should be able to get the last value withprediction["values"][-1].
- For example, if you assign what is returned from the
- Sum up all of the predicted values of all countries.
- Return the total.
Now, let’s run the program! Navigate to the file fishing.py. This file does not contain any classes, but instead contains a main function. In the main function, create an instance of the Fishing class with small.csv as the file. Save the instance in a variable called data, then call total_production_need for 50 years. total_production_need should return a total production need of 13243690.762868665. (Your degree of precision may vary.)
Problem 3: Creating Visual Data Representations¶
Problem 3a: plot_production_vs_need(self):¶
In classes.py, write a method for the Country class called plot_production_vs_need(self). You will use the attributes from __init__(self, name) to create a plot of the country’s production vs. its need (not consumption!) over time.
Note that to access this data for each country, calls to get_actual_production and get_production_need are required to populate those fields before accessing the self.production and self.need dictionaries.
A general approach to this problem would follow these steps:
Produce two lists, one for the production and one for the need.
Plot the two lists of data on a line graph. You will need two calls to plt.plot() to do this, each call will use the years list for the x parameter. Make sure to add a label for each line by adding label= to each call of plt.plot(). Because there may be missing data in the “y” values, you should also add the marker='s' parameter to the plt.plot() calls
Additionally, the plot should have the following attributes:
xlabelis set to “Year”ylabelis set to “Metric Tonnes”titleis set to “Production vs. Need for {country_code}”- The legend is added, using
plt.legend() - Nothing else should be added to this plot; following these instructions should result in a nearly-identical plot as shown below.
Finally, you should use plt.savefig(str(self.name) + "-prod-vs-need.png") to save the plot as a PNG image, and plt.clf() to clear the plot after you are done.

To produce this plot, uncomment out the lines of code in the main() function of fishing.py. It should look like the following (assuming that data is an instance of the Fishing class)
usa = data.countries["USA"]
usa.plot_production_vs_need()
Problem 3b: Pause and Interpret the Graph¶
Pause for a few minutes to think about the plot created in Problem 3a and think about the following questions:
- When did the USA’s need surpass its production?
- What’s missing from the data we gave you that would help explain why the USA was still able to consume more seafood than it produced?
Write your answers to this questions in answers.txt
Problem 3c: Plotting the Prediction Line¶
Now, we’ll use the given predict_need method to plot the best-fit line for US consumption and a prediction out to 50 years from now.
For this problem, uncomment out the following lines of code in the main() function in fishing.py.
plot_linear_prediction(usa)
Running your program should then result in no errors and a new file named USA_need_prediction.png being created in the same folder as your program.
The plot_linear_prediction(country) function is given to you in the utils.py starter file. This function takes in a Country object as its parameter. It then calls the predict_need method with the production need data for the given Country object, and then plots the best-fit line and prediction. You do not need to write this function; it’s already been written for you. But it does rely on correct implementations for previous problems.

Problem 4: Running the program with the large file:¶
At this point, it’s time to run the program with the larger data file. In fishing.py, change data to use the file large.csv as its input. Then, at the very end of your program, add a print statement to print out the total you return from total_production_need. Your program should print the following:
Metric tonnes of seafood needed to be produced in 50 years: 245,637,243.224
Warning
The final number is a gross approximation and should not be used as a realistic estimate of the world’s total seafood production needs.
Code Quality¶
Info
Make sure to provide descriptive comments for each function in docstring format
Your assignment should pass two checks: flake8 and our code quality guidelines. The code quality guidelines are very thorough.
Collaboration¶
Warning
If you discuss an assignment with one or more classmates, you must specify with whom you collaborated in the header comment in your submission. You may discuss with as many classmates as you like, but you must cite all of them in your work. Note that you may not collaborate in a way that is prohibited, even if you cite the collaboration. Please consult the course syllabus for more details.
Submission¶
Submit fishing.py, classes.py, and answers.txt to Gradescope.
HW5 - Homework 5 (NO RESUBMISSIONS)
Final Submission by Friday 03/13 at 11:59 pm. Submit on Gradescope