There will be NO resubmission opportunities for HW5.

Useful CSE 160 Resources

Learning Objectives

In this assignment, you will construct classes with Python to effectively parse and utilize data regarding fishing, population, and farming around the world! For this assignment, you will be doing the following:

  • Write a full Python program from scratch and with only a minimal starting template
  • Write classes for application in creating data statistics
  • Read and process data from a given comma-separated values (CSV) file
  • Build and handle a highly nested structure
  • Create plots and graphs to visualize different aspects of the data
  • Practice good coding style, as defined by the course style guide

There are many problems and sub-problems to this assignment, so it may appear daunting. Fear not! It’s a simpler assignment than it may appear at first sight.

Problem 0: Setting up & Program Organization:

  1. You should see five files in your homework5 folder:
    1. fishing.py - a file that will run your code; this is a nearly-empty file!
    2. classes.py - a file that contains two classes that contain most of the code you will write for this assignment.
    3. utils.py - a file with a handful of functions that you will may use in fishing.py or classes.py
    4. small.csv - a small subset of the data that you should use for testing. It contains information on the four measures for 3 countries for the years 1995-2000
    5. large.csv - the full, larger set of data that you will use for answering the final questions
    6. answers.txt - where you’ll write your answers to the questions in this homework
    7. test_fishing.py - this is a file provided to help test and debug your program. You can (and should!) run this program periodically while working on the assignment to see if your program is meeting the spec’s expectations. It is set up similarly to HW4’s test files, and can be run in the terminal with python test_fishing.py.

In this assignment you will be asked to write the following methods in the classes Country and Fishing in the file classes.py. All of these methods will be expected and run by the autograder. You are expected to implement all of them as defined in this specification:

  • Country
    • __init__(self, name, start_year, end_year)
    • update_data(self, row)
    • get_actual_production(self, year)
    • get_production_need(self, year)
    • calculate(self)
    • plot_production_vs_need(self)
    • predict_need(self, predict_years)
      • This method is written for you to you in the starter files and will be used later in the assignment!
  • Fishing
    • __init__(self, filename)
    • parse_data(self, raw_data)
    • total_production_need(self, years_to_predict)

The development pattern for this assignment is similar to past assignments. You will start by implementing methods in the class Country that will generate statistics for data. Then you will implement methods in the class Fishing that will read data from a file, create appropriate Country objects, and calculate the total production need. Along the way, we will ask you guiding questions and ask you to plot some of the data.

Problem 1: Country

Tip

We have created tests for this section, so be sure to test your class by running them and checking in with how your program is functioning for this problem!

Problem 1a: __init__(self, name, start_year, end_year):

For this problem, you will write the initialization method __init__(self, name, start_year, end_year) in the Country class within the classes.py file that creates an instance of the class, Country, and initializes the following:

  • Name of the country as a string stored in self.name
  • Empty dictionaries for each measure read from a data file:
    • Farmed, as self.farmed
    • Wild caught as self.wild_caught
    • Consumption as self.consumption
    • Population as self.population
  • Empty dictionaries for two more items we will calculate:
    • Production as self.production
    • Need as self.need

Notice that self.years has already been initialized for you to be a list of integers between start_year and end_year (inclusive). (Later, in Problem 2, we will describe how to extract the start_year and end_year from a data file so you can pass them to this constructor).

Problem 1b: update_data(self, row):

For this problem, you will write the method update_data(self, row) in the Country class. This method should update the instance (“self”) to contain the data for one of the following measures for the range of self.years:

  • Farmed
  • Wild caught
  • Consumption
  • Population

The input parameter for this method is:

  • row: a dictionary representation of one row of data from the data file as provided by csv.DictReader. (Remember that csv.DictReader reads in numbers (e.g., 1960, 333) as strings. You must convert them to numbers in the update_data method!) Using the example .csv file discussed in the Background section, the line of data: "Ankh-Morpork,AMP,farmed,321,333" would be provided as the following dictionary:
{
    "country": "Ankh-Morpork",
    "country code": "AMP",
    "measure": "farmed",
    "1960": "321",
    "1961": "333"
}

Warning

If data is missing for a specific year, it would be automatically stored as an empty string. In order to avoid errors later on, missing data should be stored as None for the given year.

update_data does not return anything, however it updates one of the self.farmed, self.wild_caught, self.consumption or self.population as indicated by the value associated with the key "measure" in the provded row.

If the dictionary above was passed in as the argument for row, the Country’s self.farmed field would be updated to be the following dictionary:

{
    1960: 321.0,
    1961: 333.0
}

This is because the row’s "measure" was "farmed" and it contains data from the years 1960 and 1961.

Note again how the numbers are converted from strings! Years are integers, and values associated with those years are floats. Remember to use self.years to iterate through the years!

Problem 1c: get_actual_production(self, year):

For this problem, write a method in the Country class called get_actual_production(self, year) that calculates the actual total production for the country in the given year. The total production should be the sum of the farmed and wild caught values for that country for the given year. While you may assume we have already updated the data into the dictionaries self.farmed and self.wild_caught such that they will not be empty dictionaries, note that we might still be missing data for a given year.

Calling the get_actual_production method should update the self.production dictionary to contain a pairing between the given year and the total production for that year. For example, given the sample data shown in Problem 1b, calling get_actual_production(1960) should change self.production to be {1960: 321.0}. If an additional dictionary such as the following was updated into the class via the update_data method:

{
    "country": "Ankh-Morpork",
    "country code": "AMP",
    "measure": "wild caught",
    "1960": "400"
}

Then calling get_actual_production(1960) should update self.production to be {1960: 721.0}, as it adds the number of farmed and wild_caught values for that country for that year.

Recall that we might have missing data for certain years and measures that we have stored as None. To handle these missing values:

  • If both farmed and wild caught data are missing for the given year for this country, then the total production for the given year should be None. The entry should still exist in the self.production dictionary, for example, like so {1960: None}.
  • If only one of the farmed or wild caught data points is missing for the given year for this country, then the total production should just be the existing value. That is, if we are missing farmed data for 1960, but we have wild caught data, total production should be the same as wild caught.

Problem 1d: get_production_need(self, year):

For this problem, write a method in the Country class called get_production_need(self, year) that calculates the amount of seafood production that is needed to feed the country’s population for the given year. The values for the consumption measure in the data are specified in kilograms / capita for a particular year. We will calculate the need for a given year by multiplying the population for that year by the consumption for that year. However, since the values for the two production measures in the data (farmed and wild caught) are specified in metric tons (1000kg) for a particular year, to be able to better compare a country’s need to its actual production, we will need to convert the need to metric tons as well.

We can therefore calculate the production need with:

production_need=population×consumption1000 production\_need = \frac{population \times consumption}{1000}

Calling this method should update the self.need dictionary to contain a pairing between the year and the seafood need for that year.

Tip

If either population or consumption is missing for the given year and country, then the need value should be None.

Problem 1e: calculate(self):

For this problem, you will write a method called calculate(self) that updates the actual production and the production needed for each year in the range of self.years. You will want to use the get_actual_production(self, year) and the get_production_need(self, year) methods for each of the years within the range.

This method should not return anything. After it is called the dictionaries self.production and self.need should be populated with the correct values for each year in self.years.

Problem 2: Fishing:

Problem 2a: __init__(self, filename):

For this problem, write the initialization method __init__(self, filename) in the Fishing class within the classes.py file that opens the file filename, uses csv.DictReader to read the file, and creates an instance of the Fishing class.

The Fishing class should contain the following attributes:

  • A dictionary, self.countries, that contains information for all the countries in the file, where the keys are the country codes and the values are instances of the Country class
  • The minimum year seen in the entire dataset as self.min_year
  • The maximum year seen in the entire dataset as self.max_year

General solution approach:

  1. Make an empty dictionary that stores all countries
  2. Open the file and use CSV DictReader to read the data
  3. Determine self.min_year and self.max_year, using the helper methods min_year and max_year from utils.py. Assuming you read the csv.DictReader object into a variable called reader, you can call min_year and max_year using reader.fieldnames as a parameter as following:
self.min_year = min_year(reader.fieldnames)
self.max_year = max_year(reader.fieldnames)
  1. Call the parse_data method, giving it a list of dictionaries (i.e., what’s returned from csv.DictReader). (You will write the parse_data method in the next problem, but the call to it has been given to you in the starter code). Hint: You will want to convert the data from csv.DictReader to a list of dictionaries before passing it into parse_data.

Problem 2b: parse_data(self, raw_data):

Given raw_data as a list of dictionaries read from an input file by csv.DictReader, parse_data should populate the Fishing class’s self.countries dictionary.

For each dictionary in the raw_data list, parse_data should use the "country code" field as the key into the self.countries dictionary. If a value for that country code doesn’t already exist in the self.countries dictionary, a new instance of the Country class should be created. Then, the Country object’s update_data method should be called, given the current row of the data.

This method doesn’t return anything, but does have the effect of populating the self.countries dictionary, which associates country codes (keys) with instances of Country classes.

Problem 2c: total_production_need(self, years_to_predict):

So, now that we’ve done all that work, how much seafood will the entire world need to produce 50 years from now?

For this problem, write a method in the Fishing class called total_production_need(self, years_to_predict) that returns a single number: how many metric tonnes will the world need to produce years_to_predict years from now?

This method should do the following:

  1. Take as input the number of years in the future to predict.
  2. For each country code in self.countries:
    • Calculate the needed production using the country’s calculate method from Problem 1. (This doesn’t return anything, just creates/modifies attributes within the Country object.)
    • Predict the production need for years_to_predict years from now using Country’s predict_need method (already implemented for you).
  3. Get the last value in the predicted values.
    • For example, if you assign what is returned from the predict_need method to prediction, you should be able to get the last value with prediction["values"][-1].
  4. Sum up all of the predicted values of all countries.
  5. Return the total.

Now, let’s run the program! Navigate to the file fishing.py. This file does not contain any classes, but instead contains a main function. In the main function, create an instance of the Fishing class with small.csv as the file. Save the instance in a variable called data, then call total_production_need for 50 years. total_production_need should return a total production need of 13243690.762868665. (Your degree of precision may vary.)

Problem 3: Creating Visual Data Representations

Problem 3a: plot_production_vs_need(self):

In classes.py, write a method for the Country class called plot_production_vs_need(self). You will use the attributes from __init__(self, name) to create a plot of the country’s production vs. its need (not consumption!) over time.

Note that to access this data for each country, calls to get_actual_production and get_production_need are required to populate those fields before accessing the self.production and self.need dictionaries.

A general approach to this problem would follow these steps:

Produce two lists, one for the production and one for the need.

Plot the two lists of data on a line graph. You will need two calls to plt.plot() to do this, each call will use the years list for the x parameter. Make sure to add a label for each line by adding label= to each call of plt.plot(). Because there may be missing data in the “y” values, you should also add the marker='s' parameter to the plt.plot() calls

Additionally, the plot should have the following attributes:

  • xlabel is set to “Year”
  • ylabel is set to “Metric Tonnes”
  • title is set to “Production vs. Need for {country_code}”
  • The legend is added, using plt.legend()
  • Nothing else should be added to this plot; following these instructions should result in a nearly-identical plot as shown below.

Finally, you should use plt.savefig(str(self.name) + "-prod-vs-need.png") to save the plot as a PNG image, and plt.clf() to clear the plot after you are done.

USA Production vs. Need

To produce this plot, uncomment out the lines of code in the main() function of fishing.py. It should look like the following (assuming that data is an instance of the Fishing class)

usa = data.countries["USA"]
usa.plot_production_vs_need()

Problem 3b: Pause and Interpret the Graph

Pause for a few minutes to think about the plot created in Problem 3a and think about the following questions:

  1. When did the USA’s need surpass its production?
  2. What’s missing from the data we gave you that would help explain why the USA was still able to consume more seafood than it produced?

Write your answers to this questions in answers.txt

Problem 3c: Plotting the Prediction Line

Now, we’ll use the given predict_need method to plot the best-fit line for US consumption and a prediction out to 50 years from now.

For this problem, uncomment out the following lines of code in the main() function in fishing.py.

plot_linear_prediction(usa)

Running your program should then result in no errors and a new file named USA_need_prediction.png being created in the same folder as your program.

The plot_linear_prediction(country) function is given to you in the utils.py starter file. This function takes in a Country object as its parameter. It then calls the predict_need method with the production need data for the given Country object, and then plots the best-fit line and prediction. You do not need to write this function; it’s already been written for you. But it does rely on correct implementations for previous problems.

Predicted Need for USA

Problem 4: Running the program with the large file:

At this point, it’s time to run the program with the larger data file. In fishing.py, change data to use the file large.csv as its input. Then, at the very end of your program, add a print statement to print out the total you return from total_production_need. Your program should print the following:

Metric tonnes of seafood needed to be produced in 50 years: 245,637,243.224

Warning

The final number is a gross approximation and should not be used as a realistic estimate of the world’s total seafood production needs.


Code Quality

Info

Make sure to provide descriptive comments for each function in docstring format

Your assignment should pass two checks: flake8 and our code quality guidelines. The code quality guidelines are very thorough.

Collaboration

Warning

If you discuss an assignment with one or more classmates, you must specify with whom you collaborated in the header comment in your submission. You may discuss with as many classmates as you like, but you must cite all of them in your work. Note that you may not collaborate in a way that is prohibited, even if you cite the collaboration. Please consult the course syllabus for more details.

Submission

Submit fishing.py, classes.py, and answers.txt to Gradescope.

HW5 - Homework 5 (NO RESUBMISSIONS)

Final Submission by Friday 03/13 at 11:59 pm.

Submit on Gradescope