Useful CSE 160 Resources

Learning Objectives

In this assignment, you will construct classes with Python to effectively parse and utilize data regarding fishing, population, and farming around the world! For this assignment, you will be doing the following:

  • Write a full Python program from scratch and with only a minimal starting template
  • Write classes for application in creating data statistics
  • Read and process data from a given comma-separated values (CSV) file
  • Build and handle a highly nested structure
  • Create plots and graphs to visualize different aspects of the data
  • Practice good coding styles, as defined by the course style guide

Warning

There are many problems and sub-problems to this assignment, so it may appear daunting. Fear not! It’s a simpler assignment than it may appear at first sight.

Problem 0: Setting up & Program Organization:

  1. Download and extract homework5.zip
  2. You should have a folder with five files:
    1. fishing.py - a file that will run all the rest of your code; this is a nearly-empty file!
    2. classes.py - a file that contains the classes that contain most of the logic for this assignment.
    3. utils.py - a file with a handful of functions that you will either use in fishing.py
    4. small.csv - a small subset of the data that you should use for testing
    5. large.csv - the full, larger set of data that you will use for answering the final questions
    6. answers.txt - where you’ll write your answers to the questions in this homework
    7. test_fishing.py - this is a file given to you to help test and debug your program. You can (and should!) run this program periodically while working on the assignment to see if your program is meeting the spec’s expectations. It is set up similarly to HW4’s test files, and can be run in the terminal with python test_fishing.py.
  3. Make sure you have all of these files in the same folder. The subsequent problems and questions will explain in further detail the format and expected usage of the data.

In this assignment you will be asked to write the following classes that contain the following functions; all of which will be expected and run by the autograder. You are expected to implement all of them as defined in this specification:

  • Country
    • __init__(self, name, start_year, end_year)
    • update_data(self, row)
    • get_actual_production(self, year)
    • get_production_need(self, year)
    • calculate(self)
    • plot_production_vs_consumption(self)
    • predict_need(self, predict_years)
      • This function is given to you in the starter files and will be used later in the assignment!
  • Fishing
    • __init__(self, filename)
    • parse_data(self, raw_data)
    • total_production_need(self, years_to_predict)

The development pattern for this assignment is similar to past assignments. You will start by writing a class (Country) that will generate statistics for data. Then you will write a class (Fishing) that will parse the data in its initialization to read data from a file and calculate the total production need. Along the way, we will ask you guiding questions and ask you to plot some of the data.

Problem 1: Country

Tip

We have created tests for this section, so be sure to test your class but running them and checking in with how your program is functioning for this problem!

Problem 1a: __init__(self, name, start_year, end_year):

For this problem, you will write the initialization function (__init__(self, name, start_year, end_year)) in the Country class within the classes.py file that creates the instance of the class, country, that initializes the following:

  • Name of the country as a string
  • Empty dictionaries for each data category collected:
    • Farmed, as self.farmed
    • Wild caught as self.wild_caught
    • Consumption as self.consumption
    • Population as self.population
    • Production as self.production
    • Need as self.need

In addition, self.years has already been initialized for you. This contains the range of years for which your file has data for as a list of integers.

Problem 1b: update_data(self, row):

For this problem, you will write a function (update_data(self, row)) in the Country class. This function should update the instance (“self”) that updates the data for each of the following categories for the range of self.years:

  • Farmed
  • Wild caught
  • Consumption
  • Population

The input parameters for this function are defined as follows:

  • row: a dictionary representation of one row out of the data file as read by csv.DictReader. (Remember that this reads in numbers (e.g., 1960, 333) as strings. You must convert them to numbers in this function!) E.g., "Ankh-Morpork,AMP,farmed,321,333" would be given as
{
    "country": "Ankh-Morpork",
    "country code": "AMP",
    "measure": "farmed",
    "1960": "321",
    "1961": "333"
}

update_data does not return anything, however it updates one of the class fields for farmed, wild caught, consumption, or population as indicated by the "measure" key in the given row.

In the example shown earlier, the Country’s self.farmed field would be updated to be the following dictionary:

{
    1960: 321.0,
    1961: 333.0
}

This is because the row’s "measure" was "farmed" and it contains data from the years 1960 and 1961.

Note again how the numbers are converted from strings! Years are integers, and values are floats. Remember to use self.years to iterate through the years!

Problem 1c: get_actual_production(self, year):

For this problem, write a function in the Country class called get_actual_production(self, year) that calculates the actual production for the country in the given year. The total production should be the sum of the farmed and wild caught values for that country for that year. You may assume that data has already been updated into the dictionaries self.farmed and self.wild_caught.

Calling this method should update the self.production dictionary to contain a pairing between the year and the total production for that year. For example, given the sample data shown in Problem 1, calling get_actual_production(1960) should change self.production to be {1960: 321.0}. If an additional dictionary such the following was updated into the class via the update_data method:

{
    "country": "Ankh-Morpork",
    "country code": "AMP",
    "measure": "wild caught",
    "1960": "400",
}

Then calling get_actual_production(1960) should update self.production to be {1960: 721.0}, as it adds the number of farmed and wild_caught values for that country for that year.

The following should be considered for handling missing data points:

  • If both farmed and wild caught are missing for the given year and country, then the total production for that year should be None. This entry should still exist in the self.production dictionary.
  • If only one of the farmed or wild caught is missing for the given year and country, then the total production should just be the existing value.

Problem 1d: get_production_need(self, year):

For this problem, write a function in the Country class called get_production_need(self, year) that calculates the amount of seafood production that is needed to feed the country’s population for the given year. The consumption values in the data are specified as kilograms / capita / year. Since the production values are given in metric tons (1000kg), we will need to convert the consumption values to metric tons as well.

We can therefore calculate the necessary production with:

productionneed=population×consumption1000 production need = \frac{population \times consumption}{1000}

Calling this method should update the self.need dictionary to contain a pairing between the year and the seafood need for that year.

Tip

If either population or consumption is missing for the given year and country, then the need value should be None.

Problem 1e: calculate(self):

For this problem, you will write a function called calculate(self) that updates the actual production and the production needed for each year in the range of self.years. You will want to use the get_actual_production(self, year) and the get_production_need(self, year) methods for each of the years within the range.

This method should not return anything. After it is called the dictionaries self.production and self.need should be populated with the correct values for each year in self.years.

Problem 2: Fishing:

Problem 2a: __init__(self, filename):

For this problem, write the initialization method (__init__(self, filename)) in the Fishing class within the classes.py file that reads the data in filename, uses csv.DictReader to read the file, and creates an instance of the Fishing class.

The Fishing class should contain the following attributes:

  • A dictionary, self.countries, that contains information for all the countries in the file where the keys are the country codes and the values are instances of the Country class
  • The minimum year seen in the entire dataset as self.min_year
  • The maximum year seen in the entire dataset as self.max_year

General solution approach:

  1. Make an empty dictionary that stores all countries
  2. Open the file and use CSV DictReader to read the data
  3. Determine self.min_year and self.max_year, using the helper functions min_year and max_year from utils.py. Assuming you read the csv.DictReader object into a variable called reader, you can call min_year and max_year using reader.fieldnames as a parameter as following:
self.min_year = min_year(reader.fieldnames)
self.max_year = max_year(reader.fieldnames)
  1. Call the parse_data function, giving it a list of dictionaries (i.e., what’s returned from csv.DictReader). (You will write the parse_data function in the next problem, but the call to it has been given to you in the starter code). Hint: it might help to convert the csv.DictReader to a list before passing it into parse_data

Problem 2b: parse_data(self, raw_data):

Given raw_data as a list of dictionaries read from the input files by csv.DictReader, parse_data should populate the Fishing class’s self.countries dictionary.

For each dictionary in the raw_data list, parse_data should use the "country code" field as the key into the self.countries dictionary. If a value for that country code doesn’t exist, a new instance of the Country class should be created; if not, the value should be retrieved. Then, the country’s update_data method should be called, given the current row of the data.

This function doesn’t return anything, but does have the effect of populating the dictionary of country codes (keys) pointing to instances of Country classes.

Problem 2c: total_production_need(self, years_to_predict):

So, now that we’ve done all that work, how much seafood will the entire world need to produce 50 years from now?

For this problem, write a method in the Fishing class called total_production_need(self, years_to_predict) that returns a single number: how many metric tonnes will the world need to produce years_to_predict years from now?

This function should do the following:

  1. Take as input the number of years in the future to predict.
  2. For each country code in self.countries:
    • Calculate the needed production using the country’s calculate method from Problem 1. (This doesn’t return anything, just creates/modifies attributes within the country object.)
    • Predict the production need for years_to_predict years from now using country’s predict_need method
  3. Get the last value in the predicted values.
    • For example, if you assign the return value from predict_need to prediction, you should be able to get the last value with prediction["values"][-1].
  4. Sum up all of the predicted values.
  5. Return the total.

Now, let’s run the program! Navigate to fishing.py and, in the main function, create an instance of the Fishing class with small.csv as the file. Save the instance in a variable called data, then call total_production_need for 50 years. total_production_need should return a total production need of 13243690.762868665. (Your degree of precision may vary.)

Problem 3: Creating Visual Data Representations

Problem 3a: plot_production_vs_consumption(self):

Info

This function is called plot_production_vs_consumption when it should more accurately be called plot_production_vs_need. We will keep the old name for this quarter to avoid issues with the autograder.

Write a function called plot_production_vs_consumption(self). You will use the attributes from the __init__(self, name) to create a plot of the country’s production vs. its need (not consumption!) over time.

A general approach to this problem would follow these steps:

Produce two lists, one for the production and one for the need.

Plot the two lists of data on a line graph. You will need two calls to plt.plot() to do this, each call will use the years list for the x parameter. Make sure to add a label for each line by adding label= to each call of plt.plot(). Because there may be missing data in the “y” values, you should also add the marker='s' parameter to the plt.plot() calls

Additionally, the plot should have the following attributes:

  • xlabel is set to “Year”
  • ylabel is set to “Metric Tonnes”
  • title is set to “Production vs. Need for {country_code}”
  • The legend is added, using plt.legend()
  • Nothing else should be added to this plot; following these instructions should result in a nearly-identical plot as shown above. (Minor differences between operating systems may occur.)

Finally, you should use plt.savefig("us-prod-vs-need.png") to save the plot as a PNG image, and plt.clf() to clear the plot after you are done.

USA Production vs. Need

To produce this plot, uncomment out the lines of code in the main() function of fishing.py. It should look like the following (assuming that data is an instance of the Fishing class)

usa = data.countries["USA"]
usa.plot_production_vs_consumption()

Problem 3b: Pause and Interpret the Graph

Pause for a few minutes to think about the plot created in Problem 2a and think about the following questions:

  1. When did the US’s need surpass its production?
  2. What’s missing from the data we gave you that would help explain why the US was still able to consume more seafood than it produced?

Write your answers to this questions in answers.txt

Problem 3c: Plotting the Prediction Line

Now, we’ll use the given predict_need function to plot the best-fit line for US consumption and a prediction out to 50 years from now.

For this problem, uncomment out the following lines of code in the main() function of Fishing.py.

plot_linear_prediction(usa)

Running your program should then result in no errors and a new file named USA_need_prediction.png being created in the same folder as your program.

The plot_linear_prediction(data, country_code) function is given to you in the utils.py starter file. This function takes in a country object as its parameter. It then calls the predict_need function with the production need data for the given country object, and then plots the best-fit line and prediction. You do not need to write this function; it’s already been written for you. But it does rely on correct implementations for previous problems.

Predicted Need for USA

Problem 4: Running the program with the large file:

At this point, it’s time to run with the larger data file. Change data to use "large.csv" as its input. Then, at the very end of your program, add a print statement to print out the total you return from total_production_need. Your program should print the following:

Metric tonnes of seafood needed to be produced in 50 years: 245,637,243.224

Info

You can format very large numbers using Python’s f-string syntax. For example, if total is 245637243.223947, doing print(f'{total_need:.3f}') would print 245,637,243.224

Warning

The final number is a gross approximation and should not be used as a realistic estimate of the world’s total seafood production needs.

Submit your work

Submit fishing.py, classes.py, and answers.txt to Gradescope.

HW5 - Homework 4 - Part 2

Initial Submission by Wednesday 05/28 at 11:59 pm.

Submit on Gradescope