CSE 163, Winter 2020: Homework 2: Processing CSV Data

Submission

This assignment and its reflection are due by Thursday, January 23 at 11:59 pm.

You should submit your finished hw2_manual.py, hw2_pandas.py, and hw2_test.py on Ed and the reflection on Google Forms

You may submit your assignment as many times as you want before the late cutoff (remember submitting after the due date will cost late days). Recall on Ed, you submit by pressing the "Mark" button. You are welcome to develop the assignment on Ed or develop locally and then upload to Ed before marking.

Overview

In this assignment, you will use the foundational Python skills you've been developing and apply them to analyze a small dataset. Many datasets you’ll be working with are structured as CSV or tabular representation - this assignment will be an introduction to reading, processing, and grouping rows and columns to calculate some interesting statistics. These skills will be very useful to have a strong foundation in when we work with much larger (and less complete) real-world datasets!

This assignment is broken in to two main parts, where each part mostly does the same computations in different ways. This is to give you the opportunity to compare/contrast different approaches to solving problems

Learning Objectives

After this homework, students will be able to:

  • Follow a Python development work flow for this course, including:
    • Writing a Python script from scratch and turning in the assignment.
    • Use the course infrastructure (flake8, test suites, course resources).
  • Use Python to review CS1 programming concepts and implement programs that follow a specification, including:
    • Use/manipulation of various data types including numbers and strings.
    • Control structures (conditional statements, loops, parameters, returns) to implement basic functions provided by a specification.
    • Basic text file processing.
    • Documenting code.
  • Write unit tests to test each function written including their edge cases.
  • Work with data structures (lists, sets, dictionaries) in Python
  • Process structured data in Python with CSV files as input with and without a library (Pandas)
    • Handle edge cases appropriately, including addressing missing values/data
    • Practice user-friendly error-handling
  • Apply programming to identify and investigate a question on a dataset using basic statistical concepts (e.g. mean, max)

Expectations

Here are some baseline expectations we expect you to meet:

Files

If you are developing on Ed, all the files are there. If you are developing locally, you should download the starter code hw2.zip and open it as the project in Visual Studio Code. The files included are:

  • hw2_manual.py: The file for you to put solutions to Part 0.
  • hw2_pandas.py: The file for you to put solutions to Part 1.
  • hw2_test.py: The file for you to put your tests for Part 0 and Part 1.
  • cse163_utils.py: A file where we will store utility functions for helping you write tests.
  • run_hw2.py: A client program provided to call your functions. This is just for your convenience.
  • pokemon_box.csv: A CSV file that stores information about Pokemon. This columns of this file are explained below.
  • pokemon_test.csv: A very small CSV file that stores information about Pokemon. This columns of this file are explained below.

Data

For this assignment, you will be working with a dataset of Pokemon that you have caught on your Pokemon journey so far. The file pokemon_box.csv stores all the data about the captured Pokemon and has a format that looks like:

id name level personality type weakness atk def hp stage
1 Bulbasaur 12 Jolly Grass Fire 45 50 112 1
... ... ... ... ... ... ... ... ... ...

Note that because this is a CSV file, the file contents have these cells separated by commas.

Column Descriptions

  • id: Unique identification number corresponding to the species of a Pokemon. Note that if there are multiple Pokemon of the same species in the dataset, they all share the id.
  • name: Name of the species of Pokemon. For example Pikachu.
  • level: The level of this Pokemon (an integer)
  • personality: A one-word string describing the personality of this Pokemon
  • type: A one-word string describing the type of the Pokemon (e.g. "Grass" for Bulbasaur)
  • weakness: What type this Pokemon is weak to. For example, Bulbasaur is considered weak to the fire type.
  • atk, def, hp: Pokemon stats that indicate how many hits a Pokemon can take (hp), how strong its attacks are (atk), and how much hits affect it (def)
  • stage: Indicates if this Pokemon has evolved into a new species. For example, in the Charmander species (stage 1), it evolves into a Charmeleon (stage 2), which evolves into Charizard (stage 3). pokemon evolution stages

Table of Contents

Evaluation

Your submission will be evaluated on the following dimensions:

  • Your solution correctly implements the described behaviors. You will have access to some tests when you turn in your assignment, but we will withhold other tests to test your solution when grading. All behavior we test is completely described by the problem specification or shown in an example.
  • Your code meets our style requirements:
    • All files submitted pass flake8
    • Your program should be written with good programming style. This means you should use the proper naming convention for methods (snake_case), your code should not be overly redundant and should avoid unnecessary computations.
    • Every function written is commented using a doc-string format that describes its behavior, parameters, returns, and highlights any special cases.
    • There is a comment at the top of each file you write with your name, section, and a brief description of what that program does.
    • Any expectations on this page or the sub-pages for the assignment are met as well as all requirements for each of the problems are met.

A note on advanced material

A lot of students have been asking questions like "Can I use this method or can I use this language feature in this class?". The general answer to this question is it depends on what you want to use, what the problem is asking you to do and if there are any restrictions that problem places on your solution.

There is no automatic deduction for using some advanced feature or using material that we have not covered in class yet, but if it violates the restrictions of the assignment, it is possible you will lose points. It's not possible for us to list out every possible thing you can't use on the assignment, but we can say for sure that you are safe to use anything we have covered in class so far as long as it meets what the specification asks and you are appropriately using it as we showed in class.

For example, some things that are probably okay to use even though we didn't cover them:

  • Using the update method on the set class even though I didn't show it in lecture. It was clear we talked about sets and that you are allowed to use them on future assignments and if you found a method on them that does what you need, it's probably fine as long as it isn't violating some explicit restriction on that assignment.
  • Using something like a ternary operator in Python. This doesn't make a problem any easier, it's just syntax.

For example, some things that are probably not okay to use:

  • Importing some random library that can solve the problem we ask you to solve in one line.
  • If the problem says "don't use a loop" to solve it, it would not be appropriate to use some advanced programming concept like recursion to "get around" that restriction.

These are not allowed because they might make the problem trivially easy or violate what the learning objective of the problem is.

You should think about what the spec is asking you to do and as long as you are meeting those requirements, we will award credit. If you are concerned that an advanced feature you want to use falls in that second category above and might cost you points, then you should just not use it! These problems are designed to be solvable with the material we have learned so far so it's entirely not necessary to go look up a bunch of advanced material to solve them.

tl;dr; We will not be answering every question of "Can I use X" or "Will I lose points if I use Y" because the general answer is "You are not forbidden from using anything as long as it meets the spec requirements. If you're unsure if it violates a spec restriction, don't use it and just stick to what we learned before the assignment was released."