CSE 163, Winter 2020: Exams

Exam 1

After Exam

Exam 1 Solution

Regrade Policy

Remember that this exam is only worth 20% of your total grade. If exam 1 did not go as well as you wanted it to, you still have time to improve for exam 2. This also means much more of your grade is determined by things outside of exams so there is a lot of other things you can also focus on.

Statistics:

  • Mean: 71/90 (78.9%)
  • Median: 72.5/90 (80.6%)
  • Std Dev: 10.9

Overview

Exam 1 will take place on Friday, February 14. It will take place in PCAR 192 from 3:30 pm - 4:20 pm.

You are allowed to bring one 8.5x11 INCHES ๐Ÿ™‚ piece of paper for notes. You may use both front and back sides and it may be handwritten or typed.

There will also be a reference sheet provided with every exam with a list of methods for the classes and libraries we have learned so far. This list shown at the end of this page.

Topics covered

You are responsible for understanding the following topics:

  • Python Programming:
    • Write a Python function that uses basic constructs we discussed in class like variables, arithmetic, loops, conditionals.
    • Manipulate data for example numbers, strings or data structures like lists, dictionaries, or sets.
    • Open and read files in order to manipulate the data in the file.
    • Indexing and slicing into structures to get desired data.
    • Write a basic list comprehension (like we discussed in lecture).
  • Pandas:
    • Selecting data from a Series or DataFrame using the various methods weโ€™ve discussed in class or used on the assignments.
    • Functions to compute values from Series or DataFrame (e.g. max, count, mean) as well as how to aggregate data using groupby.
    • Filtering and modifying data in pandas objects.
  • Data Visualization:
    • Describe the effectiveness or limitations of a visualizationโ€™s ability to communicate its ideas.
    • Be able to read and write code that makes basic plots using seaborn.
  • Machine Learning:
    • Understand basic terminology, including but not limited to:
      • Features and Labels
      • Models: Regression vs classification
      • Training vs Testing
      • Overfitting
      • Decision Tree
    • Use sklearn to train a machine learning model using best practices.
    • Use a decision tree to predict labels for data and to comment on how the tree indicates the importance of features.
  • Classes and Objects:
    • Write a class with fields and functions from scratch.
    • Write code that uses your object as a client.
  • Efficiency:
    • Look at a function and describe its efficiency using Big-O notation.

The following topics will NOT be covered on the Exam 1:

  • Advanced time series operations like resample.
  • The exact list of visualization encodings and their relative effectiveness. We only care that you know that there are different encodings and they have different effectivenessโ€™ and can explain how that might affect a visualizations readability.
  • Using matplotlib to customize visualizations.
  • Specifics of the algorithm to learn a decision tree.

Practice material

Ordered from most-recent to least-recent.

  • CSE 163 - 19sp (problems | solutions)
  • Practice Exam (problems | solutions)
  • The homeworks and section problems are great things to reference for the types of questions we will ask. You may also want to look at lecture examples, but they sometimes involve material more advanced than what you need for the exam.

Reference Sheet

As described above, a reference sheet will be provided on the exam that has a list of functions and their parameters that we learned so far. This reference sheet will only contain method names and parameters and will not have descriptions of the methods or examples; if you feel like you would like descriptions for the methods, you may use space on your cheat sheet to do write them down.

  • Built-in Python functions
    • print(*strings)
    • range(end)
    • range(start, end[, step])
    • abs(v)
    • min(v1, v2)
    • max(v1, v2)
    • sum(v1, v2)
    • open(fname)
    • Types: int(v), float(v), str(v), bool(v)
  • String methods
    • upper()
    • lower()
    • find(s)
    • strip()
    • split()
  • List methods
    • Construct: list() or []
    • append(val)
    • extend(lst)
    • insert(idx, val)
    • remove(val)
    • pop(idx)
    • index(val)
    • reverse()
    • sort(key=None)
  • Set methods
    • Construct: set()
    • add(val)
    • remove(val)
  • Dictionary methods
    • Construct: dict() or {}
    • keys()
    • values()
    • items()
  • File methods
    • readlines()
    • read()
  • Special methods to implement in a class
    • __init__
    • __repr__
    • __eq__
  • Pandas methods
    • Parse: pd.read_csv
    • mean()
    • min() / max()
    • idxmin() / idxmax()
    • count()
    • unique()
    • groupby(col)
    • apply(fun)
    • isnull()
    • notnull()
    • dropna()
    • fillna(val)
    • sort_values(col)
    • sort_index()
    • nlargest(n, col)
  • Pandas fields
    • index
    • loc[row, col]
  • Seaborn methods
    • sns.catplot(x, y, data, kind[, hue])
      • kind: ["count", "bar", "violin"]
    • sns.relplot(x, y, data, kind[, hue[, size]])
      • kind: ["scatter", "line"]
    • sns.regplot(x, y, data)
  • sklearn methods
    • sklearn.metrics.accuracy_score(y_true, y_pred)
    • sklearn.metrics.mean_square_error(y_true, y_pred)
    • sklearn.model_selection.train_test_split(X, y, test_size)
  • sklearn model classes
    • sklearn.tree.DecisionTreeClassifier()
    • sklearn.tree.DecisionTreeRegressor()
  • sklearn model methods
    • fit(X, y)
    • predict(X)

Exam 2

After Exam

Exam 2 Solution

Regrade Policy

Statistics:

  • Mean: 69.7/79 (88.2%)
  • Median: 73.0/79 (92.4%)
  • Std Dev: 9.45 (11.97%)

Online Exam

Remember, we are moving the exam online. Please make sure you are up to date with the announcments on Ed and have read the exam instructions.

Overview

Exam 2 will take place on Monday, March 9. It will take place in PCAR 192 from 3:30 pm - 4:20 pm.

You are allowed to bring ONE 8.5x11 INCHES ๐Ÿ™‚ piece of paper for notes. You may use both front and back sides and it may be handwritten or typed.

There will also be a reference sheet provided with every exam with a list of methods for the classes and libraries we have learned so far. This list shown at the end of this page.

Topics covered

You are responsible for understanding the following topics for Exam 2. You will still be expected to know the essential things from Exam 1 (e.g. how to write a method, how to write a class, how to tell the efficiency of a method), but the emphasis will be on material we covered that was not on Exam 1.

  • Hashing:
    • Hash functions and how they are used in set to make access \(\mathcal{O}(1)\)
    • Understand what properties are desirable in a hash function
  • Geo-spatial data (geopandas, matplotlib):
    • Selecting data from a GeoDataFrame using the various methods we have discussed in class or used on the assignments
    • Functions to compute values from GeoSeries or GeoDataFrame (e.g. max, count, mean) as well as how to aggregate data using groupby and dissolve
    • Mechanics of how a join works to merge datasets and how to write code to make a join happen
    • Filtering and modifying data in geopandas objects
    • How to make geo-spatial plots using geopandas, including making multiple plots on the same figure
  • Ethics:
    • Understand the case studies we discussed in lecture and what the ethical concerns are for each one
    • Principles of how data scientists can avoid doing something unethical with their analyses for these case studies
  • Image data (numpy):
    • Read and write code that uses numpy arrays
    • Simulate a convolution using a given kernel
    • Broadcasting values with numpy arithmetic
    • Basic idea of neural network and techniques used to use machine learning on image data
    • Ideas in machine learning like supervised vs unsupervised learning or hyper-parameter tuning

The following topics will NOT be covered on Exam 2:

  • The Count-Min Sketch data structure
  • Understanding the matrix multiply (np.dot)
  • Reading/Writing images/geo-spatial data from files (geopandas.read_file, imageio)
  • Details of convolutional neural networks
  • What's a gibbon
  • Code involving the scikit-image library

Practice material

  • CSE 163 - 19sp (problems | solutions)
  • Practice Exam (problems | solutions)
  • The homeworks and section problems are great things to reference for the types of questions we will ask. You may also want to look at lecture examples, but they sometimes involve material more advanced than what you need for the exam.

Reference Sheet

As described above, a reference sheet will be provided on the exam that has a list of functions and their parameters that we learned so far. This reference sheet will only contain method names and parameters and will not have descriptions of the methods or examples; if you feel like you would like descriptions for the methods, you may use space on your cheat sheet to do write them down.

The reference sheet for Exam 2 will build off of the one for Exam 1 with new functions added in. We removed things that will be guaranteed that you will not need to write for Exam 2.

We wil adopt a new convention that puts optional parameters with a question mark after their name.

  • Built-in Python functions
    • print(*strings)
    • range(end)
    • range(start, end, step?)
    • abs(v)
    • min(v1, v2)
    • max(v1, v2)
    • sum(v1, v2)
    • open(fname)
    • zip(l1, l2)
    • Types: int(v), float(v), str(v), bool(v)
  • String methods
    • upper()
    • lower()
    • find(s)
    • strip()
    • split()
  • List methods
    • Construct: list() or []
    • append(val)
    • extend(lst)
    • insert(idx, val)
    • remove(val)
    • pop(idx)
    • index(val)
    • reverse()
    • sort(key?)
  • Set methods
    • Construct: set()
    • add(val)
    • remove(val)
  • Dictionary methods
    • Construct: dict() or {}
    • keys()
    • values()
    • items()
  • Special methods to implement in a class
    • __init__
    • __repr__
    • __eq__
    • __hash__
  • Pandas object methods:
    • mean()
    • min() / max()
    • idxmin() / idxmax()
    • count()
    • unique()
    • groupby(col)
    • apply(fun)
    • isnull()
    • notnull()
    • dropna()
    • fillna(val)
    • sort_values(col)
    • sort_index()
    • nlargest(n, col)
    • merge(other, left_on, right_on, how)
  • Pandas object fields
    • index
    • loc[row, col]
  • Geopandas object methods
    • plot(column?, legend?, ax?, color?, vmin?, vmax?)
    • dissolve(by, aggfunc)
    • Any of the pandas functions above
  • Geopandas module methods
    • geopandas.sjoin(left, right, op, how)
  • matplotlib
    • plt.subplots(nrows, ncols)
    • plt.show()
    • plt.savefig(f_name)
  • numpy module methods
    • np.array(vals?)
    • np.arange(end)
    • np.arange(start, end, step?)
    • np.ones(shape), np.zeros(shape)
    • np.dot(a1, a2)
    • np.sum(a), np.min(a), np.max(a), np.mean(a)
  • numpy array methods
    • reshape(shape)
    • sum(), min(), max(), mean()
    • copy()
  • numpy array fields
    • shape