CSE 163, Spring 2019: Exams

Exam 2

After Exam

Exam 2 Solution

Statistics:

• Mean: 73/90 (81%)
• Median: 75/90 (83%)
• Std Dev: 11.5

Overview

Exam 2 will take place on Monday, June 3. It will take place in CSE2 G10 from 2:30 pm - 3:20 pm.

You are allowed to bring ONE 8.5x11 INCHES đź™‚ piece of paper for notes. You may use both front and back sides and it may be handwritten or typed.

There will also be a reference sheet provided with every exam with a list of methods for the classes and libraries we have learned so far. This list shown at the end of this page.

Topics covered

You are responsible for understanding the following topics for Exam 2. You will still be expected to know the essential things from Exam 1 (e.g. how to write a method, how to write a class, how to tell the efficiency of a method), but the emphasis will be on material we covered that was not on Exam 1.

• Hashing:
• Hash functions and how they are used in set to make access $$\mathcal{O}(1)$$
• Understand what properties are desirable in a hash function
• Geo-spatial data (geopandas, matplotlib):
• Selecting data from a GeoDataFrame using the various methods we have discussed in class or used on the assignments
• Functions to compute values from GeoSeries or GeoDataFrame (e.g. max, count, mean) as well as how to aggregate data using groupby and dissolve
• Mechanics of how a join works to merge datasets and how to write code to make a join happen
• Filtering and modifying data in geopandas objects
• How to make geo-spatial plots using geopandas, including making multiple plots on the same figure
• Ethics:
• Understand the case studies we discussed in lecture and what the ethical concerns are for each one
• Principles of how data scientists can avoid doing something unethical with their analyses for these case studies
• Image data (numpy):
• Read and write code that uses numpy arrays
• Simulate a convolution using a given kernel
• Broadcasting values with numpy arithmetic
• Basic idea of neural network and techniques used to use machine learning on image data
• Ideas in machine learning like supervised vs unsupervised learning or hyper-parameter tuning

The following topics will NOT be covered on Exam 2:

• The Count-Min Sketch data structure
• Understanding the matrix multiply (np.dot)
• Reading/Writing images/geo-spatial data from files (geopandas.read_file, imageio)
• Details of convolutional neural networks
• What's a gibbon
• Code involving the scikit-image library

Practice material

• Practice Exam (problems | solutions)
• Section Exam (problems | solutions)
• The homeworks and section problems are great things to reference for the types of questions we will ask. You may also want to look at lecture examples, but they sometimes involve material more advanced than what you need for the exam.

Reference Sheet

As described above, a reference sheet will be provided on the exam that has a list of functions and their parameters that we learned so far. This reference sheet will only contain method names and parameters and will not have descriptions of the methods or examples; if you feel like you would like descriptions for the methods, you may use space on your cheat sheet to do write them down.

The reference sheet for Exam 2 will build off of the one for Exam 1 with new functions added in. We removed things that will be guaranteed that you will not need to write for Exam 2.

We wil adopt a new convention that puts optional parameters with a question mark after their name.

• Built-in Python functions
• print(*strings)
• range(end)
• range(start, end, step?)
• abs(v)
• min(v1, v2)
• max(v1, v2)
• sum(v1, v2)
• open(fname)
• zip(l1, l2)
• Types: int(v), float(v), str(v), bool(v)
• String methods
• upper()
• lower()
• find(s)
• strip()
• split()
• List methods
• Construct: list() or []
• append(val)
• extend(lst)
• insert(idx, val)
• remove(val)
• pop(idx)
• index(val)
• reverse()
• sort(key?)
• Set methods
• Construct: set()
• add(val)
• remove(val)
• Dictionary methods
• Construct: dict() or {}
• keys()
• values()
• items()
• Special methods to implement in a class
• __init__
• __repr__
• __eq__
• __hash__
• Pandas object methods:
• mean()
• min() / max()
• idxmin() / idxmax()
• count()
• unique()
• groupby(col)
• apply(fun)
• isnull()
• notnull()
• dropna()
• fillna(val)
• sort_values(col)
• sort_index()
• nlargest(n, col)
• merge(other, left_on, right_on, how)
• Pandas object fields
• index
• loc[row, col]
• Geopandas object methods
• plot(column?, legend?, ax?, color?, vmin?, vmax?)
• dissolve(by, aggfunc)
• Any of the pandas functions above
• Geopandas module methods
• geopandas.sjoin(left, right, op, how)
• matplotlib
• plt.subplots(nrows, ncols)
• plt.show()
• plt.savefig(f_name)
• numpy module methods
• np.array(vals?)
• np.arange(end)
• np.arange(start, end, step?)
• np.ones(shape), np.zeros(shape)
• np.dot(a1, a2)
• np.sum(a), np.min(a), np.max(a), np.mean(a)
• numpy array methods
• reshape(shape)
• sum(), min(), max(), mean()
• copy()
• numpy array fields
• shape

Exam 1

After Exam

Exam 1 Solution

Remember that exams are only worth 25% of your total grade, with exam 1 being about half of that. If exam 1 did not go as well as you wanted it to, you still have time to improve for exam 2. This also means much more of your grade is determined by things outside of exams so there is a lot of other things you can also focus on.

Statistics:

• Mean: 73/90 (81%)
• Median: 75/90 (83%)
• Std Dev: 10.7

Overview

Exam 1 will take place on Friday, May 10. It will take place in CSE2 G10 from 2:30 pm - 3:20 pm.

You are allowed to bring one 8.5x11 INCHES đź™‚ piece of paper for notes. You may use both front and back sides and it may be handwritten or typed.

There will also be a reference sheet provided with every exam with a list of methods for the classes and libraries we have learned so far. This list shown at the end of this page.

Topics covered

You are responsible for understanding the following topics:

• Python Programming:
• Write a Python function that uses basic constructs we discussed in class like variables, arithmetic, loops, conditionals.
• Manipulate data for example numbers, strings or data structures like lists, dictionaries, or sets.
• Open and read files in order to manipulate the data in the file.
• Indexing and slicing into structures to get desired data.
• Write a basic list comprehension (like we discussed in lecture).
• Pandas:
• Selecting data from a Series or DataFrame using the various methods weâ€™ve discussed in class or used on the assignments.
• Functions to compute values from Series or DataFrame (e.g. max, count, mean) as well as how to aggregate data using groupby.
• Filtering and modifying data in pandas objects.
• Basics of time series (i.e. using a datetime index and selecting data).
• Data Visualization:
• Describe the effectiveness or limitations of a visualizationâ€™s ability to communicate its ideas.
• Be able to read and write code that makes basic plots using seaborn.
• Machine Learning:
• Understand basic terminology, including but not limited to:
• Features and Labels
• Models: Regression vs classification
• Training vs Testing
• Overfitting
• Decision Tree
• Use sklearn to train a machine learning model using best practices.
• Use a decision tree to predict labels for data and to comment on how the tree indicates the importance of features.
• Classes and Objects:
• Write a class with fields and functions from scratch.
• Write code that uses your object as a client.
• Efficiency:
• Look at a function and describe its efficiency using Big-O notation.

The following topics will NOT be covered on the Exam 1:

• Using negative indices to specify the end of the list in slice syntax. Itâ€™s fair game to use a negative step size like 6:0:-2, but we wonâ€™t test your knowledge of negative indices like -4:-1.
• Advanced time series operations like resample.
• The exact list of visualization encodings and their relative effectiveness. We only care that you know that there are different encodings and they have different effectivenessâ€™ and can explain how that might affect a visualizations readability.
• Using matplotlib to customize visualizations.
• Specifics of the algorithm to learn a decision tree.

Practice material

• Practice Exam (problems | solutions)
• Another practice exam will be handout out in section on May 9.
• The homeworks and section problems are great things to reference for the types of questions we will ask. You may also want to look at lecture examples, but they sometimes involve material more advanced than what you need for the exam.

Reference Sheet

As described above, a reference sheet will be provided on the exam that has a list of functions and their parameters that we learned so far. This reference sheet will only contain method names and parameters and will not have descriptions of the methods or examples; if you feel like you would like descriptions for the methods, you may use space on your cheat sheet to do write them down.

• Built-in Python functions
• print(*strings)
• range(end)
• range(start, end[, step])
• abs(v)
• min(v1, v2)
• max(v1, v2)
• sum(v1, v2)
• open(fname)
• Types: int(v), float(v), str(v), bool(v)
• String methods
• upper()
• lower()
• find(s)
• strip()
• split()
• List methods
• Construct: list() or []
• append(val)
• extend(lst)
• insert(idx, val)
• remove(val)
• pop(idx)
• index(val)
• reverse()
• sort(key=None)
• Set methods
• Construct: set()
• add(val)
• remove(val)
• Dictionary methods
• Construct: dict() or {}
• keys()
• values()
• items()
• File methods
• readlines()
• read()
• Special methods to implement in a class
• __init__
• __repr__
• __eq__
• Pandas methods
• Parse: pd.read_csv
• mean()
• min() / max()
• idxmin() / idxmax()
• count()
• unique()
• groupby(col)
• apply(fun)
• isnull()
• notnull()
• dropna()
• fillna(val)
• sort_values(col)
• sort_index()
• nlargest(n, col)
• Pandas fields
• index
• loc[row, col]
• Seaborn methods
• sns.catplot(x, y, data, kind[, hue])
• kind: ["count", "bar", "violin"]
• sns.relplot(x, y, data, kind[, hue[, size]])
• kind: ["scatter", "line"]
• sns.regplot(x, y, data)
• sklearn methods
• sklearn.metrics.accuracy_score(y_true, y_pred)
• sklearn.metrics.mean_square_error(y_true, y_pred)
• sklearn.model_selection.train_test_split(X, y, test_size)
• sklearn model classes
• sklearn.tree.DecisionTreeClassifier()
• sklearn.tree.DecisionTreeRegressor()
• sklearn model methods
• fit(X, y)
• predict(X)