Lecture1 Reading: Required: Mitchell Chapter 1 Required: Hulten Chapter 1-3, 16 Required: Hulten Part of Chapter 19 pp225 - 229 (stop at ‘distribution of mistakes’) 0) Setup You will need to set up your environment prepare for later assignments. This involves installing the relevant tools and running the sample code on sample data. First, get familiar with python 3.x: https://www.learnpython.org/ No way Python!?!? Python is widely used in machine learning and data science to process, prepare, & explore data; and to stitch together ML tools into experiments/work flows. Keep in mind that we are only going to be programming very basic versions of the various algorithms, and there will be very little need for optimization. I provide python framework code to get you started on the assignments, deal with data loading, etc. [ If you really think python will be a problem for you, let me know. ] Next install the tools: Install visual studio community: https://visualstudio.microsoft.com/thank-you- downloading-visual- studio/?sku=Community&rel=15&rid=34347&utm_source=docs&utm_medium=clickbu tton&utm_campaign=python_gettingstarted# Install the with the python option (as described here: https://docs.microsoft.com/en- us/visualstudio/python/installing-python-support-in-visual-studio?view=vs- 2017) (Links to an external site.) You are welcome to use a different python environment, but we haven't tested others, and can't offer support for them. Finally, execute the assignment 1 framework: Download (Links to an external site.) the framework code from the web page. Download (Links to an external site.) the SMSSpamCollection dataset from the web page. Update the kDataPath variable and execute StartingPoint1.py in your python environment. Make sure you get accuracy reports for the strawman models. Spend some time reading through the code. Over the next several weeks we’ll rewrite and expand most of it. Hand in: This assignment is ungraded. There is nothing to hand in. But if you don’t complete it you’re going to have a hard time doing the next assignment. 1) Logistic Regression Take the heuristic spam model as a starting point (that is, match the general API) and implement logistic regression. Recall, loss is the average of: (-y[i] * math.log(yPredicted[i])) - ((1 - y[i]) * (math.log(1.0-yPredicted[i]))) (e.g. the validation set loss is the sum of that across the validation data, divided by the number of validation samples) Recall, yPredicted is: 1.0 / (1.0 + math.exp(-z)) Where z is: self.weight0 + sum([exampleFeatures[i] * self.weights[i] for i in range(len(exampleFeatures))]) Use a threshold of 0.5 for classification (if score for a sample after the sigmoid is > 0.5 it is classified as spam). Use gradient descent for optimization. Recall the gradient for weight j is sum over samples of: ((yPredicted[i] - y[i]) * x[i][j]) [ divide this by the number of samples then multiply by step size for update. ] HAND IN: Run your implementation on the training/validation data in the framework and hand in a clear document containing the following. Include your logistic regression code as an appendix (no points for the code, you can lose points from not including it). 2 Point -- Run for 50,000 iterations with step size 0.01 and produce a graph of the training set loss vs iteration every 1000 iterations (This may take a few minutes to run. Machine learning takes some patience). 2 Point -- Plot the validation set loss, validation set accuracy, and value of weight[1] (this is the weight associated with X_1) after every 10,000 iterations. 1 Point -- Calculate all the statistics from the evaluation framework on the 50,000 iteration run, including the confusion matrix, precision, recall, etc. Answer in no more than 150 words (plus the plots and tables mentioned above): 1 Point -- What do these measurements tell you about logistic regression compared to the straw-man algorithms? 1 Point -- How did the gradient descent converge? 1 Point -- What makes you think you implemented logistic regression correctly? 2) Basic Model Evaluation Finish implementing the EvaluationsStub methods that you will find in the EvaluationsStub.py code provided in the sample code. These are methods of the form: Precision(y, yPredicted) -> float The full list of evaluations to implement include: Precision Recall False positive rate False negative rate Also implement code to visualize the complete confusion matrix (simple ASCII art is fine). You can find the definitions of all of these in the reading from Hulten chapter 19. Hand in a document containing: 0.5 Points -- Your code. Keep it clear! If the TA can’t easily follow it they will have to deduct credit. 05 points -- A table showing the output of all of these evaluation methods for the spam domain with the most common model and the spam heuristic model (the two models in the starting point I provided). ? Lecture 2 Reading: Required: Hulten Chapters 6, 11,12,17, 19 (Finish 19) 3) Feature Engineering Add bag of words features to your spam domain solution Support frequency based feature selection, top N Support mutual information based features selection top N Tokenize in the simplest way possible (by splitting on whitespace). Recall: MutiualInformation(X,Y) = Sum over every value X has and Y has: P(has X, has Y) * log_e ( P(has X, has Y) / (P(has X) * P(has Y)) ) And use smoothing when calculating the probabilities: P(*) = (# observed + 1) / (total samples + 2) Do all the runs listed below on the train/validation split provided by the framework. HAND IN: A document that contains the following tables (clearly labeled!) 1 point -- Perform a leave-out-one wrapper search on each of the 5 features provided by the starting framework (50,000 iterations, there are 5 features, you can do this evaluation manually or programmatically). (> 40, contains #, contains ‘call’, contains ‘to’, contains ‘your’). Hand in a table showing the accuracy on validation set with each features left out, compared to a model built with all of the features. 0.5 point -- A list of the top 10 bag of word features selected by filtering by frequency. 0.5 point -- A list of the top 10 bag of word features selected by filtering by mutual information. 2 points -- 0.5 point -- Run gradient descent to 50,000 iterations with the top 10 words by frequency. 0.5 point -- Run gradient descent to 50,000 iterations with the top 10 words by mutual information. 0.5 point -- Run gradient descent to 50,000 iterations with the better of these PLUS the hand crafted features from the framework. 0.5 point -- Run gradient descent to 50,000 iterations of the previous setting with 100 words plus hand-crafted instead of 10. Hand in a clearly labeled table comparing the accuracies of these methods 4) ROC Curves and Operating Points Update model.predict for your logistic regression model so that it takes a threshold as a parameter (as opposed to the default we’ve been using so far of 0.5). 1 point — Produce a chart that compares a: * logistic regression model with 10 mutual information features * logistic regression model with 10 mutual information features and my heuristic features by plotting their precision vs their recall at 100 different thresholds (0.1, 0.2, etc). You can use a python plotting library (like mathplotlib) or import data into some other tool to produce the plot (like excel). Get your predictions by training on the training set the framework provides and evaluating on the validation set. 1 point — Hand in a table that contains the threshold that achieve 10% False Positive Rate on the validation set for these two model (10 mutual information features with and without heuristics). Include the False Negative Rate that is achieved on the validation set by that threshold. 5) Categorizing Mistakes Looking at and interpreting the mistakes your models make is an important part of successful machine learning, particularly when doing feature engineering. Implement a way to get the raw context for the samples where your model is most- wrong. Recall this includes examples where the true answer was 1, but the model gives very low probabilities, and examples where the true answer was 0, but gives very high probabilities. To do this you will need to have a version of model.predict that returns raw probabilities (without using a threshold). HAND IN: 0.5 point: Produce a list of the 20 (or as many as your model makes) worst false positives (where the model was very sure but wrong) made by running logistic regression on the initial train/validation split with 10 mutual information features 0.5 points: Produce a list of the 20 worst false negatives (where the model was very sure but wrong) made by running logistic regression on the initial train/validation split with 10 mutual information features Look at these mistakes and categories them according to potential causes for the mistakes -- the property of the message do you think lead to the model getting the wrong answer. Come up with your own categories (reasons for the mistake): 0.5 points: categorize the false positives into at least 4 categories. 0.5 points: categorize the false negatives into at least 4 categories. 1 point: in no more than 150 words describe the insight you got from this process, including one new heuristic feature you think would reduce the bad false positives, and one that would reduce the bad false negatives. ? Lecture 3 Reading: Required: * Mitchell Online Addendum 3 (Links to an external site.) (Not chapter 3 in book) * Mitchell ch 5 (in book) Optional: * Kohavi 95 (http://web.cs.iastate.edu/~jtian/cs573/Papers/Kohavi-IJCAI-95.pdf (Links to an external site.)) 6) Comparing Models Implement code to estimate the 95% range for your accuracy estimates. In the future always include 95% confidence ranges whenever you turn in accuracy estimates. Recall: Upper = Accuracy + 1.96 * sqrt((Accuracy * (1 - Accuracy) / n)) Lower = Accuracy - 1.96 * sqrt((Accuracy * (1 - Accuracy) / n)) Evaluate two settings from your feature selection assignment: 10 words selected by mutual information and 10 words selected by frequency. Implement cross validation that supports any number of folds from 2 to n. Verify that it is selecting the correct data into each fold. Be prepared to run all current evaluations on the result (precision, recall, false positive rate, false negative rate). Hand in a clearly labeled table with: * 1 point -- the accuracy estimates from the train/validation split run with error bounds for the two models (10 words by frequency and 10 words by mutual information) * 1 point -- the accuracy estimates from a 5-fold cross validation run for the two model variants with error bound (10 words by frequency and 10 words by mutual information, cross validation on the data in the train+validation set) NOTE: if your gradient descent is slow (like mine is) these runs are going to start to take a long time. One possibility is to just-not-care — let it run over night or whatever. Another easy approach is to do several runs in parallel. You can do this manually or programmatically. You could also choose to optimize your code, but don’t go overboard. That’s not the point of this assignment. In practice you should use an existing, highly-optimized implementations of machine learning algorithms (not one you write yourself). 7) Build your Best SMS Spam Model This is a competition assignment. See README_Test. Download the kaggle support code (Links to an external site.)that will load the test data and format your submission. Build the best model you can using any of the learning algorithms you developed so far. Do any feature engineering you like. Submit your answers to our competition (Links to an external site.), check the leader board, and iterate according to the submission rules. 5 Points — For submitting an answer that performs better than the TAs baseline answer. 3 Points — For submitting an answer that performs better than the TAs tuned answer. 5 Points — Hand In: A report of no more than 1000 words with no more than 5 figures/tables which demonstrates that you have produced an effective model for the SMS spam problem and properly evaluated your model. 1 Point — Demonstrate several parameter sweeps in figures. 1 Point — Make sure to use the appropriate techniques from the previous lectures (error bounds, various measures of model quality, categorized mistakes), describe how you used one of them and how it helped. 1 Point — Examine your mistakes and improve your feature engineering in at least 3 ways. Describe the features your best model uses (you can use insights from your previous investigation, but indicate which you used). 1 Point — Include an ROC curve comparing the first model you tried with your final best model. Clearly label what they are. 1 Point — Clearly describe your best model and parameter settings you used as well as the complete final feature set. NOTE: Trying many parameters can take a lot of time. You can run different parameter settings in parallel using joblib (Links to an external site.): from joblib import Parallel, delayed # Running the parameter evaluations serially paramsEvals = [ ExpecuteOnParams(params) for params in paramsList ] # Running the parameter evaluations in parallel paramsEvals = Parallel(n_jobs=8)(delayed(ExecuteOnParams)(params) for params in paramsList) # In my implementation params is a dictionary of all the (hyper) parameters to use for feature selection # and model training, and ExecuteOnParams runs selection and fitting using those parameters, # returning an updated params dictionary that includes the validation set loss from the run. ? Lecture 4 Reading: * Required: Hulten 4,5,7,8 * Mitchell ch 3 (in book) 8) Decision Trees Download (Links to an external site.) the support framework for the Adult Census dataset. Download (Links to an external site.) the Adult Census dataset. Implement a decision tree learner: import DecisionTreeModel model = DecisionTreeModel.DecisionTree() model.fit(x,y) yValidatePredicted = model.predict(xValidate) Implement (very simple) numeric attributes by measuring the range of the feature values at each node then consider splitting at the midpoint. E.g. for our binary 0/1 features where every value is 0 if the features is absent and 1 if the feature is present, consider a split at the midpoint of 0.5. Recall: if all samples at a node have the same value for a feature, there is no need to consider splitting. Use greedy search (one step look ahead: just consider a single split at a time). Use InformationGain as the split criteria. Recall Information Gain is: H(node_data) - sum_i p(feature has i) * H(node_data where feature has i) And H is: sum_y - p(node_data has label y) * log(p(node_data has label y)) Implement a single search parameter: minToSplit which stops the tree growth process when a new leaf has fewer samples than the limit. Implement a function to visualize the tree: model.visualize() feature i: >= 0.5: Leaf: < 0.5: Feature j: >= 0.5: Etc… < 0.5: Etc… HAND IN: Run your DecisionTree algorithm on the Adult Census dataset (with the train/validate/test split provided in the sample framework) using the features that were included with the support framework and with minToSplit = 10000. 1 Point -- Report the accuracy on the validation set. Include an error bound. 1 Point -- Include a visualization of the tree. Now tune the minToSplit parameter to try to find the best setting. 3 Point -- In no more than 200 words describe the process you used to optimize minToSplit, including a a chart showing the minToSplit values you tried on the X axis and validation set accuracy on Y axis. How sensitive is the accuracy to the value of minToSplit? Which value turned out to be best? Is it significantly better than the initial setting (10,000) at a 95% confidence threshold? How do you know. 2 Point -- produce an ROC curve comparing this tree learned with minToSplit = 10,000 to the tree you learned with your best setting. Clearly label which curve corresponds to which model. Describe what you are able to learn from looking at the curves. NOTE to produce an ROC curve you need to use a threshold for predicting 1/0 based on the fraction of samples at the leaf with the predicted label. Upload your code in .py files. Upload all your answers in a single PDF file. No archives/zips ? Lecture 5 Reading: * Required: Sections 1,3, & 4 of original random forest paper: https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf (Links to an external site.) * Required: Hulten 9,20 9) Random Forest Implement random forests, using your existing decision tree model as the base model. NumTrees & MinSplit — Support parameters for the number of trees and the minSplit for the base trees. Bagging — Support a flag to use bagging to select data for each tree, when absent you should learn each tree on the full original sample. FeatureRestriction — If specified randomly select N features for each tree and restrict the tree to using those (select a different random set for each tree). If set to 0 use all available features. Seed — Support a random seed so your runs (sampling features and training samples) are deterministic. This is purely to help you debug & it will almost certainly help. Recall, bagging means creating a new training set by randomly selecting samples from the original training set with replacement. If the original is N samples, select N random indexes into the sample array (with replacement) and build a new set using the samples at those indexes. HAND IN: Use the Adult Census data for these assignments. Build a model with numTrees = 10, minSplit = 500, use Bagging and FeatureRestriction = 75. 2 Points — Create a table that has accuracies of the 10 individual trees, along with the accuracy of the full random forest (after the individual trees vote) on xTest. Run parameter sweeps for numTrees in: [1, 20, 40, 60, 80], for each of these settings: minSplit = 2, Bagging, and FeatureRestriction = 75 minSplit of 50, Bagging, and FeatureRestriction = 75 minSplit = 2, NO Bagging, and FeatureRestriction = 75 minSplit = 2, Bagging, and NO FeatureRestriction = 0 2 Points — Produce a plot with numTrees on x-axis and the hold-out set accuracy for each of these variations on the y-axis. NOTE -- these runs can be very slow, but the trees can be built in parallel. joblib is a python library that makes this easy to do. Keep in mind, if you're doing parallel hyperparameter search, you probably won't want to also parallelize tree growing inside the random forest algorithm: # Sequential version: self.trees = [ GrowTree(i, x, y, minToSplit, random.randint(0,1000000), logProgress, useBagging, restrictFeatures, numFeaturesPerTree) for i in range(numTrees) ] #Parallel version: self.trees = Parallel(n_jobs=6)(delayed(GrowTree)(i, x, y, minToSplit, random.randint(0,1000000), logProgress, useBagging, restrictFeatures, numFeaturesPerTree) for i in range(numTrees)) ? Lecture 6 Reading: * Mitchell 8 * Hulten 25 10) Computer Vision Features Download the new data set (Links to an external site.) and support code (Links to an external site.). You'll also need to install the Pillow imaging library (Links to an external site.) to process image files. For this assignment add 4 new feature sets to BlinkSupport.Featurize. Build a model with random forests tune the parameters (Try at least 3 settings each of: min to split, num trees and feature restriction). 0.5 Point -- Divide the image into a 3 x 3 grid and for each grid location include a feature for the min, max, average y-gradient among the locations in the grid. What test- set accuracy did you achieve? What parameter values were best? 0.5 Point -- Divide the image into a 3 x 3 grid and for each grid location include a feature for the min, max, average x-gradient among the locations in the grid. What test- set accuracy did you achieve? What parameter values were best? 0.5 Point -- Implement a histogram of gradients across the whole image (not on the grid) with 5 uniformly spaced bins for the absolute value of the y gradients (0 - 0.2, 0.2 - 0.4, etc). For each bin create a feature whose value is the percent of y-gradients that fall in the bin. What test-set accuracy did you achieve? What parameter values were best? 0.5 Point -- Implement a histogram of gradients across the whole image (not on the grid) with 5 uniformly spaced bins for the absolute value of the x gradients (0 - 0.2, 0.2 - 0.4, etc). For each bin create a feature whose value is the percent of x-gradients that fall in the bin. What test-set accuracy did you achieve? What parameter values were best? 1 Point -- Produce an ROC curve with one curve for: the y-gradients on the 3x3 grid; the x-gradients on the 3x3 grid; the y-gradient histogram; the x-gradient histogram. Use the tuning values you found in the previous parts of this assignment. 11) k-Means Clustering Implement kMeans clustering. Select random samples to use as initial cluster centroids. For each iteration: Assign each training sample to the nearest cluster centroid Update each centroid location to the avg location of assigned samples Use Euclidean distance with L2 norm. Recall this is: Sqrt( sum for each dimension d: (sample_d - centroid_d)^2 ) Recall the new centroid location for dimension d is: New_d = ( sum for each assigned sample: sample_d ) / ( # assigned samples) Support a parameter to specify how many clusters to learn. Support a parameter to specify how many iterations to run for. Hand IN: Run a clustering on the training set of the eye blink dataset (xTrain) with the two features provided in the support code (avg y gradient, and avg y gradient mid image). Use 4 clusters for 10 iterations. 1 Point -- Produce a plot showing the training data and overlaying the paths the cluster centroids take for each of the 10 iterations. 1 Point -- Find the closest training sample to the final location of each cluster center and the associated image. 1 Point -- In no more than 150 words describe the clustering process. Did the clustering converge? What do you learn from examining the images nearest the cluster centers? 12) K Nearest Neighbors Implement k nearest neighbors where each test sample is classified based on its k nearest neighbors among the training set; the test sample's score is the proportion of the k neighbors whose label is 1. Use the blink dataset with the features you implemented in the previous assignment (3x3 grid of x-gradient & y-gradient and histograms across the whole image). 2 Points -- Evaluate k in [1,3,5,10,20,50,100]. Produce an ROC curve comparing each of these approaches (fit the model on xTrain, evaluate on xValidate). 1 Point -- in no more than 100 words describe the results. Which k is best? ? Lecture 7 Reading: * Mitchell 4 * Hulten 10, 21 13) Neural Networks Implement learning of fully connected neural networks with an input layer, N hidden layers (and M nodes per hidden layer), and an output layer (with one output variable) using the Backpropagation algorithm as described in Mitchell with stochastic gradient descent. The implementation should be roughly: For each iteration: For each training sample: Pass the sample through the network to get activations Propagate error from output layer back through the network Update all the weights [ Hint: have one structure for all the weights in the network, a parallel one for activations, and a parallel one for errors ] ANSWER Use BlinkSupport’s Featurize with the following arguments: includeGradients=False includeRawPixels=False includeIntensities =True This results in features where each image is scaled down by a factor of 2 and each pixel intensity is converted to the range of 0.0-1.0. Train models with every combination of hidden layer in [1, 2] and hidden nodes per layer in [ 2, 5, 10, 15, 20 ]. For each, use 200 iterations with step size=0.05. NOTE: This will take a while to complete, you might want to verify all your parameters are working on smaller experiments before kicking off the full run. Produce a plot with one line for each of these run with the iteration number on the x axis and training set loss on the y axis. Use Equation 4.2 in Mitchell for loss (Mean Squared Error): E() = 1/2 sum_sample (y^- y)^2 where E is the error on a single sample, and MSE is the average of this across the whole training set. Produce a separate plot with one line for each of these runs but with the validation set losses on the y axis. Next take the model with 1 hidden layer and 2 nodes in that layer. Visualize the weights for each of the hidden nodes. There should be 12*12 weights (plus one for the bias, ignore it). Convert them to a 12 x 12 image where each pixel intensity is ~ 255 * abs(weight). Note: the VisualizeWeights function in BlinkSupport (Links to an external site.) can do this for you. Hand in a writeup including: * 1 Point -- the two charts (all your training runs showing the train and validation loss) * 1 Point -- the two visualizations (the weights for two hidden nodes) * 1 Point -- the best parameters and test-set accuracy you found in your parameter sweep * And in no more than 150 words (and 1 more chart / figure if it will help) answer these questions: o 1 Point -- Did you observe overfitting and underfitting? Where? o 1 Point -- What do the visualizations mean? 14) Warm up Model Tuning (Don’t go to far, just warm up, you’ll do more tuning with more powerful approaches in the Kaggle assignment) Now tune neural networks to produce the best model you can. Use just the training data (xTrain) and validation data (xValidate) to tune your parameters (Using xValidate as holdout data, or by combining xTrain and xValidate and doing cross-validation); reserve the test set (xTest) for final accuracy measurement. 1 Point -- Change the features in at least one way [ increase or decrease the image scaling provided by the sample, change normalization, or add momentum to your back propagation. ]. Include a table showing the results and a few sentences describing if and how it helped. 2 Point -- Use 2-3 tables and not more than 200 words to describe the parameter tuning you did. Describe one place where you examined the output of the modeling process and used the insight to improve your modeling process. What was this output? What change did you make because of it? 1 Point -- Include an ROC curve comparing the best random forest model you got on hand-crafted features (last assignment); your initial Neural Network (before any tuning); and your final resulting network (after changing a feature and tuning). ? Lecture 8 Reading: * Hulten ch 13,14,15 * The ResNet Paper (Links to an external site.) (You may not follow it all, that's fine) 15) Build your best Eye Blink Model This is a Kaggle Competition Assignment, see here: https://www.kaggle.com/c/csep546-aut19-kc2/overview (Links to an external site.) Redownload the BlinkSupport.zip (Links to an external site.) code. You'll find: * BlinkKaggleSupport.py -- basic help for loading/submitting * Pytorch Set up on VS.txt -- a quick walkthrough on getting set up with visual studio, should help with other environments too * BestEyeBlinkModel.ipynb -- a sample of how to set up on Google's Colab service, where you can get free GPU access. Open the file in Google Colab. If you use this, send Andrew a thank you note. You can also make your own way, starting here: https://pytorch.org/get- started/locally/ (Links to an external site.) Build the best model you can using any of the learning algorithms you developed so far (but probably neural networks). Do any feature and network engineering you like. But consider updating the starter model in SimpleBlinkNeuralNet.py based roughly on LeNet-5 and then tuning from there. 5 Points — For submitting an answer that performs better than the first baseline answer (which will be submitted by the TA). 5 Points — For submitting an answer that performs better than the second baseline (which will be enabled by the TA after a week). Hand In: A report of no more than 1000 words with no more than 5 figures/tables which demonstrates that you have produced an effective model for the Blink problem and properly evaluated your model. 1 Point — Demonstrate several parameter sweeps in your figures and highlight at least one area that indicates underfitting and one that indicates overfitting. 1 Point — Make sure to use the appropriate techniques from the previous lectures (error bounds, various measures of model quality, categorized mistakes). Demonstrate you used them in your modeling. 1 Point — Examine your mistakes and improve your feature/network engineering in at least 3 ways. Describe the features you tried and how they affected accuracy. Use whatever you need to to produce an accurate model, but consider: * torch.nn.Conv2d * torch.nn.ReLu * torch.nn.MaxPool2d * torch.nn.BatchNorm2d * torch.nn.Dropout * Tuning the number of filters/nodes, the optimizer, and the training iterations * Data augmentation (e.g. mirroring all the training data: make a left eye look like a right eye and vice versa) 1 Point — Include an ROC curve comparing the first model you tried with your final best model. Clearly label what they are. 1 Point — Clearly describe your best model and parameter settings you used as well as the complete final feature set. Included the updated neural network code so we can see the model being built and the forward pass. ? Lecture 9 Reading: * Mitchell ch 13 * Watch AlphaZero talk: https://www.youtube.com/watch?v=Wujy7OzvdJk&t=2057s * Hulten ch 22,23,24 16) Reinforcement Learning Setup: Install the gym toolkit for evaluating reinforcement learning algorithms https://gym.openai.com/docs/ (Links to an external site.) Download the support code (Links to an external site.). Implement Q-learning as described in Mitchell chapter 13. Represent Q^ with tables. Use formulas 13.10 and 13.11 to update Q^. Support an experimentation strategy that: * Uses the formula in section 13.3.5 to decide which action to take P(a_i | s). Support the parameter k to modify this expression (k=e is a good start; also consider values in the range 1.01 - 1.5). * Add an additional parameter called randomActionRate, which overrides and takes a totally random action (e.g. if this is 0.02 then take a random exploration action 2% of the time). * Also support a ‘learningMode=False’ option that unconditionally takes the best action so you can run your best policy against the simulator and see how well you've learned. In summary, you should support these parameters: * discountRate = 1.0 # Controls the discount rate for future rewards -- this is gamma from 13.10 * randomActionRate = 0.01 # Percent of time the next action selected by GetAction is totally random * actionProbabilityBase = 2.7 # This is k from the P(a_i|s) expression from section 13.3.5 and influences how random exploration is * learningRateScale = 0.01 # Should be multiplied by visits_n from 13.11 Use StartingPoint-QLearningCartPole.py. 3 Points -- Run the starting point code 10 times with your implementation (learn the policy then evaluate it as in the starting point). Hand in a table of the scores you achieved on each of the 10 iterations as well as the average. [ Many runs should score 200.0 ] Now move on to StartingPoint-QLearningMountainCar.py file. If your implementation is correct then the default parameters should come close to getting the cart up the hill (but not really succeed). 4 Points (1 point per parameter) -- Tune the following 4 parameters: * discountRate * actionProbabilityBase * trainingIterations * GymSupport.mountainCarBinsPerDimension For each produce a chart with at least 5 settings of the parameter value on the x-axis and the average score across 10 policy learning runs on the y axis (10x learn a policy then evaluate it as the sample code does, get the average score). Consider the properties of this problem and use your understanding to guide which regions you explore. For each parameter include 2-3 sentences about what the results of the tuning tell you about the problem. 2 Points -- Produce an improved parameter setting. You may change the 4 you tuned and you may change any other that you think matters (and do additional tuning). Run your new parameter settings 10 times (learn the policy then evaluate it as in the starting point). Hand in a table of the scores you achieved on each of the 10 policy learning runs as well as the average. ? Lecture 10 Reading: Mitchell ch 6 Mitchell online addendum (not in book) ch 14 (Links to an external site.) Hulten ch 18, 26