Now you will be making a simple machine learning model for the provided education data using scikit-learn
. Complete this in a function called fit_and_predict_degrees
that takes the data as a parameter and returns the test mean squared error as a float. This may sound like a lot, so we've broken it down into four steps for you:
DataFrame
to only include the columns for year, degree type, sex, and total. Drop any rows that have missing data for these columns and then convert string values to their dummy encoding. Split the columns as needed into input values and target values.You do not need to anything fancy like find the optimal settings for parameters to maximize performance. We just want you to start simple and train a model from scratch! The reference below has all the methods you will need for this section!
Like in Part 1, it can be difficult to write tests for this section. Machine Learning is all about uncertainty, and it's often difficult to write tests to know what is right. This requires diligence and making sure you are very careful with the method calls you make. To help you with this, we've provided some alternative ways to gain confidence in your result: