One of greatest things about
tidiverse is the piping operator
%>%, along with the fact that everything is designed to work well with it. The same applies to modeling with the
modelr package that this set aims to exercise.
Answers to these exercises are available here.
Please do all exercises using the
tidiverse package. It will involve mostly
purrr packages. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Familiarize yourself with the
heights data set provided with the
Create a list of formulas for modeling income with:
*height * weight
*linear combination of all variables
From the data, remove rows containing NA’s. Fit the linear model with the formulas from exercise 2.
For each fit, calculate RMSE.
For each model, add residuals to the data and plot their distribution. (Hint: use
Create an equally spaced grid of height and weight.
Predict and plot values of the height/weight model for the grid from the previous exercise.
Fit the height/weight with 10-fold cross validation. Calculate the RMSE for each fold.
Fit the height/weight with 100 steps of MC cross validation. Plot the histogram of RMSE.
Plot the histogram of RMSE for the height/weight model from exercise 3 for 100 rounds of bootstrapped data.