One of greatest things about `tidiverse`

is the piping operator `%>%`

, along with the fact that everything is designed to work well with it. The same applies to modeling with the `modelr`

package that this set aims to exercise.

Answers to these exercises are available here.

Please do all exercises using the `tidiverse`

package. It will involve mostly `modelr`

and `purrr`

packages. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

**Exercise 1**

Familiarize yourself with the `heights`

data set provided with the `modelr`

package.

**Exercise 2**

Create a list of formulas for modeling income with:

*height

*height * weight

*linear combination of all variables

**Exercise 3**

From the data, remove rows containing NA’s. Fit the linear model with the formulas from exercise 2.

**Exercise 4**

For each fit, calculate RMSE.

**Exercise 5**

For each model, add residuals to the data and plot their distribution. (Hint: use `lift_dl()`

.)

**Exercise 6**

Create an equally spaced grid of height and weight.

**Exercise 7**

Predict and plot values of the height/weight model for the grid from the previous exercise.

**Exercise 8**

Fit the height/weight with 10-fold cross validation. Calculate the RMSE for each fold.

**Exercise 9**

Fit the height/weight with 100 steps of MC cross validation. Plot the histogram of RMSE.

**Exercise 10**

Plot the histogram of RMSE for the height/weight model from exercise 3 for 100 rounds of bootstrapped data.

## Leave a Reply