Below are the solutions to these exercises on ridge regression.

############### # # # Exercise 1 # # # ############### library(lars) library(glmnet)

## Warning: package 'glmnet' was built under R version 3.3.3

## Warning: package 'foreach' was built under R version 3.3.3

data(diabetes) attach(diabetes) set.seed(1234) par(mfrow=c(2,5)) for(i in 1:10){ plot(x[,i], y) abline(lm(y~x[,i])) }

layout(1) model_ols <- lm(y ~ x) summary(model_ols)

## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -155.829 -38.534 -0.227 37.806 151.355 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 152.133 2.576 59.061 < 2e-16 *** ## xage -10.012 59.749 -0.168 0.867000 ## xsex -239.819 61.222 -3.917 0.000104 *** ## xbmi 519.840 66.534 7.813 4.30e-14 *** ## xmap 324.390 65.422 4.958 1.02e-06 *** ## xtc -792.184 416.684 -1.901 0.057947 . ## xldl 476.746 339.035 1.406 0.160389 ## xhdl 101.045 212.533 0.475 0.634721 ## xtch 177.064 161.476 1.097 0.273456 ## xltg 751.279 171.902 4.370 1.56e-05 *** ## xglu 67.625 65.984 1.025 0.305998 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 54.15 on 431 degrees of freedom ## Multiple R-squared: 0.5177, Adjusted R-squared: 0.5066 ## F-statistic: 46.27 on 10 and 431 DF, p-value: < 2.2e-16

############### # # # Exercise 2 # # # ############### lambdas <- 10^seq(7, -3) model_ridge <- glmnet(x, y, alpha = 0, lambda = lambdas) plot.glmnet(model_ridge, xvar = "lambda", label = TRUE)

############### # # # Exercise 3 # # # ############### cv_fit <- cv.glmnet(x=x, y=y, alpha = 0, nlambda = 1000) plot.cv.glmnet(cv_fit)

cv_fit$lambda.min

## [1] 4.685654

############### # # # Exercise 4 # # # ############### fit <- glmnet(x=x, y=y, alpha = 0, lambda=cv_fit$lambda.min) fit$beta

## 10 x 1 sparse Matrix of class "dgCMatrix" ## s0 ## age -1.776857 ## sex -218.078518 ## bmi 503.649515 ## map 309.268175 ## tc -116.815832 ## ldl -51.664808 ## hdl -181.472588 ## tch 113.468602 ## ltg 470.871230 ## glu 80.969337

############### # # # Exercise 5 # # # ############### fit <- glmnet(x=x, y=y, alpha = 0, lambda=cv_fit$lambda.1se) fit$beta

## 10 x 1 sparse Matrix of class "dgCMatrix" ## s0 ## age 22.463959 ## sex -120.242431 ## bmi 366.769888 ## map 235.675894 ## tc -9.896795 ## ldl -52.093095 ## hdl -170.482275 ## tch 121.536669 ## ltg 313.810759 ## glu 112.152681

############### # # # Exercise 6 # # # ############### library(caret)

## Warning: package 'caret' was built under R version 3.3.3

intrain <- createDataPartition(y=diabetes$y, p = 0.8, list = FALSE) training <- diabetes[intrain,] testing <- diabetes[-intrain,] ############### # # # Exercise 7 # # # ############### cv_ridge <- cv.glmnet(x=training$x, y=training$y, alpha = 0, nlambda = 1000) ridge_reg <- glmnet(x=training$x, y=training$y, alpha = 0, lambda=cv_ridge$lambda.min) ridge_reg$beta

## 10 x 1 sparse Matrix of class "dgCMatrix" ## s0 ## age 38.25965 ## sex -209.67238 ## bmi 529.69156 ## map 341.55293 ## tc -102.08181 ## ldl -70.38056 ## hdl -141.87799 ## tch 102.70460 ## ltg 489.04852 ## glu 52.72637

ridge_reg <- glmnet(x=training$x, y=training$y, alpha = 0, lambda=cv_ridge$lambda.1se) ridge_reg$beta

## 10 x 1 sparse Matrix of class "dgCMatrix" ## s0 ## age 49.941787 ## sex -127.389221 ## bmi 399.264021 ## map 272.565206 ## tc -5.586767 ## ldl -66.919061 ## hdl -151.119495 ## tch 104.071028 ## ltg 339.155470 ## glu 96.613412

############### # # # Exercise 8 # # # ############### ridge_reg <- glmnet(x=training$x, y=training$y, alpha = 0, lambda=cv_ridge$lambda.min) ridge_pred <- predict.glmnet(ridge_reg, s = cv_ridge$lambda.min, newx = testing$x) sd((ridge_pred - testing$y)^2)/sqrt(length(testing$y))

## [1] 415.6961

ridge_reg <- glmnet(x=training$x, y=training$y, alpha = 0, lambda=cv_ridge$lambda.1se) ridge_pred <- predict.glmnet(ridge_reg, s = cv_ridge$lambda.1se, newx = testing$x) sd((ridge_pred - testing$y)^2)/sqrt(length(testing$y))

## [1] 394.3651

# lower prediction error with higher lambda ############### # # # Exercise 9 # # # ############### ols_reg <- lm(y ~ x, data = training) summary(ols_reg)

## ## Call: ## lm(formula = y ~ x, data = training) ## ## Residuals: ## Min 1Q Median 3Q Max ## -154.669 -41.299 1.594 38.940 151.834 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 153.601 2.922 52.575 < 2e-16 *** ## xage 33.104 68.732 0.482 0.63036 ## xsex -228.818 69.589 -3.288 0.00111 ** ## xbmi 546.497 77.994 7.007 1.29e-11 *** ## xmap 359.381 74.686 4.812 2.24e-06 *** ## xtc -721.463 447.808 -1.611 0.10808 ## xldl 403.276 362.512 1.112 0.26672 ## xhdl 128.830 232.186 0.555 0.57935 ## xtch 177.948 180.552 0.986 0.32503 ## xltg 748.765 185.718 4.032 6.82e-05 *** ## xglu 34.245 77.094 0.444 0.65718 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 54.92 on 344 degrees of freedom ## Multiple R-squared: 0.5142, Adjusted R-squared: 0.5001 ## F-statistic: 36.42 on 10 and 344 DF, p-value: < 2.2e-16

################ # # # Exercise 10 # # # ################ ols_pred <- predict(ols_reg, newdata=testing$x, type = "response") sd((ols_pred - testing$y)^2)/sqrt(length(testing$y))

## [1] 419.758

```
# least squares prediction error is higher.
```

Eric says

Thank you for the excellent set of exercises!

Just a quick question. I’m curious about how you defined the prediction standard error. It was computed as

sd((ols_pred – testing$y)^2)/sqrt(length(testing$y))

Is this a standard way to compute the prediction? It occurs to me that standard deviation is defined as

sqrt(sum((ols_pred – testing$y)^2) / (length(testing$y)) – 1)

Thus, if standard error is defined here as sd/sqrt(n), then shouldn’t the error function simply be

sd(ols_pred – testing$y)/sqrt(length(testing$y)),

without explicitly squaring the values inside the sd() function?

Bassalat Sajjad says

Hi Eric

Thank you for the feedback. The way I see it is that we take the standard deviation of the squared errors which is similar to RMSE. This is divided by the square root of the number of observations to get the standard deviation of the sampling distribution of the mean error.

We can also consider MAE (mean absolute error) without squaring the errors. This would be another metric for predictive modeling.

Let me know if this helps.