In this Exercise set ,we will continue our journey with H20’s Machine Learning algorithms. We will also find out about Gradient Boosted Machine and Classifiers like naive bayes. On the next series, we will conclude the machine learning journey with H2O.
Answers to the exercises are available here. Please check the documentation before starting this exercise set.
We are working with the same data as before. Please load the energy efficiency data set and initialize an H2O cluster. Create the Explanatory Variables and response Variable, like the previous exercise set. Create a default gradient boosted machine model by H2O.
Check the Variable Importance and Plot the importance metric using H2O.
Find the model performance in the test data.
Create a grid of models. The hyper parameter should be on the max_depth parameter in gbm. Max_depth 10 should be enough for most data sets so you can create a grid of 1 to 10.
Play with the gbm parameters to get the intuition behind them.
Sort the grid by the error metric and compare the values with the base line model.
Find the best model from the grid and check the performance on the test data.
Now we will look at how classification works in GBM with the help of UCI’s bank marketing data set.Download the data set from
As always, create the train test and validation frame using 60% data as training, 20% as validation, and 20% as test. Create a 5 fold naive bayes classifier.
Find the auc score for test data and validation data from the naive bayes model.
Create a default gbm model with the training data set. Find the variable importance via var_imp. Try adding more variable to the default gbm model and check you have not left out any important variable.
Check out the performance of the default gbm model with the auc score in both the validation data and test data.