This is a continuation of the intermediate decision tree exercise.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
predict() command to make predictions on the Train data. Set the method to “class”. Class returns classifications instead of probability scores. Store this prediction in pred_dec.
Print out the confusion matrix
What is the accuracy of the model. Use the confusion matrix.
What is the misclassification error rate? Refer to Basic_decision_tree exercise to get the formula.
Lets say we want to find the baseline model to compare our prediction improvement. We create a base model using this code
Use the table() command to create a confusion matrix between the base and Test$class.
What is the number difference between the confusion matrix accuracy of dec and base?
Remember the predict() command in question 1. We will use the same mode and same command except we will set the method to “regression”. This gives us a probability estimates. Store this in pred_dec_reg
load the ROCR package.
Use the prediction(), performance() and plot() command to print the ROC curve. Use pred_dec_reg variable from Q7. You can also refer to the previous exercise to see the code.
plot() the same ROC curve but set colorize=TRUE
Comment on your findings using ROC curve and accuracy. Is it a good model? Did you notice that ROC prediction() command only takes probability predictions as one of its arguments. Why is that so?