In this Exercise, we will dive quickly through some basic sampling methods. Follow along this series to use these methods later for our decision trees modelling exercise. We will sample using the package caTools and caret. This is a beginner level exercise. Please refer to the help section for
createFolds() functions. You may also find it helpful to go over
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Load the iris data and also load the package “caTools”. If the package is not installed, then use
install.packages command to install it.
Set the seed to 100
use the function
sample.split with a
SplitRatio=0.7 to split the dataset into two folds using the species class. store the results in the variable
use subset function to subset the dataframe where the split is True. Store this result in the variable called
Store the other 30 percent of the sample in the variable
Test. Use the same subset method.
Print out the number of rows in the Test and Train variables. You should see 70 percent of data in the Train and 30 percent in the Test.
Install and load the library “caret”
Set the seed to 500 and use the
createDataPartition to do the same 2 fold split as Q3 but with a 80:20 ratio with
createDataPartition function to create 5 different samples of the training data.
We know how to make 2 splits now and make 5 different samples. But what about 5 equal splits? Use the
createFolds() command to make 5 equal partitions of iris data-set. Make sure that each partitiion has an equal representation of the species class as much as possible.