In this Exercise, we will dive quickly through some basic sampling methods. Follow along this series to use these methods later for our decision trees modelling exercise. We will sample using the package caTools and caret. This is a beginner level exercise. Please refer to the help section for `set.seed()`

, `sample.split()`

,`createDataPartition()`

, and `createFolds()`

functions. You may also find it helpful to go over `subset()`

function.

Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

**Exercise 1**

Load the iris data and also load the package “caTools”. If the package is not installed, then use `install.packages`

command to install it.

**Exercise 2**

Set the seed to 100

**Exercise 3**

use the function `sample.split`

with a `SplitRatio=0.7 `

to split the dataset into two folds using the species class. store the results in the variable `split`

**Exercise 4**

use subset function to subset the dataframe where the split is True. Store this result in the variable called `Train`

**Exercise 5**

Store the other 30 percent of the sample in the variable `Test`

. Use the same subset method.

**Exercise 6**

Print out the number of rows in the Test and Train variables. You should see 70 percent of data in the Train and 30 percent in the Test.

**Exercise 7**

Install and load the library “caret”

**Exercise 8**

Set the seed to 500 and use the `createDataPartition`

to do the same 2 fold split as Q3 but with a 80:20 ratio with `List=FALSE`

**Exercise 9**

Use the `createDataPartition`

function to create 5 different samples of the training data.

**Exercise 10**

We know how to make 2 splits now and make 5 different samples. But what about 5 equal splits? Use the `createFolds()`

command to make 5 equal partitions of iris data-set. Make sure that each partitiion has an equal representation of the species class as much as possible.

Reg says

Excellent exercises. I have submitted a vote but I hope this comment below will also help.

PLEASE explain more. Explain what has happened, e.g. this formed a model that will show “XYZ” features. Also tell us why you are doing this, e.g. it gives a dataframe that will hold ABC and that will be used to do ……..

Explain how you are attempting to solve the problems. More words. Tell us briefly if there are other possible solutions without showing all the steps, e.g. we are trying to find the mean of “DEF” – there are other functions but the one I choose is best because…….. Weel done

Imtiaz says

Thanks for your comment. There are more exercises to come in this series and I will add those suggestions.