In the exercises below we cover the basics of factors. Before proceeding, first read chapter 4 of An Introduction to R, and the help pages for the `cut`

, and `table`

functions.

Answers to the exercises are available here.

**Exercise 1**

If `x = c(1, 2, 3, 3, 5, 3, 2, 4, NA)`

, what are the levels of `factor(x)`

?

a. 1, 2, 3, 4, 5

b. NA

c. 1, 2, 3, 4, 5, NA

**Exercise 2**

Let `x <- c(11, 22, 47, 47, 11, 47, 11)`

. If an R expression `factor(x, levels=c(11, 22, 47), ordered=TRUE)`

is executed, what will be the 4th element in the output?

a. 11

b. 22

c. 47

**Exercise 3**

If `z <- c("p", "a" , "g", "t", "b")`

, then which of the following R expressions will replace the third element in `z`

with "b".

a. `factor(z[3]) <- "b"`

b. `levels(z[3]) <- "b"`

c. `z[3] <- "b"`

**Exercise 4**

If `z <- factor(c("p", "q", "p", "r", "q"))`

and levels of `z`

are "p", "q" ,"r", write an R expression that will change the level "p" to "w" so that z is equal to: "w", "q" , "w", "r" , "q".

**Exercise 5**

If:

`s1 <- factor(sample(letters, size=5, replace=TRUE))`

and

`s2 <- factor(sample(letters, size=5, replace=TRUE))`

,

write an R expression that will concatenate s1 and s2 in a single factor with 10 elements.

**Exercise 6**

Consider the `iris`

data set in R. Write an R expression that will ‘cut’ the `Sepal.Length`

variable and create the following factor with five levels.

`(4.3, 5.02] (5.02, 5.74] (5.74, 6.46] (6.46, 7.18] (7.18, 7.9]`

` 32 41 42 24 11`

**Exercise 7**

Consider again the `iris`

data set. Write an R expression that will generate a two-way frequency table with two rows and three colums. The rows should relate to `Sepal.length`

(less than 5: TRUE or FALSE) and columns to `Species`

, with the following output:

setosa versicolor virginica

FALSE 30 49 49

TRUE 20 1 1

**Exercise 8**

Consider the factor `responses <- factor(c("Agree", "Agree", "Strongly Agree", "Disagree", "Agree"))`

, with the following output:

[1] Agree Agree Strongly Agree Disagree Agree

Levels: Agree Disagree Strongly Agree

Later it was found that new a level "Strongly Disagree" exists. Write an R expression that will include "strongly disagree" as new level attribute of the factor and returns the following output:

[1] Agree Agree Strongly Agree Disagree Agree

Levels: Strongly Agree Agree Disagree Strongly Disagree

**Exercise 9**

Let `x <- data.frame(q=c(2, 4, 6), p=c("a", "b", "c"))`

. Write an R statement that will replace levels a, b, c with labels "fertiliser1", "fertliser2", "fertiliser3".

**Exercise 10**

If `x <- factor(c("high", "low", "medium", "high", "high", "low", "medium"))`

, write an R expression that will provide unique numeric values for various levels of x with the following output:

levels value

1 high 1

2 low 2

3 medium 3

Kapil Sharma says

Thanks for your exercise given for practice.

Exercise-7: May you please explain more because it’s hard to understand for me.

ILAN LIVNE says

try this:

x<-iris$Sepal.Length table(x,iris$Species)

x

setosa versicolor virginica

FALSE 30 49 49

TRUE 20 1 1

ILAN LIVNE says

x<-iris$Sepal.Length<5

table(x,iris$Species)

x setosa versicolor virginica

FALSE 30 49 49

TRUE 20 1 1

Ruse says

Regarding exercise 6:

Why do these two expressions return different answers?

> table(cut(iris$Sepal.Length, breaks = c(4.3, 5.02, 5.74, 6.46, 7.18, 7.9)))

> table(cut(iris$Sepal.Length, breaks = 5))

The first one returns 31 objects for (4.3,5.02] , while the latter returns 32 objects for (4.3,5.02] . Upon inspection of the factor created by executing only the expression

> cut(iris$Sepal.Length, breaks = c(4.3, 5.02, 5.74, 6.46, 7.18, 7.9))

, I see that the 14th element is . The same is not true for when i use . Why is that so? Is this some kind of bug? I thought that an interval like (4.3,5.02] was supposed to mean “starting from but not including 4.3 , up to and including 5.02 “. So to me, it’s like the latter of the above expressions is really counting the number of elements in the interval [4.3, 5.02] – i. e. 32. -, while the first one is counting the number of elements in the interval (4.3,5.02] – i. e. 31 since the 4.3 value is left out- . I’m puzzled. Can anyone help me out with this? Thanks 🙂