In the exercises below we cover the basics of factors. Before proceeding, first read chapter 4 of An Introduction to R, and the help pages for the cut
, and table
functions.
Answers to the exercises are available here.
Exercise 1
If x = c(1, 2, 3, 3, 5, 3, 2, 4, NA)
, what are the levels of factor(x)
?
a. 1, 2, 3, 4, 5
b. NA
c. 1, 2, 3, 4, 5, NA
Exercise 2
Let x <- c(11, 22, 47, 47, 11, 47, 11)
. If an R expression factor(x, levels=c(11, 22, 47), ordered=TRUE)
is executed, what will be the 4th element in the output?
a. 11
b. 22
c. 47
Exercise 3
If z <- c("p", "a" , "g", "t", "b")
, then which of the following R expressions will replace the third element in z
with "b".
a. factor(z[3]) <- "b"
b. levels(z[3]) <- "b"
c. z[3] <- "b"
Exercise 4
If z <- factor(c("p", "q", "p", "r", "q"))
and levels of z
are "p", "q" ,"r", write an R expression that will change the level "p" to "w" so that z is equal to: "w", "q" , "w", "r" , "q".
Exercise 5
If:
s1 <- factor(sample(letters, size=5, replace=TRUE))
and
s2 <- factor(sample(letters, size=5, replace=TRUE))
,
write an R expression that will concatenate s1 and s2 in a single factor with 10 elements.
Exercise 6
Consider the iris
data set in R. Write an R expression that will ‘cut’ the Sepal.Length
variable and create the following factor with five levels.
(4.3, 5.02] (5.02, 5.74] (5.74, 6.46] (6.46, 7.18] (7.18, 7.9]
32 41 42 24 11
Exercise 7
Consider again the iris
data set. Write an R expression that will generate a two-way frequency table with two rows and three colums. The rows should relate to Sepal.length
(less than 5: TRUE or FALSE) and columns to Species
, with the following output:
setosa versicolor virginica
FALSE 30 49 49
TRUE 20 1 1
Exercise 8
Consider the factor responses <- factor(c("Agree", "Agree", "Strongly Agree", "Disagree", "Agree"))
, with the following output:
[1] Agree Agree Strongly Agree Disagree Agree
Levels: Agree Disagree Strongly Agree
Later it was found that new a level "Strongly Disagree" exists. Write an R expression that will include "strongly disagree" as new level attribute of the factor and returns the following output:
[1] Agree Agree Strongly Agree Disagree Agree
Levels: Strongly Agree Agree Disagree Strongly Disagree
Exercise 9
Let x <- data.frame(q=c(2, 4, 6), p=c("a", "b", "c"))
. Write an R statement that will replace levels a, b, c with labels "fertiliser1", "fertliser2", "fertiliser3".
Exercise 10
If x <- factor(c("high", "low", "medium", "high", "high", "low", "medium"))
, write an R expression that will provide unique numeric values for various levels of x with the following output:
levels value
1 high 1
2 low 2
3 medium 3
Thanks for your exercise given for practice.
Exercise-7: May you please explain more because it’s hard to understand for me.
try this:
x<-iris$Sepal.Length table(x,iris$Species)
x
setosa versicolor virginica
FALSE 30 49 49
TRUE 20 1 1
x<-iris$Sepal.Length<5
table(x,iris$Species)
x setosa versicolor virginica
FALSE 30 49 49
TRUE 20 1 1
Regarding exercise 6:
Why do these two expressions return different answers?
> table(cut(iris$Sepal.Length, breaks = c(4.3, 5.02, 5.74, 6.46, 7.18, 7.9)))
> table(cut(iris$Sepal.Length, breaks = 5))
The first one returns 31 objects for (4.3,5.02] , while the latter returns 32 objects for (4.3,5.02] . Upon inspection of the factor created by executing only the expression
> cut(iris$Sepal.Length, breaks = c(4.3, 5.02, 5.74, 6.46, 7.18, 7.9))
, I see that the 14th element is . The same is not true for when i use . Why is that so? Is this some kind of bug? I thought that an interval like (4.3,5.02] was supposed to mean “starting from but not including 4.3 , up to and including 5.02 “. So to me, it’s like the latter of the above expressions is really counting the number of elements in the interval [4.3, 5.02] – i. e. 32. -, while the first one is counting the number of elements in the interval (4.3,5.02] – i. e. 31 since the 4.3 value is left out- . I’m puzzled. Can anyone help me out with this? Thanks 🙂