Factor data type in R can be very painful to use, especially for beginners. Fortunately, like everything else, there are packages for working with factors. One of the packages is forcats
by RStudio. In this set, you will have an opportunity to exercise it.
Answers to the exercises are available here.
Please, do all exercises using the forcats
package. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Load the forcats
package. Besides all the functions, there is a gss_cat
data set in there. We will use it for this exercise set.
Calculate the number of occurrences per level for the rincome
column.
Exercise 2
Re-order the levels of rincome
by first occurrence of the level in the data-set.
Exercise 3
Re-order the levels of rincome
by frequency in the data-set.
Exercise 4
Join all values together into the category “other answer” besides the 10 most frequent values of the rincome
column.
- Work with different data types,
- know what the different data types represent and when to apply them in your modelling,
- And much more
Exercise 5
Change levels: “Not applicable”, “Refused”, “Don’t know” and “No answer” to “Other.”
Exercise 6
Change the levels so that all intervals will have form “$x to $y.”
Exercise 7
Change levels into groups:
* Missing = c (“No answer”, “Don’t know”, “Not applicable”, “Refused”)
* Below 10k
* 10k – 25k
* Above 25k
Exercise 8
Remove empty levels from column race
.
Exercise 9
Combine abc=as_factor(c('a', 'b', 'c'))
, def=as_factor(c('d', 'e', 'f'))
into a single factor variable with correct levels.
Exercise 10
Re-order rincome
factor levels to have the "$25000 or more"
category at the top.
How to rename and relevel factors in R
https://predictivehacks.com/rename-and-relevel-factors-in-r/