Anywhere you look at R
code these days, dplyr
seems to be there – indeed data indicate that its popularity is growing relative to many common R packages. Influential data scientists have recommended that beginners start “from scratch with the dplyr
package for manipulating a data frame” leaving for later standard R subsetting and loops. In any case, whether you like or dislike the package, it just seems dplyr
has become too big to ignore for any R
programmer.
After studying the basics of dplyr
and finishing this exercise set successfully by learning some of the most basic verbs you should be able to accomplish surprisingly complicated data manipulation tasks with ease.
For this exercise set we will, again, use data drawn from the 1980 US Census on married women aged 21–35 with two or more children. The data includes gender of first and second child, as well as information on whether the woman had more than two children, race, age and number of weeks worked in the year 1979. For more information please refer to the reference manual for the package AER.
Solutions are available here.
Exercise 1
Load the dplyr
package. Install and load the AER
package and run the command data("Fertility")
which loads the dataset Fertility to your workspace. Take a glimpse()
at the data.
Exercise 2
Select rows 35 to 50 and print to console its age and work entry.
Exercise 3
Select the last row in the dataset and print to console.
Exercise 4
Count how many women proceeded to have a third child.
Exercise 5
There are four possible gender combinations for the first two children. Which is the most common?
- work with different data manipulation packages, including dplyr,
- know how to import, transform and prepare your dataset for modelling,
- and much more.
Exercise 6
By racial composition what is the proportion of woman working four weeks or less in 1979?
Exercise 7
filter
out a subset of woman between
the age 22 and 24 and calculate the proportion who had a boy as their firstborn
Exercise 8
Add a new column, age squared, to the dataset.
Exercise 9
Out of all the racial composition in the dataset which had the lowest proportion of boys for their firstborn. With the same command display the number of observation in each category as well.
Exercise 10
Calculate the proportion of women who have a third child by gender combination of the first two children?
(Photo by Judy Schmidt)
Leave a Reply