dplyr
is a great package for interactive data wrangling and exploration. One of key aspects that makes it so great is that it uses non-standard evaluation so a user does not have to repeat data frame name and quote names all the time. On the other hand this feature makes programming with dplyr
a non-trivial task. For example passing a name of a grouping column, say grp_var = 'Species'
to a function that utilizes dplyr
‘s group_by()
will fail as grp_var
will be taken for a column name. Fortunately author’s of dplyr
provided rlang
package that comes with solutions to such problems.
The reader is assumed to know basics of dplyr
. If you want to practice dplyr
first there is great series of exercises Data wrangling: Transforming available. For quick introduction to programming with dplyr I recommend this vignette.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
All answers should be written the dplyr way.
Exercise 1
Load packages dplyr
and rlang
. Write a function my_fun_1(df)
that takes a data frame and adds a column sum
being a sum of columns x
and y
. Make sure it will not mistake a variable for a column!
Test it with code:
df_1 <- data.frame(x = 1:5, y = 6:10)
my_fun_1(df_1)
df_2 <- data.frame(x = 1:5)
y <- 1
my_fun_1(df_2)
Exercise 2
Write a function my_fun_1(df)
that takes a data frame and adds a column sum
being a sum of a column x
and a value y
passed to function. Make sure it will not mistake a variable for a column!
Test it with code:
df_1 <- data.frame(x = 1:5, y = 6:10)
my_fun_2(df_1)
Exercise 3
Write a function my_fun_3(x)
that quotes its input. Check type and environment of the object returned.
Exercise 4
Get value of an input from the object from previous exercise.
Exercise 5
Given obj <- my_fun_3(100)
modify x to make value under obj
equal to 200.
Exercise 6
Write a function that groups data frame by a column passed as an argument and counts number of rows in each group. Test it with code: my_group_by_1(iris, Species)
.
- work with different data manipulation packages, including dplyr,
- know how to import, transform and prepare your dataset for modelling,
- and much more.
Exercise 7
Modify the function from previous exercise so that it takes string as a grouping variable. Test it with code: my_group_by_2(iris, "Species")
.
Exercise 8
Write a function that groups data frame by multiple columns passed as arguments and counts number of rows in each group. Test it with code: my_group_by_3(iris, Species, sl_g_5=Sepal.Length > 5)
.
Exercise 9
Modify a function from exercise 6 so that additionally takes names of columns to be aggregated by mean()
function. Test it with code: my_group_by_4(iris, Species, Sepal.Length, Sepal.Width)
.
Exercise 10
Write a function that takes a data frame and for a column passed as an argument and creates a column **column_name**_sqrt
.
Leave a Reply