The dplyr is an R-package that is used for transformation and summarization of tabular data with rows and columns.
It includes a set of functions that filter rows, select specific columns, re-order rows, adds new columns and summarizes data.
Moreover, dplyr contains a useful function to perform another common task, which is the “split-apply-combine” concept.
Compared to base functions in R, the functions in dplyr have an advantage in the sense that they are easier to use, more consistent in the syntax, and aim to analyze data frames instead of just vectors.
Before proceeding, please follow our short tutorial.
Look at the examples given and try to understand the logic behind them. Then try to solve the exercises below using R, without looking at the answers. Then check the solutions. to check your answers.
Exercise 1
Select the first three columns of the iris dataset using their column names. HINT: Use select()
.
Exercise 2
Select all the columns of the iris dataset except “Petal Width”. HINT: Use “-
“.
Exercise 3
Select all columns of the iris dataset that start with the character string “P”.
Exercise 4
Filter the rows of the iris dataset for Sepal.Length >= 4.6 and Petal.Width >= 0.5.
Exercise 5
Pipe the iris data frame to the function that will select two columns (Sepal.Width and Sepal.Length). HINT: Use pipe operator.
Exercise 6
Arrange rows by a particular column, such as the Sepal.Width. HINT: Use arrange()
.
- Work with different data manipulation packages, including dplyr
- Know how to import, transform, and prepare your dataset for modeling
- And much more
Exercise 7
Select three columns from iris, arrange the rows by Sepal.Length, then arrange the rows by Sepal.Width.
Exercise 8
Create a new column called proportion, which is the ratio of Sepal.Length to Sepal.Width. HINT: Use mutate()
.
Exercise 9
Compute the average number of Sepal.Length, apply the mean()
function to the column Sepal.Length, and call the summary value “avg_slength”. HINT: Use summarize()
.
Exercise 10
Split the iris data frame by the Sepal.Length, then ask for the same summary statistics as above. HINT: Use group_by()
.
Hi,
From where can i get the datasets (Here: iris dataset) for these exercises ?
Iris is loaded in R by default.