The dplyr package by Hadley Wickham is a very useful package that provides “A Grammar of Data Manipulation”. It aims to simplify common data manipulation tasks, and provides “verbs”, i.e. functions that correspond to the most common data manipulation tasks. Have fun playing with dplyr in the exercises below!
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Install and load the package dplyr package. Given the metadata:
Wt: weight of the subject (kg).
Dose: dose of theophylline administered orally to the subject (mg/kg).
Time: time since drug administration when the sample was drawn (hr).
conc: theophylline concentration in the sample (mg/L).
Copy and paste this code to get df
names() function to get the column names of df.
Let’s practice using the
select() function. This allows you to work with just column names instead of indices.
a) Select only the columns starting from Subject to Dose
b) Only select the
Dose columns now.
Let’s look at the sample with Dose greater than 5 mg/kg. Use the filter
command() to return
df with Dose>5′
Great. Now use filter command to return df with Dose>5 and Time greater than the mean Time.
Now let’s try sorting the data. Use the
arrange() function to
1) arrange df by weight (descending)
2) arrange df by weight (ascending)
3) arrange df by weight (ascending) and Time (descending)
mutate() command allows you to create a new column using conditions and data derived from other columns. Use
mutate() command to create a new column called trend that equals to Time-mean(Time). This will tell you how far each time value is from its mean. Set
Given the meta-data
76.2 kg Super-middleweight
72.57 kg Middleweight
69.85 kg Light-middleweight
66.68 kg Welterweight
mutate function to classify the weight using the information above. For the purpose of this exercise, considering anything above 76.2 kg to be Super-middleweight and anything below 66.8 to be Welterweight. Anything below 76.2 to be middleweight and anything below 72.57 to be light-middleweight. Store the classifications under weight_cat. Hint: Use ifelse function() with mutate() to achieve this. Store this back into df.
groupby() command to group df by weight_cat. This allows us to use aggregated functions similar to group by in SQL. Store this in a df called weight_group
summarize() command on the weight_group created in Question 9 to find the mean Time and sum of Dose received by each weight categories.