Although efficiency (execution speed) is not always the primary aim when you are writing R code, and thinking about it can be an unnecessary distraction, it can be crucial at times. Waiting 10 minutes instead of 10 hours for your simulation to finish, or cutting off a few milliseconds on a function that is used thousands of times a day in Shiny application matters. But, most importantly, it provides an excuse to read about and play with
R, resulting in a better intuition as a programmer.
In this exercise set, we will practice bench marking and think a bit about some low-hanging fruits that we can take advantage of. Before starting, make sure you are using the newest version of R.
Solutions are available here.
Load (and install) the
microbenchmark package and use it to compare your own function to the one below. The task is to draw
n standard, random normal numbers. Don’t stop until you have vastly improved the execution speed.
Write a function that simulates throwing three die that is faster than the following:
Write a function that uses a well known mathematical result to add the first “n” integers in a faster way than
sum(1:n). Compare the performance at n=2e4.
Examine the help page for the
sort() function and compare a few parameter settings. Find a benchmark for the task of finding the five lowest values of
Sepal.Length in the inbuilt
So, which row is it that contains the lowest measurement in
Sepal.Length? It is logical to check with
which(iris$Sepal.Length == min(iris$Sepal.Length)). But, what would be a both faster and more concise way?
Sepal.Length: it is stored in centimeters with one decimal place. Create a new column “sepal length in decimeters” and store it as a different atomic class. Calculate the mean of both columns and benchmark the time it takes. Check the space the vector occupies with
A very common question when one is faced with a new table of data is whether any observations are missing in any of the columns, and queries
R with something like
sapply(iris, function(x) sum(is.na(x)) > 0). How can we shorten and increase the efficiency?
A common way to create an integer sequence from 1 to n is
seq(to = n). Find two faster ways to accomplish the same task.
Save the iris data-set to your hard drive as a
RDS file. Now, load them in again to your current environment and compare the loading times. Also, compare the files on your hard drive with
Imagine that, for some reason, you need to check whether some two numbers are both positive (some kind of condition checking is very common in function writing). Let’s imagine they are just random numbers:
rnorm(1) > 0 & rnorm(1) > 0 How can you improve the efficiency of this code?
(Photo by John ‘K’)