Although efficiency (execution speed) is not always the primary aim when you are writing R code, and thinking about it can be an unnecessary distraction, it can be crucial at times. Waiting 10 minutes instead of 10 hours for your simulation to finish, or cutting off a few milliseconds on a function that is used thousands of times a day in Shiny application matters. But, most importantly, it provides an excuse to read about and play with `R`

, resulting in a better intuition as a programmer.

In this exercise set, we will practice bench marking and think a bit about some low-hanging fruits that we can take advantage of. Before starting, make sure you are using the newest version of R.

Solutions are available here.

**Exercise 1**

Load (and install) the `microbenchmark`

package and use it to compare your own function to the one below. The task is to draw `n`

standard, random normal numbers. Don’t stop until you have vastly improved the execution speed.

**Exercise 2**

Write a function that simulates throwing three die that is faster than the following:

**Exercise 3**

Write a function that uses a well known mathematical result to add the first “n” integers in a faster way than `sum(1:n)`

. Compare the performance at n=2e4.

**Exercise 4**

Examine the help page for the `sort()`

function and compare a few parameter settings. Find a benchmark for the task of finding the five lowest values of `Sepal.Length`

in the inbuilt `iris`

data-set.

**Exercise 5**

So, which row is it that contains the lowest measurement in `Sepal.Length`

? It is logical to check with `which(iris$Sepal.Length == min(iris$Sepal.Length))`

. But, what would be a both faster and more concise way?

**Exercise 6**

Back to `Sepal.Length`

: it is stored in centimeters with one decimal place. Create a new column “sepal length in decimeters” and store it as a different *atomic* class. Calculate the mean of both columns and benchmark the time it takes. Check the space the vector occupies with `object.size()`

.

**Exercise 7**

A very common question when one is faced with a new table of data is whether any observations are missing in any of the columns, and queries `R`

with something like `sapply(iris, function(x) sum(is.na(x)) > 0)`

. How can we shorten and increase the efficiency?

**Exercise 8**

A common way to create an integer sequence from 1 to n is `seq(to = n)`

. Find two faster ways to accomplish the same task.

**Exercise 9**

Save the iris data-set to your hard drive as a `csv`

and `RDS`

file. Now, load them in again to your current environment and compare the loading times. Also, compare the files on your hard drive with `file.size()`

.

**Exercise 10**

Imagine that, for some reason, you need to check whether some two numbers are both positive (some kind of condition checking is very common in function writing). Let’s imagine they are just random numbers: `rnorm(1) > 0 & rnorm(1) > 0`

How can you improve the efficiency of this code?

(Photo by John ‘K’)

## Leave a Reply