`rivers`

, `women`

and `nhtemp`

), or created them by stringing together several numbers with the `c`

function (e.g. `c(1, 2, 3, 4)`

). R offers an extremely useful shortcut to create vectors of the latter kind, which is the colon `:`

operator. Instead of having to type:
`x <- c(1, 2, 3, 4)`

we can simply type

`x <- 1:4`

to create exactly the same vector. Obviously this is especially useful for longer sequences.

In fact, you will use sequences like this a lot in real-world applications of R, e.g. to select subsets of data points, records, or variables. The exercises in this set might come across as a little abstract, but trust me, these sequences are really the basic building blocks for your future R scripts. So let’s go ahead!

Before starting the exercises, please note this is the fourth set in a series of five: In the first three sets, we practised creating vectors, vector arithmetics, and various functions. You can find all sets in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes all solutions (carefully explained), and the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

One more thing: I would really appreciate your feedback on these exercises: Which ones did you like? Which ones were too easy or too difficult? Please let me know what you think here!

Try to shorten the notation of the following vectors as much as possible, using `:`

notation:

`x <- c(157, 158, 159, 160, 161, 162, 163, 164)`

`x <- c(15, 16, 17, 18, 20, 21, 22, 23, 24)`

`x <- c(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)`

`x <- c(-1071, -1072, -1073, -1074, -1075, -1074, -1073, -1072, -1071)`

`x <- c(1.5, 2.5, 3.5, 4.5, 5.5)`

(Solution)

The `:`

operator can be used in more complex operations along with arithmetic operators, and variable names. Have a look at the following expressions, and write down what sequence you think they will generate. Then check with R.

`(10:20) * 2`

`105:(30 * 3)`

`10:20*2`

`1 + 1:10/10`

`2^(0:5)`

(Solution)

Use the `:`

operator and arithmetic operators/functions from the previous chapter to create the following vectors:

`x <- c(5, 10, 15, 20, 25, 30)`

`x <- c(0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3)`

`x <- c(1/5, 2/6, 3/7, 4/8, 5/9, 6/10, 7/11, 8/12)`

`x <- (1, 4, 3, 8, 5, 12, 7, 16, 9, 20)`

(Hint: you have to use the recycle rule)

(Solution)

Another way to generate a sequence is the `seq`

function. Its first two arguments are `from`

and `to`

, followed by a third, which is `by`

. `seq(from=5, to=30, by=5)`

replicates part (a) of the previous exercise.

Note that you can omit the argument names `from`

, `to`

, and `by`

, if you stick to their positions, i.e., `seq(5, 30, 5)`

. Have a look at the following expressions, and write down what sequence you think they will generate. Then check with R.

`seq(from=20, to=80, by=20)`

`seq(from=-10, to=5, by=0.5)`

`seq(from=10, to=-3, by=-2)`

`seq(from=0.01, to=0.09, by=0.02)`

(Solution)

Compare the regular sequence of exercises 2(a) and 3(a) (both using the `:`

operator) with the same sequences using the `seq`

function with appropriate `by`

argument. Can you think of a more general rule how to convert any `seq(from, to, by)`

statement to a sequence generated with the `:`

operator?

In other words, rewrite `seq(from=x, to=y, by=z)`

to a statement using the `:`

operator. Hint: if this appears difficult, try to do this first by choosing some values for `x`

, `y`

, and `z`

, and see which pattern emerges.

(Solution)

The previous exercises in this set were aimed at generating sets of *increasing* or *decreasing* numbers. However, sometimes you just want a set of *equal* numbers. You can accomplish this with the `rep`

function (from “replicate”). Its first argument is the number or vector that will be replicated, and its second argument `times`

, … well I guess you can guess that one already. Now, let’s shorten the following statements, using `rep`

:

`x <- c(5, 5, 5, 5, 5, 5, 5)`

`x <- c(5, 6, 7)`

`y <- c(x, x, x, x, x)`

`x <- (10, 16, 71, 10, 16, 71, 10, 16, 71)`

(Solution)

`rep`

has a third very useful argument: `each`

. As we saw in the previous exercise (part b), vectors are replicated in their entirety by `rep`

.

However, you can also replicate “each” individual element. Consider for example:

`seq(c(1, 2, 3), times=2, each=3)`

.

This says: “replicate each element of the input vector `c(1, 2, 3)`

3 times, and then replicate the resulting vector 2 times.” Now, let’s shorten the following statements, using `rep`

:

`x <- c(5, 5, 5, 5, 8, 8, 8, 8, -3, -3, -3, -3, 0.34, 0.34, 0.34, 0.34)`

`x <- c(-0.1, -0.1, -0.9, -0.9, -0.6, -0.6)`

`x <- c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3)`

(Solution)

We can actually write part c of te previous exercise even more compact by using `rep`

in combination with the `:`

operator. Do you see how?

In this exercise we’re using combinations of `rep`

, `:`

and `seq`

to create the following sequences:

`x <- c(97, 98, 99, 100, 101, 102, 97, 98, 99, 100, 101, 102, 97, 98, 99, 100, 101, 102)`

`x <- c(-5, -5, -5, -5, -6, -6, -6, -6, -7, -7, -7, -7, -8, -8, -8, -8)`

`x <- c(13, 13, 17, 17, 21, 21, 25, 25, 29, 29, 13, 13, 17, 17, 21, 21, 25, 25, 29, 29)`

`x <- c(1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0)`

(Solution)

Suppose there would be no `each`

argument for `rep`

. Rewrite the following statement, without using the `each`

argument: `x <- rep(c(27, 31, 19, 14), each=v, times=w)`

(Solution)

Let’s finish this set off with an application. Let’s create a series of vectors for later use in a timeseries dataset. The idea is that each observation in this dataset can be identified by a *timestamp*, which is defined by four vectors:

- s (for seconds)
- m (minutes)
- h (hours)
- d (days)

For this exercise, we’ll limit the series to a full week of 7 days.

This is a somewhat more complicated problem than the previous ones in this exercise. Don’t worry however! Whenever you’re faced with a somewhat more complicated problem than you are used to, the best strategy is to break it down into smaller problems. So, we’ll simply start with the `s`

vector.

- Since
`s`

counts the number of seconds, we know it has to start at 1, run to 60, restart at 1, etc. As it should cover a full week, we also know we have to replicate this series many times. Can you calculate exactly how many times it has to replicate this series? Use the outcome of your calculation to create the full`s`

vector. - Now, let’s create the vector
`m`

. Think about how this vector differs from`s`

. What does this mean for the`times`

and`each`

arguments? - Now, let’s create vector
`h`

and`d`

using the same logic. Check that`s`

,`m`

,`h`

, and`d`

have equal length.

(Solution)

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

Sofar, the functions we have practised (`log`

, `sqrt`

, `exp`

, `sin`

, `cos`

, and `acos`

) always return a vector with the same length as the input vector. In other words, the function is applied element by element to the elements of the input vector. Not all functions behave this way though. For example, the function `min(x)`

returns a single value (the minimum of all values in `x`

), regardless of whether x has length 1, 100 or 100,000.

Before starting the exercises, please note this is the third set in a series of five: In the first two sets, we practised creating vectors and vector arithmetics. In the fourth set (posted next week) we will practise regular sequences and replications.

You can find all sets right now in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes all solutions (carefully explained), and the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

One more thing: I would really appreciate your feedback on these exercises: Which ones did you like? Which ones were too easy or too difficult? Please let me know what you think here!

Did you know R has actually lots of built-in datasets that we can use to practise? For example, the `rivers`

data “gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey” (you can find this description, and additonal information, if you enter `help(rivers)`

in R. Also, for an overview of all built-in datasets, enter `data()`

.

Have a look at the `rivers`

data by simply entering `rivers`

at the R prompt. Create a vector `v`

with 7 elements, containing the number of elements (`length`

) in `rivers`

, their sum (`sum`

), mean (`mean`

), median (`median`

), variance (`var`

), standard deviation (`sd`

), minimum (`min`

) and maximum (`max`

).

(Solution)

For many functions, we can tweak their result through additional *arguments*. For example, the `mean`

function accepts a `trim`

argument, which trims a fraction of observations from both the low and high end of the vector the function is applied to.

- What is the result of
`mean(c(-100, 0, 1, 2, 3, 6, 50, 73), trim=0.25)`

? Don’t use R, but try to infer the result from the explanation of the`trim`

argument I just gave. Then check your answer with R. - Calculate the mean of
`rivers`

after trimming the 10 highest and lowest observations. Hint: first calculate the trim fraction, using the`length`

function.

(Solution)

Some functions accept multiple vectors as inputs. For example, the `cor`

function accepts two vectors and returns their correlation coefficient. The `women`

data “gives the average heights and weights for American women aged 30-39”. It contains two vectors `height`

and `weight`

, which we access after entering `attach(women)`

(we’ll discuss the details of `attach`

in a later chapter).

- Using the
`cor`

function, show that the average height and weight of these women are almost perfectly correlated. - Calculate their covariance, using the
`cov`

function. - The
`cor`

function accepts a third argument`method`

which allows for three distinct methods (“pearson”, “kendall”, “spearman”) to calculate the correlation. Repeat part (a) of this exercise for each of these methods. Which is the method chosen by the default (i.e. without specifying the method explicitly?)

(Solution)

In the previous three exercises, we practised functions that accept one or more vectors of any length as input, but return a single value as output. We’re now returning to functions that return a vector of the same length as their input vector. Specifically, we’ll practise rounding functions. R has several functions for rounding. Let’s start with `floor`

, `ceiling`

, and `trunc`

:

`floor(x)`

rounds to the largest integer not greater than`x`

`ceiling(x)`

rounds to the smallest integer not less than`x`

`trunc(x)`

returns the integer part of`x`

To appreciate the difference between the three, I suggest you first play around a bit in R with them. Just pick any number (with or without a decimal point, positive and negative values), and see the result each of these functions gives you. Then make it somewwat closer to the next integer (either above or below), or flip the sign, and see what happens. Then continue with the following exercise:

Below you will find a series of arguments (x), and results (y), that can be obtained by choosing one *or more* of the 3 functions above (e.g. `y <- floor(x)`

). Which of the above 3 functions could have been used in each case? First, choose your answer without using R, then check with R.

`x <- c(300.99, 1.6, 583, 42.10)`

`y <- c(300, 1, 583, 42)`

`x <- c(152.34, 1940.63, 1.0001, -2.4, sqrt(26))`

`y <- c(152, 1940, 1, 5, -2)`

`x <- -c(3.2, 444.35, 1/9, 100)`

`y <- c(-3, -444, 0, -100)`

`x <- c(35.6, 670, -5.4, 3^3)`

`y <- c(36, 670, -5, 27)`

(Solution)

In addition to `trunc`

, `floor`

, and `ceiling`

, R also has `round`

and `signif`

rounding functions. The latter two accept a second argument `digits`

. In case of `round`

, this is the number of decimal places, and in case of `signif`

, the number of significant digits. As with the previous exercise, first play around a little, and see how these functions behave. Then continue with the exercise below:

Below you will find a series of arguments (x), and results (y), that can be obtained by choosing one, or both, of the 2 functions above (e.g. `y <- round(x, digits=d)`

). Which of these functions could have been used in each case, and what should the value of `d`

be? First, choose your answer without using R, then check with R.

`x <- c(35.63, 300.20, 0.39, -57.8)`

`y <- c(36, 300, 0, -58)`

`x <- c(153, 8642, 10, 39.842)`

`y <- c(153.0, 8640.0, 10.0, 39.8)`

`x <- c(3.8, 0.983, -23, 7.1)`

`y <- c(3.80, 0.98, -23.00, 7.10)`

(Solution)

Ok, let’s continue with a really interesting function: `cumsum`

. This function returns a vector of the same length as its input vector. But contrary to the previous functions, the value of an element in the output vector depends not only on its corresponding element in the input vector, but on *all previous* elements in the input vector. So, its results are *cumulative*, hence the `cum`

prefix. Take for example: `cumsum(c(0, 1, 2, 3, 4, 5))`

, which returns: 0, 1, 3, 6, 10, 15. Do you notice the pattern?

Functions that are similar in their behavior to `cumsum`

, are: `cumprod`

, `cummax`

and `cummin`

. From just their naming, you might already have an idea how they work, and I suggest you play around a bit with them in R before continuing with the exercise.

- The
`nhtemp`

data contain “the mean annual temperature in degrees Fahrenheit in New Haven, Connecticut, from 1912 to 1971”. (Although`nhtemp`

is not a vector, but a timeseries object (which we’ll learn the details of later), for the purpose of this exercise this doesn’t really matter.) Use one of the four functions above to calculate the maximum mean annual temperature in New Haven observed since 1912, for each of the years 1912-1971. - Suppose you put $1,000 in an investment fund that will exhibit the following annual returns in the next 10 years: 9% 18% 10% 7% 2% 17% -8% 5% 9% 33%. Using one of the four functions above, show how much money your investment will be worth at the end of each year for the next 10 years, assuming returns are re-invested every year. Hint: If an investment returns e.g. 4% per year, it will be worth 1.04 times as much after one year, 1.04 * 1.04 times as much after two years, 1.04 * 1.04 * 1.04 times as much after three years, etc.

(Solution)

R has several functions for sorting data: `sort`

takes a vector as input, and returns the same vector with its elements sorted in increasing order. To reverse the order, you can add a second argument: `decreasing=TRUE`

.

- Use the
`women`

data (exercise 3) and create a vector`x`

with the elements of the`height`

vector sorted in decreasing order. - Let’s look at the
`rivers`

data (exercise 1) from another perspective. Looking at the 141 data points in`rivers`

, at first glance it seems quite a lot have zero as their last digit. Let’s examine this a bit closer. Using the modulo operator you practised in exercise 9 of the previous exercise set, to isolate the last digit of the`rivers`

vector, sort the digits in increasing order, and look at the sorted vector on your screen. How many are zero? - What is the total length of the 4 largest rivers combined? Hint: Sort the rivers vector from longest to shortest, and use one of the
`cum...`

functions to show their combined length. Read off the appropriate answer from your screen.

(Solution)

Another sorting function is `rank`

, which returns the ranks of the values of a vector. Have a look at the following output:

```
x <- c(100465, -300, 67.1, 1, 1, 0)
rank(x)
```

`## [1] 6.0 1.0 5.0 3.5 3.5 2.0`

- Can you describe in your own words what
`rank`

does? - In exercise 3(c) you estimated the correlation between
`height`

and`weight`

, using Spearman’s rho statistic. Try to replicate this using the`cor`

function, without the`method`

argument (i.e., using its default Pearson method, and using`rank`

to first obtain the ranks of`height`

and`weight`

.

(Solution)

A third sorting function is `order`

. Have a look again at the vector `x`

introduced in the previous exercise, and the output of `order`

applied to this vector:

```
x <- c(100465, -300, 67.1, 1, 1, 0)
order(x)
```

`## [1] 2 6 4 5 3 1`

- Can you describe in your own words what
`order`

does? Hint: look at the output of`sort(x)`

if you run into trouble. - Remember the time series of mean annual temperature in New Haven, Connecticut, in exercise 6? Have a look at the output of
`order(nhtemp)`

:

`order(nhtemp)`

```
## [1] 6 15 29 9 13 3 5 12 7 23 1 24 47 25 51 14 18 32 16 11 56 8 17
## [24] 28 45 52 31 37 4 22 36 39 54 19 34 26 49 30 33 53 55 21 27 58 10 50
## [47] 57 59 43 44 35 2 46 48 40 20 60 41 38 42
```

Given that the starting year for this series is 1912, in which years did the lowest and highest mean annual temperature occur?

- What is the result of order(sort(x)), if x is a vector of length 100, and all of its elements are numbers? Explain your answer.

(Solution)

In exercise 1 of this set, we practised the `max`

function, followed by the `cummax`

function in exercise 6. In the final exercise of this set, we’re returning to this topic, and will practise yet another function to find a maximum. While the former two functions applied to a *single* vector, it’s also possible to find a maximum across *multiple* vectors.

- First let’s see how
`max`

deals with multiple vectors. Create two vectors`x`

and`y`

, where`x`

contains the first 5 even numbers greater than zero, and`y`

contains the first 5 uneven numbers greater than zero. Then see what`max`

does, as in`max(x, y)`

. Is there a difference with`max(y, x)`

? - Now, try
`pmax(x, y)`

, where p stands for “parallel”. Without using R, what do you think intuitively, what it will return? Then, check, and perhaps refine, your answer with R. - Now try to find the parallel minimum of x and y. Again, first try to write down the output you expect. Then check with R (I assume, you can guess the appropriate name of the function).
- Let’s move from two to three vectors. In addition to
`x`

and`y`

, add`-x`

as a third vector. Write down the expected output for the parallel minima and maxima, then check your answer with R. - Finally, let’s find out how
`pmax`

handles vectors of different lenghts Write down the expected output for the following statements, then check your answer with R.

`pmax(x, 6)`

`pmax(c(x, x), y)`

`pmin(x, c(y, y), 3)`

(Solution)

This is the second set in a series of five: In the first set (posted last week) we practised the basics of vectors. In set three and four (upcoming) we will practise more vector arithmetics to e.g. calculate all kinds of statistics, carry out simulations, sort data, or calculate the distance between two cities.

If you can’t wait till all sets are posted: you can find them right now in my ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

Let’s create the following vectors:

`u <- 4`

`v <- 8`

Use the elementary arithmetic operators `+`

, `-`

, `*`

, `/`

, and `^`

to:

- add
`u`

and`v`

- subtract
`v`

from`u`

- multiply
`u`

by`v`

- divide
`u`

by`v`

- raise
`u`

to the power of`v`

(Solution)

Now, suppose u and v are not scalars, but vectors with multiple elements:

`u <- c(4, 5, 6)`

`v <- c(1, 2, 3)`

Without using R, write down what you expect as the result of the same operations as in the previous exercise:

- add
`u`

and`v`

- subtract
`v`

from`u`

- multiply
`u`

by`v`

- divide
`u`

by`v`

- raise
`u`

to the power of`v`

(Solution)

We just saw how arithmetic operators work on vectors with the same length. But how about vectors that differ in length? Let’s find out… Consider the following vectors:

`u <- c(5, 6, 7, 8)`

`v <- c(2, 3, 4)`

Without using R, write down what you expect as the result of the same operations as in the previous exercise:

- add
`u`

and`v`

- subtract
`v`

from`u`

- multiply
`u`

by`v`

- divide
`u`

by`v`

- raise
`u`

to the power of`v`

Then check your answer with R. Which rule does R use, when it has to deal with vectors of different lengths?

(Solution)

When we want to carry out a series of arithmetic operations, we can either use a single expression, or a series of expressions. Consider two vectors `u`

and `v`

:

`u <- c(8, 9, 10)`

`v <- c(1, 2, 3)`

We can create a new vector w in a single line of code:

`w <- (2 * u + v) / 10`

or carry out each operation on a separate line:

`w <- 2 * u`

`w <- w + v`

`w <- w / 10`

Convert the following expressions to separate operations, and check that both approaches give the same result:

`w <- (u + 0.5 * v) ^ 2`

`w <- (u + 2) * (u - 5) + v`

`w <- (u + 2) / ((u - 5) * v)`

(Solution)

We can do the reverse as well. Convert the following multi-line operations to a single expression. Check that both approaches give the same result.

Part a:

`w <- u + v`

`w <- w / 2`

`w <- w + u`

Part b:

`w1 <- u^3`

`w2 <- u - v`

`w <- w1 / w2`

(Solution)

Exercise 6, 7, and 8 focus on mathematics. Sooner or later you might have to translate mathematical formulas into R, to perform simple or more elaborate mathematical calculations. The goal of these exercises is to practise just that: how to translate a mathematical expression to R code. So, we won’t delve into the mathematics behind these formulas, and their derivation.

Also, in some cases, these formulas have already been translated to R by others, and are available in so-called *contributed packages*. We will deal with using these packages at a later stage, and for now the goal is just to become familiar with implementing mathematical formulas in R yourself.

So, here’s the deal: If you really hate math, or know for sure you will never use math in R, then it’s ok to skip exercise 6, 7, and 8. Otherwise: Let’s go for it and enjoy!

Besides the arithmetic operators we have used so far, there are some more that we often use: `log`

, `exp`

, and `sqrt`

. We can also use the well-known constant *pi*, by simply typing `pi`

, instead of its value 3.1415927.

Let’s try to apply what we have learned so far to some well-known, somewhat more advanced formulas. Don’t let the math scare you. Just translate the formulas to R code, one operator at a time. Don’t hesitate to use multiple lines if that makes things easier, or add parentheses to make sure operations are carried out in the right order.

- Suppose the surface area of a circle equals 25, what is the radius?
- What is the probability density at
`x=0`

of a normally distributed random variable`x`

with mean (`mu`

) equal to zero, and standard devation (`sigma`

) equal to one (look up the formula online, e.g. https://en.wikipedia.org/wiki/Normal_distribution)?

(Solution)

Consider the following formula to calculate the number of mortgage payment terms: \[n=\frac{\ln \Bigg(\dfrac{i}{\dfrac{M}{P}-i}+1\Bigg)}{\ln(1+i)}\] In this equation, `M`

represents the monthly payment amount, `P`

the principle, and `i`

the (monthly) interest rate.

- Calculate the number of payment terms
`n`

for a mortgage with a principle balance of 200,000, monthly interest rate of 0.5%, and monthly payment amount of 2000. - Now construct a vector
`n`

of length 6 with the results of this calculation for a series of monthly payment amounts: 2000, 1800, 1600, 1400, 1200, 1000. - Does the last value of
`n`

surprise you? Can you explain it?

(Solution)

Suppose you have geographical data and want to calculate the distance between two places on earth, given by their latitude and longitude coordinates. Consider the coordinates for:

- Paris: 48.8566° N (latitude), 2.3522° E (longitude), and
- New York 40.7128° N (latitude), 74.0060° W (longitude)

If you’re up for a real challenge, lookup “Great-circle distance” on Wikipedia, and use the *spherical law of cosines* to find the distance (and stop reading right now!).

If this sounds like a pretty daunting task, don’t worry! I will walk you through this step-by-step in the remainder of this exercise.

Ok, here we go. We will use the following common abbreviations:

- latitude (\(\phi\))
`phi`

- longitude (\(\lambda\))
`lambda`

- Create 4 scalars
`phi.paris`

,`phi.ny`

,`lambda.paris`

,`lambda.ny`

, representing these coordinates. Because New York is located in the West, you have to enter this as a negative value (-74.0060). - Convert the 4 coordinates from degrees to radians, using the formula: \[radians = degrees \frac{\pi}{180}\]
- Calculate the central angle between both cities, using the spherical law of cosines: \[\Delta\sigma=\arccos(\sin \phi_1 \sin \phi_2 + \cos \phi_1 \cos \phi_2 \cos(\Delta\lambda))\] where: \(\Delta\sigma\) is just a scalar (name it anything you want in R), \(\phi_1\) is the latitude of Paris, \(\phi_2\) the latitude of New York, and \(\Delta\lambda\) the
*absolute*difference between both longitudes.Hint: For this calculation you need the following mathematical functions in R:`sin`

,`cos`

,`acos`

, and`abs`

. - Finally, to find the distance, multiply \(\Delta\sigma\) (i.e., the outcome you just calculated) by the radius of the earth (6371 km.)

(Solution)

Use the modulo operator (`%%`

) to find out for which of the following pairs, the second number is a multiple of the first. Your R code should contain the modulo operator just once!

530, 1429410

77, 13960

231, 2425

8, 391600

(Solution)

May I kindly ask you to share your thoughts on these exercises? This will allow me to further improve the quality of the exercises.

You can share your thoughts simply by adding a comment below. I am particularly interested in:

- Which exercises you liked least and most
- Which (if any) exercises were too hard, and should be simplified
- Which (if any) exercises were too easy and should be more challenging
- Overall comments on the quality of this set
- Which topics you’d like to see addressed in future sets

Because vectors are such a key concept, we’re going to practise their use and application slowly, step-by-step. For now we’ll just practise *numeric* vectors (and save other types, such as *character* vectors, for later).

In this set we’re practising the basics of vectors, i.e. how to create vectors and assign them to a name. It is the first set in a series of five: In the second set (posted next week) we will practise working with vectors. In set three and four we will practise vector arithmetics to e.g. calculate all kinds of statistics, carry out simulations, sort data, or calculate the distance between two cities.

If you can’t wait till all sets are posted: you can find them right now in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

Let’s start really easy (don’t worry, we’ll quickly move to more challenging problems) with a vector containing just a single number, which we also call a scalar. Enter a vector in R, by just typing a random number, e.g. `100`

, at the prompt and hit the Enter key.

(Solution)

Great! You just created your first vector! Now, let’s first enter a vector with more than one number. E.g. a vector containing the numbers 1, 2, 3, 4, 5, in that order. If you enter these numbers just like this, R will respond with an error message. It throws an error, because it needs a little bit more information from our side that we actually want to store those numbers in a vector structure. We have to use the following notation for this:

`c(1, 2, 3, 4, 5)`

.

Now, enter a vector with the first 5 even numbers in R, and hit Enter.

(Solution)

Let’s now enter a much longer vector, containing the numbers 1 to 10, 10 times (use copy & paste). What do the numbers between square brackets in the R output mean?

(Solution)

You should be pretty familiar with entering vectors now. You might actually feel a little *bored* by typing all these numbers. Life would be pretty miserable if we would have to enter data this way over and over again in R. But fortunately, there is a neat solution! We can *assign* a vector to a variable name such that we can retrieve the data we have entered, conveniently by just typing the name of the variable.

Try to assign a vector containing the numbers 1, 2, 3, 4, 5 to a variable named `a`

, using the assignment operator (`<-`

), and see which of the statements below work.

Enter each of the 9 statements one at a time at the prompt, hit Enter, and try to retrieve the contents of `a`

, by typing `a`

at the prompt after you entered each statement:

`a<-c(1, 2, 3, 4, 5)`

`a <- c(50, 60, 70, 80, 90)`

`a -> c(20, 31, 42, 53, 64)`

`c(5, 6, 7, 9, 10) <- a`

`c(101, 102, 103, 104, 105) -> a`

`a < - c(11, 12, 13, 14, 15)`

`a < -c(100, 99, 88, 77, 66)`

`assign(a, c(1000, 2000, 3000, 4000, 5000))`

`assign('a', c(83, 16, 35, 58, 3))`

(Solution)

In an R script, you might have created dozens or even hundreds of vectors. In that case, naming them `a`

, `b`

, `c`

etc. is not ideal, because it will be difficult to keep track of what all those letters actually mean. This problem is easily mitigated by using longer, and meaningful, variable names.

Assign the following vectors to a meaningful variable name:

`c(2, 4, 6, 8, 10, 12, 14, 16, 20)`

`0`

`3.141593`

`c(1, 10, 100, 1000, 10000, 100000)`

(Solution)

Create vectors that correspond to the following variables names:

- bmi
- age
- daysPerMonth
- firstFivePrimeNumbers

(Solution)

So far, we have created vectors from a bunch of numbers. Instead of numbers, however, you can also enter other vectors, e.g. `c(vector1, vector2, vector3)`

, and string them together.

To practise this, let’s first create three vectors that each contain just 1 element with variable names `p`

, `q`

, and `r`

, and values 1, 2, and 3. Then, create a new vector that contains multiple elements, using the scalars we just created. I.e., create a vector `u`

of length 3, with the subsequent elements of `p`

, `q`

and `r`

.

(Solution)

To play with this a little more, let’s create a longer vector, using only the assignment operator (`<-`

), the `c()`

function, and the vector `u`

we just created. I.e., create a new vector `u`

with length 96 that contains the elements of `u`

as follows: 1, 2, 3, 1, 2, 3, …., 1, 2, 3

(Solution)

May I kindly ask you to share your thoughts on these exercises? This will allow me to further improve the quality of the exercises.

You can share your thoughts simply by adding a comment below. I am particularly interested in:

- Which exercises you liked least and most
- Which (if any) exercises were too hard, and should be simplified
- Which (if any) exercises were too easy and should be more challenging
- Overall comments on the quality of this set
- Which topics you’d like to see addressed in future sets

- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

`10 150 30 45 20.3`

And here’s another one:

`-5 -4 -3 -2 -1 0 1 2 3`

still another one:

`"Darth Vader" "Luke Skywalker" "Han Solo"`

and our final example:

`389.3491`

These examples show that a vector is, simply speaking, just a collection of one (fourth example) or more (first and second example) numbers or character strings (third example). In R, a vector is considered a data structure (it’s a very simple data structure, and we’ll cover more complex structures in another tutorial).

We can construct a vector from a series of individual elements, using the `c()`

function, as follows:

c(10, 150, 30, 45, 20.3)

## [1] 10.0 150.0 30.0 45.0 20.3

(In examples like these, lines starting with `##`

show the output from R on the screen).

As you’ll see, once you have entered the vector, R will respond by displaying its elements. In many cases it will be convenient to refer to this vector using a name, instead of having to enter it over and over again. We can accomplish this using the `assign()`

function, which is equivalent to the `<-`

and `=`

operators:

assign('a', c(10, 150, 30, 45, 20.3)) a <- c(10, 150, 30, 45, 20.3) a = c(10, 150, 30, 45, 20.3)

The second statement (using the <`-`

operator) is the most common way of assigning in R, and we’ll therefore use this form rather than the `=`

operator or the `assign()`

function.

Once we have assigned a vector to a name, we can refer to the vector using this name. For example, if we type a, R will now show the elements of vector a.

```
a
```

## [1] 10.0 150.0 30.0 45.0 20.3

Instead of a, we could have chosen any other name, e.g.:

aVeryLongNameWhichIsCaseSensitive_AndDoesNotContainSpaces <- c(10, 150, 30, 45, 20.3)

Strictly speaking, we call this “name” an object.

To familiarize yourself with the vector data structure, now try to construct a couple of vectors in R and assign them to a named object, as in the example above.

**To summarize: A vector is a data structure, which can be constructed using the c() function, and assigned to a named object using the <- operator.**

Now, let’s move on to the first set of real exercises on vectors!

]]>I thought perhaps this would be a good time to share some thoughts on the ideas behind the site, and how to proceed from this point onward. The main idea is pretty simple: it helps to practice if you want to learn R programming.

Although the idea itself is simple, for many people, and perhaps you as well, following up on this idea is a challenge. For example, practicing R programming requires a certain task that has to be completed, a solution to an analytical problem that has to be found, or broader goal definition. Without this, we would just be typing random R syntax, or copy-paste code we found somewhere on the web, which will contribute little to improving our R skills. The main problem R-exercises is trying to solve is how to specify these tasks, problems and goals in a useful, creative and structured way. The exercise sets are our (current) solution to this problem.

But there’s a second challenge for those who want to practice: Staying focused. Live throws many distractions at us and while you perhaps found some interesting problems to practice your R skills, sooner or later practicing fades away when more urgent matters pop up. So, the second problem R-exercises is trying to solve is how to practice in a focused, persistent way. Offering new exercises on a daily basis, rather than one-time communication (e.g. a book or course) is our solution to this second problem.

Is there a need for a site filled with exercises? There is an enormous amount of educational material on R available already. Our Course Finder directory includes 140 R courses, offered on 17 different platforms. Many universities teach R as part of their methods/statistics course programs. There are plenty of books on R. A search for “tutorial” on blog aggregator R-bloggers, reveals 1783 articles. And then there’s Youtube. It seems, with so much material, gaps are unlikely. But are they?

Going back to the two challenges we just described, we think what we’re offering is complementary to courses, books, classes and tutorials. Because the focus of most courses, books, classes and tutorials is on explaining/demonstrating things instead of practicing (the first challenge). And their focus is temporary, not necessarily persistent (the second challenge). It’s gone after you completed the course, read the book or watched the video tutorial.

In their excellent book “Make it stick”, Roediger and McDaniel explain that many of our intuitive approaches to learning (e.g. rereading a text) are unproductive. Instead they advise: “One of the best habits a learner can instill in herself is regular self-quizzing to recalibrate her understanding of what she does and does not know.” From this perspective, R-exercises can help you to recalibrate your understanding of what you know and don’t know about R.

We’re committed to keep expanding R-exercises, and adding more exercise sets. A while ago we started to differentiate sets in terms of difficulty (beginner, intermediate and advanced), an idea that many readers seemed to like when we proposed it. Recently we started to include information about online courses directly related to the exercises in a set, so for those who want to learn more, it’s easy to find a relevant course quickly.

Another idea we have is to offer premium (paid) memberships, with access to more extensive learning materials related to each exercise set. We’d actually love to hear your suggestions on how we can improve and expand R-exercises. What would you like to see on the site in 2017?

]]>- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

Answers to the exercises are available here. For the other (upcoming) exercise sets on data.table, check back next week here. If there are any particular topics/problems related to data.table, you’d like to see included in subsequent exercise sets, please post as a comment below.

**Exercise 1**

Setup: Read the wine quality dataset from the uci repository as a data.table (available for download from: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv) into an object named `df`

. To demonstrate the speed of data.table, we’re going to make this dataset much bigger, with:

`df <- df[rep(1:nrow(df), 1000), ]`

Check that the resulting data.table has 1.2 mln. rows and 12 variables.

**Exercise 2**

Check if `df`

contains any keys. If no keys are present, create a key for the `quality`

variable. Confirm that the key has been set.

**Exercise 3**

Create a new data.table `df2`

, containing the subset of `df`

with quality equal to 9. Use `system.time()`

to measure run-time.

**Exercise 4**

Remove the key from `df`

, and repeat exercise 3. How much slower is this? Now, repeat exercise 3 once more and check timing. Explain the difference in speed [hint: use the `key2()`

function.]

**Exercise 5**

Create a new data.table `df2`

, containing the subset of `df`

with quality equal to 7, 8 or 9. First without setting keys, then with setting keys and compare run-time.

**Exercise 6**

Create a new data.table `df3`

containing the subset of observations from `df`

with:

fixed acidity < 8 and residual sugar < 5 and pH < 3. First without setting keys, then with setting keys and compare run-time. Explain why differences are small.

**Exercise 7**

Take a bootstrap sample (i.e., with replacement) of the full `df`

data.table without keys, and record run-time. Then, convert to a regular data frame, and repeat. What is the difference in speed? Is there any (speed) benefit in creating a new variable `id`

equal to the row number, creating a key for this variable, and use this key to select the bootstrap?

`sort`

, `order`

, and `xtfrm`

functions.
Answers to the exercises are available here.

If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.

**Exercise 1**

Sort the vector `x <- c(1, 3, 2, 5, 4)`

in:

a. ascending order

b. descending order

**Exercise 2**

Sort the matrix `x <- matrix(1:100, ncol=10)`

:

a. in descending order by its second column (call the sorted matrix x1)

b. in descending order by its second row (call the sorted matrix x2)

**Exercise 3**

Sort only the first column of `x`

in descending order.

**Exercise 4**

Consider the `women`

data.

a. Confirm that the data are sorted in increasing order for both the `height`

and `weight`

variable, without looking at the data.

b. Create a new variable `bmi`

, based on the following equation: BMI = ( Weight in Pounds / (Height in inches) x (Height in inches) ) x 703. Check, again without looking at the data, whether BMI increases monotonically with weight and height.

c. Sort the dataframe on `bmi`

, and its variable names alphabetically

**Exercise 5**

Consider the `CO2`

data.

a. Sort the data based on the `Plant`

variable, alphabetically. (Note that `Plant`

is a factor!). Check that the data are sorted correctly by printing the data on the screen.

b. Sort the data based on the `uptake`

(increasing) and `Plant`

(alphabetically) variables (in that order).

c. Sort again, based on `uptake`

(increasing) and `Plant`

(reversed alphabetically), in that order.

**Exercise 6**

Create a dataframe `df`

with 40 columns, as follows:

`df <- as.data.frame(matrix(sample(1:5, 2000, T), ncol=40))`

a. Sort the dataframe on all 40 columns, from left to right, in increasing order.

a. Sort the dataframe on all 40 columns, from left to right, in decreasing order.

c. Sort the dataframe on all 40 columns, from right to left, in increasing order.

Image: en:User:RolandH [GFDL, CC-BY-SA-3.0 or CC BY-SA 2.5-2.0-1.0], via Wikimedia Commons

]]>- Spatial Data Analysis: Introduction to Raster Processing (Part 1)
- Advanced Techniques With Raster Data: Part 1 – Unsupervised Classification
- Spatial Data Analysis: Introduction to Raster Processing: Part-3
- Become a Top R Programmer Fast with our Individual Coaching Program
- Explore all our (>4000) R exercises
- Find an R course using our R Course Finder directory

`rbind`

and `cbind`

is a common R task. However, when dimensions or classes differ between the objects passed to these functions, errors or unexpected results are common as well. Sounds familiar? Time to practice!
Answers to the exercises are available here.

**Exercise 1**

Try to create matrices from the vectors below, by binding them column-wise. First, without using R, write down whether binding the vectors to a matrix is actually possible; then the resulting matrix and its mode (e.g., character, numeric etc.). Finally check your answer using R.

a. `a <- 1:5 ; b <- 1:5`

b. `a <- 1:5 ; b <- c('1', '2', '3', '4', '5')`

c. `a <- 1:5 ; b <- 1:4; c <- 1:3`

**Exercise 2**

Repeat exercise 1, binding vectors row-wise instead of column-wise while avoiding any row names.

**Exercise 3**

Bind the following matrices column-wise. First, without using R, write down whether binding the matrices is actually possible; then the resulting matrix and its mode (e.g., character, numeric etc.). Finally check your answer using R.

a. `a <- matrix(1:12, ncol=4); b <- matrix(21:35, ncol=5)`

b. `a <- matrix(1:12, ncol=4); b <- matrix(21:35, ncol=3)`

c. `a <- matrix(1:39, ncol=3); b <- matrix(LETTERS, ncol=2)`

**Exercise 4**

Bind the matrix `a <- matrix(1:1089, ncol=33)`

to itself, column-wise, 20 times (i.e., resulting in a new matrix with 21*33 columns). Hint: Avoid using `cbind()`

to obtain an efficient solution. Various solutions are possible. If yours is different from those shown on the solutions page, please post yours on that page as comment, so we can all benefit.

**Exercise 5**

Try to create new data frames from the data frames below, by binding them column-wise. First, without using R, write down whether binding the data frames is actually possible; then the resulting data frame and the class of each column (e.g., integer, character, factor etc.). Finally check your answer using R.

a. `a <- data.frame(v1=1:5, v2=LETTERS[1:5]) ; b <- data.frame(var1=6:10, var2=LETTERS[6:10])`

b. `a <- data.frame(v1=1:6, v2=LETTERS[1:6]) ; b <- data.frame(var1=6:10, var2=LETTERS[6:10])`

**Exercise 6**

Try to create new data frames from the data frames below, by binding them row-wise. First, without using R, write down whether binding the data frames is actually possible; then the resulting data frame and the class of each column (e.g., integer, character, factor etc.). Finally check your answer using R, and explain any unexpected output.

a. `a <- data.frame(v1=1:5, v2=LETTERS[1:5]) ; b <- data.frame(v1=6:10, v2=LETTERS[6:10])`

b. `a <- data.frame(v1=1:6, v2=LETTERS[1:6]) ; b <- data.frame(v2=6:10, v1=LETTERS[6:10])`

**Exercise 7**

a. Use `cbind()`

to add vector `v3 <- 1:5`

as a new variable to the data frame created in exercise 6b.

b. Reorder the columns of this data frame, as follows: v1, v3, v2.

**Exercise 8**

Consider again the matrices of exercise 3b. Use both `cbind()`

and `rbind()`

to bind both matrices column-wise, adding `NA`

for empty cells.

**Exercise 9**

Consider again the data frames of exercise 5b. Use both `cbind()`

and `rbind()`

to bind both matrices column-wise, adding `NA`

for empty cells.

Image: By Hella, Handdrawing 1995.

]]>If you’re new to R, you might be particularly interested in getting up to speed quickly, and focus on those commands that produce a graph, table, or prediction, or jump straight to some of the popular packages that offer the latest data science toys.

If this approach works for you, great! Many R beginners will suffer unnecessarily however, by taking this route. Why? Read on…

Consider this analogy between learning R and learning a real language (as in, Italian or Chinese). Real languages have words that convey strong, clear messages (“Water”, “Money”, “Sleep”), as well al words that play a more supportive role (“you need”, “A little”, “Enough”), and words that may convey even less information (“Actually”, “Apparently”). (For more analogies like these, read this post on the ideas behind our exercise sets).

Likewise, R has commands that carry out big, important tasks with a lot of immediate practical value (`plot`

, `glm`

, `table`

), as well as commands that play a more supportive role (`data.frame`

, `cut`

, `as.Date`

).

My point is this: You need both the strong words/big commands, and the more supportive words/commands to get something done in practice, whether it’s in Italian (like, communicating: “Apparently, you need a little sleep”) or in R (like: reading some data, cleaning it, running a model and plotting the results).

Many basic R functions, like `mode`

, play this supportive role. In and of itself, they don’t have much practical value, but they are important little nuts and bolts that are indispensable to get your ultimate task done. And are therefore important to master, as part of your R vocabulary!

Ok, so let’s get back to the original question and look at some practical uses of `mode`

. Looking through some of my recent code, I noticed, I mostly use `mode`

to convert characters to numbers and vice versa, and to check input values in functions.

For example, your raw data might have typos or wrong characters, as in the following example, where the last element of the input vector contains the letter ‘o’, instead of the zero digit. Such data can only be read as character, so after reading it into R, you’d have something like the vector x below. Using this vector in a calculation would throw an error.

x <- c('20', '30', '4o') # raw data, x <- gsub('o', '0', x) # cleaning mode(x) <- 'numeric' # convert from character to numeric x + 1 # we can now use x in calculations

## [1] 21 31 41

Another example of practical use of mode is to check input values in functions. Consider the `sum`

function, which requires arguments of type numeric or logical. It will stop your script if it receives a character, e.g. `sum(10, 'a')`

. Suppose this is not what you want. Instead you want a sum function that would simply return NA when it receives a character, without stopping the script. In that case you can use `is.numeric()`

to check input values:

mysum <- function(a, b) { if(is.numeric(a) & is.numeric(b)) a + b else NA } mysum(10, 20)

## [1] 30

mysum(10, 'a')

## [1] NA

Finally, an example that involves subsetting of a table. Here we convert a numeric range (2005:2010) to character, to select a few columns (representing years) of a table:

df <- data.frame(years=1991:2010, v=sample(1:10, 20000, T)) mytable <- with(df, table(v, years)) # a cross-table with counts mytable[, as.character(2005:2010)]

## years ## v 2005 2006 2007 2008 2009 2010 ## 1 92 108 115 110 90 119 ## 2 101 82 91 98 103 90 ## 3 96 99 93 89 84 100 ## 4 99 107 92 103 108 97 ## 5 99 99 118 118 104 81 ## 6 94 89 93 89 95 108 ## 7 104 125 108 111 115 95 ## 8 116 85 100 97 108 101 ## 9 109 95 94 91 89 102 ## 10 90 111 96 94 104 107

Do you have other examples of practical uses of `mode`

? Feel free to share below as a comment!