# Regular Sequences

So far in this series, we used vectors from built-in datasets (rivers, women and nhtemp), or created them by stringing together several numbers with the c function (e.g. c(1, 2, 3, 4)). R offers an extremely useful shortcut to create vectors of the latter kind, which is the colon : operator. Instead of having to type:

x <- c(1, 2, 3, 4)

we can simply type

x <- 1:4

to create exactly the same vector. Obviously this is especially useful for longer sequences.

In fact, you will use sequences like this a lot in real-world applications of R, e.g. to select subsets of data points, records, or variables. The exercises in this set might come across as a little abstract, but trust me, these sequences are really the basic building blocks for your future R scripts. So let’s go ahead!

Before starting the exercises, please note this is the fourth set in a series of five: In the first three sets, we practised creating vectors, vector arithmetics, and various functions. You can find all sets in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes all solutions (carefully explained), and the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

One more thing: I would really appreciate your feedback on these exercises: Which ones did you like? Which ones were too easy or too difficult? Please let me know what you think here!

### Exercise 1

Try to shorten the notation of the following vectors as much as possible, using : notation:

1. x <- c(157, 158, 159, 160, 161, 162, 163, 164)
2. x <- c(15, 16, 17, 18, 20, 21, 22, 23, 24)
3. x <- c(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
4. x <- c(-1071, -1072, -1073, -1074, -1075, -1074, -1073, -1072, -1071)
5. x <- c(1.5, 2.5, 3.5, 4.5, 5.5)

(Solution)

### Exercise 2

The : operator can be used in more complex operations along with arithmetic operators, and variable names. Have a look at the following expressions, and write down what sequence you think they will generate. Then check with R.

1. (10:20) * 2
2. 105:(30 * 3)
3. 10:20*2
4. 1 + 1:10/10
5. 2^(0:5)

(Solution)

### Exercise 3

Use the : operator and arithmetic operators/functions from the previous chapter to create the following vectors:

1. x <- c(5, 10, 15, 20, 25, 30)
2. x <- c(0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3)
3. x <- c(1/5, 2/6, 3/7, 4/8, 5/9, 6/10, 7/11, 8/12)
4. x <- (1, 4, 3, 8, 5, 12, 7, 16, 9, 20) (Hint: you have to use the recycle rule)

(Solution)

### Exercise 4

Another way to generate a sequence is the seq function. Its first two arguments are from and to, followed by a third, which is by. seq(from=5, to=30, by=5) replicates part (a) of the previous exercise.

Note that you can omit the argument names from, to, and by, if you stick to their positions, i.e., seq(5, 30, 5). Have a look at the following expressions, and write down what sequence you think they will generate. Then check with R.

1. seq(from=20, to=80, by=20)
2. seq(from=-10, to=5, by=0.5)
3. seq(from=10, to=-3, by=-2)
4. seq(from=0.01, to=0.09, by=0.02)

(Solution)

### Exercise 5

Compare the regular sequence of exercises 2(a) and 3(a) (both using the : operator) with the same sequences using the seq function with appropriate by argument. Can you think of a more general rule how to convert any seq(from, to, by) statement to a sequence generated with the : operator?

In other words, rewrite seq(from=x, to=y, by=z) to a statement using the : operator. Hint: if this appears difficult, try to do this first by choosing some values for x, y, and z, and see which pattern emerges.

(Solution)

### Exercise 6

The previous exercises in this set were aimed at generating sets of increasing or decreasing numbers. However, sometimes you just want a set of equal numbers. You can accomplish this with the rep function (from “replicate”). Its first argument is the number or vector that will be replicated, and its second argument times, … well I guess you can guess that one already. Now, let’s shorten the following statements, using rep:

1. x <- c(5, 5, 5, 5, 5, 5, 5)
2. x <- c(5, 6, 7)y <- c(x, x, x, x, x)
3. x <- (10, 16, 71, 10, 16, 71, 10, 16, 71)

(Solution)

### Exercise 7

rep has a third very useful argument: each. As we saw in the previous exercise (part b), vectors are replicated in their entirety by rep.

However, you can also replicate “each” individual element. Consider for example:

seq(c(1, 2, 3), times=2, each=3).

This says: “replicate each element of the input vector c(1, 2, 3) 3 times, and then replicate the resulting vector 2 times.” Now, let’s shorten the following statements, using rep:

1. x <- c(5, 5, 5, 5, 8, 8, 8, 8, -3, -3, -3, -3, 0.34, 0.34, 0.34, 0.34)
2. x <- c(-0.1, -0.1, -0.9, -0.9, -0.6, -0.6)
3. x <- c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3)

(Solution)

### Exercise 8

We can actually write part c of te previous exercise even more compact by using rep in combination with the : operator. Do you see how?

In this exercise we’re using combinations of rep, : and seq to create the following sequences:

1. x <- c(97, 98, 99, 100, 101, 102, 97, 98, 99, 100, 101, 102, 97, 98, 99, 100, 101, 102)
2. x <- c(-5, -5, -5, -5, -6, -6, -6, -6, -7, -7, -7, -7, -8, -8, -8, -8)
3. x <- c(13, 13, 17, 17, 21, 21, 25, 25, 29, 29, 13, 13, 17, 17, 21, 21, 25, 25, 29, 29)
4. x <- c(1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0)

(Solution)

### Exercise 9

Suppose there would be no each argument for rep. Rewrite the following statement, without using the each argument: x <- rep(c(27, 31, 19, 14), each=v, times=w)

(Solution)

### Exercise 10

Let’s finish this set off with an application. Let’s create a series of vectors for later use in a timeseries dataset. The idea is that each observation in this dataset can be identified by a timestamp, which is defined by four vectors:

• s (for seconds)
• m (minutes)
• h (hours)
• d (days)

For this exercise, we’ll limit the series to a full week of 7 days.

This is a somewhat more complicated problem than the previous ones in this exercise. Don’t worry however! Whenever you’re faced with a somewhat more complicated problem than you are used to, the best strategy is to break it down into smaller problems. So, we’ll simply start with the s vector.

1. Since s counts the number of seconds, we know it has to start at 1, run to 60, restart at 1, etc. As it should cover a full week, we also know we have to replicate this series many times. Can you calculate exactly how many times it has to replicate this series? Use the outcome of your calculation to create the full s vector.
2. Now, let’s create the vector m. Think about how this vector differs from s. What does this mean for the times and each arguments?
3. Now, let’s create vector h and d using the same logic. Check that s, m, h, and d have equal length.

(Solution)

# Vectors and Functions

In the previous set we started with arithmetic operations on vectors. We’ll take this a step further now, by practising functions to summarize, sort and round the elements of a vector.

Sofar, the functions we have practised (log, sqrt, exp, sin, cos, and acos) always return a vector with the same length as the input vector. In other words, the function is applied element by element to the elements of the input vector. Not all functions behave this way though. For example, the function min(x) returns a single value (the minimum of all values in x), regardless of whether x has length 1, 100 or 100,000.

Before starting the exercises, please note this is the third set in a series of five: In the first two sets, we practised creating vectors and vector arithmetics. In the fourth set (posted next week) we will practise regular sequences and replications.

You can find all sets right now in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes all solutions (carefully explained), and the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

One more thing: I would really appreciate your feedback on these exercises: Which ones did you like? Which ones were too easy or too difficult? Please let me know what you think here!

### Exercise 1

Did you know R has actually lots of built-in datasets that we can use to practise? For example, the rivers data “gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey” (you can find this description, and additonal information, if you enter help(rivers) in R. Also, for an overview of all built-in datasets, enter data().

Have a look at the rivers data by simply entering rivers at the R prompt. Create a vector v with 7 elements, containing the number of elements (length) in rivers, their sum (sum), mean (mean), median (median), variance (var), standard deviation (sd), minimum (min) and maximum (max).

(Solution)

### Exercise 2

For many functions, we can tweak their result through additional arguments. For example, the mean function accepts a trim argument, which trims a fraction of observations from both the low and high end of the vector the function is applied to.

1. What is the result of mean(c(-100, 0, 1, 2, 3, 6, 50, 73), trim=0.25)? Don’t use R, but try to infer the result from the explanation of the trim argument I just gave. Then check your answer with R.
2. Calculate the mean of rivers after trimming the 10 highest and lowest observations. Hint: first calculate the trim fraction, using the length function.

(Solution)

### Exercise 3

Some functions accept multiple vectors as inputs. For example, the cor function accepts two vectors and returns their correlation coefficient. The women data “gives the average heights and weights for American women aged 30-39”. It contains two vectors height and weight, which we access after entering attach(women) (we’ll discuss the details of attach in a later chapter).

1. Using the cor function, show that the average height and weight of these women are almost perfectly correlated.
2. Calculate their covariance, using the cov function.
3. The cor function accepts a third argument method which allows for three distinct methods (“pearson”, “kendall”, “spearman”) to calculate the correlation. Repeat part (a) of this exercise for each of these methods. Which is the method chosen by the default (i.e. without specifying the method explicitly?)

(Solution)

### Exercise 4

In the previous three exercises, we practised functions that accept one or more vectors of any length as input, but return a single value as output. We’re now returning to functions that return a vector of the same length as their input vector. Specifically, we’ll practise rounding functions. R has several functions for rounding. Let’s start with floor, ceiling, and trunc:

• floor(x) rounds to the largest integer not greater than x
• ceiling(x) rounds to the smallest integer not less than x
• trunc(x) returns the integer part of x

To appreciate the difference between the three, I suggest you first play around a bit in R with them. Just pick any number (with or without a decimal point, positive and negative values), and see the result each of these functions gives you. Then make it somewwat closer to the next integer (either above or below), or flip the sign, and see what happens. Then continue with the following exercise:

Below you will find a series of arguments (x), and results (y), that can be obtained by choosing one or more of the 3 functions above (e.g. y <- floor(x)). Which of the above 3 functions could have been used in each case? First, choose your answer without using R, then check with R.

1. x <- c(300.99, 1.6, 583, 42.10)
y <- c(300, 1, 583, 42)
2. x <- c(152.34, 1940.63, 1.0001, -2.4, sqrt(26))
y <- c(152, 1940, 1, 5, -2)
3. x <- -c(3.2, 444.35, 1/9, 100)
y <- c(-3, -444, 0, -100)
4. x <- c(35.6, 670, -5.4, 3^3)
y <- c(36, 670, -5, 27)

(Solution)

### Exercise 5

In addition to trunc, floor, and ceiling, R also has round and signif rounding functions. The latter two accept a second argument digits. In case of round, this is the number of decimal places, and in case of signif, the number of significant digits. As with the previous exercise, first play around a little, and see how these functions behave. Then continue with the exercise below:

Below you will find a series of arguments (x), and results (y), that can be obtained by choosing one, or both, of the 2 functions above (e.g. y <- round(x, digits=d)). Which of these functions could have been used in each case, and what should the value of d be? First, choose your answer without using R, then check with R.

1. x <- c(35.63, 300.20, 0.39, -57.8)
y <- c(36, 300, 0, -58)
2. x <- c(153, 8642, 10, 39.842)
y <- c(153.0, 8640.0, 10.0, 39.8)
3. x <- c(3.8, 0.983, -23, 7.1)
y <- c(3.80, 0.98, -23.00, 7.10)

(Solution)

### Exercise 6

Ok, let’s continue with a really interesting function: cumsum. This function returns a vector of the same length as its input vector. But contrary to the previous functions, the value of an element in the output vector depends not only on its corresponding element in the input vector, but on all previous elements in the input vector. So, its results are cumulative, hence the cum prefix. Take for example: cumsum(c(0, 1, 2, 3, 4, 5)), which returns: 0, 1, 3, 6, 10, 15. Do you notice the pattern?

Functions that are similar in their behavior to cumsum, are: cumprod, cummax and cummin. From just their naming, you might already have an idea how they work, and I suggest you play around a bit with them in R before continuing with the exercise.

1. The nhtemp data contain “the mean annual temperature in degrees Fahrenheit in New Haven, Connecticut, from 1912 to 1971”. (Although nhtemp is not a vector, but a timeseries object (which we’ll learn the details of later), for the purpose of this exercise this doesn’t really matter.) Use one of the four functions above to calculate the maximum mean annual temperature in New Haven observed since 1912, for each of the years 1912-1971.

# Fighting Factors with Cats: Exercises

In this exercise set, we will practice using the forcats factor manipulation package by Hadley Wickham. In the last exercise set, we saw that it is entirely possible to deal with factors in base R,  but also that things can get a bit involved and un-intuitive. Forcats simplifies many common factor manipulation tasks and worth mastering if you cannot avoid using factors in your work. Also, studying the package and its source code can give you ideas for writing your own custom function to simplify everyday tasks that you think can be dealt with in a better way.

Solutions are available here.

Exercise 1

Load the gapminder data-set from the gapminder package, as well as forcats. Check what the levels of the continent factor variable are and their frequency in the data.

Exercise 2

Notice that one continent, Antarctica, is missing – add it as the last level of six.

Exercise 3

Actually, you change your mind. There is no permanent human population on Antarctica. Drop this (unused) level from your factor.

Exercise 4

Again, modify the continent factor, making it more precise. Add two new levels: instead of Americas, add North America and South America. The countries in the following vector should be classified as South America and the rest as North America.
 c("Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Paraguay", "Peru", "Uruguay", "Venezuela") 

Exercise 5

Arrange the levels of the continent factor in alphabetical order.

Exercise 6

Re-order the continent levels again so that they appear in order of total population in 2007.

Exercise 7

Reverse the order of the factors.

Exercise 8

Make continent, again, an unordered factor. Set North America as the first level, therefore interpreted as a reference group in modeling functions such as lm().

Exercise 9

Turn the following messy vector into a factor with two levels: “Female” and “Male” using the factor function. Use the labels argument in the factor() function.
gender <- c("f", "m ", "male ","male", "female", "FEMALE", "Male", "f", "m")

Exercise 10

Gender can be considered sensitive data. Convert the gender variable into a factor that takes the integer values “1” and “2”, where one integer represents female and the other male, but make the choice randomly.

# Facing the Facts about Factors: Exercises

Factor variables in R can be mind-boggling. Often, you can just avoid them and use characters vectors instead – just don’t forget to set stringsAsFactors=FALSE. They are, however, very useful in some circumstances, such as statistical modelling and presenting data in graphs and tables. Relying on factors but misunderstanding them has been known to “eat up hours of valuable time in any given analysis”, as one member of the community put it. It is therefore a good investment to get them straight as soon as possible on your R journey.

The intent behind these exercises is to help you find and fill in the cracks and holes in your relationship with factor variables.

Solutions are available here.

Exercise 1

Load the gapminder data-set from the gapminder package. Save it to an object called gp. Check programmatically how many factors it contains and how many levels each factor has.

Exercise 2

Notice that one continent, Antarctica, is missing from the corresponding factor – add it as the last level of six.

Exercise 3

Actually, you change your mind. There is no permanent human population on Antarctica. Drop this (unused) level from your factor. Can you find three ways to do this, then you are an expert.

Exercise 4

Again, modify the continent factor, making it more precise. Add two new levels instead of Americas, North-America and South-America. The countries in the following vector should be classified as South-America and the rest as North-America.
 c("Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Paraguay", "Peru", "Uruguay", "Venezuela") 

Exercise 5

Get the levels of the factor in alphabetical order.

Exercise 6

Re-order the continent levels again so that they appear in order of total population in 2007.

Exercise 7

Reverse the order of the factor and define continents as an ordered factor.

Exercise 8

Make the continent an unordered factor again and set North-America as the first level, thus interpreted as a reference group in modelling functions such as lm().

Exercise 9

Turn the following messy vector into a factor with two levels: Female and Male, using the factor function. Use the labels argument in the factor() function (ps: you can save some time by applying tolower() and trimws() before you apply factor()).
gender <- c("f", "m ", "male ","male", "female", "FEMALE", "Male", "f", "m")

Exercise 10

Use the fact that factors are built on top of integers and create a dummy (binary) variable male that takes the value 1 if the gender has the value “Male.”

# How To Create a Flexdashboard: Exercises

INTRODUCTION

With flexdashboard, you can easily create interactive dashboards for R. What is amazing about it is that with R Markdown, you can publish a group of related data visualizations as a dashboard.

Additionally, it supports a wide variety of components, including htmlwidgets; base, lattice, and grid graphics; tabular data; gauges and value boxes and text annotations.

It is flexible and easy to specify rows and column-based layouts. Components are intelligently re-sized to fill the browser and adapted for display on mobile devices.

In combination with Shiny, you can create a high quality dashboard with interactive visualizations.

Before proceeding, please follow our short tutorial.

Look at the examples given and try to understand the logic behind them. Then, try to solve the exercises below by using R without looking at the answers. Then, check the solutions to check your answers.

Exercise 1

Create a new flexdashboard R Markdown file from the R console.

Exercise 2

Create the very initial dashboard interface in a single column.

Exercise 3

Add the space that you will put your first chart in.

Exercise 4

Add the space that you will put your second chart in. The two charts should be stacked vertically.

Exercise 5

Add a third chart with the same logic.

Exercise 6

Transform your layout to scrolling.

Exercise 7

Displays THE 3 charts split across two columns.

Exercise 8

Change the width of these two columns.

Exercise 9

Define two rows, instead of columns. The first has a single chart and the second has two charts.

Exercise 10

Change the height of these two columns.

# How To Plot With Patchwork: Exercises

INTRODUCTION

The goal of patchwork is to make it simple to combine separate ggplots into the same graphic. It tries to solve the same problem as gridExtra::grid.arrange() and cowplot::plot_grid, but using an API that incites exploration and iteration.

Before proceeding, please follow our short tutorial.

Look at the examples given and try to understand the logic behind them. Then, try to solve the exercises below by using R without looking at the answers. Then, check the solutions to check your answers.

Exercise 1

Create a scatter-plot object of mtcars between mpg and disp.

Exercise 2

Create a box-plot object of mtcars between gear and disp grouped by gear.

Exercise 3

Compose those two objects into one graph.

Exercise 4

Repeat the previous process but in one plotting operation.

Exercise 5

Display the composed graph in one column with the two graphs, one below the other.

Exercise 6

Set the graph on top to have two times the size of the graph at the bottom.

Exercise 7

Add space between your plots.

Exercise 8

Create two objects of your choice (p3, p4) and display all of your four objects in nested mode by putting p4 on top p1 and p2 and p3 at the bottom (nested one below the other.)

Exercise 9

Now, put the two nested plots next to each other.

Exercise 10

Finally, display p4 in one column and the rest of your objects in another.

# Programmatically Creating Text Outputs in R: Exercises

In the age of Rmarkdown and Shiny, or when making any custom output from your data, you want your output to look consistent and neat. Also, when writing your output, you often want it to obtain a specific (decorative) format defined by the html or LaTeX engine. These exercises are an opportunity to refresh our memory on functions, such as paste, sprintf, formatC and others that are convenient tools to achieve these ends. All of the solutions rely partly on the ultra flexible sprintf(), but there are no-doubt many ways to solve the exercises with other functions. Feel free to share your solutions in the comment section.

Example solutions are available here.

Exercise 1

Print out the following vector as prices in dollars (to the nearest cent):
c(14.3409087337707, 13.0648270623048, 3.58504267621646, 18.5077076398145, 16.8279241011882). Example: \$14.34

Exercise 2

Using these numbers, c(25, 7, 90, 16), make a vector of filenames in the following format: file_025.txt. Left pad the numbers so they are all three digits.

Exercise 3

Actually, if we are only dealing with numbers less than one hundred, file_25.txt would have been enough. Change the code from the last exercise so that the padding is pro-grammatically decided by the biggest number in the vector.

Exercise 4

Print out the following haiku on three lines, right aligned, with the help of cat: c("Stay the patient course.", "Of little worth is your ire.", "The network is down.").

Exercise 5

Write a function that converts a number to its hexadecimal representation. This is a useful skill when converting bmp colors from one representation to another. Example output:

      tohex(12)
[1] "12 is c in hexadecimal"


Exercise 6

Take a string and pro-grammatically surround it with the html header tag h1.

Exercise 7

Back to the poem from exercise 4, let R convert to html unordered list so that it would appear like the following in a browser:

• Stay the patient course
• Of little worth is your ire
• The network is down

Exercise 8

Here is a list of the current top 5 movies on imdb.com in terms of rating c("The Shawshank Redemption", "The Godfather", "The Godfather: Part II", "The Dark Knight", "12 Angry Men", "Schindler's List"). Convert them into a list compatible with the written text.

Example output:

[1] "The top ranked films on imdb.com are The Shawshank Redemption, The Godfather, The Godfather: Part II, The Dark Knight, 12 Angry Men and Schindler's List"

Exercise 9

Now, you should be able to solve this quickly: write a function that converts a proportion to a percentage that takes as input number of decimal places. An input of 0.921313 and 2 decimal places should return "92.13%".

Exercise 10

Improve the function from the last exercise so that the percentage consistently takes 10 characters by doing some left padding. Raise an error if the percentage already happens to be longer than 10.

(Image by Daniel Friedman).