Vectors and Functions

In the previous set we started with arithmetic operations on vectors. We’ll take this a step further now, by practising functions to summarize, sort and round the elements of a vector.

Sofar, the functions we have practised (log, sqrt, exp, sin, cos, and acos) always return a vector with the same length as the input vector. In other words, the function is applied element by element to the elements of the input vector. Not all functions behave this way though. For example, the function min(x) returns a single value (the minimum of all values in x), regardless of whether x has length 1, 100 or 100,000.

Before starting the exercises, please note this is the third set in a series of five: In the first two sets, we practised creating vectors and vector arithmetics. In the fourth set (posted next week) we will practise regular sequences and replications.

You can find all sets right now in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes all solutions (carefully explained), and the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

One more thing: I would really appreciate your feedback on these exercises: Which ones did you like? Which ones were too easy or too difficult? Please let me know what you think here!

Exercise 1

Did you know R has actually lots of built-in datasets that we can use to practise? For example, the rivers data “gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey” (you can find this description, and additonal information, if you enter help(rivers) in R. Also, for an overview of all built-in datasets, enter data().

Have a look at the rivers data by simply entering rivers at the R prompt. Create a vector v with 7 elements, containing the number of elements (length) in rivers, their sum (sum), mean (mean), median (median), variance (var), standard deviation (sd), minimum (min) and maximum (max).

(Solution)

Exercise 2

For many functions, we can tweak their result through additional arguments. For example, the mean function accepts a trim argument, which trims a fraction of observations from both the low and high end of the vector the function is applied to.

  1. What is the result of mean(c(-100, 0, 1, 2, 3, 6, 50, 73), trim=0.25)? Don’t use R, but try to infer the result from the explanation of the trim argument I just gave. Then check your answer with R.
  2. Calculate the mean of rivers after trimming the 10 highest and lowest observations. Hint: first calculate the trim fraction, using the length function.

(Solution)

Exercise 3

Some functions accept multiple vectors as inputs. For example, the cor function accepts two vectors and returns their correlation coefficient. The women data “gives the average heights and weights for American women aged 30-39”. It contains two vectors height and weight, which we access after entering attach(women) (we’ll discuss the details of attach in a later chapter).

  1. Using the cor function, show that the average height and weight of these women are almost perfectly correlated.
  2. Calculate their covariance, using the cov function.
  3. The cor function accepts a third argument method which allows for three distinct methods (“pearson”, “kendall”, “spearman”) to calculate the correlation. Repeat part (a) of this exercise for each of these methods. Which is the method chosen by the default (i.e. without specifying the method explicitly?)

(Solution)

Exercise 4

In the previous three exercises, we practised functions that accept one or more vectors of any length as input, but return a single value as output. We’re now returning to functions that return a vector of the same length as their input vector. Specifically, we’ll practise rounding functions. R has several functions for rounding. Let’s start with floor, ceiling, and trunc:

  • floor(x) rounds to the largest integer not greater than x
  • ceiling(x) rounds to the smallest integer not less than x
  • trunc(x) returns the integer part of x

To appreciate the difference between the three, I suggest you first play around a bit in R with them. Just pick any number (with or without a decimal point, positive and negative values), and see the result each of these functions gives you. Then make it somewwat closer to the next integer (either above or below), or flip the sign, and see what happens. Then continue with the following exercise:

Below you will find a series of arguments (x), and results (y), that can be obtained by choosing one or more of the 3 functions above (e.g. y <- floor(x)). Which of the above 3 functions could have been used in each case? First, choose your answer without using R, then check with R.

  1. x <- c(300.99, 1.6, 583, 42.10)
    y <- c(300, 1, 583, 42)
  2. x <- c(152.34, 1940.63, 1.0001, -2.4, sqrt(26))
    y <- c(152, 1940, 1, 5, -2)
  3. x <- -c(3.2, 444.35, 1/9, 100)
    y <- c(-3, -444, 0, -100)
  4. x <- c(35.6, 670, -5.4, 3^3)
    y <- c(36, 670, -5, 27)

(Solution)

Exercise 5

In addition to trunc, floor, and ceiling, R also has round and signif rounding functions. The latter two accept a second argument digits. In case of round, this is the number of decimal places, and in case of signif, the number of significant digits. As with the previous exercise, first play around a little, and see how these functions behave. Then continue with the exercise below:

Below you will find a series of arguments (x), and results (y), that can be obtained by choosing one, or both, of the 2 functions above (e.g. y <- round(x, digits=d)). Which of these functions could have been used in each case, and what should the value of d be? First, choose your answer without using R, then check with R.

  1. x <- c(35.63, 300.20, 0.39, -57.8)
    y <- c(36, 300, 0, -58)
  2. x <- c(153, 8642, 10, 39.842)
    y <- c(153.0, 8640.0, 10.0, 39.8)
  3. x <- c(3.8, 0.983, -23, 7.1)
    y <- c(3.80, 0.98, -23.00, 7.10)

(Solution)

Exercise 6

Ok, let’s continue with a really interesting function: cumsum. This function returns a vector of the same length as its input vector. But contrary to the previous functions, the value of an element in the output vector depends not only on its corresponding element in the input vector, but on all previous elements in the input vector. So, its results are cumulative, hence the cum prefix. Take for example: cumsum(c(0, 1, 2, 3, 4, 5)), which returns: 0, 1, 3, 6, 10, 15. Do you notice the pattern?

Functions that are similar in their behavior to cumsum, are: cumprod, cummax and cummin. From just their naming, you might already have an idea how they work, and I suggest you play around a bit with them in R before continuing with the exercise.

  1. The nhtemp data contain “the mean annual temperature in degrees Fahrenheit in New Haven, Connecticut, from 1912 to 1971”. (Although nhtemp is not a vector, but a timeseries object (which we’ll learn the details of later), for the purpose of this exercise this doesn’t really matter.) Use one of the four functions above to calculate the maximum mean annual temperature in New Haven observed since 1912, for each of the years 1912-1971.
  2. Suppose you put $1,000 in an investment fund that will exhibit the following annual returns in the next 10 years: 9% 18% 10% 7% 2% 17% -8% 5% 9% 33%. Using one of the four functions above, show how much money your investment will be worth at the end of each year for the next 10 years, assuming returns are re-invested every year. Hint: If an investment returns e.g. 4% per year, it will be worth 1.04 times as much after one year, 1.04 * 1.04 times as much after two years, 1.04 * 1.04 * 1.04 times as much after three years, etc.

(Solution)

Exercise 7

R has several functions for sorting data: sort takes a vector as input, and returns the same vector with its elements sorted in increasing order. To reverse the order, you can add a second argument: decreasing=TRUE.

  1. Use the women data (exercise 3) and create a vector x with the elements of the height vector sorted in decreasing order.
  2. Let’s look at the rivers data (exercise 1) from another perspective. Looking at the 141 data points in rivers, at first glance it seems quite a lot have zero as their last digit. Let’s examine this a bit closer. Using the modulo operator you practised in exercise 9 of the previous exercise set, to isolate the last digit of the rivers vector, sort the digits in increasing order, and look at the sorted vector on your screen. How many are zero?
  3. What is the total length of the 4 largest rivers combined? Hint: Sort the rivers vector from longest to shortest, and use one of the cum... functions to show their combined length. Read off the appropriate answer from your screen.

(Solution)

Exercise 8

Another sorting function is rank, which returns the ranks of the values of a vector. Have a look at the following output:

x <- c(100465, -300, 67.1, 1, 1, 0)
rank(x)
## [1] 6.0 1.0 5.0 3.5 3.5 2.0
  1. Can you describe in your own words what rank does?
  2. In exercise 3(c) you estimated the correlation between height and weight, using Spearman’s rho statistic. Try to replicate this using the cor function, without the method argument (i.e., using its default Pearson method, and using rank to first obtain the ranks of height and weight.

(Solution)

Exercise 9

A third sorting function is order. Have a look again at the vector x introduced in the previous exercise, and the output of order applied to this vector:

x <- c(100465, -300, 67.1, 1, 1, 0)
order(x)
## [1] 2 6 4 5 3 1
  1. Can you describe in your own words what order does? Hint: look at the output of sort(x) if you run into trouble.
  2. Remember the time series of mean annual temperature in New Haven, Connecticut, in exercise 6? Have a look at the output of order(nhtemp):
order(nhtemp)
##  [1]  6 15 29  9 13  3  5 12  7 23  1 24 47 25 51 14 18 32 16 11 56  8 17
## [24] 28 45 52 31 37  4 22 36 39 54 19 34 26 49 30 33 53 55 21 27 58 10 50
## [47] 57 59 43 44 35  2 46 48 40 20 60 41 38 42

Given that the starting year for this series is 1912, in which years did the lowest and highest mean annual temperature occur?

  1. What is the result of order(sort(x)), if x is a vector of length 100, and all of its elements are numbers? Explain your answer.

(Solution)

Exercise 10

In exercise 1 of this set, we practised the max function, followed by the cummax function in exercise 6. In the final exercise of this set, we’re returning to this topic, and will practise yet another function to find a maximum. While the former two functions applied to a single vector, it’s also possible to find a maximum across multiple vectors.

  1. First let’s see how max deals with multiple vectors. Create two vectors x and y, where x contains the first 5 even numbers greater than zero, and y contains the first 5 uneven numbers greater than zero. Then see what max does, as in max(x, y). Is there a difference with max(y, x)?
  2. Now, try pmax(x, y), where p stands for “parallel”. Without using R, what do you think intuitively, what it will return? Then, check, and perhaps refine, your answer with R.
  3. Now try to find the parallel minimum of x and y. Again, first try to write down the output you expect. Then check with R (I assume, you can guess the appropriate name of the function).
  4. Let’s move from two to three vectors. In addition to x and y, add -x as a third vector. Write down the expected output for the parallel minima and maxima, then check your answer with R.
  5. Finally, let’s find out how pmax handles vectors of different lenghts Write down the expected output for the following statements, then check your answer with R.
  • pmax(x, 6)
  • pmax(c(x, x), y)
  • pmin(x, c(y, y), 3)

(Solution)




Working With Vectors

In the previous exercise set we practised vectors as a data structure. As I noted at the beginning of that set, perhaps you were already familiar with data in a vector-like structure in other applications such as Microsoft Excel or SPSS. If so, perhaps you also used those data to carry out calculations. In this set, we’re going to practise all sorts of calculations with vectors, from basic operations like addition and multiplication to somewhat more advanced arithmetics.

This is the second set in a series of five: In the first set (posted last week) we practised the basics of vectors. In set three and four (upcoming) we will practise more vector arithmetics to e.g. calculate all kinds of statistics, carry out simulations, sort data, or calculate the distance between two cities.

If you can’t wait till all sets are posted: you can find them right now in my ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

Exercise 1

Let’s create the following vectors:

u <- 4

v <- 8

Use the elementary arithmetic operators +, -, *, /, and ^ to:

  1. add u and v
  2. subtract v from u
  3. multiply u by v
  4. divide u by v
  5. raise u to the power of v

(Solution)

Exercise 2

Now, suppose u and v are not scalars, but vectors with multiple elements:

u <- c(4, 5, 6)

v <- c(1, 2, 3)

Without using R, write down what you expect as the result of the same operations as in the previous exercise:

  1. add u and v
  2. subtract v from u
  3. multiply u by v
  4. divide u by v
  5. raise u to the power of v

(Solution)

Exercise 3

We just saw how arithmetic operators work on vectors with the same length. But how about vectors that differ in length? Let’s find out… Consider the following vectors:

u <- c(5, 6, 7, 8)

v <- c(2, 3, 4)

Without using R, write down what you expect as the result of the same operations as in the previous exercise:

  1. add u and v
  2. subtract v from u
  3. multiply u by v
  4. divide u by v
  5. raise u to the power of v

Then check your answer with R. Which rule does R use, when it has to deal with vectors of different lengths?

(Solution)

Exercise 4

When we want to carry out a series of arithmetic operations, we can either use a single expression, or a series of expressions. Consider two vectors u and v:

u <- c(8, 9, 10)

v <- c(1, 2, 3)

We can create a new vector w in a single line of code:

w <- (2 * u + v) / 10

or carry out each operation on a separate line:

w <- 2 * u

w <- w + v

w <- w / 10

Convert the following expressions to separate operations, and check that both approaches give the same result:

  1. w <- (u + 0.5 * v) ^ 2
  2. w <- (u + 2) * (u - 5) + v
  3. w <- (u + 2) / ((u - 5) * v)

(Solution)

Exercise 5

We can do the reverse as well. Convert the following multi-line operations to a single expression. Check that both approaches give the same result.

Part a:

w <- u + v

w <- w / 2

w <- w + u

Part b:

w1 <- u^3

w2 <- u - v

w <- w1 / w2

(Solution)

Intro to Exercise 6, 7 and 8

Exercise 6, 7, and 8 focus on mathematics. Sooner or later you might have to translate mathematical formulas into R, to perform simple or more elaborate mathematical calculations. The goal of these exercises is to practise just that: how to translate a mathematical expression to R code. So, we won’t delve into the mathematics behind these formulas, and their derivation.

Also, in some cases, these formulas have already been translated to R by others, and are available in so-called contributed packages. We will deal with using these packages at a later stage, and for now the goal is just to become familiar with implementing mathematical formulas in R yourself.

So, here’s the deal: If you really hate math, or know for sure you will never use math in R, then it’s ok to skip exercise 6, 7, and 8. Otherwise: Let’s go for it and enjoy!

Exercise 6

Besides the arithmetic operators we have used so far, there are some more that we often use: log, exp, and sqrt. We can also use the well-known constant pi, by simply typing pi, instead of its value 3.1415927.

Let’s try to apply what we have learned so far to some well-known, somewhat more advanced formulas. Don’t let the math scare you. Just translate the formulas to R code, one operator at a time. Don’t hesitate to use multiple lines if that makes things easier, or add parentheses to make sure operations are carried out in the right order.

  1. Suppose the surface area of a circle equals 25, what is the radius?
  2. What is the probability density at x=0 of a normally distributed random variable x with mean (mu) equal to zero, and standard devation (sigma) equal to one (look up the formula online, e.g. https://en.wikipedia.org/wiki/Normal_distribution)?

(Solution)

Exercise 7

Consider the following formula to calculate the number of mortgage payment terms: \[n=\frac{\ln \Bigg(\dfrac{i}{\dfrac{M}{P}-i}+1\Bigg)}{\ln(1+i)}\] In this equation, M represents the monthly payment amount, P the principle, and i the (monthly) interest rate.

  1. Calculate the number of payment terms n for a mortgage with a principle balance of 200,000, monthly interest rate of 0.5%, and monthly payment amount of 2000.
  2. Now construct a vector n of length 6 with the results of this calculation for a series of monthly payment amounts: 2000, 1800, 1600, 1400, 1200, 1000.
  3. Does the last value of n surprise you? Can you explain it?

(Solution)

Exercise 8

Suppose you have geographical data and want to calculate the distance between two places on earth, given by their latitude and longitude coordinates. Consider the coordinates for:

  • Paris: 48.8566° N (latitude), 2.3522° E (longitude), and
  • New York 40.7128° N (latitude), 74.0060° W (longitude)

If you’re up for a real challenge, lookup “Great-circle distance” on Wikipedia, and use the spherical law of cosines to find the distance (and stop reading right now!).

If this sounds like a pretty daunting task, don’t worry! I will walk you through this step-by-step in the remainder of this exercise.

Ok, here we go. We will use the following common abbreviations:

  • latitude (\(\phi\)) phi
  • longitude (\(\lambda\)) lambda
  1. Create 4 scalars phi.paris, phi.ny, lambda.paris, lambda.ny, representing these coordinates. Because New York is located in the West, you have to enter this as a negative value (-74.0060).
  2. Convert the 4 coordinates from degrees to radians, using the formula: \[radians = degrees \frac{\pi}{180}\]
  3. Calculate the central angle between both cities, using the spherical law of cosines: \[\Delta\sigma=\arccos(\sin \phi_1 \sin \phi_2 + \cos \phi_1 \cos \phi_2 \cos(\Delta\lambda))\] where: \(\Delta\sigma\) is just a scalar (name it anything you want in R), \(\phi_1\) is the latitude of Paris, \(\phi_2\) the latitude of New York, and \(\Delta\lambda\) the absolute difference between both longitudes.Hint: For this calculation you need the following mathematical functions in R: sin, cos, acos, and abs.
  4. Finally, to find the distance, multiply \(\Delta\sigma\) (i.e., the outcome you just calculated) by the radius of the earth (6371 km.)

(Solution)

Exercise 9

Use the modulo operator (%%) to find out for which of the following pairs, the second number is a multiple of the first. Your R code should contain the modulo operator just once!

530, 1429410

77, 13960

231, 2425

8, 391600

(Solution)

 

Congratulations, you’re done with this set!

May I kindly ask you to share your thoughts on these exercises? This will allow me to further improve the quality of the exercises.

You can share your thoughts simply by adding a comment below. I am particularly interested in:

  • Which exercises you liked least and most
  • Which (if any) exercises were too hard, and should be simplified
  • Which (if any) exercises were too easy and should be more challenging
  • Overall comments on the quality of this set
  • Which topics you’d like to see addressed in future sets



Creating vectors

A vector is the most elementary way to store and structure data in R. For now, think of it as a list of numbers, which can be as short as a single number, or as long as about 2 billion(!) numbers. Perhaps you were used to working with lists of numbers already in a spreadsheet application (E.g., a row or column filled with numbers in Microsoft Excel), or statistics package (e.g. a numeric variable in SPSS or SAS). However, in R, vectors have so many applications that go beyond the use of data in the examples I just mentioned. R vectors are the basic building blocks underlying the most fancy dashboards, interactive apps, machine learning models, tables and figures.

Because vectors are such a key concept, we’re going to practise their use and application slowly, step-by-step. For now we’ll just practise numeric vectors (and save other types, such as character vectors, for later).

In this set we’re practising the basics of vectors, i.e. how to create vectors and assign them to a name. It is the first set in a series of five: In the second set (posted next week) we will practise working with vectors. In set three and four we will practise vector arithmetics to e.g. calculate all kinds of statistics, carry out simulations, sort data, or calculate the distance between two cities.

If you can’t wait till all sets are posted: you can find them right now in our ebook Start Here To Learn R – vol. 1: Vectors, arithmetic, and regular sequences. The book also includes the fifth and final set of the series. This final set focuses on the application of the concepts you learned in the first four sets, to real-world data.

Exercise 1

Let’s start really easy (don’t worry, we’ll quickly move to more challenging problems) with a vector containing just a single number, which we also call a scalar. Enter a vector in R, by just typing a random number, e.g. 100, at the prompt and hit the Enter key.

(Solution)

Exercise 2

Great! You just created your first vector! Now, let’s first enter a vector with more than one number. E.g. a vector containing the numbers 1, 2, 3, 4, 5, in that order. If you enter these numbers just like this, R will respond with an error message. It throws an error, because it needs a little bit more information from our side that we actually want to store those numbers in a vector structure. We have to use the following notation for this:

c(1, 2, 3, 4, 5).

Now, enter a vector with the first 5 even numbers in R, and hit Enter.

(Solution)

Exercise 3

Let’s now enter a much longer vector, containing the numbers 1 to 10, 10 times (use copy & paste). What do the numbers between square brackets in the R output mean?

(Solution)

Exercise 4

You should be pretty familiar with entering vectors now. You might actually feel a little bored by typing all these numbers. Life would be pretty miserable if we would have to enter data this way over and over again in R. But fortunately, there is a neat solution! We can assign a vector to a variable name such that we can retrieve the data we have entered, conveniently by just typing the name of the variable.

Try to assign a vector containing the numbers 1, 2, 3, 4, 5 to a variable named a, using the assignment operator (<-), and see which of the statements below work.

Enter each of the 9 statements one at a time at the prompt, hit Enter, and try to retrieve the contents of a, by typing a at the prompt after you entered each statement:

  1. a<-c(1, 2, 3, 4, 5)
  2. a <- c(50, 60, 70, 80, 90)
  3. a -> c(20, 31, 42, 53, 64)
  4. c(5, 6, 7, 9, 10) <- a
  5. c(101, 102, 103, 104, 105) -> a
  6. a < - c(11, 12, 13, 14, 15)
  7. a < -c(100, 99, 88, 77, 66)
  8. assign(a, c(1000, 2000, 3000, 4000, 5000))
  9. assign('a', c(83, 16, 35, 58, 3))

(Solution)

Exercise 5

In an R script, you might have created dozens or even hundreds of vectors. In that case, naming them a, b, c etc. is not ideal, because it will be difficult to keep track of what all those letters actually mean. This problem is easily mitigated by using longer, and meaningful, variable names.

Assign the following vectors to a meaningful variable name:

  1. c(2, 4, 6, 8, 10, 12, 14, 16, 20)
  2. 0
  3. 3.141593
  4. c(1, 10, 100, 1000, 10000, 100000)

(Solution)

Exercise 6

Create vectors that correspond to the following variables names:

  1. bmi
  2. age
  3. daysPerMonth
  4. firstFivePrimeNumbers

(Solution)

Exercise 7

So far, we have created vectors from a bunch of numbers. Instead of numbers, however, you can also enter other vectors, e.g. c(vector1, vector2, vector3), and string them together.

To practise this, let’s first create three vectors that each contain just 1 element with variable names p, q, and r, and values 1, 2, and 3. Then, create a new vector that contains multiple elements, using the scalars we just created. I.e., create a vector u of length 3, with the subsequent elements of p, q and r.

(Solution)

Exercise 8

To play with this a little more, let’s create a longer vector, using only the assignment operator (<-), the c() function, and the vector u we just created. I.e., create a new vector u with length 96 that contains the elements of u as follows: 1, 2, 3, 1, 2, 3, …., 1, 2, 3

(Solution)

Congratulations, you’re done with this set!

May I kindly ask you to share your thoughts on these exercises? This will allow me to further improve the quality of the exercises.

You can share your thoughts simply by adding a comment below. I am particularly interested in:

  • Which exercises you liked least and most
  • Which (if any) exercises were too hard, and should be simplified
  • Which (if any) exercises were too easy and should be more challenging
  • Overall comments on the quality of this set
  • Which topics you’d like to see addressed in future sets



Comparing Vectors Exercises

When one is first learning R, a handy tool is to be able to distinguish if two columns in a dataset are different. This is particularly useful if you’re testing out transformations by tweaking parameters and seeing what changes. In this exercise set, will explore methods for

Exercises in this section will be solved using the Base R and the dplyr package. It is recommended to take a look at the documentation for functions as you encounter them before continuing.

Answers to the exercises are available here.

Exercise 1
Load the mtcars dataset.

Exercise 2
Check that hp does not equal mpg and that am equals itself.

Hint: an easy way to do this for small data sets is to return and inspect a boolean vector.

Exercise 3
Add noise to mpg by generating a random poisson value with mean one and adding it to mpg. Call this new column mpg_noise. Compare mpg and mpg_noise.

Hint: you can use the rpois function to generate the random term.

Exercise 4
Return the indices of the rows which have changed.

Exercise 5
Return a dataframe containing only those overvations that did not change.

Learn more about data structures in the online course R: Complete Data analysis solutions. In this course you will learn how to

  • work with different basic data structures,
  • know how to compare and develop different data preprocessing techniques using todyverse and base-r,
  • and much more.

Exercise 6
In the dataset from Exercise 3, compare mpg and mpg_noise using all.equal.

Exercise 7
Now, let’s consider one method for comparing two rows from similar datasets. From mtcars, create a dataset with a new variable called nonsense equal to the product of mpg and hp.

Exercise 8
From the dplyr package, use the bind_rows function to combine the first observations from each of the datasets from Exercise 3 and Exercise 7.

What did bind_rows do that rbind cannot?

Exercise 9
Transpose the two row dataframe from Exercise 8.

Exercise 10
Call all.equal on each column of the dataframe from Exercise 9. Is the result what you expected?




How to create your first vector in R

Are you an expert R programmer? If so, this is *not* for you. This is a short tutorial for R novices, explaining vectors, a basic R data structure. Here’s an example:

10 150 30 45 20.3

And here’s another one:

-5 -4 -3 -2 -1 0 1 2 3

still another one:

"Darth Vader" "Luke Skywalker" "Han Solo"

and our final example:

389.3491

These examples show that a vector is, simply speaking, just a collection of one (fourth example) or more (first and second example) numbers or character strings (third example). In R, a vector is considered a data structure (it’s a very simple data structure, and we’ll cover more complex structures in another tutorial).

So, how can we set up and use a vector in R?

We can construct a vector from a series of individual elements, using the c() function, as follows:

c(10, 150, 30, 45, 20.3)
## [1]  10.0 150.0  30.0  45.0  20.3

(In examples like these, lines starting with ## show the output from R on the screen).

Assigning a vector

As you’ll see, once you have entered the vector, R will respond by displaying its elements. In many cases it will be convenient to refer to this vector using a name, instead of having to enter it over and over again. We can accomplish this using the assign() function, which is equivalent to the <- and = operators:

assign('a', c(10, 150, 30, 45, 20.3))
a <- c(10, 150, 30, 45, 20.3)
a = c(10, 150, 30, 45, 20.3)

The second statement (using the <- operator) is the most common way of assigning in R, and we’ll therefore use this form rather than the = operator or the assign() function.

Once we have assigned a vector to a name, we can refer to the vector using this name. For example, if we type a, R will now show the elements of vector a.

a
## [1]  10.0 150.0  30.0  45.0  20.3

Instead of a, we could have chosen any other name, e.g.:

aVeryLongNameWhichIsCaseSensitive_AndDoesNotContainSpaces <- c(10, 150, 30, 45, 20.3)

Strictly speaking, we call this “name” an object.

To familiarize yourself with the vector data structure, now try to construct a couple of vectors in R and assign them to a named object, as in the example above.

Learn more about vectors in the online course R Programming A-Z™: R For Data Science With Real Exercises! This course had more than 68,000 students enrolled already and does not require prior knowledge of R.

To summarize: A vector is a data structure, which can be constructed using the c() function, and assigned to a named object using the <- operator.

Now, let’s move on to the first set of real exercises on vectors!




Vectors Vol. 2 Exercises

vector

[For this exercise, first write down your answer, without using R. Then, check your answer using R.]

Answers to the exercises are available here.

Exercise 1

Consider two vectors, x, y
x=c(4,6,5,7,10,9,4,15)
y=c(0,10,1,8,2,3,4,1)

What is the value of: x*y

Exercise 2

Consider two vectors, a, b

a=c(1,2,4,5,6)
b=c(3,2,4,1,9)
What is the value of: cbind(a,b)

Exercise 3

Consider two vectors, a, b

a=c(1,5,4,3,6)
b=c(3,5,2,1,9)
What is the value of: a<=b

Exercise 4

Consider two vectors, a, b

a=c(10,2,4,15)
b=c(3,12,4,11)
What is the value of: rbind(a,b)

Exercise 5

If x=c(1:12)
What is the value of: dim(x)
What is the value of: length(x)

Exercise 6

If a=c(12:5)
What is the value of: is.numeric(a)

Exercise 7

Consider two vectors, x, y

x=c(12:4)
y=c(0,1,2,0,1,2,0,1,2)
What is the value of: which(!is.finite(x/y))

Exercise 8

Consider two vectors, x, y

x=letters[1:10]
y=letters[15:24]
What is the value of: x<y

Exercise 9

If x=c('blue','red','green','yellow')
What is the value of: is.character(x)

Exercise 10

If x=c('blue',10,'green',20)
What is the value of: is.character(x)

Want to practice vectors a bit more? We have more exercise sets on this topic here.




Regular Sequences Vol. 2 Exercises

regular sequence

[For this exercise, first write down your answer, without using R. Then, check your answer using R.]

Answers to the exercises are available here.

Exercise 1

if x <- c(a = 1, b = 2,c=3,d=4)
What is the output for the code:
seq(5,11,along.with =x)

Exercise 2

If x= seq(4,12,4) ,
what is the output for the code:
rep(x,each=2)

Exercise 3

What is the output for the code:
seq(5,11,by=2,length.out=3)

Exercise 4

What is the output for the code:
rep(letters[1:10],3)

Exercise 5

Create a sequence with values:
100 95 90 85 80 75 70 65 60 55 50

Exercise 6

What is the output for the code:
seq(10,0,by=5)

Exercise 7

What is the output for the code:
seq(2,10,by=4)==c(2,6,10)

Exercise 8

What is the output for the code:
 rep(c('seq','rep'),each=4)

Exercise 9

Consider two variables, A and B,
A= as.Date("2016-11-01")

B = as.Date("2016-11-15")

What is the output for the code:
seq.Date(A,B, by = "1 day")

Exercise 10
Consider two variables, C and D,

C= as.Date("2016-02-01")

D = as.Date("2016-06-15")

What is the output for the code:
seq.Date(D,C, by = "-1 month")

 

Want to practice regular sequences a bit more? We have more exercise sets on this topic here.




Index vectors

In the exercises below we cover the basics of index vectors. Before proceeding, first read section 2.7 of An Introduction to R, and the help pages for the sum, and which functions.

Answers to the exercises are available here.

Exercise 1
If x <- c("ww", "ee", "ff", "uu", "kk"), what will be the output for x[c(2,3)]?
a. "ee", "ff"
b. "ee"
c. "ff"

Exercise 2
If x <- c("ss", "aa", "ff", "kk", "bb"), what will be the third value in the index vector operation x[c(2, 4, 4)]?
a. "uu"
b. NA
c. "kk"

Exercise 3
If x <- c("pp", "aa", "gg", "kk", "bb"), what will be the fourth value in the index vector operation x[-2]?
a. "aa"
b. "gg"
c. "bb"

Exercise 4
Let a <- c(2, 4, 6, 8) and b <- c(TRUE, FALSE, TRUE, FALSE), what will be the output for the R expression max(a[b])?

Exercise 5
Let a <- c (3, 4, 7, 8) and b <- c(TRUE, TRUE, FALSE, FALSE), what will be the output for the R expression sum(a[b])?

Exercise 6
Write an R expression that will return the sum value of 10 for the vector x <- c(2, 1, 4, 2, 1, NA)

Exercise 7
If x <- c(1, 3, 5, 7, NA) write an r expression that will return the output 1, 3, 5, 7.

Exercise 8
Consider the data frame s <- data.frame(first= as.factor(c("x", "y", "a", "b", "x", "z")), second=c(2, 4, 6, 8, 10, 12)). Write an R statement that will return the output 2, 4, 10, by using the variable first as an index vector.

Exercise 9
What will be the output for the R expression (c(FALSE, TRUE)) || (c(TRUE, TRUE))?

Exercise 10
Write an R expression that will return the positions of 3 and 7 in the vector x <- c(1, 3, 6, 7, 3, 7, 8, 9, 3, 7, 2).




Character vector exercises

character vectorsIn the exercises below we cover the basics of character vectors. Before proceeding, first read section 2.6 of An Introduction to R, and the help pages for the nchar, substr and sub functions.

Answers to the exercises are available here.

Exercise 1
If x <- “Good Morning! “, find out the number of characters in X
a. 1
b. 14
c. 13

Exercise 2
Consider the character vector x <- c (“Nature’s”, “Best “), how many characters are there in x?
a. 12
b. 13
c. 8,5

Exercise 3
If x <- c("Nature’s"," At its best ") , how many characters are there in x?
a. 19
b. 8, 13
c. 8, 9

Exercise 4
If fname <- “James“ and lname <- “Bond”, write some R code that will produce the output "James Bond".

Exercise 5
If m <- “Capital of America is Washington” then extract the string “Capital of America” from the character vector m.

Exercise 6
Write some R code to replace the first occurrence of the word “failed” with “failure” in the string “Success is not final, failed is not fatal”.

Exercise 7
Consider two character vectors:
Names <- c("John", "Andrew", "Thomas") and
Designation <- c("Manager", "Project Head", "Marketing Head").
Write some R code to obtain the following output.
Names Designation
1 John Manager
2 Andrew Project Head
3 Thomas Marketing Head

Exercise 8
Write some R code that will initialise a character vector with fixed length of 10.

Exercise 9
Write some R code that will generate a vector with the following elements, without using loops.
"aa" "ba" "ca" "da" "ea" "ab" "bb" "cb" "db" "eb" "ac" "bc" "cc" "dc" "ec"
"ad" "bd" "cd" "dd" "ed" "ae" "be" "ce" "de" "ee"

Exercise 10
Let df <- data.frame(Date = c("12/12/2000 12:11:10")) . Write some R code that will convert the given date to character values and gives the following output:
"2000-12-12 12:11:10 GMT"




Missing values

missing valuesToday we’re training how to handle missing values in a data set. Before starting the exercises, please first read section 2.5 of An Introduction to R.

Solutions are available here.

Exercise 1
If X <- c (22,3,7,NA,NA,67) what will be the output for the R statement length(X)

Exercise 2
If X = c(NA,3,14,NA,33,17,NA,41) write some R code that will remove all occurrences of NA in X.
a. X[!is.na(X)]
b. X[is.na(X)]
c. X[X==NA]= 0

Exercise 3
If Y = c(1,3,12,NA,33,7,NA,21) what R statement will replace all occurrences of NA with 11?
a. Y[Y==NA]= 11
b. Y[is.na(Y)]= 11
c. Y[Y==11] = NA

Exercise 4
If X = c(34,33,65,37,89,NA,43,NA,11,NA,23,NA) then what will count the number of occurrences of NA in X?
a. sum(X==NA)
b. sum(X == NA, is.na(X))
c. sum(is.na(X))

Exercise 5
Consider the following vector W <- c (11, 3, 5, NA, 6)
Write some R code that will return TRUE for value of W missing in the vector.

Exercise 6
Load ‘Orange’ dataset from R using the command data(Orange) . Replace all values of age=118 to NA.

Exercise 7
Consider the following vector A <- c (33, 21, 12, NA, 7, 8) .
Write some R code that will calculate the mean of A without the missing value.

Exercise 8
Let:
c1 <- c(1,2,3,NA) ;
c2 <- c(2,4,6,89) ;
c3 <- c(45,NA,66,101) .
If X <- rbind (c1,c2,c3, deparse.level=1) , write a code that will display all rows with missing values.

Exercise 9
Consider the following data obtained from df <- data.frame (Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, 56, 60), Price = c(34, 52, 21, 44, 20), stringsAsFactors = FALSE)
Write some R code that will return a data frame which removes all rows with NA values in Name column

Exercise 10
Consider the following data obtained from df <- data.frame(Name = c(NA, “Joseph”, “Martin”, NA, “Andrea”), Sales = c(15, 18, 21, NA, 60), Price = c(34, 52, 33, 44, NA), stringsAsFactors = FALSE)
Write some R code that will remove all rows with NA values and give the following output

Name Sales Price
2 Joseph 18 52
3 Martin 21 33