On this set of exercises, we are going to explore some of the probability functions in R with practical applications. Basic probability knowledge is required.
Note: We are going to use random number functions and random process functions in R such as runif
, a problem with these functions is that every time you run them you will obtain a different value. To make your results reproducible you can specify the value of the seed using set.seed(‘any number’)
before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random numbers). For this set of exercises we will use set.seed(1)
, don’t forget to specify it before every random exercise.
Answers to the exercises are available here
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Generating random numbers. Set your seed to 1 and generate 10 random numbers using runif
and save it in an object called random_numbers
.
Exercise 2
Using the function ifelse
and the object random_numbers
simulate coin tosses. Hint: If random_numbers
is bigger than .5 then the result is head, otherwise is tail.
Another way of generating random coin tosses is by using the rbinom
function. Set the seed again to 1 and simulate with this function 10 coin tosses. Note: The value you will obtain is the total number of heads of those 10 coin tosses.
Exercise 3
Using the function rbinom
to generate 10 unfair coin tosses with probability success of 0.3. Set the seed to 1.
- work with different binomial and logistic regression techniques,
- know how to compare regression models and choose the right fit,
- and much more.
Exercise 4
We can simulate rolling a die in R with runif
. Save in an object called die_roll
1 random number with min = 0
and max = 6
. This mean that we will generate a random number between 1 and 6.
Apply the function ceiling
to die_roll
. Don’t forget to set the seed to 1 before calling runif
.
Exercise 5
Simulate normal distribution values. Imagine a population in which the average height is 1.70 m with an standard deviation of 0.1, using rnorm
simulate the height of 100 people and save it in an object called heights
.
To get an idea of the values of heights applying the function summary
to it.
Exercise 6
a) What’s the probability that a person will be smaller or equal to 1.90 m ? Use pnorm
b) What’s the probability that a person will be taller or equal to 1.60 m? Use pnorm
Exercise 7
The waiting time (in minutes) at a doctor’s clinic follows an exponential distribution with a rate parameter of 1/50. Use the function rexp
to simulate the waiting time of 30 people at the doctor’s office.
Exercise 8
What’s the probability that a person will wait less than 10 minutes? Use pexp
Exercise 9
What’s the waiting time average?
Exercise 10
Let’s assume that patients with a waiting time bigger than 60 minutes leave. Out of 100 patients that arrive to the clinic how many are expected to leave? Use pexp
Nice exercise set!
For exercise 10, you probably meant to use pexp().
Also the solution is for 100 patients.
Hi Simon,
Thanks for taking a look at the set and for letting us know about the error. I already fix it.
Thank you for this exercise set.
Could you take a look at the directions for exercise 4.
It says:
“Save in an object called die_roll 1 random number with min = 1 and max = 6.
This mean that we will generate a random number between 1 and 6.”
The R help for runif indicates that, except for rare cases, runif() will never
return the extreme values. That is, it returns an _open_ interval between
min & max. Looks to me like a die role of 1 will never happen (and that
the interval from min to min +1 will always be one point short, asymptotic
to uniform, not exactly).
I checked my understanding by running the exercise with 100 draws and
never got a 1. Am I missing something?
I report this because if there is a bug, it will probably confuse people.
Lee
To be clear, when ceiling() is applied, a 1 spot will not appear, since 1.n
will be forced to 2 spots.
(I believe there is also a pedantic “short by one point” issue at the
max end.)
Thanks again. I am still working my way through, but have found
these exercises very useful.
Lee
Yeah the mistake was in min =1 it should suggest to fix min = 0. The ceiling function will always round the numbers to the closer higher natural number
Hi Lee,
Thank you so much for taking the time to check the set of exercises. You are right,
the exercise should say min = 0 instead of 1. I will fix it ASAP
Francisco,
Could you check the description of Exercise 2 against the supplied answers?
The description says:
Note: The value you will obtain is the total number of heads of those 10 coin tosses.
The (kindly) supplied answer appears to give the number of _tails_ seen (by counting
the line above in the answer).
Am I missing something? Is reporting errata useful to you?
Thank you.
Lee
This one is tricky, what the function rbinom reports is the number of succesfull events, in this case I decided to be heads (randomly)
The tricky part is that the probability of success and the failure probability are the same .5 so in this particularly exercise you can treat the result as head or tails.
On exercise 3 this distinction is clear because the probability of success is different from the failure one
Francisco,
Thank you for your timely & helpful reply. I concur that the
results of exercise 3 are clearer,
Sorry to be dense, but I still do not understand the provided
solution for exercise 2. The variable coin_tosses_1 in the
answer set clearly has 6 heads (as I would expect from the
printout of variable random_numbers in the solution to
exercise 1). The variable coin_tosses in part 2 of exercise 2
is the inverse of coin_tosses_1 and clearly differs.
If the intent of the exercise is to show two ways of calculating
the same thing, the difference is puzzling.& confusing.
Thank you for any help.
Lee
The C code in rbinom.c is pretty dense but it appears
to have a floating point comparison to 0.5. Example 2, part 2
appears to be triggering the inversion of the sense of success by
falling on the unexpected side of the fencepost. Someone, somewhere
must love floating point math, perhaps its mother!
Francisco,
Could you check the match between the description of Exercise 6
and the (kindly) supplied answer?
I believe that the supplied answer is the intended exercise but that
the description contains two subtle fencepost errors. The description
uses “smaller than” and “taller than”. By my understanding of English
these translate to “strictly less than” and “strictly taller than” or .
That is, _exclusive_ of the density at the endpoint.
If I understand pnorm() correctly, it returns a density _inclusive_ (=)
of the endpoint.
One can fuss around with the math to exclude the endpoint, but that makes
the problem both harder and different than the supplied answer.
My apology if I am missing something obvious.
Lee
That’s an interesting comment. Your right pnorm() returns a density _inclusive_(=), but since the normal distribution is
continous the probability of the endpoint alone is 0 in an strict sense. In other words the probability that someone’s height is 1.6800000 (imagine more zeros after)
is 0, there will be persons super close to that value but never the same one.
By then the probability that a person will be smaller than 1.90 is the same as the probability that a person will be smaller or equal to 1.90
Francisco,
Could you take a look at the (mis) match of the description & solution
for Exercise 10? The relevant fragment of the description is:
Out of 100 patients that arrive to the clinic how many are expected to leave?
Given that description, I believe there is a missing truncation in the (kindly)
provided solution.
Punting a long philosophical discussion about the essential natures of people &
beers(1), people & beers come in whole units. This gives an integer result for
“how many”. Admittedly a fine point,
I believe a fix would be to either add “whole number” to the description
change the solution to truncate (better teaching option), or to just change
the solution.
Lee
1) Agreed that the concept of beers having discrete units rather than a composite
amount is relatively recent, since bottling,