For a majority of users, the primary use of R is for statistical testing and analysis. At the heart of this, within the frequentist world, lies hypothesis testing and distribution sampling.
The skill in conducting this sort of work is being able to identify an appropriate distribution on which to model the question and test accordingly. Conveniently, within R, the syntax for conducting the test and drawing distributional samples is very uniform. The aim of this tutorial is to expose users less familiar with the statistical theory to some of the more common distributions and their applications.
Answers to the exercises are available here.
Exercise 1
Set a seed to 123 and plot a histogram of 1000 draws from a normal distribution with mean 10, standard deviation 2.
Exercise 2
Using a QQ plot. Assess the normality of your previously simulated draws.
Exercise 3
Using a t-test, test for a difference in means between your samples. 1000 samples of the Student’s t-distribution, 10 degrees of freedom, and a delta value of 9. Report on your p-value and its significance at the 5% level.
Exercise 4
Rewrite your t-test, testing now if your normal samples have a greater mean than your samples from the Student’s t-distribution. Report on your new p-value.
Exercise 5
Putting these skills together now, calculate a two-sided t-test of equal means from two normal distributions. The first of mean 1, standard deviation 0.5, the second of mean 0.9, a standard deviation of 1. Hint: A function may become useful here.
Exercise 6
Replicate this t-test 1000 times and test for a standard uniform distribution, using a QQ plot.
- Run different statistical tests, including t-tests and chi-square tests,
- Do different type of statistical analyses, including ANOVA and regression,
- And much more
Exercise 7
Moving onto other distributions now. The probability of rolling a score greater than 3 on a loaded die is 0.75. What is the probability of rolling exactly 5 scores greater than 3 from 10 rolls of the die?
Exercise 8
Building on this, what is the probability of rolling a score greater than 3, less than 5 times, from 10 rolls?
Exercise 9
On average, a cashier serves 50 people per hour in their shop. What is the probability that they serve 60 people or more during one hour?
Exercise 10
For the cashier serving, each transaction takes 50 seconds on average. What is the probability of the transaction being completed in less than 30seconds?
Hi, thanks for this. I can understand Q1-5 easily, but from 6 onwards I’m not familiar with the syntax so it’s harder to grasp. Especially Q7-Q10 where I’m just totally lost as to why those commands were used.