Before starting any experiment, careful planning needs to take place. For instance, how many samples are required for your experiment? This question is important for two reasons. First, an experiment with too few of samples may not be able to determine real differences between, say a control and experimental group. And second, too many samples may incur unnecessary costs. These questions center around the idea of a power analysis.
Statistical power is the probability of detecting a significant trend, given a significant trend actually exists. For example, a statistical power of 0.8 means there is a 0.8 probability of detecting a significant trend. Equivalently, a statistical power of 0.8 implies there is a 0.2 probability that you fail to detect a significant trend, in other words, a false-negative. Statistical power is intricately linked to three other quantities: statistical significance, effect size, and sample size. The threshold for statistical significance, or α, is the probability of false-positive. Therefore, an α of 0.05 would mean that there is a 0.05 probability that an experiment would show a significant trend, even though the trend is not actually significant. Effect size is the strength of the trend. And lastly, the sample size is the number of samples for an experiment.
Thankfully for us, the pwr package in R has a number of tools to conduct a power analysis for common experimental designs.
#install.packages('pwr', dependencies = TRUE) require(pwr)
Suppose we are planning an experiment where we have both a control and experimental group. We determined that a t-test is an appropriate test to compare differences between the two groups. We want to determine how many samples will be necessary to be confident in our results. We use the
pwr.t.test() command in R. This command has four important arguments: sample size (n), effect size (d), significance level (sig.level), and statistical power (power). We have to specify three of the four arguments to obtain the fourth. For example, let us imagine that we want a significance level of 0.05 a statistical power of 0.8. In addition, we anticipate an effect size (d) of 0.3. These values will be dependent on your particular experiment.
pwr.t.test(n = , d = 0.3 , sig.level = 0.05, power = 0.8, type = "two.sample")
## ## Two-sample t test power calculation ## ## n = 175.3847 ## d = 0.3 ## sig.level = 0.05 ## power = 0.8 ## alternative = two.sided ## ## NOTE: n is number in *each* group
We see that for our specified experimental setup, we need a sample size (n) of 175. This is really useful to know before we start some large, expensive experiment.
We could also determine the statistical power given a set sample size of 90.
pwr.t.test(n = 90, d = 0.3 , sig.level = 0.05, power = , type = "two.sample")
## ## Two-sample t test power calculation ## ## n = 90 ## d = 0.3 ## sig.level = 0.05 ## power = 0.5166416 ## alternative = two.sided ## ## NOTE: n is number in *each* group
We see that with a sample size of 90 and effect size of 0.3, we only have a statistical power of 0.51. Therefore, given there is a significant difference between our two groups, we have about a coin flip’s probability of detecting that difference. This is not a well-designed experiment.
sample_sizes = seq(5,300,by=5) power_t_test=pwr.t.test(n = sample_sizes , d = 0.3 , sig.level = 0.05, power = , type = "two.sample") par(mfrow=c(1,1),oma=c(0,0,0,0),mar=c(4.5,4.5,0.5,0.5)) plot(sample_sizes,power_t_test$power,type='b',pch=16,las=1, ylab='statistical power',xlab='sample size', cex.lab=1.4, cex.axis=1.2, ylim = c(0,1)) abline(h = 0.80, lty=2, lwd=2,col='red')
The pwr package also allows power calculations for ANOVAs, chi-squared tests, general linear models, and differences of proportions.