Hypothesis testing

Last revised by Stefan Tigges on 3 Jan 2024

Hypothesis testing is a statistical method used to evaluate clinical trial results and consists of four consecutive steps:

  1. specification of the null hypothesis and the alternative hypothesis

  2. data collection

  3. statistics and p-value calculation

  4. rejection or failure to reject the null hypothesis

The null hypothesis (H0) is that there is no difference between groups being evaluated, while the alternative hypothesis (HA) is that there is a difference between groups. For example, in the National Lung Screening Trial (NLST), H0 is that there was no difference in lung cancer mortality between subjects screened for lung cancer with low dose CT and those screened with chest x-ray while HA is that there was a mortality difference. Hypothesis testing evaluates the plausibility of the null hypothesis in light of the data gathered in the trial.  

Carry out the clinical trial and gather data in a way suitable to test the hypothesis.

Calculate a p-value. The p-value is the probability of getting a result at least as extreme as the one observed in the trial. The p-value is a conditional probability (P(data observed|H0 true)). Since the p-value is calculated assuming that the null hypothesis is true, the "expected" trial result is that there is no difference between groups. Because of random sampling error, small differences between identical groups do occur.   

Compare the p-value with alpha (α), the predetermined level at which we "reject" the null. Alpha is usually set at 0.05 or 5%. This means that if our p-value is <0.05, we conclude that H0 is implausible and we reject the null. Remember, under the assumption that the null hypothesis was true, we expected no difference between groups: if the probability of seeing a difference as large as the one we saw is small, perhaps the null hypothesis is incorrect. In the NLST for example, there were 20% fewer lung cancer deaths in the low-dose CT group compared to the chest X-ray group, resulting in a p-value of 0.004. If the p-value is >0.05, we conclude that we have insufficient evidence to reject the null. Because the p-value is calculated assuming H0 is true, it is incorrect to "accept" the null if p is >0.05.

Hypothesis testing is a nearly universal feature of articles in the medical literature but is unsatisfactory for multiple reasons.

  • the p-value gives no information regarding the size of an effect

  • the p-value gives no information regarding the variability of an effect. For this reason, confidence limits are often used instead of or in addition to p-values

  • p-values don’t tell us about what we’re actually interested in. We want to know how likely the alternative hypothesis is given the data that we observed, but p-values tell us how likely our observation is assuming that the null hypothesis is true

  • statistical significance (p-value < 0.05) is not the same as clinical significance

  • p-values cannot account for the effects of non-random error (bias)

  • overreliance on the p-value to determine the plausibility of the null hypothesis ignores the pre-experiment likelihood that H0 or HA are true. For example, if one were to perform 100 experiments on the ability of individuals to perform a psychic feat, the likelihood of at least one of these experiments showing an effect due to random error is high

  • rejecting H0 is not the same as accepting HA

  • p-values do not address the probability of making a beta error i.e. a false negative clinical trial result

ADVERTISEMENT: Supporters see fewer/no ads

Updating… Please wait.

 Unable to process the form. Check for errors and try again.

 Thank you for updating your details.