If your shoes don’t fit a little, they might cause a little pain, but not enough to pay attention to. But somewhere there’s a threshold. If the shoes are too small, you go out and buy new ones.

Something similar happens with statistical tests such as the chi-squared. In order to do the test, we set up a null hypothesis, which is that the data fits the model.

If your calculated statistic value (i.e. the chi-squared statistic) is a “little bit” big, that’s not enough to contradict your null hypothesis. But if it’s a LOT too big, then it does matter — it is “significant”.

I know this is still rather vague, so hang on. Statisticians calculate the strength of the evidence against the null hypothesis using what they call a “p-value” (p stands for “probability”, not “pain”). The smaller the p-value, the stronger the evidence against the null hypothesis, so it is rejected. A large p-value indicates weak evidence against the null hypothesis, so it is not rejected. As p-values are probabilities, they have a value between 0 and 1. p-values are typically compared to a significance level of 0.05, that is, α=0.05. In fact, this is the α=0.05 famous threshold that most scientists use (well, not famous like The X Factor, but trust me, famous among statisticians and scientists). So, to summarise:

Assuming that the null hypothesis is true,

i) if the p-value is less than or equal to α , (typically α=0.05) then reject the null hypothesis

ii) if the p-value is greater than α then do not reject the null hypothesis.

p-values are usually calculated using statistical software, for example, Excel.

You might like to check out the following YouTube video which talks about p-values (Statistics Learning Centre, look for the video on p-values).