If your shoes don’t fit a little, they might cause a little pain, but not enough to pay attention to.
But somewhere there’s a threshold. If the shoe is too small, you go out and buy new ones.
In the same way, saying that something is only 30% likely to occur according to the null hypothesis,
is not enough pain to say that the data does not fit the model. But there is a threshold, called a
p-value (p stands for "probability", not "pain"). In fact, most scientists use
a 5% threshold.
Usually when we talk about p-values, we’re in the middle of doing some sort of statistics (like a t-test,
a regression, or a chi-square test). But p-values work just as well for the brute-force simulation we’re
discussing here. The idea is, if the hypothesized process (in this case, random sickdays) produces the
observed data less than 5% of the time, then the hypothesized process is probably NOT responsible for the data.
How could the 5% threshold be applied to the sickdays problems?
If you were following that closely, you realize that in order to show that the data fits the model,
you need to show that the hypothesized process produces the observed data MORE THAN 5% of the time.
On the other hand, if you are familiar with t-tests or regression, you know that a "good"
result is one in which your value is LESS THAN the p-value.
The reason for this discrepancy is that with a t-test you are trying to DISPROVE the null hypothesis.
In a goodness of fit test, you are trying to SUPPORT the null hypothesis. So:
|t-test||goal: disprove the null hypothesis||want to go below the p-value|
|chi-square||goal: show that the data fits the null hypothesis||want to exceed the p-value|