So our tcalc was about 5.1 — is this big enough? Could we have something this big just by chance? Well, once again it depends on our sample sizes. We still need someone to tell us just how big the number has to be to mean “statistically significant”. And finally, we get a break. Because statisticians have created a simple table, below.
degrees of freedom (df) |
tcrit (for p-value = 0.05) |
---|---|
1 | 12.7 |
2 | 4.3 |
3 | 3.2 |
4 | 2.8 |
5 | 2.6 |
6 | 2.5 |
7 | 2.4 |
8 | 2.3 |
9 | 2.2 |
10 | 2.1 |
20+ | 2.0 |
Why do we need a table? Because (at least unless you decide to pursue theoretical statistics) it’s pretty hard to explain the complicated formulas used to calculate the numbers in the table.
But how do we use this table? Well, first we have to find the correct row in the table. The rows are again sort of related to sample size. They are labelled “degrees of freedom”, which is a common concept throughout all of statistics.
Degrees of freedom tell you how many independent (“free to vary”) pieces of information are provided by the data in an experiment.
That word “independent” will cause us some grief because we have done a lot of calculations with our data and made some statistical assumptions about the populations. There’s a very complicated formula built into most statistical software that keeps track of how many independent pieces of information were used. If no software is available, a nice and very simple rule that statisticians tell us we can use is that.
The number of degrees of freedom is one less than the smaller sample size.
For our initial experiment this is df =7, and it corresponds to a tcrit value of 2.4 on the table. In other words, if there’s no real difference between treatment and control, then the ratio of observed effect to error should be no more than 2.4. Or conversely, if our effect was more than 2.4 times bigger than error, it means our treatment really was different from (better than, in this case) the control.
So, we compare our value tcalc = 5.1 to the number in the table tcrit = 2.4, and since our calculated value is bigger than the critical value, we can say that the observed effect is statistically significant. That’s it, end of the road, the test is complete – we have supported our hypothesis that Fish-2-Whale is better than the control.
At this point a few WARNINGS might be appropriate. There are different tables available depending on what you want to do. This table has (a=0.05) which means it is telling you how large t should be before you can say that there is only a small probability of 5% that the effect we are seeing is just due to random chance. This table is also gives the results for a two-tailed test, which means we are not assuming in advance whether the difference we are looking for is positive or negative (i.e. we assume before we collect the data that the new fish food could be better or it could be worse!). There are other tables used for other a values and for one-tailed tests.