Luckily there is a statistical test that is perfect for your situation, and (big surprise here) it is called a t-test.
You use a t-test to find out if the averages of two sets of data are the same or not.
The first thing to get out of the way is that statisticians obviously have a different meaning for the word “same” than the rest of us. “Same” for a statistician means something like “close enough that any differences are due to chance.”
For example, let’s say you fed an identical diet to two identical tanks of fish. After a few weeks you found that some fish gained a few extra grams and some didn’t, so overall the average weight gain was 299 grams in the first tank and 300 grams in the second tank. A statistician could probably show that the weight gains were the “same” in both tanks, even though any preppy would tell you that the numbers “299” and “300” are not “the same”.
Statisticians even have a word for this (three words, actually): “not statistically significant”. If a difference is “not statistically significant”, that means “the averages are so close that any differences are probably just due to chance” in this context.
On the other hand, what you (the bright young fishfood innovator) are trying to prove is that the differences between fish that eat or don’t eat Fish-2-Whale ARE statistically significant.
On the other hand, what you (the bright young fish food innovator) are trying to prove is that the differences between fish that eat and fish that don’t eat Fish-2-Whale ARE statistically significant.
So, getting back to our fish data, we can look at the graph and see that the distributions do not seem to overlap much, and the averages look like they are far apart. The t-test will (we hope) allow us to confirm this visual impression. It is important to realise that the t-test has no super powers — if your data do not contain a real difference, a t-test will not magically make one for you. Look at the data below:
|will NOT be
might be stat. sig.
What is the take home message? If your data has an obvious difference or lack of difference that will simply be confirmed by a t-test. BUT … if your data falls in a “grey zone”, like in the middle graph above, then the difference between the distributions MAY be statistically significant. Since it is hard to tell, that’s when the t-test will actually help you make a decision.
photo credits: goldfish