The difference between the means of the two groups is 267g – 173g = 94g. In other words, fish that got our new food grew, on average, 94g bigger.

But how accurate is this difference? We have two sources of errors: there is 16g error in the first number and 10g error in the second number.

If we were very pessimistic or very unlucky, we would add all the errors together and claim that we might be out by as much as 16g + 10g = 26g.

If we were very optimistic or very lucky, maybe our errors compensated each other and partially cancelled, so we could claim our final error might be as small as only 16g – 10g = 6g.

Like Goldilocks, statisticians avoid either extreme and choose something in between. These three triangles below explain how statisticians think about combining errors – SEM_{t} is the treatment error, and SEM_{c} is the control error.

The red lines show the possible sizes of the combined error. The triangle on the left with the small angle gives an optimistic estimate of the combined error; the triangle on the right with the large angle gives a pessimistic estimate of the combined error. Statisticians compromise and choose the one in the middle with a right-angle. The hypotenuse of this triangle gives what is called the Standard Error in the Difference of the Means, or SEDM.

This way of combining standard errors uses the Pythagorean theorem, which can help you remember it.

The formula for SEDM using the Pythagorean theorem is:

SEDM^{2 =} SEM_{c}^{2 }+ SEM_{t}^{2}

The SEDM comes out to be 18.5 g (don’t forget the units). Let’s add these new calculations to our table.

n | mean | standard deviation (SD) |
standard error in the mean (SEM) | difference of means | SEDM | |
---|---|---|---|---|---|---|

Fish-2-Whale | 8 | 267g | 44g | 15.6g | 94g | |

Control | 8 | 173g | 28g | 9.9g |

We could now write something in our final report like:

*The difference in the average weight gain between the two groups of fish was 94**±**18 g. *

This means, the Fish-2-Whale fish group gained approximately an additional 94g, with an error of about 18g.

## How important is sample size?

Suppose our boss wasn’t happy with an error as large as 18g. Would we have done better with bigger sample sizes? Remember that the formula for SEM contains the square-root of the sample size, so to get 10 times more accuracy we would need a sample that was 100 times bigger! So if we use 800 fish in each treatment, our table of calculations might look like this:

n | mean | standard deviation (SD) |
standard error in the mean (SEM) | difference of means | SEDM | |
---|---|---|---|---|---|---|

Treatment | 800 | 267g | 44g | 15.6g | 94g | 1.85g |

Control | 800 | 173g | 28g | 9.9g |

We could now write something much more impressive like:

*The difference in the average weight gain between the two groups of fish was 94**±**2 g.*

Notice that in our final report we just give a simple indication of the size of the error (you could put 1.8 g if you really wanted to).

What happens if the sample sizes are not the same? You calculate the SEM for each group in the same way as usual, and combine them using Pythagoras, just as before. Here is one final example with different sample sizes:

n | mean | standard deviation (SD) |
standard error in the mean (SEM) | difference of means | SEDM | |
---|---|---|---|---|---|---|

Treatment | 1000 | 267g | 44g | 1.39g | 94g | 1.8g |

Control | 600 | 173g | 28g | 1.14g |

Notice that the SEM for the treatment group (the fish fed with Fish-2-Whale) went down a bit because of the bigger sample size, and the SEM for the control group went up a bit because of its smaller sample size. The combined error or SEDM in the end went down a bit but not much.

### Caveats

In general, increasing your sample size is a good way of getting better results. However, it’s not a magic solution to all of your problems. Remember, statistics doesn’t have super-powers – 800 fish won’t help you a bit if Fish-2-Whale is a failure. Or, if Fish-2-Whale produces spectacular results, then you don’t need the expense of a huge sample size to prove it. . But if your results fall into a “grey area” — where the distributions overlap, but not a lot — a large sample size will help you figure out if you’re really onto something.

The second caveat is trickier. A lot of people (not just students) think that, since you need sample size to calculate the Standard Deviation that means that SD should go down as sample size goes up. **But that is an illusion** – if you add together 8 deviations, then you divide by 7. Or if you add together 800 deviations, then you divide by 799. **In fact, standard deviation does not change in any predictable way as sample size increases.** It stays approximately the same, because it is measuring how variable the population itself is. If the population is highly variable, then the SD should be high no matter how many samples you take. Likewise if the population has little variability, then the SD should be low even if you only take a few samples.