Let’s start with the 42% M/F sick days. For simplicity, we’ll assume this means 42 out of 100 (rather than 84 out of 200 or 420 out of 1000, etc.). This is the data that was observed. Using the laws of probability and assuming individuals are equally likely to have sick days on any day of the week, we also know that (approximately) 40 out of 100 sick days should fall on Monday or Friday. This is the expected value.
What we want to do is test how far apart the “observed” and “expected” answers are, right? So a logical first step is to subtract one from the other — that tells us how different they are. We’ll do this both for M/F sick days and for midweek sick days:
|
observed (o) |
expected (e) |
difference (o – e) |
Mon/Fri | 42 | 40 | |
Midweek | 58 | 60 |
Then we want to know how important this difference is. Is it big compared to what we expected, or small? To compare the size of two numbers, we need to find a ratio — in other words, use division. We need to find out how big the difference is compared to the number we expected to get. So, divide the difference (between the observed and expected) by the expected value:
|
observed (o) |
expected (e) |
difference (o – e) |
(difference compared to expected ) (o-e)/e |
Mon/Fri | 42 | 40 | +2 | |
Midweek | 58 | 60 | -2 |
The last column in the table shows the relative size of the deviations as well as whether the difference is positive or negative. If we ignore the negative signs and simply add up the magnitudes (or absolute values) in that column, we have a way of measuring the TOTAL deviation for all the data, in this case . A big deviation would mean that we probably have the wrong explanation, whereas a small total deviation would probably mean we’re on the right track. Since we’re trying to show that sick days are RANDOM, big deviations are bad for our case, while small deviations are good for our case.