Regression to the mean is all about how data evens out. It basically states that if a variable is extreme the first time you measure it, it will be closer to the average the next time you measure it. In technical terms, it describes how a random variable that is outside the norm eventually tends to return to the norm. For example, your odds of winning on a slot machine stay the same. You might hit a “winning streak” which is, technically speaking, a set of random variables outside the norm. But play the machine long enough, and the random variables will regress to the mean (i.e. “return to normal”) and you’ll end up losing.

The Sports Illustrated jinx is an excellent example of regression to the mean. The jinx states that whoever appears on the cover of SI is going to have a poor following year (or years). But the “jinx” is actually regression towards the mean. Most players have good games, and they have bad games. A winning streak is usually just that: a lucky streak. And it leads to being on the cover of SI. But it’s statistically likely to be followed by a fall back to average performance.

Regression to the mean usually happens because of sampling error. A good sampling technique is to randomly sample from the population. If you don’t (i.e. if you asymmetrically sample), then your results may be abnormally high or low for the average and therefore would regress back to the mean. Regression to the mean can also happen because you take a very small, unrepresentative sample (say, the highest 1 percent of the population or the lowest ten percent).