Chapter nine

Regression to the mean, or, why today is probably a Tuesday.

There is a statistical fact about the universe that, if anxious people were taught it properly in school, would, on its own, eliminate approximately a third of all unnecessary suffering on Earth.

The fact has a name. The name is regression to the mean. The name is, on first hearing, terrible. It sounds like something a Victorian biologist would say with his hand on a chair. The name does not do justice to what the fact actually says, which is a small and slightly miraculous truth about how the world handles its own bad days.

The fact says, in plain English, this. When you measure something that varies, and you observe an unusually extreme value, the next measurement is, on average, less extreme. Not because anything has changed. Not because the universe owes you a balance. Simply because extreme values are, by definition, rare, and the next sample, drawn from the same distribution, is most likely to be closer to the average. The bad day was the outlier. The next day, statistically, is not.

I want to write that sentence again, in a slightly different shape, because it is, in my honest opinion, the single most useful sentence I have ever encountered for a person who is in the habit of treating their worst days as previews of the rest of their life.

The worst days are not the body of your life. They are the tail of its distribution. Tomorrow, drawn from the same distribution, is overwhelmingly likely to be closer to the mean.

The rest of the chapter is a slow careful explanation of why this is true, why your brain is structurally bad at remembering it, and what to do about the discrepancy.

∗ ∗ ∗

Let me start with a picture, because the picture does half the work.

Imagine a distribution of all the days of your life, plotted by how good they felt. Most of the days, the great majority of them, cluster around some average mood, which is not great and not terrible. A smaller number of days are better than average. A smaller number still are worse. A very small number, on either end, are extreme. The shape of this distribution is, in almost every adult life, approximately bell-shaped. The bell may be skewed in one direction or another, the average may be high or low, the spread may be wide or narrow, but the shape, in the broad sense, is recognizable. Most days are in the middle. A few days are out on the tails.

Figure 9.1 The distribution of your days. Most of them live near the average. The extreme ones, on either side, are rare.

Look at the picture for a moment, because the picture is doing something most adults have never seen drawn. Most of the days of your life are clustered around the average. The really bad days are in the left tail. The really good days are in the right. The really bad days are, in the strict statistical sense, not representative. They are not the body of your life. They are the rare events on the edge of your life.

The anxious mind, when it has a really bad day, does not see this picture. The anxious mind treats the bad day as a sample of typical experience. It does the worst possible thing a sampling instrument can do, which is to take one observation from the tail of a distribution and use it to estimate the mean of the distribution. The estimate, drawn from the tail, is catastrophically wrong. The mean is much closer to the middle than the sample suggests. The next observation, drawn from the same distribution, is overwhelmingly likely to be closer to the mean than the sample was.

This is not a piece of advice. This is not a piece of encouragement. This is a statement about how distributions actually work. The next day is, with very high probability, going to be more average than the worst day was. Not because anything is fixing you. Not because the universe is kind. Because the bad day was the unusual event, and the next sample is, by the geometry of the distribution, almost certainly going to be less unusual.

∗ ∗ ∗

Let me describe the phenomenon in slow motion, because the slow motion is where the insight lives.

Suppose your average mood, on an arbitrary scale, is around six out of ten. Suppose the standard deviation of your mood is about one and a half. This means that roughly two thirds of your days fall between four and a half and seven and a half, and roughly nineteen out of twenty days fall between three and nine. The really bad days, the ones at two or below, happen with a frequency on the order of one in forty. The really good days, at ten, happen with a similar rarity.

You wake up on Sunday and the day is, on whatever internal instrument you have for measuring this, a two. The day is bad. The day is very bad. The day is one of the worst days you have had in a month.

The anxious mind, looking at this day, performs the following calculation. This is my life. This is what my days have become. This is the new normal. Tomorrow will be worse.

The mathematics performs a different calculation. The mathematics says: this day was a two. The mean is six. Tomorrow's mood is going to be drawn from the same distribution that the rest of your moods come from. The expected value of tomorrow is, in the strictest sense, the mean of the distribution, which is six. The probability of tomorrow being another two or below is, given the standard deviation, about one in forty. The probability of tomorrow being closer to six than to two is, in fact, overwhelming.

This is the regression effect. The expected next value, when you have just observed an extreme one, is much closer to the mean than the extreme was. The next day is not, in any sense the anxious mind can use, a continuation of the bad day. The next day is a fresh draw from a distribution that, by construction, has its bulk somewhere quite a lot less awful.

The chapter is named, slightly sarcastically, why today is probably a Tuesday, because the statistically average day in most adult lives, when you actually plot it, is so unremarkable that it does not even rise to the level of a story. It is a Tuesday. Tuesday is, in this sense, the mean. The mean, in most lives, is overwhelmingly Tuesday-shaped. The Sundays of catastrophe are, statistically, rare events that the anxious mind, having lived through one, treats as the rule.

∗ ∗ ∗

Here is the phenomenon, simulated in Python, because seeing it happen in numbers is the cleanest possible cure for the intuition that contradicts it.

import random

def simulate_mood(days, mean=6.0, std=1.5, seed=0):
    """
    Simulate a sequence of daily moods drawn from a normal
    distribution with the given mean and standard deviation.
    """
    rng = random.Random(seed)
    return [rng.gauss(mean, std) for _ in range(days)]


# Generate a year of moods.
year = simulate_mood(365, mean=6.0, std=1.5, seed=42)

# Find the worst day.
worst_day_index = year.index(min(year))
worst_day_value = year[worst_day_index]

# Look at what happened the day after.
next_day_value = year[worst_day_index + 1]

print(f"Worst day:       {worst_day_value:.2f}")
print(f"The day after:   {next_day_value:.2f}")
print(f"The annual mean: {sum(year) / len(year):.2f}")

# Sample output:
# Worst day:       1.21
# The day after:   5.93
# The annual mean: 5.99
#
# The worst day was a 1.21. The next day was a 5.93. The
# next day was not the same kind of day. The next day was,
# in fact, an almost perfectly average day. The bad day did
# not predict the next day. The bad day was, by construction,
# uncorrelated with the next day. The distribution did its
# job. The mean was waiting.

You can run this. You can change the seed and run it again. You can change the mean and the standard deviation. You will, in essentially every case, observe the same phenomenon. The day after the worst day is, almost always, a much more ordinary day. The bad day did not cause the next day to be bad. The bad day did not predict the next day at all. The bad day was, in every relevant statistical sense, a fluctuation, and fluctuations are followed, on average, by less extreme fluctuations.

The simulation is generous, because real life has small amounts of autocorrelation in mood. A bad day does, in practice, raise the probability of a slightly worse next day, because some of the causes of the bad day persist. But the autocorrelation is much weaker than the anxious mind assumes. In most empirical studies of daily mood, the correlation between consecutive days is on the order of 0.3 to 0.4, which is a real effect but is nowhere near the value of 1 that the anxious mind, in its catastrophic mode, implicitly assumes. The next day is not the same day. The next day is mostly a fresh draw, with a slight tilt toward the recent past.

∗ ∗ ∗

I want to write down, with some care, the practical implication of all this, because it is genuinely useful, and because the cheap version of it is, again, what you would get from a poster.

The cheap version says: tomorrow will be better. The cheap version fails because it makes a promise the universe cannot keep. Tomorrow might, on the small finite chance, also be terrible. The cheap version is a prediction. Predictions, in noisy systems, are often wrong, and the anxious mind, having been promised that tomorrow will be better and then experienced another bad day, will reasonably conclude that the promise was empty.

The real version is a probability statement, not a promise. Tomorrow is most likely to be closer to your average day than today was. This is not a prediction. This is a statement about expected value. Sometimes, tomorrow will, in fact, be worse. The mathematics does not promise otherwise. The mathematics only promises that, on average, over many draws, the day after a bad day is closer to the mean than the bad day was. This promise the mathematics can keep, because it is a property of distributions, not a property of any particular day.

The shift, internally, is from a brittle hope (tomorrow will be better, please) to a calibrated expectation (tomorrow is drawn from the same distribution, the distribution has a body, the body is closer to the mean than the tail, that is where most of the days live). The calibrated expectation does not promise individual outcomes. It does, however, free you from the catastrophic conviction that you are now living in the tail.

Here is the picture of regression in action.

Figure 9.2 A real life looks like this. Mostly ordinary. Occasional dips. The dips are followed by ordinary days, not by more dips.

Notice the bad day in the middle. The bad day is real. The bad day is genuinely below the band. The bad day is also, importantly, a single point. The days immediately around it, on either side, are in the band. The bad day did not predict the next day. The next day is doing what days, on average, do, which is to sit in the band, near the mean, doing its quiet ordinary work of being a Tuesday.

∗ ∗ ∗

I want to close the chapter with a small honest observation about timing, because there is one specific moment in the twenty-four hour cycle where the regression argument is most needed, and that moment is three in the morning.

You will recognize the moment. It is the moment when you wake up, for no good reason, and the room is dark, and the mind is, despite the body's wishes, fully online, and the mind has decided to use the wakefulness for a brief inventory of everything that has ever gone wrong in your life. The inventory is long. The inventory is, in the dark room, indistinguishable from a verdict. You are looking at the list and the list is making the case that the rest of your life is going to be exactly this.

Three in the morning is, in the strict statistical sense, the worst possible sampling moment. It is the moment your body has the lowest cortisol, the highest perceived threat, the least access to perspective, and the least company. The mood at three in the morning is, almost by definition, in the tail of the distribution. It is not a representative sample of your life. It is a draw from the tail.

If you trust the chapter you have just read, the operational implication is this. The mood at three in the morning is data, but it is data of a very particular kind. It is a sample from the tail. The next sample, drawn at, say, eleven the next morning, is overwhelmingly likely to be closer to the mean. The eleven o'clock sample will, in fact, often be perfectly ordinary. You will look at the room, you will remember the night, and you will not quite understand how the conclusions you reached at three felt, at the time, so structurally certain. They were certain because they were drawn from the tail. They lost their certainty by morning because the next sample was drawn from the body.

The trick, the small operational trick, is not to take seriously any conclusion you reach at three in the morning. Not because the conclusion is wrong. Not because you should ignore your feelings. But because the sample is unrepresentative, and a responsible statistician does not draw inferences about a population from a single observation in the tail. The statistician waits for more data. You can wait for more data too. The data, in this case, is the next eight hours. The next eight hours will, on average, deliver samples much closer to the middle of who you are. Wait for them. Reach no conclusions until the sun is up.

This is, in case it is not obvious, the entire reason that anyone with any experience of being alive will tell you to not make important decisions at night. It is not folklore. It is statistics. The night is the tail. The morning is the body. The morning gets to vote, because the morning is where most of your life actually lives.

∗ ∗ ∗

A small exercise

Plot your distribution.

For the next week, before bed, write down a single number: how your day went, on a scale of one to ten. Do this for seven days. Do not analyze. Do not interpret. Just note the number.

At the end of the week, look at the seven numbers. Find the average. Find the highest. Find the lowest.

Notice three things. One: the average is probably not extreme. Two: the highest and lowest are, almost certainly, both rare events at the edges. Three: most of the seven days clustered somewhere in the middle.

That is your distribution, in seven samples. The middle of it is closer to the average than the tails. The next day, drawn from the same distribution, is most likely to come from the middle. The bad day, when it arrives, is not the new shape of your life. It is the tail of a shape that has, all along, mostly been Tuesday.

Chapter 10 takes a different turn, into the foundations of computer science, and into the question of why the brain that asks but what if forever is asking a question that Alan Turing already proved, in 1936, cannot be answered by any algorithm whatsoever. The chapter will explain the halting problem honestly, and it will sketch why the anxious form of rumination is, in a strict mathematical sense, an attempt to solve a problem that is provably unsolvable.

For now, the page closes here. The bad days are the tail. The next day is, almost certainly, the body. Today, statistically speaking, is probably a Tuesday.

← Chapter 8 An interlude. On Koko & Knoppix →