Chapter eight
The Reverend Thomas Bayes, a small, mild, eighteenth-century English Presbyterian minister, did not in his lifetime publish the theorem that now bears his name.
The theorem, which is one of the most consequential pieces of mathematics ever written, was found in his papers after his death, in 1761, and was published two years later by a friend who recognized that the dead clergyman had quietly worked out a rule that the rest of the world would, eventually, build artificial intelligence on top of. Bayes himself, in life, had nothing to say about anyone's mother. He had, frankly, nothing to say about most things; he was, by every available account, a man who preferred quiet to fame, and who would, I suspect, have been mortified to find a chapter of a book about anxiety three centuries later named in his honor in a faintly insolent way.
I beg his pardon. The title is, however, technically correct. Bayes' theorem is, in the most precise mathematical sense, the apparatus that describes how beliefs get installed in childhood and then resist correction in adulthood, which is to say, it is the apparatus that describes everything your mother, your father, your school, and the small adult who raised you in a difficult house ever did to the inside of your head. The theorem was right about all of them. The theorem is, in this chapter, on your side.
Here is the theorem. I am going to write it down, in proper notation, so that we are dealing with the actual object and not a vague impression of it.
That is the entire theorem. It looks small, and that is one of its virtues. Let me translate it.
On the left side, P(H | E) reads as the probability of a hypothesis H, given some evidence E. This is the thing you want to compute. You have a belief, the hypothesis. New evidence has just arrived. The theorem tells you, given the evidence, what your updated belief should be.
On the right side, the numerator has two factors. P(E | H) is the probability you would have seen this particular evidence, if the hypothesis were true. P(H) is your prior belief in the hypothesis, before the evidence arrived. Multiply the two. Divide by P(E), which is the probability of the evidence under any hypothesis whatsoever, true or false. The result is your updated belief.
This is, on inspection, just bookkeeping. It is the formal rule for how a rational mind should change its beliefs in light of new information. You start with a belief. Evidence arrives. The belief should change in a specific, computable direction by a specific, computable amount. Bayes' theorem tells you, precisely, how much.
The trouble is the second factor on the right, the prior. The prior is the belief you held before the evidence arrived. The prior is, in some technical and slightly haunted sense, your past. And the theorem has a property that, in adult life, becomes uncomfortable. The theorem says: your updated belief is, in part, a function of your prior. If your prior is very strong, no single piece of contrary evidence will move it much. If your prior is moderate, evidence will update it briskly. If your prior is unanchored, evidence will move it freely.
The strength of a prior is, mathematically, the precision with which you held it before the evidence arrived. Held loosely, the prior is easy to move. Held tightly, the prior is hard to move. Hammered in at high confidence very early in your life, by people you had no power to argue with, the prior is so hard to move that decades of evidence will pile up against it without budging it more than a millimetre.
This is, I want to suggest, the entire mathematics of what we call childhood trauma. It is not, primarily, a story of pain. It is a story of priors set with such confidence that no later evidence has been able to update them. The mind has not been broken. The mind has been provided with priors of extraordinary precision, by people who did not know what they were doing, in conditions where the small child receiving them had no way to mark them as tentative.
I owe you, again, an autobiographical illustration, because the chapter is named for one and the rest of the chapter does not work without it.
I was, as I told you in Chapter 2, a terrible student. The school I attended ran on rote memorization. My particular brain, for reasons that nobody at the time was curious about, did not metabolize rote memorization. I failed almost everything. The school, the family, the system around me, all of them reached the only conclusion the evidence supported, which was, in their phrasing, that I was not going anywhere.
I want to be careful about how I describe what happened next. A prior was installed in me. The prior was, in plain English, I am the kind of person who is not going anywhere. The prior was installed by repetition, at high confidence, over many years, by adults whose authority I had no power to question, at an age before I had the apparatus to mark a prior as tentative. The prior, by the time I was thirteen, was held with the kind of precision that, in the Bayesian framework, takes orders of magnitude more contradictory evidence to overturn than ordinary beliefs require.
I then spent the next twenty years generating contradictory evidence. I taught myself, in the back of a computer institute, an entire mathematical and computer science curriculum. I built a career. I founded companies. I wrote, eventually, this book. The evidence, in the dispassionate language of the theorem, was overwhelming. P(E | H_0), the probability of all this evidence under the hypothesis that I was not going anywhere, was vanishingly small. P(E | H_1), the probability under the hypothesis that I was, in fact, going somewhere, was very high. Each new piece of evidence should, by the theorem, have moved my posterior belief substantially toward H1.
It did not. Or rather, it did, but with a delay of roughly two decades, because my prior was held with such confidence that each individual piece of evidence updated it by only a tiny amount. The prior was not, in the mathematical sense, broken. The prior was working exactly as designed. It was simply that the design called for an enormous amount of evidence, sustained over years, before the posterior would visibly move. The evidence accumulated. The posterior moved. By thirty-five, I had, in some technical sense, finally updated my belief enough to allow myself to take seriously the possibility that I had not, after all, been correctly diagnosed at twelve.
If you have a similar prior, installed by similar means, in a similarly difficult childhood, I want to say two things to you. The first is that you are not broken. You are a Bayesian inference engine running on training data that was supplied to you, at very high confidence, by people who did not know what they were doing. The second is that the prior can be updated. It just takes more evidence than is fair. The good news, mathematically, is that evidence keeps arriving for free as long as you keep living. The bad news, also mathematically, is that you may not see the prior visibly move for years.
The trick, the small consolation that the theorem itself offers, is that the update is happening even when you cannot see it. The posterior is moving, in tiny increments, under the surface, with every piece of contradictory evidence you produce. The fact that you cannot, on any given Tuesday, feel the prior loosening, is not evidence that the prior is not loosening. It is evidence that the prior was set with high precision and is, accordingly, conservative in how it responds to new data. The theorem promises you, however, that if the new data is real, the posterior is moving, somewhere, by an amount the theorem can in principle compute.
Here is what that looks like.
Figure 8.1 The same evidence updates two priors at different speeds. Held loosely, beliefs move briskly. Held tightly, beliefs move barely at all in the visible range.
The two curves are doing the same job. Both are receiving the same stream of contradictory evidence. Both are moving, in the mathematical sense, toward the truth as the evidence suggests. The weakly held prior arrives at the truth quickly, in a few updates. The strongly held prior, given the same evidence, moves with such restraint that, over the entire range shown, it has barely crossed half the distance. The prior is not broken. The prior is working. The prior is simply holding on with a confidence that the original holders had no business installing in a child.
If you are forty, and you have not yet stopped flinching at the small evidence of being well loved, it is not because you cannot. It is because the prior is doing what it was designed to do, which is to require an enormous quantity of contrary evidence before it visibly yields. Keep generating the evidence. The curve, somewhere in the unseen part of you, is moving.
Now I want to turn to the second debt, because Chapter 2 promised that the no-children decision would be examined here, and this is the chapter in which the examination has to happen.
The argument, the one I ran on myself for years before it cost me a marriage, went, in plain language, like this. My father failed his children. I share, by genetic descent, half of his DNA. Therefore, the probability that I will fail my children is unacceptably high. The decision not to have children is, under this reasoning, an act of risk management.
I want to write the argument out in Bayesian notation, because the failure is more visible when the notation is on the page.
That was the quantity I was estimating. The quantity is, on its face, well-defined. The trouble is what I was using to estimate it. I was using, as my only available reference case, the man whose DNA I shared, namely my father, and I was treating his failure as the relevant evidence. Which is to say, I was implicitly computing something like:
And then, because my father did, in fact, fail his children, I was rounding the right-hand side to nearly one, and concluding that the left-hand side was also nearly one.
The defect in this argument is not subtle, although it took me years to see it. The defect is that I was conditioning on the wrong variable. My father did not fail his children because of his DNA. He failed his children because of his behavior. Specifically, he drank, he gambled, he slept around, and he was, by the end, absent. None of these behaviors is encoded in DNA in any meaningful sense. They are behaviors. They are choices. They are habits. They are, in technical terms, on a different axis altogether from the genetic one I was using.
The correct conditioning variable, the one that actually predicts whether a man fails his children, is not DNA shared with someone who failed his children. The correct conditioning variable is behavior of the man in question. And my behavior was, by every available measure, the opposite of my father's. I did not drink. I did not gamble. I was monastically loyal. I was not idle. The Bayesian computation, properly conducted, with the correct conditioning variable, returned a probability of repeating the pattern that was, in fact, very low. Not zero. Nothing in human affairs is zero. But low. Easily low enough that the responsible decision, on the evidence, was not to refuse children. The responsible decision was to have them carefully, with attention, in a relationship I could maintain.
This is what I mean when I say the math had been on my side the entire time. The marriage did not end because the Bayesian argument was correct. The marriage ended because I performed the Bayesian argument with the wrong conditioning variable, and the wrong variable gave me an answer that supported a decision that, at the time, I needed it to support.
I want to be careful here, because the version of this paragraph that is unkind to my younger self would be both inaccurate and a betrayal of the larger argument of this book. The younger man was not stupid. He was not lazy. He was running a Bayesian computation on the only training data he had been given, which was a single salient example, namely his father. The example was vivid. The example was right there in his memory. The example was, in technical terms, what statisticians call available, and the availability heuristic is well-known to produce exactly this kind of error in human estimation under uncertainty.
The younger man also could not, in any easy way, have computed the right answer. To have done so, he would have needed to recognize that the conditioning variable was wrong; he would have needed to recognize that behavior, not DNA, was the relevant axis; and he would have needed to evaluate his own behavior dispassionately against that axis, at a time when he had been told for years, by people whose authority he could not yet doubt, that he was not the kind of person who got to be evaluated favorably on any axis at all. The error was not a moral failure. It was a Bayesian computation performed under conditions that almost guaranteed the wrong answer.
The cost was real. The marriage ended. Eight months of clinical depression followed. There is no version of this paragraph that pretends the cost was small. I will say only that the man who can now write down the Bayesian computation correctly is the man on the other side of that cost, and that some kinds of mathematics are, in the only way it makes sense to say this, paid for.
Here, in Python, is a small working Bayesian updater, because there is a kind of consolation in seeing a thing one has been bad at, performed correctly by a machine.
def bayesian_update(prior, likelihood_if_true, likelihood_if_false):
"""
One Bayesian update.
Inputs:
prior: your belief in the hypothesis,
before this piece of evidence.
A number between 0 and 1.
likelihood_if_true: the probability of seeing this
evidence if the hypothesis is true.
likelihood_if_false:the probability of seeing this
evidence if the hypothesis is false.
Returns:
the updated belief, after the evidence.
"""
numerator = likelihood_if_true * prior
denominator = (likelihood_if_true * prior
+ likelihood_if_false * (1 - prior))
return numerator / denominator
# Start with a strongly held prior, set in childhood:
# "I am the kind of person who is not going anywhere."
# Probability of hypothesis: 0.95
prior = 0.95
# A piece of contrary evidence arrives. You did the thing.
# The thing was hard. You did it anyway.
# If the hypothesis "not going anywhere" were true, the
# probability of you doing the thing would be very low.
# P(evidence | hypothesis_true) = 0.05
# If the hypothesis were false, the probability of you
# doing the thing is much higher.
# P(evidence | hypothesis_false) = 0.80
for i in range(20):
prior = bayesian_update(prior, 0.05, 0.80)
if i in (0, 4, 9, 14, 19):
print(f"after {i+1:>2} updates, belief in 'not going "
f"anywhere' = {prior:.4f}")
# Output (approximate):
# after 1 updates, belief in 'not going anywhere' = 0.5429
# after 5 updates, belief in 'not going anywhere' = 0.0040
# after 10 updates, belief in 'not going anywhere' = 0.0000
# after 15 updates, belief in 'not going anywhere' = 0.0000
# after 20 updates, belief in 'not going anywhere' = 0.0000
#
# The prior moves. Slowly at first. Then, once enough
# evidence has accumulated, it collapses. The collapse is
# not a sudden insight. It is the cumulative effect of
# evidence the theorem has been processing all along.
The simulation tells you, in numbers, what the chapter has been arguing in prose. Even a prior of 0.95, which is to say, a belief held at very high confidence, will eventually yield to repeated contrary evidence. Not in one update. Not in five. But the belief does, mathematically, collapse, given enough data. The theorem is patient. The theorem will, in the long enough run, win.
Two technical observations are worth making about the code, because they are also observations about your life.
One: the speed of collapse depends on the ratio between P(E | H) and P(E | not H). In the simulation, that ratio is 0.05 to 0.80, which is to say, the evidence is sixteen times more likely under not the bad hypothesis than under the bad hypothesis. That is what makes the update bite. If the evidence were ambiguous, the ratio would be closer to one, and the update would barely move the prior at all. In real life, the evidence that you are doing well is often ambiguous, because the anxious mind has, very specifically, learned to discount evidence that contradicts the prior. You did the thing, but the thing does not count, because the thing was easy, or the thing was lucky, or the thing did not really matter. This is the part of the system that has to be actively retrained. The theorem can only update on evidence the system accepts as evidence.
Two: the prior does not collapse smoothly. It collapses in a shape that is, if you watched it in real time, almost discouraging. The first update moves it from 0.95 to 0.54, which is substantial, but feels like the belief still has half its strength. The fifth update is at 0.004. The collapse is sudden, late, and is preceded by a long apparent plateau in which the belief seems unmoved. This is the part of the curve where most adults give up. They have been generating contradictory evidence for years. The prior, from the inside, feels exactly as strong as it always has. They conclude that the work is not working, and they stop. The work was working. The collapse was, mathematically, days away. The shape of the curve is treacherous. It hides progress until almost the moment of arrival.
The honest piece of advice the theorem gives you, applied to a life, is this. Keep generating the contradictory evidence. Notice that the prior feels unmoved. Notice that you cannot, from the inside, distinguish a prior that is about to collapse from a prior that will never collapse. Trust the mathematics, which says, on impeccable nineteenth-century-clergyman authority, that if the evidence is real, the collapse is happening. The collapse is just not, in the early stages, visible. The visibility comes late, and arrives all at once.
A small exercise
Name your strongest prior.
Identify one belief about yourself that you have held for a very long time, that was installed by someone other than you, and that you can almost certainly trace to a specific authority figure from your childhood.
I am the kind of person who is not going anywhere. I am too much. I am not enough. I am the difficult one. I am the responsible one. I am the one who handles it. I am the one nobody listens to. I am the one who will end up alone.
Write the prior down, in its plainest form. Then ask, with the calm of a statistician examining a model, the following two questions. Who installed this prior? What evidence have I, in the years since, accumulated against it?
You do not have to resolve anything tonight. You only have to notice that the prior was installed by someone, that it was installed at very high confidence at a time when you had no power to question it, and that the evidence you have generated against it since is, in the dispassionate language of the theorem, real. The theorem is doing its work. The collapse is mathematically guaranteed, if the evidence is real and you continue to produce it. The collapse is just, on most days, invisible.
Chapter 9 takes the next step, which is to ask why your worst days are not, in any statistical sense, prophecies. The chapter is about regression to the mean, which is the most underrated piece of mathematics any anxious mind can learn, and which is also the chapter where I finally explain, with a graph, why the small forty-eight hour windows of catastrophe you have been treating as previews of your real life are, in fact, fluctuations in a noisy distribution that is, on the whole, much kinder than the windows suggest.
For now, the page closes here. The theorem is patient. The prior is real. The evidence you have been generating is also real. The collapse is, somewhere in the unseen part of you, already underway.