Test driven living.

In the late 1990s, a group of working programmers, frustrated with the way software was usually written, began advocating for a discipline that, on first hearing, sounds absurd.

The discipline was this. Before you write any code at all, write a test that the code is supposed to pass. Run the test. The test will fail, because the code does not exist yet. Then, and only then, write the smallest possible amount of code that makes the test pass. Run the test again. The test now passes. Write the next test. The test fails. Write more code. The new test passes. Continue, in this red-then-green rhythm, until the entire feature is complete.

This sounds, on first encounter, like an inefficient way to write software. It sounds like extra work. It sounds like the kind of thing only a person with too much time on their hands would propose. The programmers who adopted the practice discovered, over the next ten years, something different. They discovered that code written this way had fewer bugs. They discovered that the act of writing the test first forced them to specify, precisely, what they wanted the code to do, before they tried to do it. They discovered, perhaps most importantly, that having a test in hand made their relationship with their own code much less anxious. The code was either passing the test or not. There was no ambiguity about whether the code worked. There was a green light or a red light, and there was no third option.

The practice is called test driven development, and it is, in my honest estimation, one of the most useful pieces of engineering discipline that has ever been invented for any purpose, including, as I want to argue in this chapter, the purpose of changing your own behavior.

∗ ∗ ∗

Let me describe, briefly, what the standard pattern of attempted behavioral change looks like, because the chapter pivots on the contrast.

The standard pattern goes something like this. You decide, on some morning, that you would like to change a particular habit. You would like to stop micromanaging. You would like to stop checking your phone the moment you wake up. You would like to stop, after twenty years of the habit, eating your lunch at your desk. You make the decision. You spend, perhaps, a few minutes thinking about what success would look like. You then begin attempting the change, in the ordinary way that adults attempt changes, by paying attention to it and trying.

A few weeks later, you find yourself wondering whether you have actually changed. You think back. You try to remember whether you have, or have not, been micromanaging less, or checking your phone less, or eating lunch elsewhere. The memory is, on inspection, unclear. You have some memory of having tried. You have some memory of having failed on at least one occasion. You have, on a particular Tuesday, the strong feeling that you have made progress. On a particular Thursday, you have the equally strong feeling that you have not. The feelings disagree. You have no reliable way to settle the disagreement. The change is, in technical terms, unmeasured, and you are left to evaluate it by whatever combination of memory and mood is available to you on the day you happen to be asking.

This pattern is, on inspection, almost designed to fail. The mind has no neutral instrument with which to measure its own behavioral change. The mind is, simultaneously, the system being changed and the system attempting to measure whether the change has happened. The mind, on a good day, will report success. The mind, on a bad day, will report failure. Neither report is reliable, because the reporter and the reported are the same machine, and the machine has, by long habit, learned to flatter or punish itself depending on which mood is currently dominant.

The anxious mind, particularly, has a specific pathology in this domain. It tends to evaluate behavioral changes against the standard of have I been perfect? rather than against the standard of have I been measurably better? Any imperfection becomes, in the anxious accounting, a failure of the entire effort. One missed evening becomes proof that the change was never real. One slip becomes the verdict on a month of work. The accounting is, in engineering terms, catastrophically unforgiving, and it explains, I think, why so many honest attempts at behavioral change end with the anxious mind concluding that it cannot change at all. The conclusion is not based on what happened. The conclusion is based on an instrument that was unsuitable for the measurement in the first place.

∗ ∗ ∗

Now I want to describe what test driven living actually looks like, because the alternative to the standard pattern is more specific than the standard advice acknowledges.

The discipline, applied to human behavior, has four parts.

The first part is to write the test before attempting the change. The test is, by construction, a small, specific, observable assertion about behavior. Not I will be calmer. Not I will be a better partner. Not I will be less anxious. These are not testable. They are wishes, in the sense Chapter 18 used the word. A testable assertion is something like on three of the next seven evenings, I will go to bed by eleven o'clock. That is testable. You can, at the end of the week, check the test by looking at the clock. There is a definite answer. The answer is, in the technical sense, falsifiable. Either the assertion was true, or it was not. There is no third option.

The second part is to write the test in advance, in calm hours, when the anxious mind is not currently invested in the outcome. The test, like the algorithms of Chapter 18, has to be specified when load is low, so that when load is high, the test is a fixed external reference rather than a moving internal target. The test in advance is the thing that the future, possibly demoralized version of you cannot adjust to make itself feel better. The test is, in this sense, a contract with yourself, written by the calm version of you, on behalf of the version of you who will be tempted to revise the criteria the moment the criteria look unflattering.

The third part is to make the test pass before declaring success. This sounds obvious. It is, in practice, the part that most attempts at behavioral change quietly skip. You do not get to declare you have changed because you feel different. You do not get to declare you have changed because you tried hard. You get to declare you have changed when the test passes. The test, written in advance, is the only reliable signal. Feelings, however genuine, do not enter the calculation. The anxious mind will, by long habit, attempt to substitute feeling for evidence, and the discipline is to refuse the substitution.

The fourth part, which is the most useful, is to accept that a failing test is information, not a verdict. In test driven development, a red test is not a moral failure. A red test is, technically, the entire reason you wrote the test. The test is supposed to fail at the start. The test failing is what tells you what you need to build. The behavioral application is the same. A test that fails at the end of the week is not evidence that you cannot change. It is evidence that the change you attempted was, in its current form, insufficient. You go back. You either revise the test, because the test was too ambitious, or you revise the procedure, because the procedure was not robust enough. Either way, the red is not a verdict. The red is a signal that says here is the next thing to address.

∗ ∗ ∗

Here is the rhythm, in picture form, because the rhythm is the most important thing in this chapter.

the cycle repeats red write a test that fails green do the minimum to pass refactor improve, while keeping it green

Figure 19.1   The red-green-refactor cycle, drawn as a loop. Three stations. Each one is the entry condition for the next. The cycle is the unit of progress, not any single station.

The cycle is the unit of progress, and the unit of progress is the thing the chapter wants you to internalize. You do not measure progress by how the most recent station felt. You measure progress by how many cycles you have completed. Each cycle moves you, by a small testable increment, from where you were to where you wished to be. The increment is small. The cycle is small. The repetition is the engine.

∗ ∗ ∗

I owe you a worked example, because abstract discussion of test driven living evaporates the moment you put the book down, the same way it would for any abstract piece of engineering advice.

Let me use a small, domestic, embarrassingly minor example, because the smallness of the example is the point. The point is that this discipline, applied to small things, scales naturally to large things. Trying it first on something large is a way of guaranteeing that you fail the discipline before you have learned it. The right way to learn is on something small enough that the failure, if it comes, is a piece of information rather than a piece of evidence about your whole life.

The example is one I have, with some difficulty, actually applied to myself. I have, since the dogs left, lived in an apartment with a kitchen that I am, by my own standards, slightly too willing to let drift. I described some of this in Chapter 13, when we talked about entropy. The kitchen drifts. I noticed, over a period of months, that the drift was, on inspection, larger than it needed to be, and that the drift was producing a small but persistent background hum of low-level guilt, in the manner the chapter on entropy specifically warned against.

I decided, in a calm hour, to install a small new behavior. The behavior, stated as a wish, would have been I will be tidier. This is, in test driven terms, useless. Untestable. So I converted it into a test.

The test was: on five of the next seven evenings, by eleven o'clock, the kitchen sink will contain no more than two dirty dishes.

That is a test. It has a definite measurement (the number of dishes in the sink). It has a definite threshold (two or fewer). It has a definite time window (eleven o'clock). It has a definite frequency (five evenings out of seven). The test can be evaluated, at any moment after the seventh evening, by walking into the kitchen and counting. There is no ambiguity. There is no room for the anxious mind to negotiate. The test is either passing or failing, by an instrument that is not subject to mood.

I then attempted the change. The first week, the test failed. I had only managed three evenings, not five. The test was red. In the standard pattern, this would have been the point at which I concluded I could not change, and the project would have died. In the test driven pattern, it was the point at which I asked a different question, which was: was the test too ambitious, or was the procedure insufficient?

The procedure, on inspection, was sound. The procedure was: do the dishes immediately after dinner, every evening. The procedure was failing because, on some evenings, I was simply too tired to apply it. I did not need a better procedure. I needed a smaller test.

I revised the test. On four of the next seven evenings, by eleven o'clock, the kitchen sink will contain no more than two dirty dishes. The next week, the test passed. The week after, it passed again. After three consecutive green weeks, I revised the test upward to five out of seven. That passed. I let it run, untouched, for a month. It kept passing. After a month of consistent green, the behavior had, in the relevant sense, been installed. The kitchen, on most evenings, was now lower-entropy than it had been before the experiment started. The change had been measured. The change was, technically, real.

I want to emphasize something specific about this example, because it would be easy to miss. The change was small. The change was domestic. The change was not, in any meaningful sense, important to anyone but me. But the change happened. It happened because the test made it happen. The test was the difference between an unverifiable wish and a checked behavior. The discipline of writing the test in advance, the discipline of accepting the first red, the discipline of revising the test rather than abandoning the project, was the entire reason the change ever became real. Without the discipline, the change would have been, like dozens of previous attempts, a piece of well-intentioned mood that evaporated within a fortnight.

∗ ∗ ∗

Here is the same idea, written out as a tiny piece of code, because seeing it as code makes the logical structure visible.

def attempt_change_with_test(behavior, test, weeks=4):
    """
    Run the test driven living loop for the given behavior.

    Inputs:
        behavior: the procedure you are attempting to install
        test:     a callable returning True if the week passed,
                  False otherwise
        weeks:    how many consecutive green weeks you require
                  before declaring the behavior installed

    Returns:
        'installed' if you achieve `weeks` consecutive green weeks,
        'needs revision' if you accumulate too many reds.
    """
    green_streak = 0
    total_reds = 0
    cycles = 0

    while green_streak < weeks:
        # do the behavior for one week
        run_for_one_week(behavior)

        # at the end of the week, check the test
        if test():
            green_streak += 1
            print(f"week {cycles+1}: GREEN ({green_streak} in a row)")
        else:
            green_streak = 0
            total_reds += 1
            print(f"week {cycles+1}: RED (total reds: {total_reds})")

            if total_reds >= 3:
                # the test or procedure needs revision
                return 'needs revision'

        cycles += 1

    return 'installed'

The function is, like the Chapter 18 algorithms, not real code in the strict sense. The behaviors and tests being run are not Python functions. But the structure is exact. The loop runs the cycle. The cycle either accumulates green weeks toward installation, or accumulates red weeks toward revision. Three reds is the threshold at which the function declares that something needs to be redesigned. The threshold is, by construction, biased toward giving the experiment a fair number of attempts before forcing a revision, but biased against running indefinitely on a procedure that is, on the evidence, not working.

The function makes one specific decision visible that the chapter has been making implicitly. There is no path through this function that ends in you failed. Every path ends either in installed or in needs revision. The notion of personal failure is, in test driven living, structurally absent. The system is doing its job. The reds are doing their job. The revisions are doing their job. The only thing that would constitute failure, in this framework, is refusing to run the cycle. And refusing to run the cycle is, on inspection, the failure most attempts at behavioral change actually make.

∗ ∗ ∗

I want to extract one piece of practical wisdom from this chapter, because the wisdom is, I think, the chapter's most useful gift.

The wisdom is this. The discipline of test driven living turns behavioral change from a story you tell yourself into a process you run. The story version says I am a person who is trying to change. The process version says I am running cycle four, the test is currently passing, I will revise upward after one more green week. The story has no exit. The story can be told endlessly, in different versions, with no resolution. The process has a defined termination. The process either installs the behavior, or surfaces the need for revision. Either way, the process produces a verifiable result.

I want to say this gently, because the implication is uncomfortable. Most attempts at behavioral change are, in the technical sense the chapter has been using, not attempts at all. They are stories. The story has its own purposes. The story makes the teller feel something. The story justifies inaction. The story occupies the mental space that an actual procedure would otherwise have to occupy. Switching from story to process is, on inspection, the entire move of growing up in this domain. Story is what young adults do with the question of changing themselves. Process is what experienced adults do. The fact that almost nobody is taught the difference is, I think, one of the quieter scandals of how we educate one another into adulthood.

∗ ∗ ∗

A small exercise

Write the test.

Pick one behavioral change you have, at some point, tried and failed to make. Not a large one. A small one. Something domestic. Something small enough that the failure, when it has happened, has been a piece of background guilt rather than a major event in your life.

Write down, in one sentence, the test that the change would have to pass for you to consider it installed. Be specific. On X of the next Y days, Z will be observably true. X, Y, and Z are the parameters you have to choose. Choose them so that the test is, in your honest estimation, achievable but not trivial.

Run the test for one week. Do not aim for perfection. Aim to run the cycle.

At the end of the week, look at the result. If the test passed, run it again next week. If the test failed, ask whether the test was too ambitious or whether the procedure was insufficient, and revise the appropriate one.

The point of the exercise is not, in the first week, to install the behavior. The point is to install the cycle. The cycle is the thing that, over months, will install all the behaviors that have ever been worth installing in your life. The single cycle is small. The repeated cycle is, on inspection, what character actually is.

Chapter 20 turns to a related but distinct problem in self-modification, which is the problem of what to do with the parts of the old self that the new procedures are slowly making obsolete. The chapter is named for the engineering concept of garbage collection, the automatic reclamation of memory that has been allocated but is no longer needed, and it applies the concept, with care, to the small specific question of how to let go of identities, relationships, and self-images that the new code has, technically, replaced.

For now, the page closes here. The wish has no test. The test is the form a wish takes when it becomes checkable. Write the test in advance. Run the cycle. Trust the cycle, not the mood. The mood will lie. The cycle, in the long run, will not.

← Chapter 18                   Chapter 20 →