Appendix B

Mathematical notation, decoded for the rusty and the rebellious.

A note before we start, because the audience of this appendix is one I take particularly seriously.

If you are reading this, you are, in some specific sense, either rusty or rebellious about mathematical notation. The rusty are people who, at some point in their education, did know what these symbols meant, and have, in the intervening years, forgotten in the way one forgets the conjugation tables of a language one no longer speaks. The rebellious are people who, often because of how the notation was first introduced to them, came to suspect that the symbols were a piece of professional intimidation rather than a piece of communication, and have remained suspicious ever since.

I want to say something to both of you. The rust is not your fault. The rebellion is, on inspection, often correct. Mathematical notation is, much of the time, taught badly. It is taught as a thing you must learn to manipulate, rather than as a thing you might find useful for saying particular kinds of things more cleanly than ordinary language allows. The notation, taught well, is a series of small inventions that allow human beings to write down certain ideas without ambiguity and without the gigantic paragraphs that the same ideas would require in English. Each piece of notation exists for a reason. Each reason is, in most cases, a perfectly defensible piece of intellectual labor-saving.

I will, in this appendix, go through the notation this book has used, with one example for each piece, and one sentence per piece about what it is for. The book has not used much notation. This appendix is correspondingly short. By the end of it, the notation in the chapters should be, if not familiar, at least no longer in the way.

I have, throughout, kept the examples small and the explanations short. If at any point an explanation makes you feel stupid, that is a failure of the explanation, not a fact about you. Skip to the next entry. The order does not matter.

∗ ∗ ∗

Functions and their parts

The notation that says "thing depends on thing."

f(x)

a function of x.

Read aloud: f of x. The letter f is a name for some specific procedure. The thing in the parentheses, here x, is the input that procedure is being given. The whole expression f(x) is the output the procedure produces when handed that input.

The notation exists so that you can talk about the procedure separately from any particular input. f is the temperature in your kitchen, considered as a thing that varies over time. f(noon) is the temperature at noon. f(midnight) is the temperature at midnight. The notation lets you write down the relationship without having to commit, yet, to which moment you are asking about.

f : A → B

a function from A to B.

Read aloud: f maps A to B. This is shorthand for saying that f is a procedure that takes inputs from the set A and produces outputs in the set B. The arrow is doing the same job as maps to, and the colon is doing the same job as where.

You do not need to write functions in this form most of the time. The notation exists because, in some contexts, knowing what kinds of things a function accepts and what kinds of things it produces is more important than knowing what it does step by step.

f(x) = x

the function that returns whatever you give it.

This is the identity function, and it is the function the Epilogue of this book is named for. Whatever value you put in, the same value comes out. The notation says, with extreme economy, exactly that.

A fixed point of a function is, more generally, any value x for which f(x) = x. The function leaves the value alone. The value is, in the relevant technical sense, a place the function settles to.

Rates of change

The notation of calculus, in two doses.

dy / dx

the derivative of y with respect to x.

Read aloud: dee why dee ex. This is the notation Newton's contemporaries settled on for the derivative, and it has survived three centuries because it is, on inspection, hard to improve on. It says: how much does y change for a small change in x?

If y is your position on a road and x is time, then dy/dx is your speed. If y is the temperature in your kitchen and x is time, then dy/dx is how fast the temperature is changing. The notation lets you talk about the rate, separately from the value, which is the entire move of Chapter 4.

f′(x)

an alternative notation for the same thing.

Read aloud: f prime of x. This means exactly the same thing as dy/dx in the case where y = f(x). The small tick mark, called a prime, is shorthand for the derivative of. It exists because dy/dx is sometimes typographically clumsy, and a single tick is easier on the page.

You will see both in the wild. They mean the same thing. Use whichever you find less ugly.

∗ ∗ ∗

Probabilities

The notation that says "given that."

P(A)

the probability of A.

Read aloud: the probability of A. Here A is some event, and P(A) is a number between 0 and 1 representing how likely the event is. Zero means it definitely will not happen. One means it definitely will. A half means it could go either way.

P(A | B)

the probability of A, given B.

Read aloud: the probability of A given B. The vertical bar is doing the work of the word given. This is the probability of A happening, in the world where you already know B happened. It can be very different from P(A).

Example. P(it will rain today) might be 0.3. P(it will rain today | the sky is dark grey at noon) might be 0.85. The information about the sky has changed what you should expect. The vertical bar lets you write down conditional expectations without rewriting half a paragraph each time.

P(H | E) = P(E | H) · P(H) P(E)

Bayes' theorem.

This is the entire content of Chapter 8, written in the formal notation. Left side: the probability of a hypothesis H given evidence E. Right side: the probability of seeing that evidence under that hypothesis, multiplied by your prior belief in the hypothesis, divided by the overall probability of the evidence.

The dot in P(E | H) · P(H) is multiplication. The horizontal line is division. Everything else has been explained above. The notation says, with great compression, how a rational mind should update beliefs in light of new information.

∗ ∗ ∗

Distributions and averages

The notation for what you should expect.

the mean. greek letter mu.

The Greek letter μ is the standard symbol for the average of a distribution. If you have a set of numbers, μ is what you get when you add them up and divide by how many there are.

It is written with a Greek letter rather than an English one to distinguish it from any particular average of any particular collected sample. μ is the average of the underlying distribution. A sample drawn from the distribution has its own average, often written with a bar over the variable, like x̄. The two are usually close to each other. They are not identical.

the standard deviation. greek letter sigma.

If μ is where the distribution is centered, σ is how spread out it is. A small σ means most values are close to the mean. A large σ means values are spread further out.

In Chapter 9, the example used μ = 6 and σ = 1.5 for a mood distribution. This means most of the values are within 1.5 of 6, with progressively rarer extremes further out. The chapter's argument that the worst days are rare events at the tails of the distribution is, in technical terms, an argument about how much of the distribution's mass lies more than two or three σ from μ.

∗ ∗ ∗

Combinatorics

The notation for "how many ways."

n factorial.

Read aloud: n factorial. This is the product of all the positive integers from 1 up to and including n. So 3! = 1 · 2 · 3 = 6, and 5! = 120, and 10! = 3,628,800.

The notation matters because n! is one of the most violently expanding functions in mathematics, and it shows up whenever you count the number of ways to arrange n things in a sequence. The traveling salesman problem of Chapter 17 is hard precisely because the number of routes through n cities is, roughly, n!. The exclamation point is there partly because the operation is so striking that it apparently deserved one.

∑

sum. greek capital letter sigma.

The big sigma means add up everything in the following pattern. Below the sigma is usually a starting value, like i = 1. Above it is usually an ending value, like n. To the right is the expression you are adding up, usually written in terms of i.

Example. The notation ∑_i=1⁴ i means add up i for i from 1 to 4, which is 1 + 2 + 3 + 4 = 10. The notation exists so you do not have to write out add up i squared for i from 1 to 100, because that would, on inspection, take a very long time to write out as an explicit sum.

∗ ∗ ∗

Linear algebra, briefly

The notation for vectors and matrices.

v→

a vector v.

The arrow on top, or sometimes a bold letter v, indicates that v is not a single number but a small ordered list of numbers. A two-dimensional vector might be (3, 4). A three-dimensional vector might be (1, 0, 2). The components, considered together, describe a point in space or a direction.

Chapter 7's eigenvectors are, in this sense, ordinary vectors with a special property under a particular transformation. The notation does not need anything fancier than this to be precise.

M·v

a matrix times a vector.

If M is a matrix, a rectangular grid of numbers, and v is a vector, then M·v is the result of applying the transformation M to the vector v. The result is another vector. The notation exists because, in linear algebra, transformations of space are most usefully written as multiplications by a matrix.

An eigenvector of M is a vector v such that M·v = λ·v for some number λ called the eigenvalue. The transformation, applied to the eigenvector, just stretches it by λ. It does not turn it. That, in compact notation, is the entire content of Chapter 7.

∗ ∗ ∗

Limits and asymptotes

The notation for "approaches but does not reach."

lim_{x → ∞} f(x) = L

the limit, as x goes to infinity, of f(x) is L.

Read aloud: the limit, as x goes to infinity, of f of x, equals L. This says that as x becomes very large, the value of f(x) gets arbitrarily close to L, without necessarily ever reaching it.

Chapter 6's asymptote argument is, in this notation, the observation that lim_{x → ∞} 1/x = 0, even though no finite x produces a value of zero. The line at zero is the asymptote. The curve approaches it forever without arriving.

∗ ∗ ∗

A few small extras

Notation you will encounter elsewhere.

∈

is a member of.

The notation x ∈ A reads x is a member of A, where A is a set of things. So you might write 3 ∈ ℕ to mean three is a natural number, where the symbol ℕ stands for the natural numbers.

∀ ∃

for all, there exists.

The upside-down A means for all. The backwards E means there exists. The notation ∀ x ∈ ℕ, x + 1 > x reads for all natural numbers x, x plus one is greater than x, which is, on inspection, true. The notation lets you make universal statements compactly.

≈

approximately equal to.

The wavy equals sign means roughly equal. So π ≈ 3.14. The exact value has infinitely many digits. The approximation is good enough for most ordinary uses. The notation lets you say close enough without having to commit to which decimal place you mean.

⇒

implies.

The double arrow means implies. A ⇒ B means if A is true, then B is true. Logic uses this constantly. So does mathematics. The notation lets you write down chains of inference without having to write the word therefore over and over.

∗ ∗ ∗

This is, more or less, the entire notation the book has used, plus a handful of pieces you are likely to run into the moment you read any other mathematical text. There is, of course, much more notation in mathematics than I have covered here. There are specialized notations for category theory, for differential geometry, for number theory, for topology, each with their own conventions and their own history. None of them are in this book. If, having finished this book, you decide to wander further into mathematical territory, you will encounter them. You will, I suspect, find that they are, like everything in this appendix, small inventions that allow particular kinds of things to be said cleanly. Each one is, in its way, a tool. Each one is, in its way, learnable.

I want to say one last thing, because the chapter's audience deserves it. The notation is not the mathematics. The mathematics is the ideas. The notation is a tool for writing the ideas down efficiently. If you understand an idea but cannot follow the notation, you do not understand the idea less. You merely have not yet learned the shorthand that other people, also working on the same idea, have agreed to use. The shorthand is learnable. The shorthand is, in fact, in your hands now, in this appendix, to the extent that this book has used it. You have the dictionary. The dictionary is small. The dictionary is, on inspection, sufficient.

If a piece of notation in any future text blocks you, look it up. Do not, under any circumstances, conclude that the notation is the test by which you are entitled to read mathematics. The notation is the floor. The mathematics is the building. The notation is what holds the floor up. The mathematics is where you live.

← Appendix A Appendix C Appendix D Appendix E

Return to Contents