I have been accumulating these observations for many years. Writing textbooks has led me to think about the choices I make in writing mathematics. I have also noted writing errors commonly made by my thesis students and in papers submitted to journals. Here I collect my conclusions.
My first aim in creating this document was to educate my students, thereby reducing the time needed to edit their theses. Since it exists, I am making it publicly available in case others may find it useful. If you don't find it useful (or if you object to it on principle), then please ignore it. I hope to make some writers of mathematics (especially students) aware of issues they may not have considered, where small changes can produce mathematical writing that is easier to read by wider audiences.
After an introductory explanation, I discuss (1) mathematical style, (2) notation and terminology, (3) punctuation and English grammar as used in mathematical writing, and (4) English usage for non-native speakers. Some points are minor distinctions, but even these make mathematical writing clearer when used consistently. My intent is not to make writing rigid, but rather to make it transparent so the reader is not distracted by ambiguities or awkwardness in the flow of the narrative.
In live mathematical conversations, one takes many shortcuts that are inappropriate in precise mathematical writing. The context is known by all participants, and shortcuts evolve to save time. Also, the speaker can immediately clarify anything ambiguous. Without immediate access to the author, written mathematics must use language more carefully. In addition, mathematical concepts are abstract, without context from everyday experience, so the writing must be more consistent to make the meaning clear. Outside mathematics, imprecise writing of English can still be understood because the objects and concepts discussed are familiar.
Many mathematicians will object to some of my recommendations. Many time-honored practices in the writing of mathematics are grammatically incorrect. These mistakes in writing cause no difficulty for readers with sufficient mathematical sophistication or familiarity with the subject. I believe it is unnecessary to restrict the audience to such readers. A modicum of care leads to clearer writing that makes mathematics more easily accessible and readable to a wider and less specialized audience.
Various languages other than English have conventions of usage or grammar that lead to typical errors in English mathematical writing by their native speakers. I have put discussion of these special items in a separate section at the end. Meanwhile, in my explanations I use terms for English parts of speech and punctuation; these terms give technical reasons for my choices, but I hope that readers who are unfamiliar with these terms can still benefit from seeing what the choices are.
Before I start, several disclaimers are in order. I apologize in advance for my own grammatical errors. Habits die hard, and it is easy to err in applying principles of writing. Also, there are inconsistencies between what I propose here and the writing in my earlier books. Those books were written in the previous millennium, and I have learned many things about clear writing since then. Note also that I am a speaker of American English, and some points are consistently different in British English (such as the treatment of "which" vs. "that" and the aversion to serial commas).
Some of my conclusions conflict with manuals of English style. The conclusions I have drawn are intended to produce clear mathematical writing that is more logically consistent than publishers conventions. This applies especially to punctuation and to words that serve as logical connectives.
I welcome corrections, suggestions/inquiries and "pet peeves" that may lead to inclusion of further items in later editions.
The first section of the paper is an "Introduction" that should motivate the problem, discuss the related results, state more completely what the results are, and perhaps summarize the techniques or the structure of the paper. In addition, the introduction should contain the concluding remarks or key conjectures.
There is generally little or no value in a separate section of concluding remarks. Such remarks either are redundant or contain information that readers will look to the introduction to find. Readers who study the full details of the proofs are well aware of the statements that summarize what has been done. Readers who do not read the full details have no reason to go on to the concluding remarks. A mathematical research article is not read like a novel or even like a essay that seeks to "persuade" the reader; it does not need an epilogue.
Many definitions are phrased as "An object has property italicized term if condition holds." We use just "if" even though subsequently it is understood that an object has the property if and only if the defining condition holds. The italicization alerts the reader to this situation. The convention can be justified by saying that the property or object does not actually exist until the definition is complete, so one does not yet in the definition say that the named property implies the condition.
Definitions written by non-native speakers sometimes contain errant commas.
In each sentence below, the comma should be deleted.
"A bipartite graph, is a graph that is 2-colorable".
"A graph is bipartite, if it is 2-colorable".
The first example is a mistaken placements of a comma inside a clause (see
discussion of Commas).
Note the difference in italicization above. When written as an adjective-noun combination, the term being defined is the name for structures that have the property; hence the full term bipartite graph is italicized. When the property alone is being defined and is positioned as a predicate adjective, only the adjective is italicized.
Of course, readers sufficiently familiar with the context have no trouble understanding what is meant, but why disenfranchise other readers? One can just as easily write "The neighborhood of a vertex v, denoted N(v), is {u: uv∈ E(G)}". Alternatively, one can introduce the notation as an appositive in a conventional position immediate after the term defined: "The neighborhood N(v) of a vertex v is {u: uv∈ E(G)}".
A common Double-Duty definition is "Let G=(V,E) be a graph". The sentence defines the equation G=(V,E) to be a graph. Of course, the writer intends simultaneously to introduce notation for a particular graph and its vertex set and edge set, but that is not what the sentence says. It is better to write "Let G be a graph" and use operators V and E to refer to the vertex and edge sets of G as V(G) and E(G) (see also Operators vs. constants.)
A more subtle example is "For each 1≤ i≤ n,". The introduction of the notation i has been lost because the inequalities impose conditions on it before it is defined. Since the expression is a unit, grammatically the phrase is referring to each inequality written in this way. Correct alternatives that express the intended meaning include "For all i such that 1≤i≤n", "For i∈[n]", and "For 1≤i≤n". The third option is slightly different from the others; it means "whenever i is such that the conditions hold", implicitly introducing i in a specified range but avoiding the grammatical problem.
For example, "there exists i<j with xi=xj" ascribes a property to the inequality i<j (and is a Double-Duty Definition of i). Without context, it is hard to tell that the author meant "there exists i such that i<j and xi=xj". Consider also "The number of nonneighbors is n-1-d(u)≥ i." The number of nonneighbors is not an inequality, it is a number; the author is trying to make two statements in one inequality. For clarity, separate the statements: "The number of nonneighbors is n-1-d(u), which is at least i".
Exceptions. Applying this principle with very simple expressions leads
to ponderous writing. Here are two notable exceptions:
1) In "Choose x∈ V(G) such that x has minimum degree," we
are choosing x, not the expression "x∈ V(G)". The
justification for this exception is that the membership or containment symbol
is read as "in", which is not a verb. (One can treat nonmembership in the same
way.)
2) "Let G'=G-x". When introducing notation for an object or expression
by a single imperative verb ("let", "set", "put", "choose", etc.), we read the
equality symbol as the verb "equal", truly an exception. This exception can
be recognized by the lack of any verb outside the notational expression.
Continuing with another verb, as in "Let G'=G-x be ...", would produce
a Double Duty Definition.
If the introductory part of the sentence is longer, then we may already have a noun and a verb, and the expression again becomes a unit. For example, "Include each vertex independently with probability p=(ln n)/n" should be "Include each vertex independently with probability p, where p=(ln n)/n".
When the second formula just specifies an object, the separation can be accomplished by specifying the type of object, as in "When k=2, the graph G is Eulerian" instead of "When k=2, G is Eulerian." One can always rewrite to notational expressions separated only by a comma. Sometimes it is very easy, as in changing "For every bipartite graph G, χ(G)≤2" to "If G is bipartite, then χ(G)≤2".
Exceptions. With a list of size at least three, omission of "and" does not cause as much confusion, and including it is awkard. Here the objection to the common mathematical convention is much weaker: we accept "Let x,y,z be the vertices of T," although writing "Let {x,y,z} be the vertex set of T" would be more precise.
Another sensible exception is "Choose x,y∈ V(G)". Here the relation is between each variable and the set, and we accept this as a single formula. Again a justification is that we can read ∈ as the single word "in", without a verb. Similarly, many mathematicians write, "For n,m≥2" to mean the conjunction of n≥2 and m≥2. The exception for the membership symbol is consistent with other exceptions for the membership symbol; doing it with inequalities is more questionable. Avoid doing it with equalities (see Variable equal to list). it unnecessarily requires a pause for the reader to figure it out.
Other examples: "Suppose there is an edge xy (≠e) in G such that" should be "Suppose that G has an edge xy other than e such that". Similarly, "For k≤m with k even" improves on "For k≤m (k even)" or "For k≤m, k even", and "Consider ai for 1≤i≤n" is better than "Consider ai (1≤i≤n)". One can also separate by putting words into the parentheses: "For k≤ m (where k is even)". Note that "Suppose that there is an edge xy≠e in G such that" is a Double-Duty Definition; "xy≠e" is not an edge.
The same principle applies to logical symbols. In written mathematics, do not use the symbols ∃,∀,⇒,iff) to substitute for words in sentences. Shorthand notation used to save space on lecture slides need not follow these restrictions, since the slides summarize the lecture and are accompanied orally by sentences.
Used at the beginning of a sentence, the English word "Then" is temporal, as in "Then we left." Since the implicative sense of "then" is so common in mathematics, the temporal sense should rarely be used, to avoid confusion. Usually the temporal "then" at the beginning of a sentence can be changed to "Now" or "Next" with less confusion and essentially the same (and more accurate) meaning, especially in a proof.
When readability would be improved by omitting "then", the sentence should instead start with "When" or "For", as in this sentence itself. A comma still follows the condition introduced by "When" or "For". The structure of a sentence beginning with "Since" is like those beginning with "When" or "For"; a comma follows the first clause. After "Since" or "Because", the concluding clause cannot begin with "then" or "so"; "then" is used only with "If".
Among these choices, I treat "Therefore" as the most formal, introducing a major conclusion and hence taking a comma. Because "Hence" and "Thus" are single syllables, I use them without commas to indicate the flow of argument without making the writing choppy. This choice modifies strict English punctuation in the service of mathematical understanding. It is not incorrect to put commas after all these introductory words, but it enhances mathematical communication to omit the commas after short words introducing short conclusions that are just a step along the way.
"Suppose" vs. "Suppose that". After words of hypothesis or conclusion ("suppose", "assume", "implies", "conclude", etc), use "that" when what follows is a clause with an English verb. Omit "that" when what follows is just a noun unit, such as a notional expression. For example, "Assume the hypothesis" is a complete "imperative verb - object" sentence. The principle is the same in "Suppose x+y≤10".
The distinction made here is a matter of some debate. Some authors are more formal and want to use "that" after the introductory word when what follows is a notational formula containing a relational symbol, treating that symbol as a verb. However, I think it is better to maintain the consistency of treating formulas as noun units. In addition, the role in clarification played by "that" when a clause with a verb follows become unnecessary when the clause is condensed into notation. Finally, the notation may be displayed, which emphasizes its role as a fact (noun) and makes "that" especially unnecessary. For consistency, the use of "that" should be the same when the formula is not displayed. A related example is "the case k=2", as opposed to "the case that k=2"; here "k=2" is the case, which is a noun, so there is no "that".
In English, we also do not always use "that" when a verb is present. When the instruction is informal, without abstract concepts, "that" is usually dropped to avoid ponderous language. For example, "Suppose the hypothesis is true" would be awkward with "that". Similarly, the very short "Suppose there is" would be awkward with "that" after "Suppose", because the verb is gone before one even notices it; this is almost like "Suppose [notation]".
This exception may seem awkward. A better solution when introducing notation is to avoid "Suppose x is" entirely: "Let G be a graph" is better than "Suppose G is a graph". Compare "Suppose x=1" and "Let x=1"; the second sentence is better. The first assumes the truth of an equality; the equation is a unit. The second is more active. Because we never say "Let that . . .", we either view "Let" as the entire verb or view the equality sign as the verb. This usage of "Let" is an exception to the treatment of expressions as noun units; it is not used with inequalities, because an inequality sign would need to be read as the lengthy "be less than or equal to" to become a verb.
Numbered plural variables cause difficulty. In English, "for every two elements" is awkward because "every" is singular. Thus here it is better to say "for any two elements". The presence of "for" is suggestive of the universal quantification and helps avoid ambiguity. Nevertheless, there may still be confusion: consider the sentence "Form G' from G by adding an edge joining any two vertices with distance 2 in G." Some readers will think that only one edge is added, so this exception must be used with care.
Avoiding "any" is not imperative. Evaluate its use in context, making sure to prevent misinterpretation. "Any" is a good substitute for "an arbitrary", and the meaning of "not any" is fairly clear.
Using an indefinite article ("a" or "an") as a universal quantifier can be dangerous, as in "Prove that a bipartite graph has no odd cycle." Some readers may interpret "a" as "one" or "some", turning universality into existence. Using "every" is clearer. Putting "must" before the conclusion can suggest universality but is usually unnecessary.
Although "This result is best possible" may be grammatically correct, it is a somewhat vague sentence, since it does not specify the sense in which the result cannot be improved. Often it is more informative to say something like "the constant in the upper bound cannot be improved". For this reason, some writers suggest avoiding the term "best possible" in written mathematics.
The usage of "series" in English is contrary to its usage in mathematics. In English a "series" usually consists of finitely many occurrences in order, as in the "World Series" or the title "A Series of Unfortunate Events". In mathematics a series is an infinite sum.
Although html does not have a standard character for line-centered dots, the ellipsis in an indexed list with relations should be vertically centered on the line ("\cdots" in tex), while the ellipsis in an indexed list separated by commas should be on the baseline ("\ldots" in tex).
It is tempting for mnemonic reasons to write "We write V=V(G) and Δ=Δ(G)". Admittedly, this usage is not confusing when discussing only one graph at a time; the difference between a graph invariant and a real-valued function is that we rarely focus on the value of a real-valued function at just one point. Nevertheless, it is rare that a paper discusses only one graph, and hence it is better to use V(G) and Δ(G) for objects associated with G. The problem is particularly bad with Δ, since this character also occurs in mathematics as a difference operator. One often sees "Δn" meaning the change in the value of n, so one should not use "Δn" to mean the maximum degree times the number of vertices in a graph. (In my textbook I violated this principle by using n(G) and e(G) for the numbers of vertices and edges in a graph G while using n for the number of vertices of a particular graph and e as a particular edge; the error will be corrected in the third edition.)
Two-word terms used as single concepts to modify nouns must be hyphenated when so located (without the hyphen in this sentence, we would be discussing two "word terms"). This principle applies in the correct sentence "A well-known theorem is a theorem that is well known." The same principle applies to parameters in adjectives: "k connected graphs" would be k graphs that are connected, in contrast to "k-connected graphs". Adverbs behave differently, since they can modify adjectives; for example, we may write "upper chromatic number" without hyphens.
Another hyphenation issue arises in graph theory with analogous concepts for vertices and edges. Often a concept for edges is an analogue of a fundamental concept using vertices. In this setting, we do not need "vertex" as an adjective to specify "connectivity" or "chromatic number", but we add "edge" for the analogous edge concept. We then hyphenate "edge-connectivity" and "edge-chromatic number". This makes sense because in both cases the problem for edges is a special case (for line graphs) of the general coloring or connectivity problem. When comparing "edge-coloring" and "list coloring", the difference is then that we are not coloring the lists, so the format of the term is different from that for edge-coloring.
When an expression involving addition or subtraction is used as a parameter modifying a noun, it should be enclosed in parentheses. For example, write "(k+1)-connected graph", not "k+1-connected graph".
The term "order" for the number of vertices of a graph is not as popular as it once was. Some readers find it confusing and prefer "number of vertices". On the other hand, it is very convenient, while overuse of "number of vertices" becomes quite awkward.
Similarly, one should not use "hyperedges" to refer to the edges of a hypergraph. Hypergraphs generalize graphs by allowing edges to have arbitrary size. Calling them "hyperedges" eliminates the possibility of saying that graphs arise as a special case, since graphs have edges, not hyperedges.
Although this distinction is sensible and has become established in many settings (such as "maximum antichain" and "maximum independent set"), potential confusion can be reduced by using "largest" and "smallest" instead of "maximum" and "minimum". For example, it is harder to misinterpret "a largest matching" than to misinterpret "a maximum matching".
For consistency, then, one should not write "a vertex of maximal degree" or "the maximal number of edges"; that is, "maximal" should not be applied to numerical values. This is consistent with usage in continuous mathematics, where we write that a continuous function "attains its maximum" on a closed and bounded set.
A different problem arises in the induction step. When we cite the induction hypothesis, we must write "By the induction hypothesis", not "By induction". To obtain the smaller object (or a property of it), we are invoking the hypothesis that the claim holds for smaller values; we are not invoking the principle of mathematical induction.
Hence we should never write "a Pn" for a member of that class. We can write that a graph "contains a path with n vertices", because that is a structural description of the subgraph, but we cannot write "contains a Pn" or "consider a Pn in G". We can say "contains ten copies of Pn" to refer to subgraphs that are n-vertex paths; each such subgraph is a member of the isomorphism class denoted by Pn.
Nevertheless, complete strictness about this notation produces very awkward writing. Thus when $H$ is the notation for an isomorphism class, we still write "H⊆G" to mean that some subgraph of G belongs to the isomorphism class or is "isomorphic to H", even though we are not specifying particular subsets of the vertices and edges of G. graph with n vertices. The reason we accept this slight abuse of the notation "H⊆G" and not the expression "a Pn" is that "a" is an English word whose meaning and grammatical usage cannot be changed, which emphasizes the difficulty that Pn is not a singular object.
Some authors who write extensively about chromatic number and edge-chromatic number drop the word "proper" and use k-[edge-]coloring for the restricted concept. The minor convenience gained by dropping this word is overwhelmed by the negative influence of introducing inconsistency of terminology in combinatorics. Use "proper k-coloring" when that is what is meant. For other variations, such as "acyclic k-coloring" or "dynamic k-coloring", the adjectives replace "proper" by imposing other restrictions on the k-coloring, so the word "proper" is then no longer needed.
When the phrase after the relative pronoun specifies a further restriction of the class that has just been introduced, the correct pronoun is "that", and the subsequent phrase tells which of the items in the class are those being discussed. If the subsequent phrase speaks about the totality of the class, then the proper pronoun is "which". When "that" and "which" both seem usable, use "that" when the sense is "having the property that", and use "which" when the sense is "all of which" or "the only one of which". Usually a comma is appropriate before "which". Usually "that" is correct when an indefinite article ("a" or "an") has been used on the word being modified. Beware: This distinction is not made or is made the opposite way in British English. Some American style manuals don't care, but in mathematics there are two distinct meanings to be expressed.
The word "distinct" has the same meaning as "different". Two things can be distinct, but one thing cannot be distinct. Thus the sentence "Every value is distinct" is incorrect; it has no meaning. Many beginning students think it means that each value is different from every other value, but it does not.
The word "unique" indicates that there is only one of the items being described. It does not mean that this item is different from other items. Some students think that "The function f maps the points in A to unique points in B" is a statement that f is injective, but it is not. Every function from A to B maps each point in A to a unique point in B.
The distinction between the words "distinct" and "unique" is made clear by a typical boast on the World Wide Web. The sentence "Our website has one million unique visitors" makes no sense. The intent is to say that among millions of hits there are one million distinct visitors; if there is a unique visitor, then there is no other visitor.
Functions or parameters assign a number to each domain object. The resulting value is specific for the object; there is only one choice for it. Hence we do not say "the graph has a chromatic number 3" or "the vertex has a degree 3". These sentences suggest that the object may have more than one value of the parameter. The answer to the question "What is the degree of this vertex?" may be "This vertex has degree 3", but it cannot be "This vertex has a degree 3".
We also do not say "This vertex has the degree 3", although "The degree of this vertex is 3" is correct. Several instances occur in the sentence "Every graph has an even number of vertices with odd degree, which means that the list of vertex degrees has even sum." The term "even number" takes the article "an" because we are saying which type of number is being used (it is one of the even numbers). The later "odd degree" and "even sum" do not, because these are properties that the vertices and the list do or do not satisfy. Articles are inappropriate when invoking a property.
Articles also are not used with conceptual nouns. Compare with familiar conversation: we say "This chair has value $100" and not "This chair has the value $100." "Value" and "degree" are abstract properties. Here is another non-mathematical example: We say "I receive compensation for my work," not "I receive a compensation for my work." Compensation is an amount, but here only the abstract concept of receiving compensation is meant, not some number of things. Hence we do not use an article.
Similarly, abstract properties do not take articles. We say "because transitivity of A implies transitivity of B", not "because the transitivity of A implies the transitivity of B". The property in question is "transitivity", not "the transitivity".
When discussing a result by two authors, we cannot put possessives on both names, and making only the second name possessive would be wrong. Hence we write "the Greene--Kleitman Theorem". Here "the" serves as a definite article for the unique object "Greene--Kleitman Theorem". When the result is less celebrated, one can indicate the possessive by "of", as in "the theorem of Greene and Kleitman".
In the examples above, "Theorem" is capitalized. When there is only one instance of an object, and the name of it involves a person, it plays the role of a proper noun and its name is a title. Another example is "the Cauchy-Schwarz Inequality".
Two clauses (in essence, two complete sentences) may be combined using a conjunction; the conjuction must be preceded by a comma. Examples of conjunctions are "and", "but", "then", and "so" (the latter should be treated as conjunctions in mathematical writing). Since a conjunction joins two things, sentences should not begin with these words. This is a logical approach that helps keep writing clear, though strict English usage (especially British) may call some of these words adverbs. See further comments on the use of then and so.
Exception. The situation is more complicated when the second clause itself contains a conjunction. Compare "If A, then B holds and C holds" with "If A, then B holds, and C holds". In the first sentence, it is clear that A implies both B and C. The proper grouping or meaning in the second sentence is unclear. Since we only have one comma symbol and don't parenthesize sentences to indicate grouping, a short conjunction of two sentences within a larger conjunction is written without a comma.
One reason for using the serial comma in lists is to avoid confusion in sentences that do not contain lists. Consider the sentences "Like a, b and c have the same property" and "Later, Early and Jones proved the conjecture". These are not lists, and using a comma would be wrong, but when a document does not use serial commas these examples initially appear to be lists. Similarly, in that context an item in a list that itself joins two subitems with "and" looks like the last two items in a list.
Omitting the serial comma can also cause confusion mathematically, as in "The value of f is positive at 2, negative at 1 and 0 at 0."
When an appositive is short enough or contains essential information, the commas are omitted: "My friend Bob is a student." In mathematical writing, a similar situation applies when notation is introduced: "The degree d(v) of a vertex v is the number of neighbors of v." Here "d(v)" is a brief appositive. One could argue that the notation for "degree" is not essential to the sense of the sentence, but putting commas around very short appositives can produce very choppy sentences. A speaker need not pause for such appositives, and hence one may omit the commas.
The expression "may be" does exist in English, when used as a verb as in "It may be true" or "This may be the only component". However, when it appears at the start of a clause most likely the word "maybe" is intended, as in "Maybe this proof will work. Note that in this situation there is another verb ("work"), and the initial expression means "Possibly", which is not a verb.