diff --git a/test/spec.txt b/test/spec.txt
@@ -192,8 +192,8 @@ an implementation without writing an abstract syntax tree renderer.
This document is generated from a text file, `spec.txt`, written
in Markdown with a small extension for the side-by-side tests.
-The script `spec2md.pl` can be used to turn `spec.txt` into pandoc
-Markdown, which can then be converted into other formats.
+The script `tools/makespec.py` can be used to convert `spec.txt` into
+HTML or CommonMark (which can then be converted into other formats).
In the examples, the `→` character is used to represent tabs.
@@ -724,13 +724,14 @@ ATX headers can be empty:
## Setext headers
A [setext header](@setext-header)
-consists of a line of text, containing at least one
-[non-space character],
+consists of a line of text, containing at least one [non-space character],
with no more than 3 spaces indentation, followed by a [setext header
underline]. The line of text must be
one that, were it not followed by the setext header underline,
-would be interpreted as part of a paragraph: it cannot be a code
-block, header, blockquote, horizontal rule, or list.
+would be interpreted as part of a paragraph: it cannot be
+interpretable as a [code fence], [ATX header][ATX headers],
+[block quote][block quotes], [horizontal rule][horizontal rules],
+[list item][list items], or [HTML block][HTML blocks].
A [setext header underline](@setext-header-underline) is a sequence of
`=` characters or a sequence of `-` characters, with no more than 3
@@ -1811,7 +1812,7 @@ title], which if it is present must be separated
from the [link destination] by [whitespace].
No further [non-space character]s may occur on the line.
-A [link reference-definition]
+A [link reference definition]
does not correspond to a structural element of a document. Instead, it
defines a label which can be used in [reference link]s
and reference-style [images] elsewhere in the document. [Link
@@ -2587,7 +2588,7 @@ The following rules define [list items]:
1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
blocks *Bs* starting with a [non-space character] and not separated
from each other by more than one blank line, and *M* is a list
- marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result
+ marker of width *W* followed by 0 < *N* < 5 spaces, then the result
of prepending *M* and the following spaces to the first line of
*Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
list item with *Bs* as its contents. The type of the list item
@@ -2726,7 +2727,7 @@ this example:
Here `two` occurs in the same column as the list marker `1.`,
but is actually contained in the list item, because there is
-sufficent indentation after the last containing blockquote marker.
+sufficient indentation after the last containing blockquote marker.
The converse is also possible. In the following example, the word `two`
occurs far to the right of the initial text of the list item, `one`, but
@@ -2852,7 +2853,7 @@ A list item may contain any kind of block:
2. **Item starting with indented code.** If a sequence of lines *Ls*
constitute a sequence of blocks *Bs* starting with an indented code
block and not separated from each other by more than one blank line,
- and *M* is a list marker *M* of width *W* followed by
+ and *M* is a list marker of width *W* followed by
one space, then the result of prepending *M* and the following
space to the first line of *Ls*, and indenting subsequent lines of
*Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
@@ -3001,7 +3002,7 @@ the above case:
3. **Item starting with a blank line.** If a sequence of lines *Ls*
starting with a single [blank line] constitute a (possibly empty)
sequence of blocks *Bs*, not separated from each other by more than
- one blank line, and *M* is a list marker *M* of width *W*,
+ one blank line, and *M* is a list marker of width *W*,
then the result of prepending *M* to the first line of *Ls*, and
indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
item with *Bs* as its contents.
@@ -3090,7 +3091,7 @@ A list may start or end with an empty list item:
4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
according to rule #1, #2, or #3, then the result of indenting each line
- of *L* by 1-3 spaces (the same for each line) also constitutes a
+ of *Ls* by 1-3 spaces (the same for each line) also constitutes a
list item with the same contents and attributes. If a line is
empty, then it need not be indented.
@@ -4275,8 +4276,8 @@ corresponding codepoints.
[Decimal entities](@decimal-entities)
consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
-entities need to be recognised and tranformed into their corresponding
-UTF8 codepoints. Invalid Unicode codepoints will be written as the
+entities need to be recognised and transformed into their corresponding
+unicode codepoints. Invalid unicode codepoints will be written as the
"unknown codepoint" character (`0xFFFD`)
.
@@ -4287,7 +4288,8 @@ UTF8 codepoints. Invalid Unicode codepoints will be written as the
[Hexadecimal entities](@hexadecimal-entities)
consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
-+ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST.
++ `;`. They will also be parsed and turned into the corresponding
+unicode codepoints in the AST.
.
" ആ ಫ
@@ -4581,14 +4583,16 @@ characters that is not preceded or followed by a `_` character.
A [left-flanking delimiter run](@left-flanking-delimiter-run) is
a [delimiter run] that is (a) not followed by [unicode whitespace],
and (b) either not followed by a [punctuation character], or
-preceded by [unicode whitespace] or a [punctuation character] or
-the beginning of a line.
+preceded by [unicode whitespace] or a [punctuation character].
+For purposes of this definition, the beginning and the end of
+the line count as unicode whitespace.
A [right-flanking delimiter run](@right-flanking-delimiter-run) is
a [delimiter run] that is (a) not preceded by [unicode whitespace],
and (b) either not preceded by a [punctuation character], or
-followed by [unicode whitespace] or a [punctuation character] or
-the end of a line.
+followed by [unicode whitespace] or a [punctuation character].
+For purposes of this definition, the beginning and the end of
+the line count as unicode whitespace.
Here are some examples of delimiter runs.
@@ -4604,20 +4608,20 @@ Here are some examples of delimiter runs.
- right-flanking but not left-flanking:
```
- abc***
- abc_
+ abc***
+ abc_
"abc"**
- _"abc"
+ "abc"_
```
- - Both right and right-flanking:
+ - Both left and right-flanking:
```
- abc***def
+ abc***def
"abc"_"def"
```
- - Neither right nor right-flanking:
+ - Neither left nor right-flanking:
```
abc *** def
@@ -4635,32 +4639,40 @@ are a bit more complex than the ones given here.)
The following rules define emphasis and strong emphasis:
1. A single `*` character [can open emphasis](@can-open-emphasis)
- iff it is part of a [left-flanking delimiter run].
+ iff (if and only if) it is part of a [left-flanking delimiter run].
2. A single `_` character [can open emphasis] iff
it is part of a [left-flanking delimiter run]
- and not part of a [right-flanking delimiter run].
+ and either (a) not part of a [right-flanking delimiter run]
+ or (b) part of a [right-flanking delimeter run]
+ preceded by punctuation.
3. A single `*` character [can close emphasis](@can-close-emphasis)
iff it is part of a [right-flanking delimiter run].
-4. A single `_` character [can close emphasis]
- iff it is part of a [right-flanking delimiter run]
- and not part of a [left-flanking delimiter run].
+4. A single `_` character [can close emphasis] iff
+ it is part of a [right-flanking delimiter run]
+ and either (a) not part of a [left-flanking delimiter run]
+ or (b) part of a [left-flanking delimeter run]
+ followed by punctuation.
5. A double `**` [can open strong emphasis](@can-open-strong-emphasis)
iff it is part of a [left-flanking delimiter run].
-6. A double `__` [can open strong emphasis]
- iff it is part of a [left-flanking delimiter run]
- and not part of a [right-flanking delimiter run].
+6. A double `__` [can open strong emphasis] iff
+ it is part of a [left-flanking delimiter run]
+ and either (a) not part of a [right-flanking delimiter run]
+ or (b) part of a [right-flanking delimeter run]
+ preceded by punctuation.
7. A double `**` [can close strong emphasis](@can-close-strong-emphasis)
iff it is part of a [right-flanking delimiter run].
8. A double `__` [can close strong emphasis]
- iff it is part of a [right-flanking delimiter run]
- and not part of a [left-flanking delimiter run].
+ it is part of a [right-flanking delimiter run]
+ and either (a) not part of a [left-flanking delimiter run]
+ or (b) part of a [left-flanking delimeter run]
+ followed by punctuation.
9. Emphasis begins with a delimiter that [can open emphasis] and ends
with a delimiter that [can close emphasis], and that uses the same
@@ -4822,13 +4834,14 @@ aa_"bb"_cc
<p>aa_"bb"_cc</p>
.
-Here there is no emphasis, because the delimiter runs are
-both left- and right-flanking:
+This is emphasis, even though the opening delimiter is
+both left- and right-flanking, because it is preceded by
+punctuation:
.
-"aa"_"bb"_"cc"
+foo-_(bar)_
.
-<p>"aa"_"bb"_"cc"</p>
+<p>foo-<em>(bar)</em></p>
.
Rule 3:
@@ -4939,6 +4952,16 @@ _foo_bar_baz_
<p><em>foo_bar_baz</em></p>
.
+This is emphasis, even though the closing delimiter is
+both left- and right-flanking, because it is followed by
+punctuation:
+
+.
+_(bar)_.
+.
+<p><em>(bar)</em>.</p>
+.
+
Rule 5:
.
@@ -5035,6 +5058,17 @@ __foo, __bar__, baz__
<p><strong>foo, <strong>bar</strong>, baz</strong></p>
.
+This is strong emphasis, even though the opening delimiter is
+both left- and right-flanking, because it is preceded by
+punctuation:
+
+.
+foo-_(bar)_
+.
+<p>foo-<em>(bar)</em></p>
+.
+
+
Rule 7:
This is not strong emphasis, because the closing delimiter is preceded
@@ -5138,6 +5172,16 @@ __foo__bar__baz__
<p><strong>foo__bar__baz</strong></p>
.
+This is strong emphasis, even though the closing delimiter is
+both left- and right-flanking, because it is followed by
+punctuation:
+
+.
+_(bar)_.
+.
+<p><em>(bar)</em>.</p>
+.
+
Rule 9:
Any nonempty sequence of inline elements can be the contents of an
@@ -5706,7 +5750,7 @@ A [link destination](@link-destination) consists of either
ASCII space or control characters, and includes parentheses
only if (a) they are backslash-escaped or (b) they are part of
a balanced pair of unescaped parentheses that is not itself
- inside a balanced pair of unescaped paretheses.
+ inside a balanced pair of unescaped parentheses.
A [link title](@link-title) consists of either
@@ -5839,8 +5883,8 @@ in Markdown:
URL-escaping should be left alone inside the destination, as all
URL-escaped characters are also valid URL characters. HTML entities in
-the destination will be parsed into their UTF-8 codepoints, as usual, and
-optionally URL-escaped when written as HTML.
+the destination will be parsed into the corresponding unicode
+codepoints, as usual, and optionally URL-escaped when written as HTML.
.
[link](foo%20bä)
@@ -7215,10 +7259,10 @@ foo
## Soft line breaks
A regular line break (not in a code span or HTML tag) that is not
-preceded by two or more spaces is parsed as a softbreak. (A
-softbreak may be rendered in HTML either as a
-[line ending] or as a space. The result will be the same
-in browsers. In the examples here, a [line ending] will be used.)
+preceded by two or more spaces or a backslash is parsed as a
+softbreak. (A softbreak may be rendered in HTML either as a
+[line ending] or as a space. The result will be the same in
+browsers. In the examples here, a [line ending] will be used.)
.
foo