cmark

My personal build of CMark ✏️

Commit
42ec0dedaa3bb770a689603c52c12a4c6295b0b2
Parent
bff0466e191b414f81b607dd8ff3e60f04e03d1d
Author
John MacFarlane <jgm@berkeley.edu>
Date

Updated spec.txt.

Diffstat

1 file changed, 90 insertions, 46 deletions

Status File Name N° Changes Insertions Deletions
Modified test/spec.txt 136 90 46
diff --git a/test/spec.txt b/test/spec.txt
@@ -192,8 +192,8 @@ an implementation without writing an abstract syntax tree renderer.
 
 This document is generated from a text file, `spec.txt`, written
 in Markdown with a small extension for the side-by-side tests.
-The script `spec2md.pl` can be used to turn `spec.txt` into pandoc
-Markdown, which can then be converted into other formats.
+The script `tools/makespec.py` can be used to convert `spec.txt` into
+HTML or CommonMark (which can then be converted into other formats).
 
 In the examples, the `→` character is used to represent tabs.
 
@@ -724,13 +724,14 @@ ATX headers can be empty:
 ## Setext headers
 
 A [setext header](@setext-header)
-consists of a line of text, containing at least one
-[non-space character],
+consists of a line of text, containing at least one [non-space character],
 with no more than 3 spaces indentation, followed by a [setext header
 underline].  The line of text must be
 one that, were it not followed by the setext header underline,
-would be interpreted as part of a paragraph:  it cannot be a code
-block, header, blockquote, horizontal rule, or list.
+would be interpreted as part of a paragraph:  it cannot be
+interpretable as a [code fence], [ATX header][ATX headers],
+[block quote][block quotes], [horizontal rule][horizontal rules],
+[list item][list items], or [HTML block][HTML blocks].
 
 A [setext header underline](@setext-header-underline) is a sequence of
 `=` characters or a sequence of `-` characters, with no more than 3
@@ -1811,7 +1812,7 @@ title], which if it is present must be separated
 from the [link destination] by [whitespace].
 No further [non-space character]s may occur on the line.
 
-A [link reference-definition]
+A [link reference definition]
 does not correspond to a structural element of a document.  Instead, it
 defines a label which can be used in [reference link]s
 and reference-style [images] elsewhere in the document.  [Link
@@ -2587,7 +2588,7 @@ The following rules define [list items]:
 1.  **Basic case.**  If a sequence of lines *Ls* constitute a sequence of
     blocks *Bs* starting with a [non-space character] and not separated
     from each other by more than one blank line, and *M* is a list
-    marker *M* of width *W* followed by 0 < *N* < 5 spaces, then the result
+    marker of width *W* followed by 0 < *N* < 5 spaces, then the result
     of prepending *M* and the following spaces to the first line of
     *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
     list item with *Bs* as its contents.  The type of the list item
@@ -2726,7 +2727,7 @@ this example:
 
 Here `two` occurs in the same column as the list marker `1.`,
 but is actually contained in the list item, because there is
-sufficent indentation after the last containing blockquote marker.
+sufficient indentation after the last containing blockquote marker.
 
 The converse is also possible.  In the following example, the word `two`
 occurs far to the right of the initial text of the list item, `one`, but
@@ -2852,7 +2853,7 @@ A list item may contain any kind of block:
 2.  **Item starting with indented code.**  If a sequence of lines *Ls*
     constitute a sequence of blocks *Bs* starting with an indented code
     block and not separated from each other by more than one blank line,
-    and *M* is a list marker *M* of width *W* followed by
+    and *M* is a list marker of width *W* followed by
     one space, then the result of prepending *M* and the following
     space to the first line of *Ls*, and indenting subsequent lines of
     *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
@@ -3001,7 +3002,7 @@ the above case:
 3.  **Item starting with a blank line.**  If a sequence of lines *Ls*
     starting with a single [blank line] constitute a (possibly empty)
     sequence of blocks *Bs*, not separated from each other by more than
-    one blank line, and *M* is a list marker *M* of width *W*,
+    one blank line, and *M* is a list marker of width *W*,
     then the result of prepending *M* to the first line of *Ls*, and
     indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
     item with *Bs* as its contents.
@@ -3090,7 +3091,7 @@ A list may start or end with an empty list item:
 
 4.  **Indentation.**  If a sequence of lines *Ls* constitutes a list item
     according to rule #1, #2, or #3, then the result of indenting each line
-    of *L* by 1-3 spaces (the same for each line) also constitutes a
+    of *Ls* by 1-3 spaces (the same for each line) also constitutes a
     list item with the same contents and attributes.  If a line is
     empty, then it need not be indented.
 
@@ -4275,8 +4276,8 @@ corresponding codepoints.
 
 [Decimal entities](@decimal-entities)
 consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
-entities need to be recognised and tranformed into their corresponding
-UTF8 codepoints. Invalid Unicode codepoints will be written as the
+entities need to be recognised and transformed into their corresponding
+unicode codepoints. Invalid unicode codepoints will be written as the
 "unknown codepoint" character (`0xFFFD`)
 
 .
@@ -4287,7 +4288,8 @@ UTF8 codepoints. Invalid Unicode codepoints will be written as the
 
 [Hexadecimal entities](@hexadecimal-entities)
 consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
-+ `;`. They will also be parsed and turned into their corresponding UTF8 values in the AST.
++ `;`. They will also be parsed and turned into the corresponding
+unicode codepoints in the AST.
 
 .
 &#X22; &#XD06; &#xcab;
@@ -4581,14 +4583,16 @@ characters that is not preceded or followed by a `_` character.
 A [left-flanking delimiter run](@left-flanking-delimiter-run) is
 a [delimiter run] that is (a) not followed by [unicode whitespace],
 and (b) either not followed by a [punctuation character], or
-preceded by [unicode whitespace] or a [punctuation character] or
-the beginning of a line.
+preceded by [unicode whitespace] or a [punctuation character].
+For purposes of this definition, the beginning and the end of
+the line count as unicode whitespace.
 
 A [right-flanking delimiter run](@right-flanking-delimiter-run) is
 a [delimiter run] that is (a) not preceded by [unicode whitespace],
 and (b) either not preceded by a [punctuation character], or
-followed by [unicode whitespace] or a [punctuation character] or
-the end of a line.
+followed by [unicode whitespace] or a [punctuation character].
+For purposes of this definition, the beginning and the end of
+the line count as unicode whitespace.
 
 Here are some examples of delimiter runs.
 
@@ -4604,20 +4608,20 @@ Here are some examples of delimiter runs.
   - right-flanking but not left-flanking:
 
     ```
-    abc***
-      abc_
+     abc***
+     abc_
     "abc"**
-     _"abc"
+    "abc"_
     ```
 
-  - Both right and right-flanking:
+  - Both left and right-flanking:
 
     ```
-    abc***def
+     abc***def
     "abc"_"def"
     ```
 
-  - Neither right nor right-flanking:
+  - Neither left nor right-flanking:
 
     ```
     abc *** def
@@ -4635,32 +4639,40 @@ are a bit more complex than the ones given here.)
 The following rules define emphasis and strong emphasis:
 
 1.  A single `*` character [can open emphasis](@can-open-emphasis)
-    iff it is part of a [left-flanking delimiter run].
+    iff (if and only if) it is part of a [left-flanking delimiter run].
 
 2.  A single `_` character [can open emphasis] iff
     it is part of a [left-flanking delimiter run]
-    and not part of a [right-flanking delimiter run].
+    and either (a) not part of a [right-flanking delimiter run]
+    or (b) part of a [right-flanking delimeter run]
+    preceded by punctuation.
 
 3.  A single `*` character [can close emphasis](@can-close-emphasis)
     iff it is part of a [right-flanking delimiter run].
 
-4.  A single `_` character [can close emphasis]
-    iff it is part of a [right-flanking delimiter run]
-    and not part of a [left-flanking delimiter run].
+4.  A single `_` character [can close emphasis] iff
+    it is part of a [right-flanking delimiter run]
+    and either (a) not part of a [left-flanking delimiter run]
+    or (b) part of a [left-flanking delimeter run]
+    followed by punctuation.
 
 5.  A double `**` [can open strong emphasis](@can-open-strong-emphasis)
     iff it is part of a [left-flanking delimiter run].
 
-6.  A double `__` [can open strong emphasis]
-    iff it is part of a [left-flanking delimiter run]
-    and not part of a [right-flanking delimiter run].
+6.  A double `__` [can open strong emphasis] iff
+    it is part of a [left-flanking delimiter run]
+    and either (a) not part of a [right-flanking delimiter run]
+    or (b) part of a [right-flanking delimeter run]
+    preceded by punctuation.
 
 7.  A double `**` [can close strong emphasis](@can-close-strong-emphasis)
     iff it is part of a [right-flanking delimiter run].
 
 8.  A double `__` [can close strong emphasis]
-    iff it is part of a [right-flanking delimiter run]
-    and not part of a [left-flanking delimiter run].
+    it is part of a [right-flanking delimiter run]
+    and either (a) not part of a [left-flanking delimiter run]
+    or (b) part of a [left-flanking delimeter run]
+    followed by punctuation.
 
 9.  Emphasis begins with a delimiter that [can open emphasis] and ends
     with a delimiter that [can close emphasis], and that uses the same
@@ -4822,13 +4834,14 @@ aa_"bb"_cc
 <p>aa_&quot;bb&quot;_cc</p>
 .
 
-Here there is no emphasis, because the delimiter runs are
-both left- and right-flanking:
+This is emphasis, even though the opening delimiter is
+both left- and right-flanking, because it is preceded by
+punctuation:
 
 .
-"aa"_"bb"_"cc"
+foo-_(bar)_
 .
-<p>&quot;aa&quot;_&quot;bb&quot;_&quot;cc&quot;</p>
+<p>foo-<em>(bar)</em></p>
 .
 
 Rule 3:
@@ -4939,6 +4952,16 @@ _foo_bar_baz_
 <p><em>foo_bar_baz</em></p>
 .
 
+This is emphasis, even though the closing delimiter is
+both left- and right-flanking, because it is followed by
+punctuation:
+
+.
+_(bar)_.
+.
+<p><em>(bar)</em>.</p>
+.
+
 Rule 5:
 
 .
@@ -5035,6 +5058,17 @@ __foo, __bar__, baz__
 <p><strong>foo, <strong>bar</strong>, baz</strong></p>
 .
 
+This is strong emphasis, even though the opening delimiter is
+both left- and right-flanking, because it is preceded by
+punctuation:
+
+.
+foo-_(bar)_
+.
+<p>foo-<em>(bar)</em></p>
+.
+
+
 Rule 7:
 
 This is not strong emphasis, because the closing delimiter is preceded
@@ -5138,6 +5172,16 @@ __foo__bar__baz__
 <p><strong>foo__bar__baz</strong></p>
 .
 
+This is strong emphasis, even though the closing delimiter is
+both left- and right-flanking, because it is followed by
+punctuation:
+
+.
+_(bar)_.
+.
+<p><em>(bar)</em>.</p>
+.
+
 Rule 9:
 
 Any nonempty sequence of inline elements can be the contents of an
@@ -5706,7 +5750,7 @@ A [link destination](@link-destination) consists of either
   ASCII space or control characters, and includes parentheses
   only if (a) they are backslash-escaped or (b) they are part of
   a balanced pair of unescaped parentheses that is not itself
-  inside a balanced pair of unescaped paretheses.
+  inside a balanced pair of unescaped parentheses.
 
 A [link title](@link-title)  consists of either
 
@@ -5839,8 +5883,8 @@ in Markdown:
 
 URL-escaping should be left alone inside the destination, as all
 URL-escaped characters are also valid URL characters. HTML entities in
-the destination will be parsed into their UTF-8 codepoints, as usual, and
-optionally URL-escaped when written as HTML.
+the destination will be parsed into the corresponding unicode
+codepoints, as usual, and optionally URL-escaped when written as HTML.
 
 .
 [link](foo%20b&auml;)
@@ -7215,10 +7259,10 @@ foo
 ## Soft line breaks
 
 A regular line break (not in a code span or HTML tag) that is not
-preceded by two or more spaces is parsed as a softbreak.  (A
-softbreak may be rendered in HTML either as a
-[line ending] or as a space. The result will be the same
-in browsers. In the examples here, a [line ending] will be used.)
+preceded by two or more spaces or a backslash is parsed as a
+softbreak.  (A softbreak may be rendered in HTML either as a
+[line ending] or as a space. The result will be the same in
+browsers. In the examples here, a [line ending] will be used.)
 
 .
 foo