cmark

My personal build of CMark ✏️

Commit
d16cc15572d97c5360d66332ea56b9c6ec295f7f
Parent
2a9409f587eec1acd7a98cbd5dacc31ac3525812
Author
John MacFarlane <jgm@berkeley.edu>
Date

Updated test/spec.txt.

Diffstat

1 file changed, 56 insertions, 28 deletions

Status File Name N° Changes Insertions Deletions
Modified test/spec.txt 84 56 28
diff --git a/test/spec.txt b/test/spec.txt
@@ -2,7 +2,7 @@
 title: CommonMark Spec
 author: John MacFarlane
 version: 0.21
-date:
+date: 2015-07-14
 license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)'
 ...
 
@@ -204,9 +204,13 @@ In the examples, the `→` character is used to represent tabs.
 Any sequence of [character]s is a valid CommonMark
 document.
 
-A [character](@character) is a unicode code point.
+A [character](@character) is a Unicode code point.  Although some
+code points (for example, combining accents) do not correspond to
+characters in an intuitive sense, all code points count as characters
+for purposes of this spec.
+
 This spec does not specify an encoding; it thinks of lines as composed
-of characters rather than bytes.  A conforming parser may be limited
+of [character]s rather than bytes.  A conforming parser may be limited
 to a certain encoding.
 
 A [line](@line) is a sequence of zero or more [character]s
@@ -227,13 +231,13 @@ form feed (`U+000C`), or carriage return (`U+000D`).
 [Whitespace](@whitespace) is a sequence of one or more [whitespace
 character]s.
 
-A [unicode whitespace character](@unicode-whitespace-character) is
-any code point in the unicode `Zs` class, or a tab (`U+0009`),
+A [Unicode whitespace character](@unicode-whitespace-character) is
+any code point in the Unicode `Zs` class, or a tab (`U+0009`),
 carriage return (`U+000D`), newline (`U+000A`), or form feed
 (`U+000C`).
 
 [Unicode whitespace](@unicode-whitespace) is a sequence of one
-or more [unicode whitespace character]s.
+or more [Unicode whitespace character]s.
 
 A [space](@space) is `U+0020`.
 
@@ -247,7 +251,7 @@ is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
 
 A [punctuation character](@punctuation-character) is an [ASCII
 punctuation character] or anything in
-the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
+the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`.
 
 ## Tabs
 
@@ -1648,7 +1652,7 @@ followed by one of the strings (case-insensitive) `address`,
 `footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`,
 `html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`,
 `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`,
-`section`, `source`, `title`, `summary`, `table`, `tbody`, `td`,
+`section`, `source`, `summary`, `table`, `tbody`, `td`,
 `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
 by [whitespace], the end of the line, the string `>`, or
 the string `/>`.\
@@ -2831,8 +2835,8 @@ foo</p>
 .
 
 Laziness only applies to lines that would have been continuations of
-paragraphs had they been prepended with `>`.  For example, the
-`>` cannot be omitted in the second line of
+paragraphs had they been prepended with [block quote marker]s.
+For example, the `> ` cannot be omitted in the second line of
 
 ``` markdown
 > foo
@@ -2851,7 +2855,7 @@ without changing the meaning:
 <hr />
 .
 
-Similarly, if we omit the `>` in the second line of
+Similarly, if we omit the `> ` in the second line of
 
 ``` markdown
 > - foo
@@ -2874,7 +2878,7 @@ then the block quote ends after the first line:
 </ul>
 .
 
-For the same reason, we can't omit the `>` in front of
+For the same reason, we can't omit the `> ` in front of
 subsequent lines of an indented or fenced code block:
 
 .
@@ -2901,6 +2905,30 @@ foo
 <pre><code></code></pre>
 .
 
+Note that in the following case, we have a paragraph
+continuation line:
+
+.
+> foo
+    - bar
+.
+<blockquote>
+<p>foo
+- bar</p>
+</blockquote>
+.
+
+To see why, note that in
+
+```markdown
+> foo
+>     - bar
+```
+
+the `- bar` is indented too far to start a list, and can't
+be an indented code block because indented code blocks cannot
+interrupt paragraphs, so it is a [paragraph continuation line].
+
 A block quote can be empty:
 
 .
@@ -4849,17 +4877,17 @@ foo
 
 With the goal of making this standard as HTML-agnostic as possible, all
 valid HTML entities (except in code blocks and code spans)
-are recognized as such and converted into unicode characters before
+are recognized as such and converted into Unicode characters before
 they are stored in the AST. This means that renderers to formats other
 than HTML need not be HTML-entity aware.  HTML renderers may either escape
-unicode characters as entities or leave them as they are.  (However,
+Unicode characters as entities or leave them as they are.  (However,
 `"`, `&`, `<`, and `>` must always be rendered as entities.)
 
 [Named entities](@name-entities) consist of `&`
 + any of the valid HTML5 entity names + `;`. The
 [following document](https://html.spec.whatwg.org/multipage/entities.json)
 is used as an authoritative source of the valid entity names and their
-corresponding codepoints.
+corresponding code points.
 
 .
 &nbsp; &amp; &copy; &AElig; &Dcaron;
@@ -4874,9 +4902,9 @@ corresponding codepoints.
 [Decimal entities](@decimal-entities)
 consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these
 entities need to be recognised and transformed into their corresponding
-unicode codepoints. Invalid unicode codepoints will be replaced by
-the "unknown codepoint" character (`U+FFFD`).  For security reasons,
-the codepoint `U+0000` will also be replaced by `U+FFFD`.
+Unicode code points. Invalid Unicode code points will be replaced by
+the "unknown code point" character (`U+FFFD`).  For security reasons,
+the code point `U+0000` will also be replaced by `U+FFFD`.
 
 .
 &#35; &#1234; &#992; &#98765432; &#0;
@@ -4887,7 +4915,7 @@ the codepoint `U+0000` will also be replaced by `U+FFFD`.
 [Hexadecimal entities](@hexadecimal-entities)
 consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
 + `;`. They will also be parsed and turned into the corresponding
-unicode codepoints in the AST.
+Unicode code points in the AST.
 
 .
 &#X22; &#XD06; &#xcab;
@@ -5179,18 +5207,18 @@ followed by a `*` character, or a sequence of one or more `_`
 characters that is not preceded or followed by a `_` character.
 
 A [left-flanking delimiter run](@left-flanking-delimiter-run) is
-a [delimiter run] that is (a) not followed by [unicode whitespace],
+a [delimiter run] that is (a) not followed by [Unicode whitespace],
 and (b) either not followed by a [punctuation character], or
-preceded by [unicode whitespace] or a [punctuation character].
+preceded by [Unicode whitespace] or a [punctuation character].
 For purposes of this definition, the beginning and the end of
-the line count as unicode whitespace.
+the line count as Unicode whitespace.
 
 A [right-flanking delimiter run](@right-flanking-delimiter-run) is
-a [delimiter run] that is (a) not preceded by [unicode whitespace],
+a [delimiter run] that is (a) not preceded by [Unicode whitespace],
 and (b) either not preceded by a [punctuation character], or
-followed by [unicode whitespace] or a [punctuation character].
+followed by [Unicode whitespace] or a [punctuation character].
 For purposes of this definition, the beginning and the end of
-the line count as unicode whitespace.
+the line count as Unicode whitespace.
 
 Here are some examples of delimiter runs.
 
@@ -6511,8 +6539,8 @@ just a backslash:
 
 URL-escaping should be left alone inside the destination, as all
 URL-escaped characters are also valid URL characters. HTML entities in
-the destination will be parsed into the corresponding unicode
-codepoints, as usual, and optionally URL-escaped when written as HTML.
+the destination will be parsed into the corresponding Unicode
+code points, as usual, and optionally URL-escaped when written as HTML.
 
 .
 [link](foo%20b&auml;)
@@ -6721,7 +6749,7 @@ characters inside the square brackets.
 
 One label [matches](@matches)
 another just in case their normalized forms are equal.  To normalize a
-label, perform the *unicode case fold* and collapse consecutive internal
+label, perform the *Unicode case fold* and collapse consecutive internal
 [whitespace] to a single space.  If there are multiple
 matching reference link definitions, the one that comes first in the
 document is used.  (It is desirable in such cases to emit a warning.)