- Commit
- d16cc15572d97c5360d66332ea56b9c6ec295f7f
- Parent
- 2a9409f587eec1acd7a98cbd5dacc31ac3525812
- Author
- John MacFarlane <jgm@berkeley.edu>
- Date
Updated test/spec.txt.
My personal build of CMark ✏️
Updated test/spec.txt.
1 file changed, 56 insertions, 28 deletions
Status | File Name | N° Changes | Insertions | Deletions |
Modified | test/spec.txt | 84 | 56 | 28 |
diff --git a/test/spec.txt b/test/spec.txt @@ -2,7 +2,7 @@ title: CommonMark Spec author: John MacFarlane version: 0.21 -date: +date: 2015-07-14 license: '[CC-BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/)' ... @@ -204,9 +204,13 @@ In the examples, the `→` character is used to represent tabs. Any sequence of [character]s is a valid CommonMark document. -A [character](@character) is a unicode code point. +A [character](@character) is a Unicode code point. Although some +code points (for example, combining accents) do not correspond to +characters in an intuitive sense, all code points count as characters +for purposes of this spec. + This spec does not specify an encoding; it thinks of lines as composed -of characters rather than bytes. A conforming parser may be limited +of [character]s rather than bytes. A conforming parser may be limited to a certain encoding. A [line](@line) is a sequence of zero or more [character]s @@ -227,13 +231,13 @@ form feed (`U+000C`), or carriage return (`U+000D`). [Whitespace](@whitespace) is a sequence of one or more [whitespace character]s. -A [unicode whitespace character](@unicode-whitespace-character) is -any code point in the unicode `Zs` class, or a tab (`U+0009`), +A [Unicode whitespace character](@unicode-whitespace-character) is +any code point in the Unicode `Zs` class, or a tab (`U+0009`), carriage return (`U+000D`), newline (`U+000A`), or form feed (`U+000C`). [Unicode whitespace](@unicode-whitespace) is a sequence of one -or more [unicode whitespace character]s. +or more [Unicode whitespace character]s. A [space](@space) is `U+0020`. @@ -247,7 +251,7 @@ is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, A [punctuation character](@punctuation-character) is an [ASCII punctuation character] or anything in -the unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. +the Unicode classes `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. ## Tabs @@ -1648,7 +1652,7 @@ followed by one of the strings (case-insensitive) `address`, `footer`, `form`, `frame`, `frameset`, `h1`, `head`, `header`, `hr`, `html`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`, `meta`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `pre`, -`section`, `source`, `title`, `summary`, `table`, `tbody`, `td`, +`section`, `source`, `summary`, `table`, `tbody`, `td`, `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed by [whitespace], the end of the line, the string `>`, or the string `/>`.\ @@ -2831,8 +2835,8 @@ foo</p> . Laziness only applies to lines that would have been continuations of -paragraphs had they been prepended with `>`. For example, the -`>` cannot be omitted in the second line of +paragraphs had they been prepended with [block quote marker]s. +For example, the `> ` cannot be omitted in the second line of ``` markdown > foo @@ -2851,7 +2855,7 @@ without changing the meaning: <hr /> . -Similarly, if we omit the `>` in the second line of +Similarly, if we omit the `> ` in the second line of ``` markdown > - foo @@ -2874,7 +2878,7 @@ then the block quote ends after the first line: </ul> . -For the same reason, we can't omit the `>` in front of +For the same reason, we can't omit the `> ` in front of subsequent lines of an indented or fenced code block: . @@ -2901,6 +2905,30 @@ foo <pre><code></code></pre> . +Note that in the following case, we have a paragraph +continuation line: + +. +> foo + - bar +. +<blockquote> +<p>foo +- bar</p> +</blockquote> +. + +To see why, note that in + +```markdown +> foo +> - bar +``` + +the `- bar` is indented too far to start a list, and can't +be an indented code block because indented code blocks cannot +interrupt paragraphs, so it is a [paragraph continuation line]. + A block quote can be empty: . @@ -4849,17 +4877,17 @@ foo With the goal of making this standard as HTML-agnostic as possible, all valid HTML entities (except in code blocks and code spans) -are recognized as such and converted into unicode characters before +are recognized as such and converted into Unicode characters before they are stored in the AST. This means that renderers to formats other than HTML need not be HTML-entity aware. HTML renderers may either escape -unicode characters as entities or leave them as they are. (However, +Unicode characters as entities or leave them as they are. (However, `"`, `&`, `<`, and `>` must always be rendered as entities.) [Named entities](@name-entities) consist of `&` + any of the valid HTML5 entity names + `;`. The [following document](https://html.spec.whatwg.org/multipage/entities.json) is used as an authoritative source of the valid entity names and their -corresponding codepoints. +corresponding code points. . & © Æ Ď @@ -4874,9 +4902,9 @@ corresponding codepoints. [Decimal entities](@decimal-entities) consist of `&#` + a string of 1--8 arabic digits + `;`. Again, these entities need to be recognised and transformed into their corresponding -unicode codepoints. Invalid unicode codepoints will be replaced by -the "unknown codepoint" character (`U+FFFD`). For security reasons, -the codepoint `U+0000` will also be replaced by `U+FFFD`. +Unicode code points. Invalid Unicode code points will be replaced by +the "unknown code point" character (`U+FFFD`). For security reasons, +the code point `U+0000` will also be replaced by `U+FFFD`. . # Ӓ Ϡ � � @@ -4887,7 +4915,7 @@ the codepoint `U+0000` will also be replaced by `U+FFFD`. [Hexadecimal entities](@hexadecimal-entities) consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits + `;`. They will also be parsed and turned into the corresponding -unicode codepoints in the AST. +Unicode code points in the AST. . " ആ ಫ @@ -5179,18 +5207,18 @@ followed by a `*` character, or a sequence of one or more `_` characters that is not preceded or followed by a `_` character. A [left-flanking delimiter run](@left-flanking-delimiter-run) is -a [delimiter run] that is (a) not followed by [unicode whitespace], +a [delimiter run] that is (a) not followed by [Unicode whitespace], and (b) either not followed by a [punctuation character], or -preceded by [unicode whitespace] or a [punctuation character]. +preceded by [Unicode whitespace] or a [punctuation character]. For purposes of this definition, the beginning and the end of -the line count as unicode whitespace. +the line count as Unicode whitespace. A [right-flanking delimiter run](@right-flanking-delimiter-run) is -a [delimiter run] that is (a) not preceded by [unicode whitespace], +a [delimiter run] that is (a) not preceded by [Unicode whitespace], and (b) either not preceded by a [punctuation character], or -followed by [unicode whitespace] or a [punctuation character]. +followed by [Unicode whitespace] or a [punctuation character]. For purposes of this definition, the beginning and the end of -the line count as unicode whitespace. +the line count as Unicode whitespace. Here are some examples of delimiter runs. @@ -6511,8 +6539,8 @@ just a backslash: URL-escaping should be left alone inside the destination, as all URL-escaped characters are also valid URL characters. HTML entities in -the destination will be parsed into the corresponding unicode -codepoints, as usual, and optionally URL-escaped when written as HTML. +the destination will be parsed into the corresponding Unicode +code points, as usual, and optionally URL-escaped when written as HTML. . [link](foo%20bä) @@ -6721,7 +6749,7 @@ characters inside the square brackets. One label [matches](@matches) another just in case their normalized forms are equal. To normalize a -label, perform the *unicode case fold* and collapse consecutive internal +label, perform the *Unicode case fold* and collapse consecutive internal [whitespace] to a single space. If there are multiple matching reference link definitions, the one that comes first in the document is used. (It is desirable in such cases to emit a warning.)