diff --git a/test/spec.txt b/test/spec.txt
@@ -326,6 +326,9 @@ A [space](@) is `U+0020`.
A [non-whitespace character](@) is any character
that is not a [whitespace character].
+An [ASCII control character](@) is a character between `U+0000–1F` (both
+including) or `U+007F`.
+
An [ASCII punctuation character](@)
is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`,
`*`, `+`, `,`, `-`, `.`, `/` (U+0021–2F),
@@ -478,3903 +481,3653 @@ bar
For security reasons, the Unicode character `U+0000` must be replaced
with the REPLACEMENT CHARACTER (`U+FFFD`).
-# Blocks and inlines
-
-We can think of a document as a sequence of
-[blocks](@)---structural elements like paragraphs, block
-quotations, lists, headings, rules, and code blocks. Some blocks (like
-block quotes and list items) contain other blocks; others (like
-headings and paragraphs) contain [inline](@) content---text,
-links, emphasized text, images, code spans, and so on.
-## Precedence
+## Backslash escapes
-Indicators of block structure always take precedence over indicators
-of inline structure. So, for example, the following is a list with
-two items, not a list with one item containing a code span:
+Any ASCII punctuation character may be backslash-escaped:
```````````````````````````````` example
-- `one
-- two`
+\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~
.
-<ul>
-<li>`one</li>
-<li>two`</li>
-</ul>
+<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p>
````````````````````````````````
-This means that parsing can proceed in two steps: first, the block
-structure of the document can be discerned; second, text lines inside
-paragraphs, headings, and other block constructs can be parsed for inline
-structure. The second step requires information about link reference
-definitions that will be available only at the end of the first
-step. Note that the first step requires processing lines in sequence,
-but the second can be parallelized, since the inline parsing of
-one block element does not affect the inline parsing of any other.
-
-## Container blocks and leaf blocks
-
-We can divide blocks into two types:
-[container blocks](@),
-which can contain other blocks, and [leaf blocks](@),
-which cannot.
-
-# Leaf blocks
+Backslashes before other characters are treated as literal
+backslashes:
-This section describes the different kinds of leaf block that make up a
-Markdown document.
+```````````````````````````````` example
+\→\A\a\ \3\φ\«
+.
+<p>\→\A\a\ \3\φ\«</p>
+````````````````````````````````
-## Thematic breaks
-A line consisting of 0-3 spaces of indentation, followed by a sequence
-of three or more matching `-`, `_`, or `*` characters, each followed
-optionally by any number of spaces or tabs, forms a
-[thematic break](@).
+Escaped characters are treated as regular characters and do
+not have their usual Markdown meanings:
```````````````````````````````` example
-***
----
-___
+\*not emphasized*
+\<br/> not a tag
+\[not a link](/foo)
+\`not code`
+1\. not a list
+\* not a list
+\# not a heading
+\[foo]: /url "not a reference"
+\ö not a character entity
.
-<hr />
-<hr />
-<hr />
+<p>*not emphasized*
+<br/> not a tag
+[not a link](/foo)
+`not code`
+1. not a list
+* not a list
+# not a heading
+[foo]: /url "not a reference"
+&ouml; not a character entity</p>
````````````````````````````````
-Wrong characters:
+If a backslash is itself escaped, the following character is not:
```````````````````````````````` example
-+++
+\\*emphasis*
.
-<p>+++</p>
+<p>\<em>emphasis</em></p>
````````````````````````````````
+A backslash at the end of the line is a [hard line break]:
+
```````````````````````````````` example
-===
+foo\
+bar
.
-<p>===</p>
+<p>foo<br />
+bar</p>
````````````````````````````````
-Not enough characters:
+Backslash escapes do not work in code blocks, code spans, autolinks, or
+raw HTML:
```````````````````````````````` example
---
-**
-__
+`` \[\` ``
.
-<p>--
-**
-__</p>
+<p><code>\[\`</code></p>
````````````````````````````````
-One to three spaces indent are allowed:
-
```````````````````````````````` example
- ***
- ***
- ***
+ \[\]
.
-<hr />
-<hr />
-<hr />
+<pre><code>\[\]
+</code></pre>
````````````````````````````````
-Four spaces is too many:
-
```````````````````````````````` example
- ***
+~~~
+\[\]
+~~~
.
-<pre><code>***
+<pre><code>\[\]
</code></pre>
````````````````````````````````
```````````````````````````````` example
-Foo
- ***
+<http://example.com?find=\*>
.
-<p>Foo
-***</p>
+<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
````````````````````````````````
-More than three characters may be used:
-
```````````````````````````````` example
-_____________________________________
+<a href="/bar\/)">
.
-<hr />
+<a href="/bar\/)">
````````````````````````````````
-Spaces are allowed between the characters:
+But they work in all other contexts, including URLs and link titles,
+link references, and [info strings] in [fenced code blocks]:
```````````````````````````````` example
- - - -
+[foo](/bar\* "ti\*tle")
.
-<hr />
+<p><a href="/bar*" title="ti*tle">foo</a></p>
````````````````````````````````
```````````````````````````````` example
- ** * ** * ** * **
+[foo]
+
+[foo]: /bar\* "ti\*tle"
.
-<hr />
+<p><a href="/bar*" title="ti*tle">foo</a></p>
````````````````````````````````
```````````````````````````````` example
-- - - -
+``` foo\+bar
+foo
+```
.
-<hr />
+<pre><code class="language-foo+bar">foo
+</code></pre>
````````````````````````````````
-Spaces are allowed at the end:
+## Entity and numeric character references
-```````````````````````````````` example
-- - - -
-.
-<hr />
-````````````````````````````````
+Valid HTML entity references and numeric character references
+can be used in place of the corresponding Unicode character,
+with the following exceptions:
+- Entity and character references are not recognized in code
+ blocks and code spans.
-However, no other characters may occur in the line:
+- Entity and character references cannot stand in place of
+ special characters that define structural elements in
+ CommonMark. For example, although `*` can be used
+ in place of a literal `*` character, `*` cannot replace
+ `*` in emphasis delimiters, bullet list markers, or thematic
+ breaks.
-```````````````````````````````` example
-_ _ _ _ a
+Conforming CommonMark parsers need not store information about
+whether a particular character was represented in the source
+using a Unicode character or an entity reference.
-a------
+[Entity references](@) consist of `&` + any of the valid
+HTML5 entity names + `;`. The
+document <https://html.spec.whatwg.org/entities.json>
+is used as an authoritative source for the valid entity
+references and their corresponding code points.
----a---
+```````````````````````````````` example
+ & © Æ Ď
+¾ ℋ ⅆ
+∲ ≧̸
.
-<p>_ _ _ _ a</p>
-<p>a------</p>
-<p>---a---</p>
+<p> & © Æ Ď
+¾ ℋ ⅆ
+∲ ≧̸</p>
````````````````````````````````
-It is required that all of the [non-whitespace characters] be the same.
-So, this is not a thematic break:
+[Decimal numeric character
+references](@)
+consist of `&#` + a string of 1--7 arabic digits + `;`. A
+numeric character reference is parsed as the corresponding
+Unicode character. Invalid Unicode code points will be replaced by
+the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons,
+the code point `U+0000` will also be replaced by `U+FFFD`.
```````````````````````````````` example
- *-*
+# Ӓ Ϡ �
.
-<p><em>-</em></p>
+<p># Ӓ Ϡ �</p>
````````````````````````````````
-Thematic breaks do not need blank lines before or after:
+[Hexadecimal numeric character
+references](@) consist of `&#` +
+either `X` or `x` + a string of 1-6 hexadecimal digits + `;`.
+They too are parsed as the corresponding Unicode character (this
+time specified with a hexadecimal numeral instead of decimal).
```````````````````````````````` example
-- foo
-***
-- bar
+" ആ ಫ
.
-<ul>
-<li>foo</li>
-</ul>
-<hr />
-<ul>
-<li>bar</li>
-</ul>
+<p>" ആ ಫ</p>
````````````````````````````````
-Thematic breaks can interrupt a paragraph:
+Here are some nonentities:
```````````````````````````````` example
-Foo
-***
-bar
+  &x; &#; &#x;
+�
+&#abcdef0;
+&ThisIsNotDefined; &hi?;
.
-<p>Foo</p>
-<hr />
-<p>bar</p>
+<p>&nbsp &x; &#; &#x;
+&#87654321;
+&#abcdef0;
+&ThisIsNotDefined; &hi?;</p>
````````````````````````````````
-If a line of dashes that meets the above conditions for being a
-thematic break could also be interpreted as the underline of a [setext
-heading], the interpretation as a
-[setext heading] takes precedence. Thus, for example,
-this is a setext heading, not a paragraph followed by a thematic break:
+Although HTML5 does accept some entity references
+without a trailing semicolon (such as `©`), these are not
+recognized here, because it makes the grammar too ambiguous:
```````````````````````````````` example
-Foo
----
-bar
+©
.
-<h2>Foo</h2>
-<p>bar</p>
+<p>&copy</p>
````````````````````````````````
-When both a thematic break and a list item are possible
-interpretations of a line, the thematic break takes precedence:
+Strings that are not on the list of HTML5 named entities are not
+recognized as entity references either:
```````````````````````````````` example
-* Foo
-* * *
-* Bar
+&MadeUpEntity;
.
-<ul>
-<li>Foo</li>
-</ul>
-<hr />
-<ul>
-<li>Bar</li>
-</ul>
+<p>&MadeUpEntity;</p>
````````````````````````````````
-If you want a thematic break in a list item, use a different bullet:
+Entity and numeric character references are recognized in any
+context besides code spans or code blocks, including
+URLs, [link titles], and [fenced code block][] [info strings]:
```````````````````````````````` example
-- Foo
-- * * *
+<a href="öö.html">
.
-<ul>
-<li>Foo</li>
-<li>
-<hr />
-</li>
-</ul>
+<a href="öö.html">
````````````````````````````````
-## ATX headings
-
-An [ATX heading](@)
-consists of a string of characters, parsed as inline content, between an
-opening sequence of 1--6 unescaped `#` characters and an optional
-closing sequence of any number of unescaped `#` characters.
-The opening sequence of `#` characters must be followed by a
-[space] or by the end of line. The optional closing sequence of `#`s must be
-preceded by a [space] and may be followed by spaces only. The opening
-`#` character may be indented 0-3 spaces. The raw contents of the
-heading are stripped of leading and trailing spaces before being parsed
-as inline content. The heading level is equal to the number of `#`
-characters in the opening sequence.
-
-Simple headings:
-
```````````````````````````````` example
-# foo
-## foo
-### foo
-#### foo
-##### foo
-###### foo
+[foo](/föö "föö")
.
-<h1>foo</h1>
-<h2>foo</h2>
-<h3>foo</h3>
-<h4>foo</h4>
-<h5>foo</h5>
-<h6>foo</h6>
+<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
````````````````````````````````
-More than six `#` characters is not a heading:
-
```````````````````````````````` example
-####### foo
+[foo]
+
+[foo]: /föö "föö"
.
-<p>####### foo</p>
+<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
````````````````````````````````
-At least one space is required between the `#` characters and the
-heading's contents, unless the heading is empty. Note that many
-implementations currently do not require the space. However, the
-space was required by the
-[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
-and it helps prevent things like the following from being parsed as
-headings:
-
```````````````````````````````` example
-#5 bolt
-
-#hashtag
+``` föö
+foo
+```
.
-<p>#5 bolt</p>
-<p>#hashtag</p>
+<pre><code class="language-föö">foo
+</code></pre>
````````````````````````````````
-This is not a heading, because the first `#` is escaped:
+Entity and numeric character references are treated as literal
+text in code spans and code blocks:
```````````````````````````````` example
-\## foo
+`föö`
.
-<p>## foo</p>
+<p><code>f&ouml;&ouml;</code></p>
````````````````````````````````
-Contents are parsed as inlines:
-
```````````````````````````````` example
-# foo *bar* \*baz\*
+ föfö
.
-<h1>foo <em>bar</em> *baz*</h1>
+<pre><code>f&ouml;f&ouml;
+</code></pre>
````````````````````````````````
-Leading and trailing [whitespace] is ignored in parsing inline content:
+Entity and numeric character references cannot be used
+in place of symbols indicating structure in CommonMark
+documents.
```````````````````````````````` example
-# foo
+*foo*
+*foo*
.
-<h1>foo</h1>
+<p>*foo*
+<em>foo</em></p>
````````````````````````````````
-
-One to three spaces indentation are allowed:
-
```````````````````````````````` example
- ### foo
- ## foo
- # foo
+* foo
+
+* foo
.
-<h3>foo</h3>
-<h2>foo</h2>
-<h1>foo</h1>
+<p>* foo</p>
+<ul>
+<li>foo</li>
+</ul>
````````````````````````````````
+```````````````````````````````` example
+foo bar
+.
+<p>foo
-Four spaces are too much:
+bar</p>
+````````````````````````````````
```````````````````````````````` example
- # foo
+	foo
.
-<pre><code># foo
-</code></pre>
+<p>→foo</p>
````````````````````````````````
```````````````````````````````` example
-foo
- # bar
+[a](url "tit")
.
-<p>foo
-# bar</p>
+<p>[a](url "tit")</p>
````````````````````````````````
-A closing sequence of `#` characters is optional:
-```````````````````````````````` example
-## foo ##
- ### bar ###
-.
-<h2>foo</h2>
-<h3>bar</h3>
-````````````````````````````````
+# Blocks and inlines
+We can think of a document as a sequence of
+[blocks](@)---structural elements like paragraphs, block
+quotations, lists, headings, rules, and code blocks. Some blocks (like
+block quotes and list items) contain other blocks; others (like
+headings and paragraphs) contain [inline](@) content---text,
+links, emphasized text, images, code spans, and so on.
-It need not be the same length as the opening sequence:
+## Precedence
+
+Indicators of block structure always take precedence over indicators
+of inline structure. So, for example, the following is a list with
+two items, not a list with one item containing a code span:
```````````````````````````````` example
-# foo ##################################
-##### foo ##
+- `one
+- two`
.
-<h1>foo</h1>
-<h5>foo</h5>
+<ul>
+<li>`one</li>
+<li>two`</li>
+</ul>
````````````````````````````````
-Spaces are allowed after the closing sequence:
+This means that parsing can proceed in two steps: first, the block
+structure of the document can be discerned; second, text lines inside
+paragraphs, headings, and other block constructs can be parsed for inline
+structure. The second step requires information about link reference
+definitions that will be available only at the end of the first
+step. Note that the first step requires processing lines in sequence,
+but the second can be parallelized, since the inline parsing of
+one block element does not affect the inline parsing of any other.
+
+## Container blocks and leaf blocks
+
+We can divide blocks into two types:
+[container blocks](@),
+which can contain other blocks, and [leaf blocks](@),
+which cannot.
+
+# Leaf blocks
+
+This section describes the different kinds of leaf block that make up a
+Markdown document.
+
+## Thematic breaks
+
+A line consisting of 0-3 spaces of indentation, followed by a sequence
+of three or more matching `-`, `_`, or `*` characters, each followed
+optionally by any number of spaces or tabs, forms a
+[thematic break](@).
```````````````````````````````` example
-### foo ###
+***
+---
+___
.
-<h3>foo</h3>
+<hr />
+<hr />
+<hr />
````````````````````````````````
-A sequence of `#` characters with anything but [spaces] following it
-is not a closing sequence, but counts as part of the contents of the
-heading:
+Wrong characters:
```````````````````````````````` example
-### foo ### b
++++
.
-<h3>foo ### b</h3>
+<p>+++</p>
````````````````````````````````
-The closing sequence must be preceded by a space:
-
```````````````````````````````` example
-# foo#
+===
.
-<h1>foo#</h1>
+<p>===</p>
````````````````````````````````
-Backslash-escaped `#` characters do not count as part
-of the closing sequence:
+Not enough characters:
```````````````````````````````` example
-### foo \###
-## foo #\##
-# foo \#
+--
+**
+__
.
-<h3>foo ###</h3>
-<h2>foo ###</h2>
-<h1>foo #</h1>
+<p>--
+**
+__</p>
````````````````````````````````
-ATX headings need not be separated from surrounding content by blank
-lines, and they can interrupt paragraphs:
+One to three spaces indent are allowed:
```````````````````````````````` example
-****
-## foo
-****
+ ***
+ ***
+ ***
.
<hr />
-<h2>foo</h2>
+<hr />
<hr />
````````````````````````````````
+Four spaces is too many:
+
```````````````````````````````` example
-Foo bar
-# baz
-Bar foo
+ ***
.
-<p>Foo bar</p>
-<h1>baz</h1>
-<p>Bar foo</p>
+<pre><code>***
+</code></pre>
````````````````````````````````
-ATX headings can be empty:
-
```````````````````````````````` example
-##
-#
-### ###
+Foo
+ ***
.
-<h2></h2>
-<h1></h1>
-<h3></h3>
+<p>Foo
+***</p>
````````````````````````````````
-## Setext headings
-
-A [setext heading](@) consists of one or more
-lines of text, each containing at least one [non-whitespace
-character], with no more than 3 spaces indentation, followed by
-a [setext heading underline]. The lines of text must be such
-that, were they not followed by the setext heading underline,
-they would be interpreted as a paragraph: they cannot be
-interpretable as a [code fence], [ATX heading][ATX headings],
-[block quote][block quotes], [thematic break][thematic breaks],
-[list item][list items], or [HTML block][HTML blocks].
-
-A [setext heading underline](@) is a sequence of
-`=` characters or a sequence of `-` characters, with no more than 3
-spaces indentation and any number of trailing spaces. If a line
-containing a single `-` can be interpreted as an
-empty [list items], it should be interpreted this way
-and not as a [setext heading underline].
-
-The heading is a level 1 heading if `=` characters are used in
-the [setext heading underline], and a level 2 heading if `-`
-characters are used. The contents of the heading are the result
-of parsing the preceding lines of text as CommonMark inline
-content.
-
-In general, a setext heading need not be preceded or followed by a
-blank line. However, it cannot interrupt a paragraph, so when a
-setext heading comes after a paragraph, a blank line is needed between
-them.
-
-Simple examples:
+More than three characters may be used:
```````````````````````````````` example
-Foo *bar*
-=========
-
-Foo *bar*
----------
+_____________________________________
.
-<h1>Foo <em>bar</em></h1>
-<h2>Foo <em>bar</em></h2>
+<hr />
````````````````````````````````
-The content of the header may span more than one line:
+Spaces are allowed between the characters:
```````````````````````````````` example
-Foo *bar
-baz*
-====
+ - - -
.
-<h1>Foo <em>bar
-baz</em></h1>
+<hr />
````````````````````````````````
-The contents are the result of parsing the headings's raw
-content as inlines. The heading's raw content is formed by
-concatenating the lines and removing initial and final
-[whitespace].
```````````````````````````````` example
- Foo *bar
-baz*→
-====
+ ** * ** * ** * **
.
-<h1>Foo <em>bar
-baz</em></h1>
+<hr />
````````````````````````````````
-The underlining can be any length:
-
```````````````````````````````` example
-Foo
--------------------------
-
-Foo
-=
+- - - -
.
-<h2>Foo</h2>
-<h1>Foo</h1>
+<hr />
````````````````````````````````
-The heading content can be indented up to three spaces, and need
-not line up with the underlining:
+Spaces are allowed at the end:
```````````````````````````````` example
- Foo
----
-
- Foo
------
-
- Foo
- ===
+- - - -
.
-<h2>Foo</h2>
-<h2>Foo</h2>
-<h1>Foo</h1>
+<hr />
````````````````````````````````
-Four spaces indent is too much:
+However, no other characters may occur in the line:
```````````````````````````````` example
- Foo
- ---
+_ _ _ _ a
- Foo
----
-.
-<pre><code>Foo
----
+a------
-Foo
-</code></pre>
-<hr />
+---a---
+.
+<p>_ _ _ _ a</p>
+<p>a------</p>
+<p>---a---</p>
````````````````````````````````
-The setext heading underline can be indented up to three spaces, and
-may have trailing spaces:
+It is required that all of the [non-whitespace characters] be the same.
+So, this is not a thematic break:
```````````````````````````````` example
-Foo
- ----
+ *-*
.
-<h2>Foo</h2>
+<p><em>-</em></p>
````````````````````````````````
-Four spaces is too much:
+Thematic breaks do not need blank lines before or after:
```````````````````````````````` example
-Foo
- ---
+- foo
+***
+- bar
.
-<p>Foo
----</p>
+<ul>
+<li>foo</li>
+</ul>
+<hr />
+<ul>
+<li>bar</li>
+</ul>
````````````````````````````````
-The setext heading underline cannot contain internal spaces:
+Thematic breaks can interrupt a paragraph:
```````````````````````````````` example
Foo
-= =
-
-Foo
---- -
+***
+bar
.
-<p>Foo
-= =</p>
<p>Foo</p>
<hr />
+<p>bar</p>
````````````````````````````````
-Trailing spaces in the content line do not cause a line break:
+If a line of dashes that meets the above conditions for being a
+thematic break could also be interpreted as the underline of a [setext
+heading], the interpretation as a
+[setext heading] takes precedence. Thus, for example,
+this is a setext heading, not a paragraph followed by a thematic break:
```````````````````````````````` example
-Foo
------
+Foo
+---
+bar
.
<h2>Foo</h2>
+<p>bar</p>
````````````````````````````````
-Nor does a backslash at the end:
+When both a thematic break and a list item are possible
+interpretations of a line, the thematic break takes precedence:
```````````````````````````````` example
-Foo\
-----
+* Foo
+* * *
+* Bar
.
-<h2>Foo\</h2>
+<ul>
+<li>Foo</li>
+</ul>
+<hr />
+<ul>
+<li>Bar</li>
+</ul>
````````````````````````````````
-Since indicators of block structure take precedence over
-indicators of inline structure, the following are setext headings:
+If you want a thematic break in a list item, use a different bullet:
```````````````````````````````` example
-`Foo
-----
-`
-
-<a title="a lot
----
-of dashes"/>
+- Foo
+- * * *
.
-<h2>`Foo</h2>
-<p>`</p>
-<h2><a title="a lot</h2>
-<p>of dashes"/></p>
+<ul>
+<li>Foo</li>
+<li>
+<hr />
+</li>
+</ul>
````````````````````````````````
-The setext heading underline cannot be a [lazy continuation
-line] in a list item or block quote:
+## ATX headings
-```````````````````````````````` example
-> Foo
----
-.
-<blockquote>
-<p>Foo</p>
-</blockquote>
-<hr />
-````````````````````````````````
+An [ATX heading](@)
+consists of a string of characters, parsed as inline content, between an
+opening sequence of 1--6 unescaped `#` characters and an optional
+closing sequence of any number of unescaped `#` characters.
+The opening sequence of `#` characters must be followed by a
+[space] or by the end of line. The optional closing sequence of `#`s must be
+preceded by a [space] and may be followed by spaces only. The opening
+`#` character may be indented 0-3 spaces. The raw contents of the
+heading are stripped of leading and trailing spaces before being parsed
+as inline content. The heading level is equal to the number of `#`
+characters in the opening sequence.
+Simple headings:
```````````````````````````````` example
-> foo
-bar
-===
+# foo
+## foo
+### foo
+#### foo
+##### foo
+###### foo
.
-<blockquote>
-<p>foo
-bar
-===</p>
-</blockquote>
+<h1>foo</h1>
+<h2>foo</h2>
+<h3>foo</h3>
+<h4>foo</h4>
+<h5>foo</h5>
+<h6>foo</h6>
````````````````````````````````
-```````````````````````````````` example
-- Foo
----
+More than six `#` characters is not a heading:
+
+```````````````````````````````` example
+####### foo
.
-<ul>
-<li>Foo</li>
-</ul>
-<hr />
+<p>####### foo</p>
````````````````````````````````
-A blank line is needed between a paragraph and a following
-setext heading, since otherwise the paragraph becomes part
-of the heading's content:
+At least one space is required between the `#` characters and the
+heading's contents, unless the heading is empty. Note that many
+implementations currently do not require the space. However, the
+space was required by the
+[original ATX implementation](http://www.aaronsw.com/2002/atx/atx.py),
+and it helps prevent things like the following from being parsed as
+headings:
```````````````````````````````` example
-Foo
-Bar
----
+#5 bolt
+
+#hashtag
.
-<h2>Foo
-Bar</h2>
+<p>#5 bolt</p>
+<p>#hashtag</p>
````````````````````````````````
-But in general a blank line is not required before or after
-setext headings:
+This is not a heading, because the first `#` is escaped:
```````````````````````````````` example
----
-Foo
----
-Bar
----
-Baz
+\## foo
.
-<hr />
-<h2>Foo</h2>
-<h2>Bar</h2>
-<p>Baz</p>
+<p>## foo</p>
````````````````````````````````
-Setext headings cannot be empty:
+Contents are parsed as inlines:
```````````````````````````````` example
-
-====
+# foo *bar* \*baz\*
.
-<p>====</p>
+<h1>foo <em>bar</em> *baz*</h1>
````````````````````````````````
-Setext heading text lines must not be interpretable as block
-constructs other than paragraphs. So, the line of dashes
-in these examples gets interpreted as a thematic break:
+Leading and trailing [whitespace] is ignored in parsing inline content:
```````````````````````````````` example
----
----
+# foo
.
-<hr />
-<hr />
+<h1>foo</h1>
````````````````````````````````
+One to three spaces indentation are allowed:
+
```````````````````````````````` example
-- foo
------
+ ### foo
+ ## foo
+ # foo
.
-<ul>
-<li>foo</li>
-</ul>
-<hr />
+<h3>foo</h3>
+<h2>foo</h2>
+<h1>foo</h1>
````````````````````````````````
+Four spaces are too much:
+
```````````````````````````````` example
- foo
----
+ # foo
.
-<pre><code>foo
+<pre><code># foo
</code></pre>
-<hr />
````````````````````````````````
```````````````````````````````` example
-> foo
------
+foo
+ # bar
.
-<blockquote>
-<p>foo</p>
-</blockquote>
-<hr />
+<p>foo
+# bar</p>
````````````````````````````````
-If you want a heading with `> foo` as its literal text, you can
-use backslash escapes:
+A closing sequence of `#` characters is optional:
```````````````````````````````` example
-\> foo
-------
+## foo ##
+ ### bar ###
.
-<h2>> foo</h2>
+<h2>foo</h2>
+<h3>bar</h3>
````````````````````````````````
-**Compatibility note:** Most existing Markdown implementations
-do not allow the text of setext headings to span multiple lines.
-But there is no consensus about how to interpret
-
-``` markdown
-Foo
-bar
----
-baz
-```
+It need not be the same length as the opening sequence:
-One can find four different interpretations:
+```````````````````````````````` example
+# foo ##################################
+##### foo ##
+.
+<h1>foo</h1>
+<h5>foo</h5>
+````````````````````````````````
-1. paragraph "Foo", heading "bar", paragraph "baz"
-2. paragraph "Foo bar", thematic break, paragraph "baz"
-3. paragraph "Foo bar --- baz"
-4. heading "Foo bar", paragraph "baz"
-We find interpretation 4 most natural, and interpretation 4
-increases the expressive power of CommonMark, by allowing
-multiline headings. Authors who want interpretation 1 can
-put a blank line after the first paragraph:
+Spaces are allowed after the closing sequence:
```````````````````````````````` example
-Foo
-
-bar
----
-baz
+### foo ###
.
-<p>Foo</p>
-<h2>bar</h2>
-<p>baz</p>
+<h3>foo</h3>
````````````````````````````````
-Authors who want interpretation 2 can put blank lines around
-the thematic break,
+A sequence of `#` characters with anything but [spaces] following it
+is not a closing sequence, but counts as part of the contents of the
+heading:
```````````````````````````````` example
-Foo
-bar
-
----
-
-baz
+### foo ### b
.
-<p>Foo
-bar</p>
-<hr />
-<p>baz</p>
+<h3>foo ### b</h3>
````````````````````````````````
-or use a thematic break that cannot count as a [setext heading
-underline], such as
+The closing sequence must be preceded by a space:
```````````````````````````````` example
-Foo
-bar
-* * *
-baz
+# foo#
.
-<p>Foo
-bar</p>
-<hr />
-<p>baz</p>
+<h1>foo#</h1>
````````````````````````````````
-Authors who want interpretation 3 can use backslash escapes:
+Backslash-escaped `#` characters do not count as part
+of the closing sequence:
```````````````````````````````` example
-Foo
-bar
-\---
-baz
+### foo \###
+## foo #\##
+# foo \#
.
-<p>Foo
-bar
----
-baz</p>
+<h3>foo ###</h3>
+<h2>foo ###</h2>
+<h1>foo #</h1>
````````````````````````````````
-## Indented code blocks
-
-An [indented code block](@) is composed of one or more
-[indented chunks] separated by blank lines.
-An [indented chunk](@) is a sequence of non-blank lines,
-each indented four or more spaces. The contents of the code block are
-the literal contents of the lines, including trailing
-[line endings], minus four spaces of indentation.
-An indented code block has no [info string].
-
-An indented code block cannot interrupt a paragraph, so there must be
-a blank line between a paragraph and a following indented code block.
-(A blank line is not needed, however, between a code block and a following
-paragraph.)
+ATX headings need not be separated from surrounding content by blank
+lines, and they can interrupt paragraphs:
```````````````````````````````` example
- a simple
- indented code block
+****
+## foo
+****
.
-<pre><code>a simple
- indented code block
-</code></pre>
+<hr />
+<h2>foo</h2>
+<hr />
````````````````````````````````
-If there is any ambiguity between an interpretation of indentation
-as a code block and as indicating that material belongs to a [list
-item][list items], the list item interpretation takes precedence:
-
```````````````````````````````` example
- - foo
-
- bar
+Foo bar
+# baz
+Bar foo
.
-<ul>
-<li>
-<p>foo</p>
-<p>bar</p>
-</li>
-</ul>
+<p>Foo bar</p>
+<h1>baz</h1>
+<p>Bar foo</p>
````````````````````````````````
-```````````````````````````````` example
-1. foo
+ATX headings can be empty:
- - bar
+```````````````````````````````` example
+##
+#
+### ###
.
-<ol>
-<li>
-<p>foo</p>
-<ul>
-<li>bar</li>
-</ul>
-</li>
-</ol>
+<h2></h2>
+<h1></h1>
+<h3></h3>
````````````````````````````````
+## Setext headings
-The contents of a code block are literal text, and do not get parsed
-as Markdown:
-
-```````````````````````````````` example
- <a/>
- *hi*
+A [setext heading](@) consists of one or more
+lines of text, each containing at least one [non-whitespace
+character], with no more than 3 spaces indentation, followed by
+a [setext heading underline]. The lines of text must be such
+that, were they not followed by the setext heading underline,
+they would be interpreted as a paragraph: they cannot be
+interpretable as a [code fence], [ATX heading][ATX headings],
+[block quote][block quotes], [thematic break][thematic breaks],
+[list item][list items], or [HTML block][HTML blocks].
- - one
-.
-<pre><code><a/>
-*hi*
+A [setext heading underline](@) is a sequence of
+`=` characters or a sequence of `-` characters, with no more than 3
+spaces indentation and any number of trailing spaces. If a line
+containing a single `-` can be interpreted as an
+empty [list items], it should be interpreted this way
+and not as a [setext heading underline].
-- one
-</code></pre>
-````````````````````````````````
+The heading is a level 1 heading if `=` characters are used in
+the [setext heading underline], and a level 2 heading if `-`
+characters are used. The contents of the heading are the result
+of parsing the preceding lines of text as CommonMark inline
+content.
+In general, a setext heading need not be preceded or followed by a
+blank line. However, it cannot interrupt a paragraph, so when a
+setext heading comes after a paragraph, a blank line is needed between
+them.
-Here we have three chunks separated by blank lines:
+Simple examples:
```````````````````````````````` example
- chunk1
+Foo *bar*
+=========
- chunk2
-
-
-
- chunk3
+Foo *bar*
+---------
.
-<pre><code>chunk1
-
-chunk2
+<h1>Foo <em>bar</em></h1>
+<h2>Foo <em>bar</em></h2>
+````````````````````````````````
+The content of the header may span more than one line:
-chunk3
-</code></pre>
+```````````````````````````````` example
+Foo *bar
+baz*
+====
+.
+<h1>Foo <em>bar
+baz</em></h1>
````````````````````````````````
-
-Any initial spaces beyond four will be included in the content, even
-in interior blank lines:
+The contents are the result of parsing the headings's raw
+content as inlines. The heading's raw content is formed by
+concatenating the lines and removing initial and final
+[whitespace].
```````````````````````````````` example
- chunk1
-
- chunk2
+ Foo *bar
+baz*→
+====
.
-<pre><code>chunk1
-
- chunk2
-</code></pre>
+<h1>Foo <em>bar
+baz</em></h1>
````````````````````````````````
-An indented code block cannot interrupt a paragraph. (This
-allows hanging indents and the like.)
+The underlining can be any length:
```````````````````````````````` example
Foo
- bar
+-------------------------
+Foo
+=
.
-<p>Foo
-bar</p>
+<h2>Foo</h2>
+<h1>Foo</h1>
````````````````````````````````
-However, any non-blank line with fewer than four leading spaces ends
-the code block immediately. So a paragraph may occur immediately
-after indented code:
+The heading content can be indented up to three spaces, and need
+not line up with the underlining:
```````````````````````````````` example
- foo
-bar
+ Foo
+---
+
+ Foo
+-----
+
+ Foo
+ ===
.
-<pre><code>foo
-</code></pre>
-<p>bar</p>
+<h2>Foo</h2>
+<h2>Foo</h2>
+<h1>Foo</h1>
````````````````````````````````
-And indented code can occur immediately before and after other kinds of
-blocks:
+Four spaces indent is too much:
```````````````````````````````` example
-# Heading
- foo
-Heading
-------
- foo
-----
+ Foo
+ ---
+
+ Foo
+---
.
-<h1>Heading</h1>
-<pre><code>foo
-</code></pre>
-<h2>Heading</h2>
-<pre><code>foo
+<pre><code>Foo
+---
+
+Foo
</code></pre>
<hr />
````````````````````````````````
-The first line can be indented more than four spaces:
+The setext heading underline can be indented up to three spaces, and
+may have trailing spaces:
```````````````````````````````` example
- foo
- bar
+Foo
+ ----
.
-<pre><code> foo
-bar
-</code></pre>
+<h2>Foo</h2>
````````````````````````````````
-Blank lines preceding or following an indented code block
-are not included in it:
+Four spaces is too much:
```````````````````````````````` example
-
-
- foo
-
-
+Foo
+ ---
.
-<pre><code>foo
-</code></pre>
+<p>Foo
+---</p>
````````````````````````````````
-Trailing spaces are included in the code block's content:
+The setext heading underline cannot contain internal spaces:
```````````````````````````````` example
- foo
+Foo
+= =
+
+Foo
+--- -
.
-<pre><code>foo
-</code></pre>
+<p>Foo
+= =</p>
+<p>Foo</p>
+<hr />
````````````````````````````````
+Trailing spaces in the content line do not cause a line break:
-## Fenced code blocks
-
-A [code fence](@) is a sequence
-of at least three consecutive backtick characters (`` ` ``) or
-tildes (`~`). (Tildes and backticks cannot be mixed.)
-A [fenced code block](@)
-begins with a code fence, indented no more than three spaces.
-
-The line with the opening code fence may optionally contain some text
-following the code fence; this is trimmed of leading and trailing
-whitespace and called the [info string](@). If the [info string] comes
-after a backtick fence, it may not contain any backtick
-characters. (The reason for this restriction is that otherwise
-some inline code would be incorrectly interpreted as the
-beginning of a fenced code block.)
+```````````````````````````````` example
+Foo
+-----
+.
+<h2>Foo</h2>
+````````````````````````````````
-The content of the code block consists of all subsequent lines, until
-a closing [code fence] of the same type as the code block
-began with (backticks or tildes), and with at least as many backticks
-or tildes as the opening code fence. If the leading code fence is
-indented N spaces, then up to N spaces of indentation are removed from
-each line of the content (if present). (If a content line is not
-indented, it is preserved unchanged. If it is indented less than N
-spaces, all of the indentation is removed.)
-The closing code fence may be indented up to three spaces, and may be
-followed only by spaces, which are ignored. If the end of the
-containing block (or document) is reached and no closing code fence
-has been found, the code block contains all of the lines after the
-opening code fence until the end of the containing block (or
-document). (An alternative spec would require backtracking in the
-event that a closing code fence is not found. But this makes parsing
-much less efficient, and there seems to be no real down side to the
-behavior described here.)
+Nor does a backslash at the end:
-A fenced code block may interrupt a paragraph, and does not require
-a blank line either before or after.
+```````````````````````````````` example
+Foo\
+----
+.
+<h2>Foo\</h2>
+````````````````````````````````
-The content of a code fence is treated as literal text, not parsed
-as inlines. The first word of the [info string] is typically used to
-specify the language of the code sample, and rendered in the `class`
-attribute of the `code` tag. However, this spec does not mandate any
-particular treatment of the [info string].
-Here is a simple example with backticks:
+Since indicators of block structure take precedence over
+indicators of inline structure, the following are setext headings:
```````````````````````````````` example
-```
-<
- >
-```
+`Foo
+----
+`
+
+<a title="a lot
+---
+of dashes"/>
.
-<pre><code><
- >
-</code></pre>
+<h2>`Foo</h2>
+<p>`</p>
+<h2><a title="a lot</h2>
+<p>of dashes"/></p>
````````````````````````````````
-With tildes:
+The setext heading underline cannot be a [lazy continuation
+line] in a list item or block quote:
```````````````````````````````` example
-~~~
-<
- >
-~~~
+> Foo
+---
.
-<pre><code><
- >
-</code></pre>
+<blockquote>
+<p>Foo</p>
+</blockquote>
+<hr />
````````````````````````````````
-Fewer than three backticks is not enough:
```````````````````````````````` example
-``
-foo
-``
+> foo
+bar
+===
.
-<p><code>foo</code></p>
+<blockquote>
+<p>foo
+bar
+===</p>
+</blockquote>
````````````````````````````````
-The closing code fence must use the same character as the opening
-fence:
```````````````````````````````` example
-```
-aaa
-~~~
-```
+- Foo
+---
.
-<pre><code>aaa
-~~~
-</code></pre>
+<ul>
+<li>Foo</li>
+</ul>
+<hr />
````````````````````````````````
+A blank line is needed between a paragraph and a following
+setext heading, since otherwise the paragraph becomes part
+of the heading's content:
+
```````````````````````````````` example
-~~~
-aaa
-```
-~~~
+Foo
+Bar
+---
.
-<pre><code>aaa
-```
-</code></pre>
+<h2>Foo
+Bar</h2>
````````````````````````````````
-The closing code fence must be at least as long as the opening fence:
+But in general a blank line is not required before or after
+setext headings:
```````````````````````````````` example
-````
-aaa
-```
-``````
+---
+Foo
+---
+Bar
+---
+Baz
.
-<pre><code>aaa
-```
-</code></pre>
+<hr />
+<h2>Foo</h2>
+<h2>Bar</h2>
+<p>Baz</p>
````````````````````````````````
+Setext headings cannot be empty:
+
```````````````````````````````` example
-~~~~
-aaa
-~~~
-~~~~
+
+====
.
-<pre><code>aaa
-~~~
-</code></pre>
+<p>====</p>
````````````````````````````````
-Unclosed code blocks are closed by the end of the document
-(or the enclosing [block quote][block quotes] or [list item][list items]):
+Setext heading text lines must not be interpretable as block
+constructs other than paragraphs. So, the line of dashes
+in these examples gets interpreted as a thematic break:
```````````````````````````````` example
-```
+---
+---
.
-<pre><code></code></pre>
+<hr />
+<hr />
````````````````````````````````
```````````````````````````````` example
-`````
+- foo
+-----
+.
+<ul>
+<li>foo</li>
+</ul>
+<hr />
+````````````````````````````````
-```
-aaa
+
+```````````````````````````````` example
+ foo
+---
.
-<pre><code>
-```
-aaa
+<pre><code>foo
</code></pre>
+<hr />
````````````````````````````````
```````````````````````````````` example
-> ```
-> aaa
-
-bbb
+> foo
+-----
.
<blockquote>
-<pre><code>aaa
-</code></pre>
+<p>foo</p>
</blockquote>
-<p>bbb</p>
+<hr />
````````````````````````````````
-A code block can have all empty lines as its content:
+If you want a heading with `> foo` as its literal text, you can
+use backslash escapes:
```````````````````````````````` example
-```
-
-
-```
+\> foo
+------
.
-<pre><code>
-
-</code></pre>
+<h2>> foo</h2>
````````````````````````````````
-A code block can be empty:
+**Compatibility note:** Most existing Markdown implementations
+do not allow the text of setext headings to span multiple lines.
+But there is no consensus about how to interpret
-```````````````````````````````` example
-```
+``` markdown
+Foo
+bar
+---
+baz
```
-.
-<pre><code></code></pre>
-````````````````````````````````
+One can find four different interpretations:
-Fences can be indented. If the opening fence is indented,
-content lines will have equivalent opening indentation removed,
-if present:
+1. paragraph "Foo", heading "bar", paragraph "baz"
+2. paragraph "Foo bar", thematic break, paragraph "baz"
+3. paragraph "Foo bar --- baz"
+4. heading "Foo bar", paragraph "baz"
+
+We find interpretation 4 most natural, and interpretation 4
+increases the expressive power of CommonMark, by allowing
+multiline headings. Authors who want interpretation 1 can
+put a blank line after the first paragraph:
```````````````````````````````` example
- ```
- aaa
-aaa
-```
+Foo
+
+bar
+---
+baz
.
-<pre><code>aaa
-aaa
-</code></pre>
+<p>Foo</p>
+<h2>bar</h2>
+<p>baz</p>
````````````````````````````````
+Authors who want interpretation 2 can put blank lines around
+the thematic break,
+
```````````````````````````````` example
- ```
-aaa
- aaa
-aaa
- ```
-.
-<pre><code>aaa
-aaa
-aaa
-</code></pre>
-````````````````````````````````
+Foo
+bar
+---
-```````````````````````````````` example
- ```
- aaa
- aaa
- aaa
- ```
+baz
.
-<pre><code>aaa
- aaa
-aaa
-</code></pre>
+<p>Foo
+bar</p>
+<hr />
+<p>baz</p>
````````````````````````````````
-Four spaces indentation produces an indented code block:
+or use a thematic break that cannot count as a [setext heading
+underline], such as
```````````````````````````````` example
- ```
- aaa
- ```
+Foo
+bar
+* * *
+baz
.
-<pre><code>```
-aaa
-```
-</code></pre>
+<p>Foo
+bar</p>
+<hr />
+<p>baz</p>
````````````````````````````````
-Closing fences may be indented by 0-3 spaces, and their indentation
-need not match that of the opening fence:
+Authors who want interpretation 3 can use backslash escapes:
```````````````````````````````` example
-```
-aaa
- ```
+Foo
+bar
+\---
+baz
.
-<pre><code>aaa
-</code></pre>
+<p>Foo
+bar
+---
+baz</p>
````````````````````````````````
-```````````````````````````````` example
- ```
-aaa
- ```
-.
-<pre><code>aaa
-</code></pre>
-````````````````````````````````
+## Indented code blocks
+An [indented code block](@) is composed of one or more
+[indented chunks] separated by blank lines.
+An [indented chunk](@) is a sequence of non-blank lines,
+each indented four or more spaces. The contents of the code block are
+the literal contents of the lines, including trailing
+[line endings], minus four spaces of indentation.
+An indented code block has no [info string].
-This is not a closing fence, because it is indented 4 spaces:
+An indented code block cannot interrupt a paragraph, so there must be
+a blank line between a paragraph and a following indented code block.
+(A blank line is not needed, however, between a code block and a following
+paragraph.)
```````````````````````````````` example
-```
-aaa
- ```
+ a simple
+ indented code block
.
-<pre><code>aaa
- ```
+<pre><code>a simple
+ indented code block
</code></pre>
````````````````````````````````
-
-Code fences (opening and closing) cannot contain internal spaces:
+If there is any ambiguity between an interpretation of indentation
+as a code block and as indicating that material belongs to a [list
+item][list items], the list item interpretation takes precedence:
```````````````````````````````` example
-``` ```
-aaa
+ - foo
+
+ bar
.
-<p><code> </code>
-aaa</p>
+<ul>
+<li>
+<p>foo</p>
+<p>bar</p>
+</li>
+</ul>
````````````````````````````````
```````````````````````````````` example
-~~~~~~
-aaa
-~~~ ~~
+1. foo
+
+ - bar
.
-<pre><code>aaa
-~~~ ~~
-</code></pre>
+<ol>
+<li>
+<p>foo</p>
+<ul>
+<li>bar</li>
+</ul>
+</li>
+</ol>
````````````````````````````````
-Fenced code blocks can interrupt paragraphs, and can be followed
-directly by paragraphs, without a blank line between:
+
+The contents of a code block are literal text, and do not get parsed
+as Markdown:
```````````````````````````````` example
-foo
-```
-bar
-```
-baz
+ <a/>
+ *hi*
+
+ - one
.
-<p>foo</p>
-<pre><code>bar
+<pre><code><a/>
+*hi*
+
+- one
</code></pre>
-<p>baz</p>
````````````````````````````````
-Other blocks can also occur before and after fenced code blocks
-without an intervening blank line:
+Here we have three chunks separated by blank lines:
```````````````````````````````` example
-foo
----
-~~~
-bar
-~~~
-# baz
+ chunk1
+
+ chunk2
+
+
+
+ chunk3
.
-<h2>foo</h2>
-<pre><code>bar
+<pre><code>chunk1
+
+chunk2
+
+
+
+chunk3
</code></pre>
-<h1>baz</h1>
````````````````````````````````
-An [info string] can be provided after the opening code fence.
-Although this spec doesn't mandate any particular treatment of
-the info string, the first word is typically used to specify
-the language of the code block. In HTML output, the language is
-normally indicated by adding a class to the `code` element consisting
-of `language-` followed by the language name.
+Any initial spaces beyond four will be included in the content, even
+in interior blank lines:
```````````````````````````````` example
-```ruby
-def foo(x)
- return 3
-end
-```
+ chunk1
+
+ chunk2
.
-<pre><code class="language-ruby">def foo(x)
- return 3
-end
+<pre><code>chunk1
+
+ chunk2
</code></pre>
````````````````````````````````
+An indented code block cannot interrupt a paragraph. (This
+allows hanging indents and the like.)
+
```````````````````````````````` example
-~~~~ ruby startline=3 $%@#$
-def foo(x)
- return 3
-end
-~~~~~~~
+Foo
+ bar
+
.
-<pre><code class="language-ruby">def foo(x)
- return 3
-end
-</code></pre>
+<p>Foo
+bar</p>
````````````````````````````````
+However, any non-blank line with fewer than four leading spaces ends
+the code block immediately. So a paragraph may occur immediately
+after indented code:
+
```````````````````````````````` example
-````;
-````
+ foo
+bar
.
-<pre><code class="language-;"></code></pre>
+<pre><code>foo
+</code></pre>
+<p>bar</p>
````````````````````````````````
-[Info strings] for backtick code blocks cannot contain backticks:
+And indented code can occur immediately before and after other kinds of
+blocks:
```````````````````````````````` example
-``` aa ```
-foo
+# Heading
+ foo
+Heading
+------
+ foo
+----
.
-<p><code>aa</code>
-foo</p>
+<h1>Heading</h1>
+<pre><code>foo
+</code></pre>
+<h2>Heading</h2>
+<pre><code>foo
+</code></pre>
+<hr />
````````````````````````````````
-[Info strings] for tilde code blocks can contain backticks and tildes:
+The first line can be indented more than four spaces:
```````````````````````````````` example
-~~~ aa ``` ~~~
-foo
-~~~
+ foo
+ bar
.
-<pre><code class="language-aa">foo
+<pre><code> foo
+bar
</code></pre>
````````````````````````````````
-Closing code fences cannot have [info strings]:
+Blank lines preceding or following an indented code block
+are not included in it:
```````````````````````````````` example
-```
-``` aaa
-```
+
+
+ foo
+
+
.
-<pre><code>``` aaa
+<pre><code>foo
</code></pre>
````````````````````````````````
+Trailing spaces are included in the code block's content:
-## HTML blocks
+```````````````````````````````` example
+ foo
+.
+<pre><code>foo
+</code></pre>
+````````````````````````````````
-An [HTML block](@) is a group of lines that is treated
-as raw HTML (and will not be escaped in HTML output).
-There are seven kinds of [HTML block], which can be defined by their
-start and end conditions. The block begins with a line that meets a
-[start condition](@) (after up to three spaces optional indentation).
-It ends with the first subsequent line that meets a matching [end
-condition](@), or the last line of the document, or the last line of
-the [container block](#container-blocks) containing the current HTML
-block, if no line is encountered that meets the [end condition]. If
-the first line meets both the [start condition] and the [end
-condition], the block will contain just that line.
-1. **Start condition:** line begins with the string `<script`,
-`<pre`, or `<style` (case-insensitive), followed by whitespace,
-the string `>`, or the end of the line.\
-**End condition:** line contains an end tag
-`</script>`, `</pre>`, or `</style>` (case-insensitive; it
-need not match the start tag).
+## Fenced code blocks
-2. **Start condition:** line begins with the string `<!--`.\
-**End condition:** line contains the string `-->`.
+A [code fence](@) is a sequence
+of at least three consecutive backtick characters (`` ` ``) or
+tildes (`~`). (Tildes and backticks cannot be mixed.)
+A [fenced code block](@)
+begins with a code fence, indented no more than three spaces.
-3. **Start condition:** line begins with the string `<?`.\
-**End condition:** line contains the string `?>`.
+The line with the opening code fence may optionally contain some text
+following the code fence; this is trimmed of leading and trailing
+whitespace and called the [info string](@). If the [info string] comes
+after a backtick fence, it may not contain any backtick
+characters. (The reason for this restriction is that otherwise
+some inline code would be incorrectly interpreted as the
+beginning of a fenced code block.)
-4. **Start condition:** line begins with the string `<!`
-followed by an uppercase ASCII letter.\
-**End condition:** line contains the character `>`.
+The content of the code block consists of all subsequent lines, until
+a closing [code fence] of the same type as the code block
+began with (backticks or tildes), and with at least as many backticks
+or tildes as the opening code fence. If the leading code fence is
+indented N spaces, then up to N spaces of indentation are removed from
+each line of the content (if present). (If a content line is not
+indented, it is preserved unchanged. If it is indented less than N
+spaces, all of the indentation is removed.)
-5. **Start condition:** line begins with the string
-`<![CDATA[`.\
-**End condition:** line contains the string `]]>`.
-
-6. **Start condition:** line begins the string `<` or `</`
-followed by one of the strings (case-insensitive) `address`,
-`article`, `aside`, `base`, `basefont`, `blockquote`, `body`,
-`caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`,
-`dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`,
-`footer`, `form`, `frame`, `frameset`,
-`h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`,
-`html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`,
-`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
-`section`, `source`, `summary`, `table`, `tbody`, `td`,
-`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
-by [whitespace], the end of the line, the string `>`, or
-the string `/>`.\
-**End condition:** line is followed by a [blank line].
+The closing code fence may be indented up to three spaces, and may be
+followed only by spaces, which are ignored. If the end of the
+containing block (or document) is reached and no closing code fence
+has been found, the code block contains all of the lines after the
+opening code fence until the end of the containing block (or
+document). (An alternative spec would require backtracking in the
+event that a closing code fence is not found. But this makes parsing
+much less efficient, and there seems to be no real down side to the
+behavior described here.)
-7. **Start condition:** line begins with a complete [open tag]
-(with any [tag name] other than `script`,
-`style`, or `pre`) or a complete [closing tag],
-followed only by [whitespace] or the end of the line.\
-**End condition:** line is followed by a [blank line].
+A fenced code block may interrupt a paragraph, and does not require
+a blank line either before or after.
-HTML blocks continue until they are closed by their appropriate
-[end condition], or the last line of the document or other [container
-block](#container-blocks). This means any HTML **within an HTML
-block** that might otherwise be recognised as a start condition will
-be ignored by the parser and passed through as-is, without changing
-the parser's state.
+The content of a code fence is treated as literal text, not parsed
+as inlines. The first word of the [info string] is typically used to
+specify the language of the code sample, and rendered in the `class`
+attribute of the `code` tag. However, this spec does not mandate any
+particular treatment of the [info string].
-For instance, `<pre>` within a HTML block started by `<table>` will not affect
-the parser state; as the HTML block was started in by start condition 6, it
-will end at any blank line. This can be surprising:
+Here is a simple example with backticks:
```````````````````````````````` example
-<table><tr><td>
-<pre>
-**Hello**,
-
-_world_.
-</pre>
-</td></tr></table>
+```
+<
+ >
+```
.
-<table><tr><td>
-<pre>
-**Hello**,
-<p><em>world</em>.
-</pre></p>
-</td></tr></table>
+<pre><code><
+ >
+</code></pre>
````````````````````````````````
-In this case, the HTML block is terminated by the newline — the `**Hello**`
-text remains verbatim — and regular parsing resumes, with a paragraph,
-emphasised `world` and inline and block HTML following.
-
-All types of [HTML blocks] except type 7 may interrupt
-a paragraph. Blocks of type 7 may not interrupt a paragraph.
-(This restriction is intended to prevent unwanted interpretation
-of long tags inside a wrapped paragraph as starting HTML blocks.)
-Some simple examples follow. Here are some basic HTML blocks
-of type 6:
+With tildes:
```````````````````````````````` example
-<table>
- <tr>
- <td>
- hi
- </td>
- </tr>
-</table>
-
-okay.
+~~~
+<
+ >
+~~~
.
-<table>
- <tr>
- <td>
- hi
- </td>
- </tr>
-</table>
-<p>okay.</p>
+<pre><code><
+ >
+</code></pre>
````````````````````````````````
+Fewer than three backticks is not enough:
```````````````````````````````` example
- <div>
- *hello*
- <foo><a>
+``
+foo
+``
.
- <div>
- *hello*
- <foo><a>
+<p><code>foo</code></p>
````````````````````````````````
-
-A block can also start with a closing tag:
+The closing code fence must use the same character as the opening
+fence:
```````````````````````````````` example
-</div>
-*foo*
+```
+aaa
+~~~
+```
.
-</div>
-*foo*
+<pre><code>aaa
+~~~
+</code></pre>
````````````````````````````````
-Here we have two HTML blocks with a Markdown paragraph between them:
-
```````````````````````````````` example
-<DIV CLASS="foo">
-
-*Markdown*
-
-</DIV>
+~~~
+aaa
+```
+~~~
.
-<DIV CLASS="foo">
-<p><em>Markdown</em></p>
-</DIV>
+<pre><code>aaa
+```
+</code></pre>
````````````````````````````````
-The tag on the first line can be partial, as long
-as it is split where there would be whitespace:
+The closing code fence must be at least as long as the opening fence:
```````````````````````````````` example
-<div id="foo"
- class="bar">
-</div>
+````
+aaa
+```
+``````
.
-<div id="foo"
- class="bar">
-</div>
+<pre><code>aaa
+```
+</code></pre>
````````````````````````````````
```````````````````````````````` example
-<div id="foo" class="bar
- baz">
-</div>
+~~~~
+aaa
+~~~
+~~~~
.
-<div id="foo" class="bar
- baz">
-</div>
+<pre><code>aaa
+~~~
+</code></pre>
````````````````````````````````
-An open tag need not be closed:
-```````````````````````````````` example
-<div>
-*foo*
+Unclosed code blocks are closed by the end of the document
+(or the enclosing [block quote][block quotes] or [list item][list items]):
-*bar*
+```````````````````````````````` example
+```
.
-<div>
-*foo*
-<p><em>bar</em></p>
+<pre><code></code></pre>
````````````````````````````````
-
-A partial tag need not even be completed (garbage
-in, garbage out):
-
```````````````````````````````` example
-<div id="foo"
-*hi*
+`````
+
+```
+aaa
.
-<div id="foo"
-*hi*
+<pre><code>
+```
+aaa
+</code></pre>
````````````````````````````````
```````````````````````````````` example
-<div class
-foo
+> ```
+> aaa
+
+bbb
.
-<div class
-foo
+<blockquote>
+<pre><code>aaa
+</code></pre>
+</blockquote>
+<p>bbb</p>
````````````````````````````````
-The initial tag doesn't even need to be a valid
-tag, as long as it starts like one:
+A code block can have all empty lines as its content:
```````````````````````````````` example
-<div *???-&&&-<---
-*foo*
+```
+
+
+```
.
-<div *???-&&&-<---
-*foo*
+<pre><code>
+
+</code></pre>
````````````````````````````````
-In type 6 blocks, the initial tag need not be on a line by
-itself:
+A code block can be empty:
```````````````````````````````` example
-<div><a href="bar">*foo*</a></div>
+```
+```
.
-<div><a href="bar">*foo*</a></div>
+<pre><code></code></pre>
````````````````````````````````
+Fences can be indented. If the opening fence is indented,
+content lines will have equivalent opening indentation removed,
+if present:
+
```````````````````````````````` example
-<table><tr><td>
-foo
-</td></tr></table>
+ ```
+ aaa
+aaa
+```
.
-<table><tr><td>
-foo
-</td></tr></table>
+<pre><code>aaa
+aaa
+</code></pre>
````````````````````````````````
-Everything until the next blank line or end of document
-gets included in the HTML block. So, in the following
-example, what looks like a Markdown code block
-is actually part of the HTML block, which continues until a blank
-line or the end of the document is reached:
-
```````````````````````````````` example
-<div></div>
-``` c
-int x = 33;
-```
+ ```
+aaa
+ aaa
+aaa
+ ```
.
-<div></div>
-``` c
-int x = 33;
-```
+<pre><code>aaa
+aaa
+aaa
+</code></pre>
````````````````````````````````
-To start an [HTML block] with a tag that is *not* in the
-list of block-level tags in (6), you must put the tag by
-itself on the first line (and it must be complete):
-
```````````````````````````````` example
-<a href="foo">
-*bar*
-</a>
+ ```
+ aaa
+ aaa
+ aaa
+ ```
.
-<a href="foo">
-*bar*
-</a>
+<pre><code>aaa
+ aaa
+aaa
+</code></pre>
````````````````````````````````
-In type 7 blocks, the [tag name] can be anything:
+Four spaces indentation produces an indented code block:
```````````````````````````````` example
-<Warning>
-*bar*
-</Warning>
+ ```
+ aaa
+ ```
.
-<Warning>
-*bar*
-</Warning>
+<pre><code>```
+aaa
+```
+</code></pre>
````````````````````````````````
+Closing fences may be indented by 0-3 spaces, and their indentation
+need not match that of the opening fence:
+
```````````````````````````````` example
-<i class="foo">
-*bar*
-</i>
+```
+aaa
+ ```
.
-<i class="foo">
-*bar*
-</i>
+<pre><code>aaa
+</code></pre>
````````````````````````````````
```````````````````````````````` example
-</ins>
-*bar*
+ ```
+aaa
+ ```
.
-</ins>
-*bar*
+<pre><code>aaa
+</code></pre>
````````````````````````````````
-These rules are designed to allow us to work with tags that
-can function as either block-level or inline-level tags.
-The `<del>` tag is a nice example. We can surround content with
-`<del>` tags in three different ways. In this case, we get a raw
-HTML block, because the `<del>` tag is on a line by itself:
+This is not a closing fence, because it is indented 4 spaces:
```````````````````````````````` example
-<del>
-*foo*
-</del>
+```
+aaa
+ ```
.
-<del>
-*foo*
-</del>
+<pre><code>aaa
+ ```
+</code></pre>
````````````````````````````````
-In this case, we get a raw HTML block that just includes
-the `<del>` tag (because it ends with the following blank
-line). So the contents get interpreted as CommonMark:
+
+Code fences (opening and closing) cannot contain internal spaces:
```````````````````````````````` example
-<del>
+``` ```
+aaa
+.
+<p><code> </code>
+aaa</p>
+````````````````````````````````
-*foo*
-</del>
+```````````````````````````````` example
+~~~~~~
+aaa
+~~~ ~~
.
-<del>
-<p><em>foo</em></p>
-</del>
+<pre><code>aaa
+~~~ ~~
+</code></pre>
````````````````````````````````
-Finally, in this case, the `<del>` tags are interpreted
-as [raw HTML] *inside* the CommonMark paragraph. (Because
-the tag is not on a line by itself, we get inline HTML
-rather than an [HTML block].)
+Fenced code blocks can interrupt paragraphs, and can be followed
+directly by paragraphs, without a blank line between:
```````````````````````````````` example
-<del>*foo*</del>
+foo
+```
+bar
+```
+baz
.
-<p><del><em>foo</em></del></p>
+<p>foo</p>
+<pre><code>bar
+</code></pre>
+<p>baz</p>
````````````````````````````````
-HTML tags designed to contain literal content
-(`script`, `style`, `pre`), comments, processing instructions,
-and declarations are treated somewhat differently.
-Instead of ending at the first blank line, these blocks
-end at the first line containing a corresponding end tag.
-As a result, these blocks can contain blank lines:
-
-A pre tag (type 1):
+Other blocks can also occur before and after fenced code blocks
+without an intervening blank line:
```````````````````````````````` example
-<pre language="haskell"><code>
-import Text.HTML.TagSoup
-
-main :: IO ()
-main = print $ parseTags tags
-</code></pre>
-okay
+foo
+---
+~~~
+bar
+~~~
+# baz
.
-<pre language="haskell"><code>
-import Text.HTML.TagSoup
-
-main :: IO ()
-main = print $ parseTags tags
+<h2>foo</h2>
+<pre><code>bar
</code></pre>
-<p>okay</p>
+<h1>baz</h1>
````````````````````````````````
-A script tag (type 1):
+An [info string] can be provided after the opening code fence.
+Although this spec doesn't mandate any particular treatment of
+the info string, the first word is typically used to specify
+the language of the code block. In HTML output, the language is
+normally indicated by adding a class to the `code` element consisting
+of `language-` followed by the language name.
```````````````````````````````` example
-<script type="text/javascript">
-// JavaScript example
-
-document.getElementById("demo").innerHTML = "Hello JavaScript!";
-</script>
-okay
+```ruby
+def foo(x)
+ return 3
+end
+```
.
-<script type="text/javascript">
-// JavaScript example
-
-document.getElementById("demo").innerHTML = "Hello JavaScript!";
-</script>
-<p>okay</p>
+<pre><code class="language-ruby">def foo(x)
+ return 3
+end
+</code></pre>
````````````````````````````````
-A style tag (type 1):
-
```````````````````````````````` example
-<style
- type="text/css">
-h1 {color:red;}
-
-p {color:blue;}
-</style>
-okay
+~~~~ ruby startline=3 $%@#$
+def foo(x)
+ return 3
+end
+~~~~~~~
.
-<style
- type="text/css">
-h1 {color:red;}
-
-p {color:blue;}
-</style>
-<p>okay</p>
+<pre><code class="language-ruby">def foo(x)
+ return 3
+end
+</code></pre>
````````````````````````````````
-If there is no matching end tag, the block will end at the
-end of the document (or the enclosing [block quote][block quotes]
-or [list item][list items]):
-
```````````````````````````````` example
-<style
- type="text/css">
-
-foo
+````;
+````
.
-<style
- type="text/css">
-
-foo
+<pre><code class="language-;"></code></pre>
````````````````````````````````
-```````````````````````````````` example
-> <div>
-> foo
+[Info strings] for backtick code blocks cannot contain backticks:
-bar
-.
-<blockquote>
-<div>
+```````````````````````````````` example
+``` aa ```
foo
-</blockquote>
-<p>bar</p>
+.
+<p><code>aa</code>
+foo</p>
````````````````````````````````
+[Info strings] for tilde code blocks can contain backticks and tildes:
+
```````````````````````````````` example
-- <div>
-- foo
+~~~ aa ``` ~~~
+foo
+~~~
.
-<ul>
-<li>
-<div>
-</li>
-<li>foo</li>
-</ul>
+<pre><code class="language-aa">foo
+</code></pre>
````````````````````````````````
-The end tag can occur on the same line as the start tag:
+Closing code fences cannot have [info strings]:
```````````````````````````````` example
-<style>p{color:red;}</style>
-*foo*
+```
+``` aaa
+```
.
-<style>p{color:red;}</style>
-<p><em>foo</em></p>
+<pre><code>``` aaa
+</code></pre>
````````````````````````````````
-```````````````````````````````` example
-<!-- foo -->*bar*
-*baz*
-.
-<!-- foo -->*bar*
-<p><em>baz</em></p>
-````````````````````````````````
+## HTML blocks
-Note that anything on the last line after the
-end tag will be included in the [HTML block]:
+An [HTML block](@) is a group of lines that is treated
+as raw HTML (and will not be escaped in HTML output).
-```````````````````````````````` example
-<script>
-foo
-</script>1. *bar*
-.
-<script>
-foo
-</script>1. *bar*
-````````````````````````````````
+There are seven kinds of [HTML block], which can be defined by their
+start and end conditions. The block begins with a line that meets a
+[start condition](@) (after up to three spaces optional indentation).
+It ends with the first subsequent line that meets a matching [end
+condition](@), or the last line of the document, or the last line of
+the [container block](#container-blocks) containing the current HTML
+block, if no line is encountered that meets the [end condition]. If
+the first line meets both the [start condition] and the [end
+condition], the block will contain just that line.
+1. **Start condition:** line begins with the string `<script`,
+`<pre`, or `<style` (case-insensitive), followed by whitespace,
+the string `>`, or the end of the line.\
+**End condition:** line contains an end tag
+`</script>`, `</pre>`, or `</style>` (case-insensitive; it
+need not match the start tag).
-A comment (type 2):
+2. **Start condition:** line begins with the string `<!--`.\
+**End condition:** line contains the string `-->`.
-```````````````````````````````` example
-<!-- Foo
+3. **Start condition:** line begins with the string `<?`.\
+**End condition:** line contains the string `?>`.
-bar
- baz -->
-okay
-.
-<!-- Foo
+4. **Start condition:** line begins with the string `<!`
+followed by an ASCII letter.\
+**End condition:** line contains the character `>`.
-bar
- baz -->
-<p>okay</p>
-````````````````````````````````
+5. **Start condition:** line begins with the string
+`<![CDATA[`.\
+**End condition:** line contains the string `]]>`.
+
+6. **Start condition:** line begins the string `<` or `</`
+followed by one of the strings (case-insensitive) `address`,
+`article`, `aside`, `base`, `basefont`, `blockquote`, `body`,
+`caption`, `center`, `col`, `colgroup`, `dd`, `details`, `dialog`,
+`dir`, `div`, `dl`, `dt`, `fieldset`, `figcaption`, `figure`,
+`footer`, `form`, `frame`, `frameset`,
+`h1`, `h2`, `h3`, `h4`, `h5`, `h6`, `head`, `header`, `hr`,
+`html`, `iframe`, `legend`, `li`, `link`, `main`, `menu`, `menuitem`,
+`nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`,
+`section`, `source`, `summary`, `table`, `tbody`, `td`,
+`tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed
+by [whitespace], the end of the line, the string `>`, or
+the string `/>`.\
+**End condition:** line is followed by a [blank line].
+7. **Start condition:** line begins with a complete [open tag]
+(with any [tag name] other than `script`,
+`style`, or `pre`) or a complete [closing tag],
+followed only by [whitespace] or the end of the line.\
+**End condition:** line is followed by a [blank line].
+HTML blocks continue until they are closed by their appropriate
+[end condition], or the last line of the document or other [container
+block](#container-blocks). This means any HTML **within an HTML
+block** that might otherwise be recognised as a start condition will
+be ignored by the parser and passed through as-is, without changing
+the parser's state.
-A processing instruction (type 3):
+For instance, `<pre>` within a HTML block started by `<table>` will not affect
+the parser state; as the HTML block was started in by start condition 6, it
+will end at any blank line. This can be surprising:
```````````````````````````````` example
-<?php
-
- echo '>';
+<table><tr><td>
+<pre>
+**Hello**,
-?>
-okay
+_world_.
+</pre>
+</td></tr></table>
.
-<?php
-
- echo '>';
-
-?>
-<p>okay</p>
+<table><tr><td>
+<pre>
+**Hello**,
+<p><em>world</em>.
+</pre></p>
+</td></tr></table>
````````````````````````````````
+In this case, the HTML block is terminated by the newline — the `**Hello**`
+text remains verbatim — and regular parsing resumes, with a paragraph,
+emphasised `world` and inline and block HTML following.
-A declaration (type 4):
+All types of [HTML blocks] except type 7 may interrupt
+a paragraph. Blocks of type 7 may not interrupt a paragraph.
+(This restriction is intended to prevent unwanted interpretation
+of long tags inside a wrapped paragraph as starting HTML blocks.)
+
+Some simple examples follow. Here are some basic HTML blocks
+of type 6:
```````````````````````````````` example
-<!DOCTYPE html>
+<table>
+ <tr>
+ <td>
+ hi
+ </td>
+ </tr>
+</table>
+
+okay.
.
-<!DOCTYPE html>
+<table>
+ <tr>
+ <td>
+ hi
+ </td>
+ </tr>
+</table>
+<p>okay.</p>
````````````````````````````````
-CDATA (type 5):
-
```````````````````````````````` example
-<![CDATA[
-function matchwo(a,b)
-{
- if (a < b && a < 0) then {
- return 1;
-
- } else {
-
- return 0;
- }
-}
-]]>
-okay
+ <div>
+ *hello*
+ <foo><a>
.
-<![CDATA[
-function matchwo(a,b)
-{
- if (a < b && a < 0) then {
- return 1;
-
- } else {
-
- return 0;
- }
-}
-]]>
-<p>okay</p>
+ <div>
+ *hello*
+ <foo><a>
````````````````````````````````
-The opening tag can be indented 1-3 spaces, but not 4:
+A block can also start with a closing tag:
```````````````````````````````` example
- <!-- foo -->
-
- <!-- foo -->
+</div>
+*foo*
.
- <!-- foo -->
-<pre><code><!-- foo -->
-</code></pre>
+</div>
+*foo*
````````````````````````````````
+Here we have two HTML blocks with a Markdown paragraph between them:
+
```````````````````````````````` example
- <div>
+<DIV CLASS="foo">
- <div>
+*Markdown*
+
+</DIV>
.
- <div>
-<pre><code><div>
-</code></pre>
+<DIV CLASS="foo">
+<p><em>Markdown</em></p>
+</DIV>
````````````````````````````````
-An HTML block of types 1--6 can interrupt a paragraph, and need not be
-preceded by a blank line.
+The tag on the first line can be partial, as long
+as it is split where there would be whitespace:
```````````````````````````````` example
-Foo
-<div>
-bar
+<div id="foo"
+ class="bar">
</div>
.
-<p>Foo</p>
-<div>
-bar
+<div id="foo"
+ class="bar">
</div>
````````````````````````````````
-However, a following blank line is needed, except at the end of
-a document, and except for blocks of types 1--5, [above][HTML
-block]:
-
```````````````````````````````` example
-<div>
-bar
+<div id="foo" class="bar
+ baz">
</div>
-*foo*
.
-<div>
-bar
+<div id="foo" class="bar
+ baz">
</div>
-*foo*
````````````````````````````````
-HTML blocks of type 7 cannot interrupt a paragraph:
-
+An open tag need not be closed:
```````````````````````````````` example
-Foo
-<a href="bar">
-baz
+<div>
+*foo*
+
+*bar*
.
-<p>Foo
-<a href="bar">
-baz</p>
+<div>
+*foo*
+<p><em>bar</em></p>
````````````````````````````````
-This rule differs from John Gruber's original Markdown syntax
-specification, which says:
-
-> The only restrictions are that block-level HTML elements —
-> e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from
-> surrounding content by blank lines, and the start and end tags of the
-> block should not be indented with tabs or spaces.
-
-In some ways Gruber's rule is more restrictive than the one given
-here:
-
-- It requires that an HTML block be preceded by a blank line.
-- It does not allow the start tag to be indented.
-- It requires a matching end tag, which it also does not allow to
- be indented.
-
-Most Markdown implementations (including some of Gruber's own) do not
-respect all of these restrictions.
-
-There is one respect, however, in which Gruber's rule is more liberal
-than the one given here, since it allows blank lines to occur inside
-an HTML block. There are two reasons for disallowing them here.
-First, it removes the need to parse balanced tags, which is
-expensive and can require backtracking from the end of the document
-if no matching end tag is found. Second, it provides a very simple
-and flexible way of including Markdown content inside HTML tags:
-simply separate the Markdown from the HTML using blank lines:
-Compare:
+A partial tag need not even be completed (garbage
+in, garbage out):
```````````````````````````````` example
-<div>
-
-*Emphasized* text.
-
-</div>
+<div id="foo"
+*hi*
.
-<div>
-<p><em>Emphasized</em> text.</p>
-</div>
+<div id="foo"
+*hi*
````````````````````````````````
```````````````````````````````` example
-<div>
-*Emphasized* text.
-</div>
+<div class
+foo
.
-<div>
-*Emphasized* text.
-</div>
+<div class
+foo
````````````````````````````````
-Some Markdown implementations have adopted a convention of
-interpreting content inside tags as text if the open tag has
-the attribute `markdown=1`. The rule given above seems a simpler and
-more elegant way of achieving the same expressive power, which is also
-much simpler to parse.
-
-The main potential drawback is that one can no longer paste HTML
-blocks into Markdown documents with 100% reliability. However,
-*in most cases* this will work fine, because the blank lines in
-HTML are usually followed by HTML block tags. For example:
+The initial tag doesn't even need to be a valid
+tag, as long as it starts like one:
```````````````````````````````` example
-<table>
-
-<tr>
-
-<td>
-Hi
-</td>
-
-</tr>
-
-</table>
+<div *???-&&&-<---
+*foo*
.
-<table>
-<tr>
-<td>
-Hi
-</td>
-</tr>
-</table>
+<div *???-&&&-<---
+*foo*
````````````````````````````````
-There are problems, however, if the inner tags are indented
-*and* separated by spaces, as then they will be interpreted as
-an indented code block:
+In type 6 blocks, the initial tag need not be on a line by
+itself:
```````````````````````````````` example
-<table>
-
- <tr>
-
- <td>
- Hi
- </td>
+<div><a href="bar">*foo*</a></div>
+.
+<div><a href="bar">*foo*</a></div>
+````````````````````````````````
- </tr>
-</table>
+```````````````````````````````` example
+<table><tr><td>
+foo
+</td></tr></table>
.
-<table>
- <tr>
-<pre><code><td>
- Hi
-</td>
-</code></pre>
- </tr>
-</table>
+<table><tr><td>
+foo
+</td></tr></table>
````````````````````````````````
-Fortunately, blank lines are usually not necessary and can be
-deleted. The exception is inside `<pre>` tags, but as described
-[above][HTML blocks], raw HTML blocks starting with `<pre>`
-*can* contain blank lines.
+Everything until the next blank line or end of document
+gets included in the HTML block. So, in the following
+example, what looks like a Markdown code block
+is actually part of the HTML block, which continues until a blank
+line or the end of the document is reached:
-## Link reference definitions
+```````````````````````````````` example
+<div></div>
+``` c
+int x = 33;
+```
+.
+<div></div>
+``` c
+int x = 33;
+```
+````````````````````````````````
-A [link reference definition](@)
-consists of a [link label], indented up to three spaces, followed
-by a colon (`:`), optional [whitespace] (including up to one
-[line ending]), a [link destination],
-optional [whitespace] (including up to one
-[line ending]), and an optional [link
-title], which if it is present must be separated
-from the [link destination] by [whitespace].
-No further [non-whitespace characters] may occur on the line.
-A [link reference definition]
-does not correspond to a structural element of a document. Instead, it
-defines a label which can be used in [reference links]
-and reference-style [images] elsewhere in the document. [Link
-reference definitions] can come either before or after the links that use
-them.
+To start an [HTML block] with a tag that is *not* in the
+list of block-level tags in (6), you must put the tag by
+itself on the first line (and it must be complete):
```````````````````````````````` example
-[foo]: /url "title"
-
-[foo]
+<a href="foo">
+*bar*
+</a>
.
-<p><a href="/url" title="title">foo</a></p>
+<a href="foo">
+*bar*
+</a>
````````````````````````````````
-```````````````````````````````` example
- [foo]:
- /url
- 'the title'
+In type 7 blocks, the [tag name] can be anything:
-[foo]
+```````````````````````````````` example
+<Warning>
+*bar*
+</Warning>
.
-<p><a href="/url" title="the title">foo</a></p>
+<Warning>
+*bar*
+</Warning>
````````````````````````````````
```````````````````````````````` example
-[Foo*bar\]]:my_(url) 'title (with parens)'
-
-[Foo*bar\]]
+<i class="foo">
+*bar*
+</i>
.
-<p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p>
+<i class="foo">
+*bar*
+</i>
````````````````````````````````
```````````````````````````````` example
-[Foo bar]:
-<my url>
-'title'
-
-[Foo bar]
+</ins>
+*bar*
.
-<p><a href="my%20url" title="title">Foo bar</a></p>
+</ins>
+*bar*
````````````````````````````````
-The title may extend over multiple lines:
+These rules are designed to allow us to work with tags that
+can function as either block-level or inline-level tags.
+The `<del>` tag is a nice example. We can surround content with
+`<del>` tags in three different ways. In this case, we get a raw
+HTML block, because the `<del>` tag is on a line by itself:
```````````````````````````````` example
-[foo]: /url '
-title
-line1
-line2
-'
-
-[foo]
+<del>
+*foo*
+</del>
.
-<p><a href="/url" title="
-title
-line1
-line2
-">foo</a></p>
+<del>
+*foo*
+</del>
````````````````````````````````
-However, it may not contain a [blank line]:
+In this case, we get a raw HTML block that just includes
+the `<del>` tag (because it ends with the following blank
+line). So the contents get interpreted as CommonMark:
```````````````````````````````` example
-[foo]: /url 'title
+<del>
-with blank line'
+*foo*
-[foo]
+</del>
.
-<p>[foo]: /url 'title</p>
-<p>with blank line'</p>
-<p>[foo]</p>
+<del>
+<p><em>foo</em></p>
+</del>
````````````````````````````````
-The title may be omitted:
-
-```````````````````````````````` example
-[foo]:
-/url
+Finally, in this case, the `<del>` tags are interpreted
+as [raw HTML] *inside* the CommonMark paragraph. (Because
+the tag is not on a line by itself, we get inline HTML
+rather than an [HTML block].)
-[foo]
+```````````````````````````````` example
+<del>*foo*</del>
.
-<p><a href="/url">foo</a></p>
+<p><del><em>foo</em></del></p>
````````````````````````````````
-The link destination may not be omitted:
+HTML tags designed to contain literal content
+(`script`, `style`, `pre`), comments, processing instructions,
+and declarations are treated somewhat differently.
+Instead of ending at the first blank line, these blocks
+end at the first line containing a corresponding end tag.
+As a result, these blocks can contain blank lines:
+
+A pre tag (type 1):
```````````````````````````````` example
-[foo]:
+<pre language="haskell"><code>
+import Text.HTML.TagSoup
-[foo]
+main :: IO ()
+main = print $ parseTags tags
+</code></pre>
+okay
.
-<p>[foo]:</p>
-<p>[foo]</p>
+<pre language="haskell"><code>
+import Text.HTML.TagSoup
+
+main :: IO ()
+main = print $ parseTags tags
+</code></pre>
+<p>okay</p>
````````````````````````````````
- However, an empty link destination may be specified using
- angle brackets:
+
+A script tag (type 1):
```````````````````````````````` example
-[foo]: <>
+<script type="text/javascript">
+// JavaScript example
-[foo]
+document.getElementById("demo").innerHTML = "Hello JavaScript!";
+</script>
+okay
.
-<p><a href="">foo</a></p>
+<script type="text/javascript">
+// JavaScript example
+
+document.getElementById("demo").innerHTML = "Hello JavaScript!";
+</script>
+<p>okay</p>
````````````````````````````````
-The title must be separated from the link destination by
-whitespace:
+
+A style tag (type 1):
```````````````````````````````` example
-[foo]: <bar>(baz)
+<style
+ type="text/css">
+h1 {color:red;}
-[foo]
+p {color:blue;}
+</style>
+okay
.
-<p>[foo]: <bar>(baz)</p>
-<p>[foo]</p>
+<style
+ type="text/css">
+h1 {color:red;}
+
+p {color:blue;}
+</style>
+<p>okay</p>
````````````````````````````````
-Both title and destination can contain backslash escapes
-and literal backslashes:
+If there is no matching end tag, the block will end at the
+end of the document (or the enclosing [block quote][block quotes]
+or [list item][list items]):
```````````````````````````````` example
-[foo]: /url\bar\*baz "foo\"bar\baz"
+<style
+ type="text/css">
-[foo]
+foo
.
-<p><a href="/url%5Cbar*baz" title="foo"bar\baz">foo</a></p>
-````````````````````````````````
+<style
+ type="text/css">
+foo
+````````````````````````````````
-A link can come before its corresponding definition:
```````````````````````````````` example
-[foo]
+> <div>
+> foo
-[foo]: url
+bar
.
-<p><a href="url">foo</a></p>
+<blockquote>
+<div>
+foo
+</blockquote>
+<p>bar</p>
````````````````````````````````
-If there are several matching definitions, the first one takes
-precedence:
-
```````````````````````````````` example
-[foo]
-
-[foo]: first
-[foo]: second
+- <div>
+- foo
.
-<p><a href="first">foo</a></p>
+<ul>
+<li>
+<div>
+</li>
+<li>foo</li>
+</ul>
````````````````````````````````
-As noted in the section on [Links], matching of labels is
-case-insensitive (see [matches]).
+The end tag can occur on the same line as the start tag:
```````````````````````````````` example
-[FOO]: /url
-
-[Foo]
+<style>p{color:red;}</style>
+*foo*
.
-<p><a href="/url">Foo</a></p>
+<style>p{color:red;}</style>
+<p><em>foo</em></p>
````````````````````````````````
```````````````````````````````` example
-[ΑΓΩ]: /φου
-
-[αγω]
+<!-- foo -->*bar*
+*baz*
.
-<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>
+<!-- foo -->*bar*
+<p><em>baz</em></p>
````````````````````````````````
-Here is a link reference definition with no corresponding link.
-It contributes nothing to the document.
+Note that anything on the last line after the
+end tag will be included in the [HTML block]:
```````````````````````````````` example
-[foo]: /url
+<script>
+foo
+</script>1. *bar*
.
+<script>
+foo
+</script>1. *bar*
````````````````````````````````
-Here is another one:
+A comment (type 2):
```````````````````````````````` example
-[
-foo
-]: /url
+<!-- Foo
+
bar
+ baz -->
+okay
.
-<p>bar</p>
+<!-- Foo
+
+bar
+ baz -->
+<p>okay</p>
````````````````````````````````
-This is not a link reference definition, because there are
-[non-whitespace characters] after the title:
+
+A processing instruction (type 3):
```````````````````````````````` example
-[foo]: /url "title" ok
+<?php
+
+ echo '>';
+
+?>
+okay
.
-<p>[foo]: /url "title" ok</p>
+<?php
+
+ echo '>';
+
+?>
+<p>okay</p>
````````````````````````````````
-This is a link reference definition, but it has no title:
+A declaration (type 4):
```````````````````````````````` example
-[foo]: /url
-"title" ok
+<!DOCTYPE html>
.
-<p>"title" ok</p>
+<!DOCTYPE html>
````````````````````````````````
-This is not a link reference definition, because it is indented
-four spaces:
+CDATA (type 5):
```````````````````````````````` example
- [foo]: /url "title"
+<![CDATA[
+function matchwo(a,b)
+{
+ if (a < b && a < 0) then {
+ return 1;
-[foo]
+ } else {
+
+ return 0;
+ }
+}
+]]>
+okay
.
-<pre><code>[foo]: /url "title"
-</code></pre>
-<p>[foo]</p>
+<![CDATA[
+function matchwo(a,b)
+{
+ if (a < b && a < 0) then {
+ return 1;
+
+ } else {
+
+ return 0;
+ }
+}
+]]>
+<p>okay</p>
````````````````````````````````
-This is not a link reference definition, because it occurs inside
-a code block:
+The opening tag can be indented 1-3 spaces, but not 4:
```````````````````````````````` example
-```
-[foo]: /url
-```
+ <!-- foo -->
-[foo]
+ <!-- foo -->
.
-<pre><code>[foo]: /url
+ <!-- foo -->
+<pre><code><!-- foo -->
</code></pre>
-<p>[foo]</p>
````````````````````````````````
-A [link reference definition] cannot interrupt a paragraph.
-
```````````````````````````````` example
-Foo
-[bar]: /baz
+ <div>
-[bar]
+ <div>
.
-<p>Foo
-[bar]: /baz</p>
-<p>[bar]</p>
+ <div>
+<pre><code><div>
+</code></pre>
````````````````````````````````
-However, it can directly follow other block elements, such as headings
-and thematic breaks, and it need not be followed by a blank line.
+An HTML block of types 1--6 can interrupt a paragraph, and need not be
+preceded by a blank line.
```````````````````````````````` example
-# [Foo]
-[foo]: /url
-> bar
-.
-<h1><a href="/url">Foo</a></h1>
-<blockquote>
-<p>bar</p>
-</blockquote>
+Foo
+<div>
+bar
+</div>
+.
+<p>Foo</p>
+<div>
+bar
+</div>
````````````````````````````````
+
+However, a following blank line is needed, except at the end of
+a document, and except for blocks of types 1--5, [above][HTML
+block]:
+
```````````````````````````````` example
-[foo]: /url
+<div>
bar
-===
-[foo]
+</div>
+*foo*
.
-<h1>bar</h1>
-<p><a href="/url">foo</a></p>
+<div>
+bar
+</div>
+*foo*
````````````````````````````````
+
+HTML blocks of type 7 cannot interrupt a paragraph:
+
```````````````````````````````` example
-[foo]: /url
-===
-[foo]
+Foo
+<a href="bar">
+baz
.
-<p>===
-<a href="/url">foo</a></p>
+<p>Foo
+<a href="bar">
+baz</p>
````````````````````````````````
-Several [link reference definitions]
-can occur one after another, without intervening blank lines.
+This rule differs from John Gruber's original Markdown syntax
+specification, which says:
-```````````````````````````````` example
-[foo]: /foo-url "foo"
-[bar]: /bar-url
- "bar"
-[baz]: /baz-url
+> The only restrictions are that block-level HTML elements —
+> e.g. `<div>`, `<table>`, `<pre>`, `<p>`, etc. — must be separated from
+> surrounding content by blank lines, and the start and end tags of the
+> block should not be indented with tabs or spaces.
-[foo],
-[bar],
-[baz]
-.
-<p><a href="/foo-url" title="foo">foo</a>,
-<a href="/bar-url" title="bar">bar</a>,
-<a href="/baz-url">baz</a></p>
-````````````````````````````````
+In some ways Gruber's rule is more restrictive than the one given
+here:
+- It requires that an HTML block be preceded by a blank line.
+- It does not allow the start tag to be indented.
+- It requires a matching end tag, which it also does not allow to
+ be indented.
-[Link reference definitions] can occur
-inside block containers, like lists and block quotations. They
-affect the entire document, not just the container in which they
-are defined:
+Most Markdown implementations (including some of Gruber's own) do not
+respect all of these restrictions.
+
+There is one respect, however, in which Gruber's rule is more liberal
+than the one given here, since it allows blank lines to occur inside
+an HTML block. There are two reasons for disallowing them here.
+First, it removes the need to parse balanced tags, which is
+expensive and can require backtracking from the end of the document
+if no matching end tag is found. Second, it provides a very simple
+and flexible way of including Markdown content inside HTML tags:
+simply separate the Markdown from the HTML using blank lines:
+
+Compare:
```````````````````````````````` example
-[foo]
+<div>
-> [foo]: /url
+*Emphasized* text.
+
+</div>
.
-<p><a href="/url">foo</a></p>
-<blockquote>
-</blockquote>
+<div>
+<p><em>Emphasized</em> text.</p>
+</div>
````````````````````````````````
-Whether something is a [link reference definition] is
-independent of whether the link reference it defines is
-used in the document. Thus, for example, the following
-document contains just a link reference definition, and
-no visible content:
-
```````````````````````````````` example
-[foo]: /url
+<div>
+*Emphasized* text.
+</div>
.
+<div>
+*Emphasized* text.
+</div>
````````````````````````````````
-## Paragraphs
-
-A sequence of non-blank lines that cannot be interpreted as other
-kinds of blocks forms a [paragraph](@).
-The contents of the paragraph are the result of parsing the
-paragraph's raw content as inlines. The paragraph's raw content
-is formed by concatenating the lines and removing initial and final
-[whitespace].
+Some Markdown implementations have adopted a convention of
+interpreting content inside tags as text if the open tag has
+the attribute `markdown=1`. The rule given above seems a simpler and
+more elegant way of achieving the same expressive power, which is also
+much simpler to parse.
-A simple example with two paragraphs:
+The main potential drawback is that one can no longer paste HTML
+blocks into Markdown documents with 100% reliability. However,
+*in most cases* this will work fine, because the blank lines in
+HTML are usually followed by HTML block tags. For example:
```````````````````````````````` example
-aaa
-
-bbb
-.
-<p>aaa</p>
-<p>bbb</p>
-````````````````````````````````
+<table>
+<tr>
-Paragraphs can contain multiple lines, but no blank lines:
+<td>
+Hi
+</td>
-```````````````````````````````` example
-aaa
-bbb
+</tr>
-ccc
-ddd
+</table>
.
-<p>aaa
-bbb</p>
-<p>ccc
-ddd</p>
+<table>
+<tr>
+<td>
+Hi
+</td>
+</tr>
+</table>
````````````````````````````````
-Multiple blank lines between paragraph have no effect:
+There are problems, however, if the inner tags are indented
+*and* separated by spaces, as then they will be interpreted as
+an indented code block:
```````````````````````````````` example
-aaa
+<table>
+ <tr>
-bbb
+ <td>
+ Hi
+ </td>
+
+ </tr>
+
+</table>
.
-<p>aaa</p>
-<p>bbb</p>
+<table>
+ <tr>
+<pre><code><td>
+ Hi
+</td>
+</code></pre>
+ </tr>
+</table>
````````````````````````````````
-Leading spaces are skipped:
+Fortunately, blank lines are usually not necessary and can be
+deleted. The exception is inside `<pre>` tags, but as described
+[above][HTML blocks], raw HTML blocks starting with `<pre>`
+*can* contain blank lines.
+
+## Link reference definitions
+
+A [link reference definition](@)
+consists of a [link label], indented up to three spaces, followed
+by a colon (`:`), optional [whitespace] (including up to one
+[line ending]), a [link destination],
+optional [whitespace] (including up to one
+[line ending]), and an optional [link
+title], which if it is present must be separated
+from the [link destination] by [whitespace].
+No further [non-whitespace characters] may occur on the line.
+
+A [link reference definition]
+does not correspond to a structural element of a document. Instead, it
+defines a label which can be used in [reference links]
+and reference-style [images] elsewhere in the document. [Link
+reference definitions] can come either before or after the links that use
+them.
```````````````````````````````` example
- aaa
- bbb
+[foo]: /url "title"
+
+[foo]
.
-<p>aaa
-bbb</p>
+<p><a href="/url" title="title">foo</a></p>
````````````````````````````````
-Lines after the first may be indented any amount, since indented
-code blocks cannot interrupt paragraphs.
-
```````````````````````````````` example
-aaa
- bbb
- ccc
+ [foo]:
+ /url
+ 'the title'
+
+[foo]
.
-<p>aaa
-bbb
-ccc</p>
+<p><a href="/url" title="the title">foo</a></p>
````````````````````````````````
-However, the first line may be indented at most three spaces,
-or an indented code block will be triggered:
-
```````````````````````````````` example
- aaa
-bbb
+[Foo*bar\]]:my_(url) 'title (with parens)'
+
+[Foo*bar\]]
.
-<p>aaa
-bbb</p>
+<p><a href="my_(url)" title="title (with parens)">Foo*bar]</a></p>
````````````````````````````````
```````````````````````````````` example
- aaa
-bbb
+[Foo bar]:
+<my url>
+'title'
+
+[Foo bar]
.
-<pre><code>aaa
-</code></pre>
-<p>bbb</p>
+<p><a href="my%20url" title="title">Foo bar</a></p>
````````````````````````````````
-Final spaces are stripped before inline parsing, so a paragraph
-that ends with two or more spaces will not end with a [hard line
-break]:
+The title may extend over multiple lines:
```````````````````````````````` example
-aaa
-bbb
+[foo]: /url '
+title
+line1
+line2
+'
+
+[foo]
.
-<p>aaa<br />
-bbb</p>
+<p><a href="/url" title="
+title
+line1
+line2
+">foo</a></p>
````````````````````````````````
-## Blank lines
-
-[Blank lines] between block-level elements are ignored,
-except for the role they play in determining whether a [list]
-is [tight] or [loose].
-
-Blank lines at the beginning and end of the document are also ignored.
+However, it may not contain a [blank line]:
```````````````````````````````` example
-
-
-aaa
-
+[foo]: /url 'title
-# aaa
+with blank line'
-
+[foo]
.
-<p>aaa</p>
-<h1>aaa</h1>
+<p>[foo]: /url 'title</p>
+<p>with blank line'</p>
+<p>[foo]</p>
````````````````````````````````
+The title may be omitted:
-# Container blocks
-
-A [container block](#container-blocks) is a block that has other
-blocks as its contents. There are two basic kinds of container blocks:
-[block quotes] and [list items].
-[Lists] are meta-containers for [list items].
-
-We define the syntax for container blocks recursively. The general
-form of the definition is:
-
-> If X is a sequence of blocks, then the result of
-> transforming X in such-and-such a way is a container of type Y
-> with these blocks as its content.
+```````````````````````````````` example
+[foo]:
+/url
-So, we explain what counts as a block quote or list item by explaining
-how these can be *generated* from their contents. This should suffice
-to define the syntax, although it does not give a recipe for *parsing*
-these constructions. (A recipe is provided below in the section entitled
-[A parsing strategy](#appendix-a-parsing-strategy).)
+[foo]
+.
+<p><a href="/url">foo</a></p>
+````````````````````````````````
-## Block quotes
-A [block quote marker](@)
-consists of 0-3 spaces of initial indent, plus (a) the character `>` together
-with a following space, or (b) a single character `>` not followed by a space.
+The link destination may not be omitted:
-The following rules define [block quotes]:
+```````````````````````````````` example
+[foo]:
-1. **Basic case.** If a string of lines *Ls* constitute a sequence
- of blocks *Bs*, then the result of prepending a [block quote
- marker] to the beginning of each line in *Ls*
- is a [block quote](#block-quotes) containing *Bs*.
+[foo]
+.
+<p>[foo]:</p>
+<p>[foo]</p>
+````````````````````````````````
-2. **Laziness.** If a string of lines *Ls* constitute a [block
- quote](#block-quotes) with contents *Bs*, then the result of deleting
- the initial [block quote marker] from one or
- more lines in which the next [non-whitespace character] after the [block
- quote marker] is [paragraph continuation
- text] is a block quote with *Bs* as its content.
- [Paragraph continuation text](@) is text
- that will be parsed as part of the content of a paragraph, but does
- not occur at the beginning of the paragraph.
+ However, an empty link destination may be specified using
+ angle brackets:
-3. **Consecutiveness.** A document cannot contain two [block
- quotes] in a row unless there is a [blank line] between them.
+```````````````````````````````` example
+[foo]: <>
-Nothing else counts as a [block quote](#block-quotes).
+[foo]
+.
+<p><a href="">foo</a></p>
+````````````````````````````````
-Here is a simple example:
+The title must be separated from the link destination by
+whitespace:
```````````````````````````````` example
-> # Foo
-> bar
-> baz
+[foo]: <bar>(baz)
+
+[foo]
.
-<blockquote>
-<h1>Foo</h1>
-<p>bar
-baz</p>
-</blockquote>
+<p>[foo]: <bar>(baz)</p>
+<p>[foo]</p>
````````````````````````````````
-The spaces after the `>` characters can be omitted:
+Both title and destination can contain backslash escapes
+and literal backslashes:
```````````````````````````````` example
-># Foo
->bar
-> baz
+[foo]: /url\bar\*baz "foo\"bar\baz"
+
+[foo]
.
-<blockquote>
-<h1>Foo</h1>
-<p>bar
-baz</p>
-</blockquote>
+<p><a href="/url%5Cbar*baz" title="foo"bar\baz">foo</a></p>
````````````````````````````````
-The `>` characters can be indented 1-3 spaces:
+A link can come before its corresponding definition:
```````````````````````````````` example
- > # Foo
- > bar
- > baz
+[foo]
+
+[foo]: url
.
-<blockquote>
-<h1>Foo</h1>
-<p>bar
-baz</p>
-</blockquote>
+<p><a href="url">foo</a></p>
````````````````````````````````
-Four spaces gives us a code block:
+If there are several matching definitions, the first one takes
+precedence:
```````````````````````````````` example
- > # Foo
- > bar
- > baz
+[foo]
+
+[foo]: first
+[foo]: second
.
-<pre><code>> # Foo
-> bar
-> baz
-</code></pre>
+<p><a href="first">foo</a></p>
````````````````````````````````
-The Laziness clause allows us to omit the `>` before
-[paragraph continuation text]:
+As noted in the section on [Links], matching of labels is
+case-insensitive (see [matches]).
```````````````````````````````` example
-> # Foo
-> bar
-baz
+[FOO]: /url
+
+[Foo]
.
-<blockquote>
-<h1>Foo</h1>
-<p>bar
-baz</p>
-</blockquote>
+<p><a href="/url">Foo</a></p>
````````````````````````````````
-A block quote can contain some lazy and some non-lazy
-continuation lines:
-
```````````````````````````````` example
-> bar
-baz
-> foo
+[ΑΓΩ]: /φου
+
+[αγω]
.
-<blockquote>
-<p>bar
-baz
-foo</p>
-</blockquote>
+<p><a href="/%CF%86%CE%BF%CF%85">αγω</a></p>
````````````````````````````````
-Laziness only applies to lines that would have been continuations of
-paragraphs had they been prepended with [block quote markers].
-For example, the `> ` cannot be omitted in the second line of
-
-``` markdown
-> foo
-> ---
-```
-
-without changing the meaning:
+Here is a link reference definition with no corresponding link.
+It contributes nothing to the document.
```````````````````````````````` example
-> foo
----
+[foo]: /url
.
-<blockquote>
-<p>foo</p>
-</blockquote>
-<hr />
````````````````````````````````
-Similarly, if we omit the `> ` in the second line of
-
-``` markdown
-> - foo
-> - bar
-```
-
-then the block quote ends after the first line:
+Here is another one:
```````````````````````````````` example
-> - foo
-- bar
+[
+foo
+]: /url
+bar
.
-<blockquote>
-<ul>
-<li>foo</li>
-</ul>
-</blockquote>
-<ul>
-<li>bar</li>
-</ul>
+<p>bar</p>
````````````````````````````````
-For the same reason, we can't omit the `> ` in front of
-subsequent lines of an indented or fenced code block:
+This is not a link reference definition, because there are
+[non-whitespace characters] after the title:
```````````````````````````````` example
-> foo
- bar
+[foo]: /url "title" ok
.
-<blockquote>
-<pre><code>foo
-</code></pre>
-</blockquote>
-<pre><code>bar
-</code></pre>
+<p>[foo]: /url "title" ok</p>
````````````````````````````````
+This is a link reference definition, but it has no title:
+
```````````````````````````````` example
-> ```
-foo
-```
+[foo]: /url
+"title" ok
.
-<blockquote>
-<pre><code></code></pre>
-</blockquote>
-<p>foo</p>
-<pre><code></code></pre>
+<p>"title" ok</p>
````````````````````````````````
-Note that in the following case, we have a [lazy
-continuation line]:
+This is not a link reference definition, because it is indented
+four spaces:
```````````````````````````````` example
-> foo
- - bar
+ [foo]: /url "title"
+
+[foo]
.
-<blockquote>
-<p>foo
-- bar</p>
-</blockquote>
+<pre><code>[foo]: /url "title"
+</code></pre>
+<p>[foo]</p>
````````````````````````````````
-To see why, note that in
+This is not a link reference definition, because it occurs inside
+a code block:
-```markdown
-> foo
-> - bar
+```````````````````````````````` example
+```
+[foo]: /url
```
-the `- bar` is indented too far to start a list, and can't
-be an indented code block because indented code blocks cannot
-interrupt paragraphs, so it is [paragraph continuation text].
+[foo]
+.
+<pre><code>[foo]: /url
+</code></pre>
+<p>[foo]</p>
+````````````````````````````````
-A block quote can be empty:
+
+A [link reference definition] cannot interrupt a paragraph.
```````````````````````````````` example
->
+Foo
+[bar]: /baz
+
+[bar]
.
-<blockquote>
-</blockquote>
+<p>Foo
+[bar]: /baz</p>
+<p>[bar]</p>
````````````````````````````````
+However, it can directly follow other block elements, such as headings
+and thematic breaks, and it need not be followed by a blank line.
+
```````````````````````````````` example
->
->
->
+# [Foo]
+[foo]: /url
+> bar
.
+<h1><a href="/url">Foo</a></h1>
<blockquote>
+<p>bar</p>
</blockquote>
````````````````````````````````
-
-A block quote can have initial or final blank lines:
+```````````````````````````````` example
+[foo]: /url
+bar
+===
+[foo]
+.
+<h1>bar</h1>
+<p><a href="/url">foo</a></p>
+````````````````````````````````
```````````````````````````````` example
->
-> foo
->
+[foo]: /url
+===
+[foo]
.
-<blockquote>
-<p>foo</p>
-</blockquote>
+<p>===
+<a href="/url">foo</a></p>
````````````````````````````````
-A blank line always separates block quotes:
+Several [link reference definitions]
+can occur one after another, without intervening blank lines.
```````````````````````````````` example
-> foo
+[foo]: /foo-url "foo"
+[bar]: /bar-url
+ "bar"
+[baz]: /baz-url
-> bar
+[foo],
+[bar],
+[baz]
.
-<blockquote>
-<p>foo</p>
-</blockquote>
-<blockquote>
-<p>bar</p>
-</blockquote>
+<p><a href="/foo-url" title="foo">foo</a>,
+<a href="/bar-url" title="bar">bar</a>,
+<a href="/baz-url">baz</a></p>
````````````````````````````````
-(Most current Markdown implementations, including John Gruber's
-original `Markdown.pl`, will parse this example as a single block quote
-with two paragraphs. But it seems better to allow the author to decide
-whether two block quotes or one are wanted.)
-
-Consecutiveness means that if we put these block quotes together,
-we get a single block quote:
+[Link reference definitions] can occur
+inside block containers, like lists and block quotations. They
+affect the entire document, not just the container in which they
+are defined:
```````````````````````````````` example
-> foo
-> bar
+[foo]
+
+> [foo]: /url
.
+<p><a href="/url">foo</a></p>
<blockquote>
-<p>foo
-bar</p>
</blockquote>
````````````````````````````````
-To get a block quote with two paragraphs, use:
+Whether something is a [link reference definition] is
+independent of whether the link reference it defines is
+used in the document. Thus, for example, the following
+document contains just a link reference definition, and
+no visible content:
```````````````````````````````` example
-> foo
->
-> bar
+[foo]: /url
.
-<blockquote>
-<p>foo</p>
-<p>bar</p>
-</blockquote>
````````````````````````````````
-Block quotes can interrupt paragraphs:
+## Paragraphs
+
+A sequence of non-blank lines that cannot be interpreted as other
+kinds of blocks forms a [paragraph](@).
+The contents of the paragraph are the result of parsing the
+paragraph's raw content as inlines. The paragraph's raw content
+is formed by concatenating the lines and removing initial and final
+[whitespace].
+
+A simple example with two paragraphs:
```````````````````````````````` example
-foo
-> bar
+aaa
+
+bbb
.
-<p>foo</p>
-<blockquote>
-<p>bar</p>
-</blockquote>
+<p>aaa</p>
+<p>bbb</p>
````````````````````````````````
-In general, blank lines are not needed before or after block
-quotes:
+Paragraphs can contain multiple lines, but no blank lines:
```````````````````````````````` example
-> aaa
-***
-> bbb
+aaa
+bbb
+
+ccc
+ddd
.
-<blockquote>
-<p>aaa</p>
-</blockquote>
-<hr />
-<blockquote>
-<p>bbb</p>
-</blockquote>
+<p>aaa
+bbb</p>
+<p>ccc
+ddd</p>
````````````````````````````````
-However, because of laziness, a blank line is needed between
-a block quote and a following paragraph:
+Multiple blank lines between paragraph have no effect:
```````````````````````````````` example
-> bar
-baz
+aaa
+
+
+bbb
.
-<blockquote>
-<p>bar
-baz</p>
-</blockquote>
+<p>aaa</p>
+<p>bbb</p>
````````````````````````````````
-```````````````````````````````` example
-> bar
+Leading spaces are skipped:
-baz
+```````````````````````````````` example
+ aaa
+ bbb
.
-<blockquote>
-<p>bar</p>
-</blockquote>
-<p>baz</p>
+<p>aaa
+bbb</p>
````````````````````````````````
+Lines after the first may be indented any amount, since indented
+code blocks cannot interrupt paragraphs.
+
```````````````````````````````` example
-> bar
->
-baz
+aaa
+ bbb
+ ccc
.
-<blockquote>
-<p>bar</p>
-</blockquote>
-<p>baz</p>
+<p>aaa
+bbb
+ccc</p>
````````````````````````````````
-It is a consequence of the Laziness rule that any number
-of initial `>`s may be omitted on a continuation line of a
-nested block quote:
+However, the first line may be indented at most three spaces,
+or an indented code block will be triggered:
```````````````````````````````` example
-> > > foo
-bar
+ aaa
+bbb
.
-<blockquote>
-<blockquote>
-<blockquote>
-<p>foo
-bar</p>
-</blockquote>
-</blockquote>
-</blockquote>
+<p>aaa
+bbb</p>
````````````````````````````````
```````````````````````````````` example
->>> foo
-> bar
->>baz
+ aaa
+bbb
.
-<blockquote>
-<blockquote>
-<blockquote>
-<p>foo
-bar
-baz</p>
-</blockquote>
-</blockquote>
-</blockquote>
+<pre><code>aaa
+</code></pre>
+<p>bbb</p>
````````````````````````````````
-When including an indented code block in a block quote,
-remember that the [block quote marker] includes
-both the `>` and a following space. So *five spaces* are needed after
-the `>`:
+Final spaces are stripped before inline parsing, so a paragraph
+that ends with two or more spaces will not end with a [hard line
+break]:
```````````````````````````````` example
-> code
-
-> not code
+aaa
+bbb
.
-<blockquote>
-<pre><code>code
-</code></pre>
-</blockquote>
-<blockquote>
-<p>not code</p>
-</blockquote>
+<p>aaa<br />
+bbb</p>
````````````````````````````````
+## Blank lines
-## List items
+[Blank lines] between block-level elements are ignored,
+except for the role they play in determining whether a [list]
+is [tight] or [loose].
-A [list marker](@) is a
-[bullet list marker] or an [ordered list marker].
+Blank lines at the beginning and end of the document are also ignored.
-A [bullet list marker](@)
-is a `-`, `+`, or `*` character.
+```````````````````````````````` example
+
-An [ordered list marker](@)
-is a sequence of 1--9 arabic digits (`0-9`), followed by either a
-`.` character or a `)` character. (The reason for the length
-limit is that with 10 digits we start seeing integer overflows
-in some browsers.)
+aaa
+
-The following rules define [list items]:
+# aaa
-1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
- blocks *Bs* starting with a [non-whitespace character], and *M* is a
- list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result
- of prepending *M* and the following spaces to the first line of
- *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
- list item with *Bs* as its contents. The type of the list item
- (bullet or ordered) is determined by the type of its list marker.
- If the list item is ordered, then it is also assigned a start
- number, based on the ordered list marker.
+
+.
+<p>aaa</p>
+<h1>aaa</h1>
+````````````````````````````````
- Exceptions:
- 1. When the first list item in a [list] interrupts
- a paragraph---that is, when it starts on a line that would
- otherwise count as [paragraph continuation text]---then (a)
- the lines *Ls* must not begin with a blank line, and (b) if
- the list item is ordered, the start number must be 1.
- 2. If any line is a [thematic break][thematic breaks] then
- that line is not a list item.
-For example, let *Ls* be the lines
+# Container blocks
-```````````````````````````````` example
-A paragraph
-with two lines.
+A [container block](#container-blocks) is a block that has other
+blocks as its contents. There are two basic kinds of container blocks:
+[block quotes] and [list items].
+[Lists] are meta-containers for [list items].
- indented code
+We define the syntax for container blocks recursively. The general
+form of the definition is:
-> A block quote.
-.
-<p>A paragraph
-with two lines.</p>
-<pre><code>indented code
-</code></pre>
-<blockquote>
-<p>A block quote.</p>
-</blockquote>
-````````````````````````````````
+> If X is a sequence of blocks, then the result of
+> transforming X in such-and-such a way is a container of type Y
+> with these blocks as its content.
+So, we explain what counts as a block quote or list item by explaining
+how these can be *generated* from their contents. This should suffice
+to define the syntax, although it does not give a recipe for *parsing*
+these constructions. (A recipe is provided below in the section entitled
+[A parsing strategy](#appendix-a-parsing-strategy).)
-And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says
-that the following is an ordered list item with start number 1,
-and the same contents as *Ls*:
+## Block quotes
-```````````````````````````````` example
-1. A paragraph
- with two lines.
+A [block quote marker](@)
+consists of 0-3 spaces of initial indent, plus (a) the character `>` together
+with a following space, or (b) a single character `>` not followed by a space.
- indented code
+The following rules define [block quotes]:
- > A block quote.
-.
-<ol>
-<li>
-<p>A paragraph
-with two lines.</p>
-<pre><code>indented code
-</code></pre>
-<blockquote>
-<p>A block quote.</p>
-</blockquote>
-</li>
-</ol>
-````````````````````````````````
+1. **Basic case.** If a string of lines *Ls* constitute a sequence
+ of blocks *Bs*, then the result of prepending a [block quote
+ marker] to the beginning of each line in *Ls*
+ is a [block quote](#block-quotes) containing *Bs*.
+2. **Laziness.** If a string of lines *Ls* constitute a [block
+ quote](#block-quotes) with contents *Bs*, then the result of deleting
+ the initial [block quote marker] from one or
+ more lines in which the next [non-whitespace character] after the [block
+ quote marker] is [paragraph continuation
+ text] is a block quote with *Bs* as its content.
+ [Paragraph continuation text](@) is text
+ that will be parsed as part of the content of a paragraph, but does
+ not occur at the beginning of the paragraph.
-The most important thing to notice is that the position of
-the text after the list marker determines how much indentation
-is needed in subsequent blocks in the list item. If the list
-marker takes up two spaces, and there are three spaces between
-the list marker and the next [non-whitespace character], then blocks
-must be indented five spaces in order to fall under the list
-item.
+3. **Consecutiveness.** A document cannot contain two [block
+ quotes] in a row unless there is a [blank line] between them.
-Here are some examples showing how far content must be indented to be
-put under the list item:
+Nothing else counts as a [block quote](#block-quotes).
-```````````````````````````````` example
-- one
+Here is a simple example:
- two
+```````````````````````````````` example
+> # Foo
+> bar
+> baz
.
-<ul>
-<li>one</li>
-</ul>
-<p>two</p>
+<blockquote>
+<h1>Foo</h1>
+<p>bar
+baz</p>
+</blockquote>
````````````````````````````````
-```````````````````````````````` example
-- one
+The spaces after the `>` characters can be omitted:
- two
+```````````````````````````````` example
+># Foo
+>bar
+> baz
.
-<ul>
-<li>
-<p>one</p>
-<p>two</p>
-</li>
-</ul>
+<blockquote>
+<h1>Foo</h1>
+<p>bar
+baz</p>
+</blockquote>
````````````````````````````````
-```````````````````````````````` example
- - one
+The `>` characters can be indented 1-3 spaces:
- two
+```````````````````````````````` example
+ > # Foo
+ > bar
+ > baz
.
-<ul>
-<li>one</li>
-</ul>
-<pre><code> two
-</code></pre>
+<blockquote>
+<h1>Foo</h1>
+<p>bar
+baz</p>
+</blockquote>
````````````````````````````````
-```````````````````````````````` example
- - one
+Four spaces gives us a code block:
- two
+```````````````````````````````` example
+ > # Foo
+ > bar
+ > baz
.
-<ul>
-<li>
-<p>one</p>
-<p>two</p>
-</li>
-</ul>
+<pre><code>> # Foo
+> bar
+> baz
+</code></pre>
````````````````````````````````
-It is tempting to think of this in terms of columns: the continuation
-blocks must be indented at least to the column of the first
-[non-whitespace character] after the list marker. However, that is not quite right.
-The spaces after the list marker determine how much relative indentation
-is needed. Which column this indentation reaches will depend on
-how the list item is embedded in other constructions, as shown by
-this example:
+The Laziness clause allows us to omit the `>` before
+[paragraph continuation text]:
```````````````````````````````` example
- > > 1. one
->>
->> two
+> # Foo
+> bar
+baz
.
<blockquote>
-<blockquote>
-<ol>
-<li>
-<p>one</p>
-<p>two</p>
-</li>
-</ol>
-</blockquote>
+<h1>Foo</h1>
+<p>bar
+baz</p>
</blockquote>
````````````````````````````````
-Here `two` occurs in the same column as the list marker `1.`,
-but is actually contained in the list item, because there is
-sufficient indentation after the last containing blockquote marker.
-
-The converse is also possible. In the following example, the word `two`
-occurs far to the right of the initial text of the list item, `one`, but
-it is not considered part of the list item, because it is not indented
-far enough past the blockquote marker:
+A block quote can contain some lazy and some non-lazy
+continuation lines:
```````````````````````````````` example
->>- one
->>
- > > two
+> bar
+baz
+> foo
.
<blockquote>
-<blockquote>
-<ul>
-<li>one</li>
-</ul>
-<p>two</p>
-</blockquote>
+<p>bar
+baz
+foo</p>
</blockquote>
````````````````````````````````
-Note that at least one space is needed between the list marker and
-any following content, so these are not list items:
+Laziness only applies to lines that would have been continuations of
+paragraphs had they been prepended with [block quote markers].
+For example, the `> ` cannot be omitted in the second line of
-```````````````````````````````` example
--one
+``` markdown
+> foo
+> ---
+```
-2.two
+without changing the meaning:
+
+```````````````````````````````` example
+> foo
+---
.
-<p>-one</p>
-<p>2.two</p>
+<blockquote>
+<p>foo</p>
+</blockquote>
+<hr />
````````````````````````````````
-A list item may contain blocks that are separated by more than
-one blank line.
+Similarly, if we omit the `> ` in the second line of
-```````````````````````````````` example
-- foo
+``` markdown
+> - foo
+> - bar
+```
+then the block quote ends after the first line:
- bar
+```````````````````````````````` example
+> - foo
+- bar
.
+<blockquote>
<ul>
-<li>
-<p>foo</p>
-<p>bar</p>
-</li>
+<li>foo</li>
+</ul>
+</blockquote>
+<ul>
+<li>bar</li>
</ul>
````````````````````````````````
-A list item may contain any kind of block:
+For the same reason, we can't omit the `> ` in front of
+subsequent lines of an indented or fenced code block:
```````````````````````````````` example
-1. foo
-
- ```
+> foo
bar
- ```
-
- baz
-
- > bam
.
-<ol>
-<li>
-<p>foo</p>
+<blockquote>
+<pre><code>foo
+</code></pre>
+</blockquote>
<pre><code>bar
</code></pre>
-<p>baz</p>
+````````````````````````````````
+
+
+```````````````````````````````` example
+> ```
+foo
+```
+.
<blockquote>
-<p>bam</p>
+<pre><code></code></pre>
</blockquote>
-</li>
-</ol>
+<p>foo</p>
+<pre><code></code></pre>
````````````````````````````````
-A list item that contains an indented code block will preserve
-empty lines within the code block verbatim.
+Note that in the following case, we have a [lazy
+continuation line]:
```````````````````````````````` example
-- Foo
-
- bar
+> foo
+ - bar
+.
+<blockquote>
+<p>foo
+- bar</p>
+</blockquote>
+````````````````````````````````
- baz
-.
-<ul>
-<li>
-<p>Foo</p>
-<pre><code>bar
+To see why, note that in
+```markdown
+> foo
+> - bar
+```
-baz
-</code></pre>
-</li>
-</ul>
-````````````````````````````````
+the `- bar` is indented too far to start a list, and can't
+be an indented code block because indented code blocks cannot
+interrupt paragraphs, so it is [paragraph continuation text].
-Note that ordered list start numbers must be nine digits or less:
+A block quote can be empty:
```````````````````````````````` example
-123456789. ok
+>
.
-<ol start="123456789">
-<li>ok</li>
-</ol>
+<blockquote>
+</blockquote>
````````````````````````````````
```````````````````````````````` example
-1234567890. not ok
+>
+>
+>
.
-<p>1234567890. not ok</p>
+<blockquote>
+</blockquote>
````````````````````````````````
-A start number may begin with 0s:
+A block quote can have initial or final blank lines:
```````````````````````````````` example
-0. ok
+>
+> foo
+>
.
-<ol start="0">
-<li>ok</li>
-</ol>
+<blockquote>
+<p>foo</p>
+</blockquote>
````````````````````````````````
+A blank line always separates block quotes:
+
```````````````````````````````` example
-003. ok
+> foo
+
+> bar
.
-<ol start="3">
-<li>ok</li>
-</ol>
+<blockquote>
+<p>foo</p>
+</blockquote>
+<blockquote>
+<p>bar</p>
+</blockquote>
````````````````````````````````
-A start number may not be negative:
+(Most current Markdown implementations, including John Gruber's
+original `Markdown.pl`, will parse this example as a single block quote
+with two paragraphs. But it seems better to allow the author to decide
+whether two block quotes or one are wanted.)
+
+Consecutiveness means that if we put these block quotes together,
+we get a single block quote:
```````````````````````````````` example
--1. not ok
+> foo
+> bar
.
-<p>-1. not ok</p>
+<blockquote>
+<p>foo
+bar</p>
+</blockquote>
````````````````````````````````
-
-2. **Item starting with indented code.** If a sequence of lines *Ls*
- constitute a sequence of blocks *Bs* starting with an indented code
- block, and *M* is a list marker of width *W* followed by
- one space, then the result of prepending *M* and the following
- space to the first line of *Ls*, and indenting subsequent lines of
- *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
- If a line is empty, then it need not be indented. The type of the
- list item (bullet or ordered) is determined by the type of its list
- marker. If the list item is ordered, then it is also assigned a
- start number, based on the ordered list marker.
-
-An indented code block will have to be indented four spaces beyond
-the edge of the region where text will be included in the list item.
-In the following case that is 6 spaces:
+To get a block quote with two paragraphs, use:
```````````````````````````````` example
-- foo
-
- bar
+> foo
+>
+> bar
.
-<ul>
-<li>
+<blockquote>
<p>foo</p>
-<pre><code>bar
-</code></pre>
-</li>
-</ul>
+<p>bar</p>
+</blockquote>
````````````````````````````````
-And in this case it is 11 spaces:
+Block quotes can interrupt paragraphs:
```````````````````````````````` example
- 10. foo
-
- bar
+foo
+> bar
.
-<ol start="10">
-<li>
<p>foo</p>
-<pre><code>bar
-</code></pre>
-</li>
-</ol>
+<blockquote>
+<p>bar</p>
+</blockquote>
````````````````````````````````
-If the *first* block in the list item is an indented code block,
-then by rule #2, the contents must be indented *one* space after the
-list marker:
+In general, blank lines are not needed before or after block
+quotes:
```````````````````````````````` example
- indented code
-
-paragraph
-
- more code
-.
-<pre><code>indented code
-</code></pre>
-<p>paragraph</p>
-<pre><code>more code
-</code></pre>
-````````````````````````````````
-
-
-```````````````````````````````` example
-1. indented code
-
- paragraph
-
- more code
-.
-<ol>
-<li>
-<pre><code>indented code
-</code></pre>
-<p>paragraph</p>
-<pre><code>more code
-</code></pre>
-</li>
-</ol>
-````````````````````````````````
-
-
-Note that an additional space indent is interpreted as space
-inside the code block:
-
-```````````````````````````````` example
-1. indented code
-
- paragraph
-
- more code
+> aaa
+***
+> bbb
.
-<ol>
-<li>
-<pre><code> indented code
-</code></pre>
-<p>paragraph</p>
-<pre><code>more code
-</code></pre>
-</li>
-</ol>
+<blockquote>
+<p>aaa</p>
+</blockquote>
+<hr />
+<blockquote>
+<p>bbb</p>
+</blockquote>
````````````````````````````````
-Note that rules #1 and #2 only apply to two cases: (a) cases
-in which the lines to be included in a list item begin with a
-[non-whitespace character], and (b) cases in which
-they begin with an indented code
-block. In a case like the following, where the first block begins with
-a three-space indent, the rules do not allow us to form a list item by
-indenting the whole thing and prepending a list marker:
+However, because of laziness, a blank line is needed between
+a block quote and a following paragraph:
```````````````````````````````` example
- foo
-
-bar
+> bar
+baz
.
-<p>foo</p>
-<p>bar</p>
+<blockquote>
+<p>bar
+baz</p>
+</blockquote>
````````````````````````````````
```````````````````````````````` example
-- foo
+> bar
- bar
+baz
.
-<ul>
-<li>foo</li>
-</ul>
+<blockquote>
<p>bar</p>
+</blockquote>
+<p>baz</p>
````````````````````````````````
-This is not a significant restriction, because when a block begins
-with 1-3 spaces indent, the indentation can always be removed without
-a change in interpretation, allowing rule #1 to be applied. So, in
-the above case:
-
```````````````````````````````` example
-- foo
-
- bar
+> bar
+>
+baz
.
-<ul>
-<li>
-<p>foo</p>
+<blockquote>
<p>bar</p>
-</li>
-</ul>
+</blockquote>
+<p>baz</p>
````````````````````````````````
-3. **Item starting with a blank line.** If a sequence of lines *Ls*
- starting with a single [blank line] constitute a (possibly empty)
- sequence of blocks *Bs*, not separated from each other by more than
- one blank line, and *M* is a list marker of width *W*,
- then the result of prepending *M* to the first line of *Ls*, and
- indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
- item with *Bs* as its contents.
- If a line is empty, then it need not be indented. The type of the
- list item (bullet or ordered) is determined by the type of its list
- marker. If the list item is ordered, then it is also assigned a
- start number, based on the ordered list marker.
-
-Here are some list items that start with a blank line but are not empty:
+It is a consequence of the Laziness rule that any number
+of initial `>`s may be omitted on a continuation line of a
+nested block quote:
```````````````````````````````` example
--
- foo
--
- ```
- bar
- ```
--
- baz
+> > > foo
+bar
.
-<ul>
-<li>foo</li>
-<li>
-<pre><code>bar
-</code></pre>
-</li>
-<li>
-<pre><code>baz
-</code></pre>
-</li>
-</ul>
+<blockquote>
+<blockquote>
+<blockquote>
+<p>foo
+bar</p>
+</blockquote>
+</blockquote>
+</blockquote>
````````````````````````````````
-When the list item starts with a blank line, the number of spaces
-following the list marker doesn't change the required indentation:
```````````````````````````````` example
--
- foo
+>>> foo
+> bar
+>>baz
.
-<ul>
-<li>foo</li>
-</ul>
+<blockquote>
+<blockquote>
+<blockquote>
+<p>foo
+bar
+baz</p>
+</blockquote>
+</blockquote>
+</blockquote>
````````````````````````````````
-A list item can begin with at most one blank line.
-In the following example, `foo` is not part of the list
-item:
+When including an indented code block in a block quote,
+remember that the [block quote marker] includes
+both the `>` and a following space. So *five spaces* are needed after
+the `>`:
```````````````````````````````` example
--
+> code
- foo
+> not code
.
-<ul>
-<li></li>
-</ul>
-<p>foo</p>
+<blockquote>
+<pre><code>code
+</code></pre>
+</blockquote>
+<blockquote>
+<p>not code</p>
+</blockquote>
````````````````````````````````
-Here is an empty bullet list item:
-```````````````````````````````` example
-- foo
--
-- bar
-.
-<ul>
-<li>foo</li>
-<li></li>
-<li>bar</li>
-</ul>
-````````````````````````````````
+## List items
+A [list marker](@) is a
+[bullet list marker] or an [ordered list marker].
-It does not matter whether there are spaces following the [list marker]:
+A [bullet list marker](@)
+is a `-`, `+`, or `*` character.
-```````````````````````````````` example
-- foo
--
-- bar
-.
-<ul>
-<li>foo</li>
-<li></li>
-<li>bar</li>
-</ul>
-````````````````````````````````
+An [ordered list marker](@)
+is a sequence of 1--9 arabic digits (`0-9`), followed by either a
+`.` character or a `)` character. (The reason for the length
+limit is that with 10 digits we start seeing integer overflows
+in some browsers.)
+The following rules define [list items]:
-Here is an empty ordered list item:
+1. **Basic case.** If a sequence of lines *Ls* constitute a sequence of
+ blocks *Bs* starting with a [non-whitespace character], and *M* is a
+ list marker of width *W* followed by 1 ≤ *N* ≤ 4 spaces, then the result
+ of prepending *M* and the following spaces to the first line of
+ *Ls*, and indenting subsequent lines of *Ls* by *W + N* spaces, is a
+ list item with *Bs* as its contents. The type of the list item
+ (bullet or ordered) is determined by the type of its list marker.
+ If the list item is ordered, then it is also assigned a start
+ number, based on the ordered list marker.
-```````````````````````````````` example
-1. foo
-2.
-3. bar
-.
-<ol>
-<li>foo</li>
-<li></li>
-<li>bar</li>
-</ol>
-````````````````````````````````
+ Exceptions:
+ 1. When the first list item in a [list] interrupts
+ a paragraph---that is, when it starts on a line that would
+ otherwise count as [paragraph continuation text]---then (a)
+ the lines *Ls* must not begin with a blank line, and (b) if
+ the list item is ordered, the start number must be 1.
+ 2. If any line is a [thematic break][thematic breaks] then
+ that line is not a list item.
-A list may start or end with an empty list item:
+For example, let *Ls* be the lines
```````````````````````````````` example
-*
-.
-<ul>
-<li></li>
-</ul>
-````````````````````````````````
-
-However, an empty list item cannot interrupt a paragraph:
-
-```````````````````````````````` example
-foo
-*
-
-foo
-1.
-.
-<p>foo
-*</p>
-<p>foo
-1.</p>
-````````````````````````````````
-
-
-4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
- according to rule #1, #2, or #3, then the result of indenting each line
- of *Ls* by 1-3 spaces (the same for each line) also constitutes a
- list item with the same contents and attributes. If a line is
- empty, then it need not be indented.
-
-Indented one space:
-
-```````````````````````````````` example
- 1. A paragraph
- with two lines.
+A paragraph
+with two lines.
- indented code
+ indented code
- > A block quote.
+> A block quote.
.
-<ol>
-<li>
<p>A paragraph
with two lines.</p>
<pre><code>indented code
@@ -4382,20 +4135,20 @@ with two lines.</p>
<blockquote>
<p>A block quote.</p>
</blockquote>
-</li>
-</ol>
````````````````````````````````
-Indented two spaces:
+And let *M* be the marker `1.`, and *N* = 2. Then rule #1 says
+that the following is an ordered list item with start number 1,
+and the same contents as *Ls*:
```````````````````````````````` example
- 1. A paragraph
- with two lines.
+1. A paragraph
+ with two lines.
- indented code
+ indented code
- > A block quote.
+ > A block quote.
.
<ol>
<li>
@@ -4411,658 +4164,750 @@ with two lines.</p>
````````````````````````````````
-Indented three spaces:
+The most important thing to notice is that the position of
+the text after the list marker determines how much indentation
+is needed in subsequent blocks in the list item. If the list
+marker takes up two spaces, and there are three spaces between
+the list marker and the next [non-whitespace character], then blocks
+must be indented five spaces in order to fall under the list
+item.
-```````````````````````````````` example
- 1. A paragraph
- with two lines.
+Here are some examples showing how far content must be indented to be
+put under the list item:
- indented code
+```````````````````````````````` example
+- one
- > A block quote.
+ two
.
-<ol>
-<li>
-<p>A paragraph
-with two lines.</p>
-<pre><code>indented code
-</code></pre>
-<blockquote>
-<p>A block quote.</p>
-</blockquote>
-</li>
-</ol>
+<ul>
+<li>one</li>
+</ul>
+<p>two</p>
````````````````````````````````
-Four spaces indent gives a code block:
-
```````````````````````````````` example
- 1. A paragraph
- with two lines.
-
- indented code
+- one
- > A block quote.
+ two
.
-<pre><code>1. A paragraph
- with two lines.
-
- indented code
-
- > A block quote.
-</code></pre>
+<ul>
+<li>
+<p>one</p>
+<p>two</p>
+</li>
+</ul>
````````````````````````````````
-
-5. **Laziness.** If a string of lines *Ls* constitute a [list
- item](#list-items) with contents *Bs*, then the result of deleting
- some or all of the indentation from one or more lines in which the
- next [non-whitespace character] after the indentation is
- [paragraph continuation text] is a
- list item with the same contents and attributes. The unindented
- lines are called
- [lazy continuation line](@)s.
-
-Here is an example with [lazy continuation lines]:
-
```````````````````````````````` example
- 1. A paragraph
-with two lines.
-
- indented code
+ - one
- > A block quote.
+ two
.
-<ol>
-<li>
-<p>A paragraph
-with two lines.</p>
-<pre><code>indented code
+<ul>
+<li>one</li>
+</ul>
+<pre><code> two
</code></pre>
-<blockquote>
-<p>A block quote.</p>
-</blockquote>
-</li>
-</ol>
````````````````````````````````
-Indentation can be partially deleted:
-
```````````````````````````````` example
- 1. A paragraph
- with two lines.
+ - one
+
+ two
.
-<ol>
-<li>A paragraph
-with two lines.</li>
-</ol>
+<ul>
+<li>
+<p>one</p>
+<p>two</p>
+</li>
+</ul>
````````````````````````````````
-These examples show how laziness can work in nested structures:
+It is tempting to think of this in terms of columns: the continuation
+blocks must be indented at least to the column of the first
+[non-whitespace character] after the list marker. However, that is not quite right.
+The spaces after the list marker determine how much relative indentation
+is needed. Which column this indentation reaches will depend on
+how the list item is embedded in other constructions, as shown by
+this example:
```````````````````````````````` example
-> 1. > Blockquote
-continued here.
+ > > 1. one
+>>
+>> two
.
<blockquote>
+<blockquote>
<ol>
<li>
-<blockquote>
-<p>Blockquote
-continued here.</p>
-</blockquote>
+<p>one</p>
+<p>two</p>
</li>
</ol>
</blockquote>
+</blockquote>
````````````````````````````````
+Here `two` occurs in the same column as the list marker `1.`,
+but is actually contained in the list item, because there is
+sufficient indentation after the last containing blockquote marker.
+
+The converse is also possible. In the following example, the word `two`
+occurs far to the right of the initial text of the list item, `one`, but
+it is not considered part of the list item, because it is not indented
+far enough past the blockquote marker:
+
```````````````````````````````` example
-> 1. > Blockquote
-> continued here.
+>>- one
+>>
+ > > two
.
<blockquote>
-<ol>
-<li>
<blockquote>
-<p>Blockquote
-continued here.</p>
+<ul>
+<li>one</li>
+</ul>
+<p>two</p>
</blockquote>
-</li>
-</ol>
</blockquote>
````````````````````````````````
+Note that at least one space is needed between the list marker and
+any following content, so these are not list items:
+
+```````````````````````````````` example
+-one
-6. **That's all.** Nothing that is not counted as a list item by rules
- #1--5 counts as a [list item](#list-items).
+2.two
+.
+<p>-one</p>
+<p>2.two</p>
+````````````````````````````````
-The rules for sublists follow from the general rules
-[above][List items]. A sublist must be indented the same number
-of spaces a paragraph would need to be in order to be included
-in the list item.
-So, in this case we need two spaces indent:
+A list item may contain blocks that are separated by more than
+one blank line.
```````````````````````````````` example
- foo
- - bar
- - baz
- - boo
+
+
+ bar
.
<ul>
-<li>foo
-<ul>
-<li>bar
-<ul>
-<li>baz
-<ul>
-<li>boo</li>
-</ul>
-</li>
-</ul>
-</li>
-</ul>
+<li>
+<p>foo</p>
+<p>bar</p>
</li>
</ul>
````````````````````````````````
-One is not enough:
+A list item may contain any kind of block:
```````````````````````````````` example
-- foo
- - bar
- - baz
- - boo
-.
-<ul>
-<li>foo</li>
-<li>bar</li>
-<li>baz</li>
-<li>boo</li>
-</ul>
-````````````````````````````````
+1. foo
+ ```
+ bar
+ ```
-Here we need four, because the list marker is wider:
+ baz
-```````````````````````````````` example
-10) foo
- - bar
+ > bam
.
-<ol start="10">
-<li>foo
-<ul>
-<li>bar</li>
-</ul>
+<ol>
+<li>
+<p>foo</p>
+<pre><code>bar
+</code></pre>
+<p>baz</p>
+<blockquote>
+<p>bam</p>
+</blockquote>
</li>
</ol>
````````````````````````````````
-Three is not enough:
+A list item that contains an indented code block will preserve
+empty lines within the code block verbatim.
```````````````````````````````` example
-10) foo
- - bar
-.
-<ol start="10">
-<li>foo</li>
-</ol>
-<ul>
-<li>bar</li>
-</ul>
-````````````````````````````````
+- Foo
+ bar
-A list may be the first block in a list item:
-```````````````````````````````` example
-- - foo
+ baz
.
<ul>
<li>
-<ul>
-<li>foo</li>
-</ul>
+<p>Foo</p>
+<pre><code>bar
+
+
+baz
+</code></pre>
</li>
</ul>
````````````````````````````````
+Note that ordered list start numbers must be nine digits or less:
```````````````````````````````` example
-1. - 2. foo
+123456789. ok
.
-<ol>
-<li>
-<ul>
-<li>
-<ol start="2">
-<li>foo</li>
-</ol>
-</li>
-</ul>
-</li>
+<ol start="123456789">
+<li>ok</li>
</ol>
````````````````````````````````
-A list item can contain a heading:
-
```````````````````````````````` example
-- # Foo
-- Bar
- ---
- baz
+1234567890. not ok
.
-<ul>
-<li>
-<h1>Foo</h1>
-</li>
-<li>
-<h2>Bar</h2>
-baz</li>
-</ul>
+<p>1234567890. not ok</p>
````````````````````````````````
-### Motivation
-
-John Gruber's Markdown spec says the following about list items:
-
-1. "List markers typically start at the left margin, but may be indented
- by up to three spaces. List markers must be followed by one or more
- spaces or a tab."
+A start number may begin with 0s:
-2. "To make lists look nice, you can wrap items with hanging indents....
- But if you don't want to, you don't have to."
+```````````````````````````````` example
+0. ok
+.
+<ol start="0">
+<li>ok</li>
+</ol>
+````````````````````````````````
-3. "List items may consist of multiple paragraphs. Each subsequent
- paragraph in a list item must be indented by either 4 spaces or one
- tab."
-4. "It looks nice if you indent every line of the subsequent paragraphs,
- but here again, Markdown will allow you to be lazy."
+```````````````````````````````` example
+003. ok
+.
+<ol start="3">
+<li>ok</li>
+</ol>
+````````````````````````````````
-5. "To put a blockquote within a list item, the blockquote's `>`
- delimiters need to be indented."
-6. "To put a code block within a list item, the code block needs to be
- indented twice — 8 spaces or two tabs."
+A start number may not be negative:
-These rules specify that a paragraph under a list item must be indented
-four spaces (presumably, from the left margin, rather than the start of
-the list marker, but this is not said), and that code under a list item
-must be indented eight spaces instead of the usual four. They also say
-that a block quote must be indented, but not by how much; however, the
-example given has four spaces indentation. Although nothing is said
-about other kinds of block-level content, it is certainly reasonable to
-infer that *all* block elements under a list item, including other
-lists, must be indented four spaces. This principle has been called the
-*four-space rule*.
+```````````````````````````````` example
+-1. not ok
+.
+<p>-1. not ok</p>
+````````````````````````````````
-The four-space rule is clear and principled, and if the reference
-implementation `Markdown.pl` had followed it, it probably would have
-become the standard. However, `Markdown.pl` allowed paragraphs and
-sublists to start with only two spaces indentation, at least on the
-outer level. Worse, its behavior was inconsistent: a sublist of an
-outer-level list needed two spaces indentation, but a sublist of this
-sublist needed three spaces. It is not surprising, then, that different
-implementations of Markdown have developed very different rules for
-determining what comes under a list item. (Pandoc and python-Markdown,
-for example, stuck with Gruber's syntax description and the four-space
-rule, while discount, redcarpet, marked, PHP Markdown, and others
-followed `Markdown.pl`'s behavior more closely.)
-Unfortunately, given the divergences between implementations, there
-is no way to give a spec for list items that will be guaranteed not
-to break any existing documents. However, the spec given here should
-correctly handle lists formatted with either the four-space rule or
-the more forgiving `Markdown.pl` behavior, provided they are laid out
-in a way that is natural for a human to read.
-The strategy here is to let the width and indentation of the list marker
-determine the indentation necessary for blocks to fall under the list
-item, rather than having a fixed and arbitrary number. The writer can
-think of the body of the list item as a unit which gets indented to the
-right enough to fit the list marker (and any indentation on the list
-marker). (The laziness rule, #5, then allows continuation lines to be
-unindented if needed.)
+2. **Item starting with indented code.** If a sequence of lines *Ls*
+ constitute a sequence of blocks *Bs* starting with an indented code
+ block, and *M* is a list marker of width *W* followed by
+ one space, then the result of prepending *M* and the following
+ space to the first line of *Ls*, and indenting subsequent lines of
+ *Ls* by *W + 1* spaces, is a list item with *Bs* as its contents.
+ If a line is empty, then it need not be indented. The type of the
+ list item (bullet or ordered) is determined by the type of its list
+ marker. If the list item is ordered, then it is also assigned a
+ start number, based on the ordered list marker.
-This rule is superior, we claim, to any rule requiring a fixed level of
-indentation from the margin. The four-space rule is clear but
-unnatural. It is quite unintuitive that
+An indented code block will have to be indented four spaces beyond
+the edge of the region where text will be included in the list item.
+In the following case that is 6 spaces:
-``` markdown
+```````````````````````````````` example
- foo
- bar
-
- - baz
-```
-
-should be parsed as two lists with an intervening paragraph,
-
-``` html
-<ul>
-<li>foo</li>
-</ul>
-<p>bar</p>
-<ul>
-<li>baz</li>
-</ul>
-```
-
-as the four-space rule demands, rather than a single list,
-
-``` html
+ bar
+.
<ul>
<li>
<p>foo</p>
-<p>bar</p>
-<ul>
-<li>baz</li>
-</ul>
+<pre><code>bar
+</code></pre>
</li>
</ul>
-```
-
-The choice of four spaces is arbitrary. It can be learned, but it is
-not likely to be guessed, and it trips up beginners regularly.
-
-Would it help to adopt a two-space rule? The problem is that such
-a rule, together with the rule allowing 1--3 spaces indentation of the
-initial list marker, allows text that is indented *less than* the
-original list marker to be included in the list item. For example,
-`Markdown.pl` parses
+````````````````````````````````
-``` markdown
- - one
- two
-```
+And in this case it is 11 spaces:
-as a single list item, with `two` a continuation paragraph:
+```````````````````````````````` example
+ 10. foo
-``` html
-<ul>
+ bar
+.
+<ol start="10">
<li>
-<p>one</p>
-<p>two</p>
+<p>foo</p>
+<pre><code>bar
+</code></pre>
</li>
-</ul>
-```
+</ol>
+````````````````````````````````
-and similarly
-``` markdown
-> - one
->
-> two
-```
+If the *first* block in the list item is an indented code block,
+then by rule #2, the contents must be indented *one* space after the
+list marker:
-as
+```````````````````````````````` example
+ indented code
-``` html
-<blockquote>
-<ul>
-<li>
-<p>one</p>
-<p>two</p>
+paragraph
+
+ more code
+.
+<pre><code>indented code
+</code></pre>
+<p>paragraph</p>
+<pre><code>more code
+</code></pre>
+````````````````````````````````
+
+
+```````````````````````````````` example
+1. indented code
+
+ paragraph
+
+ more code
+.
+<ol>
+<li>
+<pre><code>indented code
+</code></pre>
+<p>paragraph</p>
+<pre><code>more code
+</code></pre>
</li>
+</ol>
+````````````````````````````````
+
+
+Note that an additional space indent is interpreted as space
+inside the code block:
+
+```````````````````````````````` example
+1. indented code
+
+ paragraph
+
+ more code
+.
+<ol>
+<li>
+<pre><code> indented code
+</code></pre>
+<p>paragraph</p>
+<pre><code>more code
+</code></pre>
+</li>
+</ol>
+````````````````````````````````
+
+
+Note that rules #1 and #2 only apply to two cases: (a) cases
+in which the lines to be included in a list item begin with a
+[non-whitespace character], and (b) cases in which
+they begin with an indented code
+block. In a case like the following, where the first block begins with
+a three-space indent, the rules do not allow us to form a list item by
+indenting the whole thing and prepending a list marker:
+
+```````````````````````````````` example
+ foo
+
+bar
+.
+<p>foo</p>
+<p>bar</p>
+````````````````````````````````
+
+
+```````````````````````````````` example
+- foo
+
+ bar
+.
+<ul>
+<li>foo</li>
</ul>
-</blockquote>
-```
+<p>bar</p>
+````````````````````````````````
-This is extremely unintuitive.
-Rather than requiring a fixed indent from the margin, we could require
-a fixed indent (say, two spaces, or even one space) from the list marker (which
-may itself be indented). This proposal would remove the last anomaly
-discussed. Unlike the spec presented above, it would count the following
-as a list item with a subparagraph, even though the paragraph `bar`
-is not indented as far as the first paragraph `foo`:
+This is not a significant restriction, because when a block begins
+with 1-3 spaces indent, the indentation can always be removed without
+a change in interpretation, allowing rule #1 to be applied. So, in
+the above case:
-``` markdown
- 10. foo
+```````````````````````````````` example
+- foo
- bar
-```
+ bar
+.
+<ul>
+<li>
+<p>foo</p>
+<p>bar</p>
+</li>
+</ul>
+````````````````````````````````
-Arguably this text does read like a list item with `bar` as a subparagraph,
-which may count in favor of the proposal. However, on this proposal indented
-code would have to be indented six spaces after the list marker. And this
-would break a lot of existing Markdown, which has the pattern:
-``` markdown
-1. foo
+3. **Item starting with a blank line.** If a sequence of lines *Ls*
+ starting with a single [blank line] constitute a (possibly empty)
+ sequence of blocks *Bs*, not separated from each other by more than
+ one blank line, and *M* is a list marker of width *W*,
+ then the result of prepending *M* to the first line of *Ls*, and
+ indenting subsequent lines of *Ls* by *W + 1* spaces, is a list
+ item with *Bs* as its contents.
+ If a line is empty, then it need not be indented. The type of the
+ list item (bullet or ordered) is determined by the type of its list
+ marker. If the list item is ordered, then it is also assigned a
+ start number, based on the ordered list marker.
- indented code
-```
+Here are some list items that start with a blank line but are not empty:
-where the code is indented eight spaces. The spec above, by contrast, will
-parse this text as expected, since the code block's indentation is measured
-from the beginning of `foo`.
+```````````````````````````````` example
+-
+ foo
+-
+ ```
+ bar
+ ```
+-
+ baz
+.
+<ul>
+<li>foo</li>
+<li>
+<pre><code>bar
+</code></pre>
+</li>
+<li>
+<pre><code>baz
+</code></pre>
+</li>
+</ul>
+````````````````````````````````
-The one case that needs special treatment is a list item that *starts*
-with indented code. How much indentation is required in that case, since
-we don't have a "first paragraph" to measure from? Rule #2 simply stipulates
-that in such cases, we require one space indentation from the list marker
-(and then the normal four spaces for the indented code). This will match the
-four-space rule in cases where the list marker plus its initial indentation
-takes four spaces (a common case), but diverge in other cases.
+When the list item starts with a blank line, the number of spaces
+following the list marker doesn't change the required indentation:
-## Lists
+```````````````````````````````` example
+-
+ foo
+.
+<ul>
+<li>foo</li>
+</ul>
+````````````````````````````````
-A [list](@) is a sequence of one or more
-list items [of the same type]. The list items
-may be separated by any number of blank lines.
-Two list items are [of the same type](@)
-if they begin with a [list marker] of the same type.
-Two list markers are of the
-same type if (a) they are bullet list markers using the same character
-(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same
-delimiter (either `.` or `)`).
+A list item can begin with at most one blank line.
+In the following example, `foo` is not part of the list
+item:
-A list is an [ordered list](@)
-if its constituent list items begin with
-[ordered list markers], and a
-[bullet list](@) if its constituent list
-items begin with [bullet list markers].
+```````````````````````````````` example
+-
-The [start number](@)
-of an [ordered list] is determined by the list number of
-its initial list item. The numbers of subsequent list items are
-disregarded.
+ foo
+.
+<ul>
+<li></li>
+</ul>
+<p>foo</p>
+````````````````````````````````
-A list is [loose](@) if any of its constituent
-list items are separated by blank lines, or if any of its constituent
-list items directly contain two block-level elements with a blank line
-between them. Otherwise a list is [tight](@).
-(The difference in HTML output is that paragraphs in a loose list are
-wrapped in `<p>` tags, while paragraphs in a tight list are not.)
-Changing the bullet or ordered list delimiter starts a new list:
+Here is an empty bullet list item:
```````````````````````````````` example
- foo
+-
- bar
-+ baz
.
<ul>
<li>foo</li>
+<li></li>
<li>bar</li>
</ul>
+````````````````````````````````
+
+
+It does not matter whether there are spaces following the [list marker]:
+
+```````````````````````````````` example
+- foo
+-
+- bar
+.
<ul>
-<li>baz</li>
+<li>foo</li>
+<li></li>
+<li>bar</li>
</ul>
````````````````````````````````
+Here is an empty ordered list item:
+
```````````````````````````````` example
1. foo
-2. bar
-3) baz
+2.
+3. bar
.
<ol>
<li>foo</li>
+<li></li>
<li>bar</li>
</ol>
-<ol start="3">
-<li>baz</li>
-</ol>
````````````````````````````````
-In CommonMark, a list can interrupt a paragraph. That is,
-no blank line is needed to separate a paragraph from a following
-list:
+A list may start or end with an empty list item:
```````````````````````````````` example
-Foo
-- bar
-- baz
+*
.
-<p>Foo</p>
<ul>
-<li>bar</li>
-<li>baz</li>
+<li></li>
</ul>
````````````````````````````````
-`Markdown.pl` does not allow this, through fear of triggering a list
-via a numeral in a hard-wrapped line:
+However, an empty list item cannot interrupt a paragraph:
-``` markdown
-The number of windows in my house is
-14. The number of doors is 6.
-```
+```````````````````````````````` example
+foo
+*
-Oddly, though, `Markdown.pl` *does* allow a blockquote to
-interrupt a paragraph, even though the same considerations might
-apply.
+foo
+1.
+.
+<p>foo
+*</p>
+<p>foo
+1.</p>
+````````````````````````````````
-In CommonMark, we do allow lists to interrupt paragraphs, for
-two reasons. First, it is natural and not uncommon for people
-to start lists without blank lines:
-``` markdown
-I need to buy
-- new shoes
-- a coat
-- a plane ticket
-```
+4. **Indentation.** If a sequence of lines *Ls* constitutes a list item
+ according to rule #1, #2, or #3, then the result of indenting each line
+ of *Ls* by 1-3 spaces (the same for each line) also constitutes a
+ list item with the same contents and attributes. If a line is
+ empty, then it need not be indented.
-Second, we are attracted to a
+Indented one space:
-> [principle of uniformity](@):
-> if a chunk of text has a certain
-> meaning, it will continue to have the same meaning when put into a
-> container block (such as a list item or blockquote).
+```````````````````````````````` example
+ 1. A paragraph
+ with two lines.
-(Indeed, the spec for [list items] and [block quotes] presupposes
-this principle.) This principle implies that if
+ indented code
-``` markdown
- * I need to buy
- - new shoes
- - a coat
- - a plane ticket
-```
+ > A block quote.
+.
+<ol>
+<li>
+<p>A paragraph
+with two lines.</p>
+<pre><code>indented code
+</code></pre>
+<blockquote>
+<p>A block quote.</p>
+</blockquote>
+</li>
+</ol>
+````````````````````````````````
-is a list item containing a paragraph followed by a nested sublist,
-as all Markdown implementations agree it is (though the paragraph
-may be rendered without `<p>` tags, since the list is "tight"),
-then
-``` markdown
-I need to buy
-- new shoes
-- a coat
-- a plane ticket
-```
+Indented two spaces:
-by itself should be a paragraph followed by a nested sublist.
+```````````````````````````````` example
+ 1. A paragraph
+ with two lines.
-Since it is well established Markdown practice to allow lists to
-interrupt paragraphs inside list items, the [principle of
-uniformity] requires us to allow this outside list items as
-well. ([reStructuredText](http://docutils.sourceforge.net/rst.html)
-takes a different approach, requiring blank lines before lists
-even inside other list items.)
+ indented code
+
+ > A block quote.
+.
+<ol>
+<li>
+<p>A paragraph
+with two lines.</p>
+<pre><code>indented code
+</code></pre>
+<blockquote>
+<p>A block quote.</p>
+</blockquote>
+</li>
+</ol>
+````````````````````````````````
+
+
+Indented three spaces:
+
+```````````````````````````````` example
+ 1. A paragraph
+ with two lines.
+
+ indented code
+
+ > A block quote.
+.
+<ol>
+<li>
+<p>A paragraph
+with two lines.</p>
+<pre><code>indented code
+</code></pre>
+<blockquote>
+<p>A block quote.</p>
+</blockquote>
+</li>
+</ol>
+````````````````````````````````
+
+
+Four spaces indent gives a code block:
+
+```````````````````````````````` example
+ 1. A paragraph
+ with two lines.
+
+ indented code
+
+ > A block quote.
+.
+<pre><code>1. A paragraph
+ with two lines.
+
+ indented code
+
+ > A block quote.
+</code></pre>
+````````````````````````````````
+
+
+
+5. **Laziness.** If a string of lines *Ls* constitute a [list
+ item](#list-items) with contents *Bs*, then the result of deleting
+ some or all of the indentation from one or more lines in which the
+ next [non-whitespace character] after the indentation is
+ [paragraph continuation text] is a
+ list item with the same contents and attributes. The unindented
+ lines are called
+ [lazy continuation line](@)s.
-In order to solve of unwanted lists in paragraphs with
-hard-wrapped numerals, we allow only lists starting with `1` to
-interrupt paragraphs. Thus,
+Here is an example with [lazy continuation lines]:
```````````````````````````````` example
-The number of windows in my house is
-14. The number of doors is 6.
-.
-<p>The number of windows in my house is
-14. The number of doors is 6.</p>
-````````````````````````````````
+ 1. A paragraph
+with two lines.
-We may still get an unintended result in cases like
+ indented code
-```````````````````````````````` example
-The number of windows in my house is
-1. The number of doors is 6.
+ > A block quote.
.
-<p>The number of windows in my house is</p>
<ol>
-<li>The number of doors is 6.</li>
+<li>
+<p>A paragraph
+with two lines.</p>
+<pre><code>indented code
+</code></pre>
+<blockquote>
+<p>A block quote.</p>
+</blockquote>
+</li>
</ol>
````````````````````````````````
-but this rule should prevent most spurious list captures.
-There can be any number of blank lines between items:
+Indentation can be partially deleted:
```````````````````````````````` example
-- foo
+ 1. A paragraph
+ with two lines.
+.
+<ol>
+<li>A paragraph
+with two lines.</li>
+</ol>
+````````````````````````````````
-- bar
+These examples show how laziness can work in nested structures:
-- baz
+```````````````````````````````` example
+> 1. > Blockquote
+continued here.
.
-<ul>
-<li>
-<p>foo</p>
-</li>
+<blockquote>
+<ol>
<li>
-<p>bar</p>
+<blockquote>
+<p>Blockquote
+continued here.</p>
+</blockquote>
</li>
+</ol>
+</blockquote>
+````````````````````````````````
+
+
+```````````````````````````````` example
+> 1. > Blockquote
+> continued here.
+.
+<blockquote>
+<ol>
<li>
-<p>baz</p>
+<blockquote>
+<p>Blockquote
+continued here.</p>
+</blockquote>
</li>
-</ul>
+</ol>
+</blockquote>
````````````````````````````````
+
+
+6. **That's all.** Nothing that is not counted as a list item by rules
+ #1--5 counts as a [list item](#list-items).
+
+The rules for sublists follow from the general rules
+[above][List items]. A sublist must be indented the same number
+of spaces a paragraph would need to be in order to be included
+in the list item.
+
+So, in this case we need two spaces indent:
+
```````````````````````````````` example
- foo
- bar
- baz
-
-
- bim
+ - boo
.
<ul>
<li>foo
<ul>
<li>bar
<ul>
-<li>
-<p>baz</p>
-<p>bim</p>
+<li>baz
+<ul>
+<li>boo</li>
+</ul>
</li>
</ul>
</li>
@@ -5072,778 +4917,938 @@ There can be any number of blank lines between items:
````````````````````````````````
-To separate consecutive lists of the same type, or to separate a
-list from an indented code block that would otherwise be parsed
-as a subparagraph of the final list item, you can insert a blank HTML
-comment:
+One is not enough:
```````````````````````````````` example
- foo
-- bar
-
-<!-- -->
-
-- baz
-- bim
+ - bar
+ - baz
+ - boo
.
<ul>
<li>foo</li>
<li>bar</li>
-</ul>
-<!-- -->
-<ul>
<li>baz</li>
-<li>bim</li>
-</ul>
-````````````````````````````````
-
-
-```````````````````````````````` example
-- foo
-
- notcode
-
-- foo
-
-<!-- -->
-
- code
-.
-<ul>
-<li>
-<p>foo</p>
-<p>notcode</p>
-</li>
-<li>
-<p>foo</p>
-</li>
-</ul>
-<!-- -->
-<pre><code>code
-</code></pre>
-````````````````````````````````
-
-
-List items need not be indented to the same level. The following
-list items will be treated as items at the same list level,
-since none is indented enough to belong to the previous list
-item:
-
-```````````````````````````````` example
-- a
- - b
- - c
- - d
- - e
- - f
-- g
-.
-<ul>
-<li>a</li>
-<li>b</li>
-<li>c</li>
-<li>d</li>
-<li>e</li>
-<li>f</li>
-<li>g</li>
+<li>boo</li>
</ul>
````````````````````````````````
-```````````````````````````````` example
-1. a
-
- 2. b
-
- 3. c
-.
-<ol>
-<li>
-<p>a</p>
-</li>
-<li>
-<p>b</p>
-</li>
-<li>
-<p>c</p>
-</li>
-</ol>
-````````````````````````````````
-
-Note, however, that list items may not be indented more than
-three spaces. Here `- e` is treated as a paragraph continuation
-line, because it is indented more than three spaces:
+Here we need four, because the list marker is wider:
```````````````````````````````` example
-- a
- - b
- - c
- - d
- - e
+10) foo
+ - bar
.
+<ol start="10">
+<li>foo
<ul>
-<li>a</li>
-<li>b</li>
-<li>c</li>
-<li>d
-- e</li>
+<li>bar</li>
</ul>
-````````````````````````````````
-
-And here, `3. c` is treated as in indented code block,
-because it is indented four spaces and preceded by a
-blank line.
-
-```````````````````````````````` example
-1. a
-
- 2. b
-
- 3. c
-.
-<ol>
-<li>
-<p>a</p>
-</li>
-<li>
-<p>b</p>
</li>
</ol>
-<pre><code>3. c
-</code></pre>
````````````````````````````````
-This is a loose list, because there is a blank line between
-two of the list items:
+Three is not enough:
```````````````````````````````` example
-- a
-- b
-
-- c
+10) foo
+ - bar
.
-<ul>
-<li>
-<p>a</p>
-</li>
-<li>
-<p>b</p>
-</li>
-<li>
-<p>c</p>
-</li>
+<ol start="10">
+<li>foo</li>
+</ol>
+<ul>
+<li>bar</li>
</ul>
````````````````````````````````
-So is this, with a empty second item:
+A list may be the first block in a list item:
```````````````````````````````` example
-* a
-*
-
-* c
+- - foo
.
<ul>
<li>
-<p>a</p>
-</li>
-<li></li>
-<li>
-<p>c</p>
+<ul>
+<li>foo</li>
+</ul>
</li>
</ul>
````````````````````````````````
-These are loose lists, even though there is no space between the items,
-because one of the items directly contains two block-level elements
-with a blank line between them:
-
```````````````````````````````` example
-- a
-- b
-
- c
-- d
+1. - 2. foo
.
-<ul>
-<li>
-<p>a</p>
-</li>
+<ol>
<li>
-<p>b</p>
-<p>c</p>
-</li>
+<ul>
<li>
-<p>d</p>
+<ol start="2">
+<li>foo</li>
+</ol>
</li>
</ul>
+</li>
+</ol>
````````````````````````````````
-```````````````````````````````` example
-- a
-- b
+A list item can contain a heading:
- [ref]: /url
-- d
+```````````````````````````````` example
+- # Foo
+- Bar
+ ---
+ baz
.
<ul>
<li>
-<p>a</p>
-</li>
-<li>
-<p>b</p>
+<h1>Foo</h1>
</li>
<li>
-<p>d</p>
-</li>
+<h2>Bar</h2>
+baz</li>
</ul>
````````````````````````````````
-This is a tight list, because the blank lines are in a code block:
+### Motivation
-```````````````````````````````` example
-- a
-- ```
- b
+John Gruber's Markdown spec says the following about list items:
+
+1. "List markers typically start at the left margin, but may be indented
+ by up to three spaces. List markers must be followed by one or more
+ spaces or a tab."
+2. "To make lists look nice, you can wrap items with hanging indents....
+ But if you don't want to, you don't have to."
- ```
-- c
-.
+3. "List items may consist of multiple paragraphs. Each subsequent
+ paragraph in a list item must be indented by either 4 spaces or one
+ tab."
+
+4. "It looks nice if you indent every line of the subsequent paragraphs,
+ but here again, Markdown will allow you to be lazy."
+
+5. "To put a blockquote within a list item, the blockquote's `>`
+ delimiters need to be indented."
+
+6. "To put a code block within a list item, the code block needs to be
+ indented twice — 8 spaces or two tabs."
+
+These rules specify that a paragraph under a list item must be indented
+four spaces (presumably, from the left margin, rather than the start of
+the list marker, but this is not said), and that code under a list item
+must be indented eight spaces instead of the usual four. They also say
+that a block quote must be indented, but not by how much; however, the
+example given has four spaces indentation. Although nothing is said
+about other kinds of block-level content, it is certainly reasonable to
+infer that *all* block elements under a list item, including other
+lists, must be indented four spaces. This principle has been called the
+*four-space rule*.
+
+The four-space rule is clear and principled, and if the reference
+implementation `Markdown.pl` had followed it, it probably would have
+become the standard. However, `Markdown.pl` allowed paragraphs and
+sublists to start with only two spaces indentation, at least on the
+outer level. Worse, its behavior was inconsistent: a sublist of an
+outer-level list needed two spaces indentation, but a sublist of this
+sublist needed three spaces. It is not surprising, then, that different
+implementations of Markdown have developed very different rules for
+determining what comes under a list item. (Pandoc and python-Markdown,
+for example, stuck with Gruber's syntax description and the four-space
+rule, while discount, redcarpet, marked, PHP Markdown, and others
+followed `Markdown.pl`'s behavior more closely.)
+
+Unfortunately, given the divergences between implementations, there
+is no way to give a spec for list items that will be guaranteed not
+to break any existing documents. However, the spec given here should
+correctly handle lists formatted with either the four-space rule or
+the more forgiving `Markdown.pl` behavior, provided they are laid out
+in a way that is natural for a human to read.
+
+The strategy here is to let the width and indentation of the list marker
+determine the indentation necessary for blocks to fall under the list
+item, rather than having a fixed and arbitrary number. The writer can
+think of the body of the list item as a unit which gets indented to the
+right enough to fit the list marker (and any indentation on the list
+marker). (The laziness rule, #5, then allows continuation lines to be
+unindented if needed.)
+
+This rule is superior, we claim, to any rule requiring a fixed level of
+indentation from the margin. The four-space rule is clear but
+unnatural. It is quite unintuitive that
+
+``` markdown
+- foo
+
+ bar
+
+ - baz
+```
+
+should be parsed as two lists with an intervening paragraph,
+
+``` html
<ul>
-<li>a</li>
-<li>
-<pre><code>b
+<li>foo</li>
+</ul>
+<p>bar</p>
+<ul>
+<li>baz</li>
+</ul>
+```
+as the four-space rule demands, rather than a single list,
-</code></pre>
+``` html
+<ul>
+<li>
+<p>foo</p>
+<p>bar</p>
+<ul>
+<li>baz</li>
+</ul>
</li>
-<li>c</li>
</ul>
-````````````````````````````````
+```
+The choice of four spaces is arbitrary. It can be learned, but it is
+not likely to be guessed, and it trips up beginners regularly.
-This is a tight list, because the blank line is between two
-paragraphs of a sublist. So the sublist is loose while
-the outer list is tight:
+Would it help to adopt a two-space rule? The problem is that such
+a rule, together with the rule allowing 1--3 spaces indentation of the
+initial list marker, allows text that is indented *less than* the
+original list marker to be included in the list item. For example,
+`Markdown.pl` parses
-```````````````````````````````` example
-- a
- - b
+``` markdown
+ - one
- c
-- d
-.
-<ul>
-<li>a
+ two
+```
+
+as a single list item, with `two` a continuation paragraph:
+
+``` html
<ul>
<li>
-<p>b</p>
-<p>c</p>
+<p>one</p>
+<p>two</p>
</li>
</ul>
+```
+
+and similarly
+
+``` markdown
+> - one
+>
+> two
+```
+
+as
+
+``` html
+<blockquote>
+<ul>
+<li>
+<p>one</p>
+<p>two</p>
</li>
-<li>d</li>
</ul>
-````````````````````````````````
+</blockquote>
+```
+
+This is extremely unintuitive.
+
+Rather than requiring a fixed indent from the margin, we could require
+a fixed indent (say, two spaces, or even one space) from the list marker (which
+may itself be indented). This proposal would remove the last anomaly
+discussed. Unlike the spec presented above, it would count the following
+as a list item with a subparagraph, even though the paragraph `bar`
+is not indented as far as the first paragraph `foo`:
+
+``` markdown
+ 10. foo
+
+ bar
+```
+
+Arguably this text does read like a list item with `bar` as a subparagraph,
+which may count in favor of the proposal. However, on this proposal indented
+code would have to be indented six spaces after the list marker. And this
+would break a lot of existing Markdown, which has the pattern:
+
+``` markdown
+1. foo
+
+ indented code
+```
+
+where the code is indented eight spaces. The spec above, by contrast, will
+parse this text as expected, since the code block's indentation is measured
+from the beginning of `foo`.
+The one case that needs special treatment is a list item that *starts*
+with indented code. How much indentation is required in that case, since
+we don't have a "first paragraph" to measure from? Rule #2 simply stipulates
+that in such cases, we require one space indentation from the list marker
+(and then the normal four spaces for the indented code). This will match the
+four-space rule in cases where the list marker plus its initial indentation
+takes four spaces (a common case), but diverge in other cases.
-This is a tight list, because the blank line is inside the
-block quote:
+## Lists
-```````````````````````````````` example
-* a
- > b
- >
-* c
-.
-<ul>
-<li>a
-<blockquote>
-<p>b</p>
-</blockquote>
-</li>
-<li>c</li>
-</ul>
-````````````````````````````````
+A [list](@) is a sequence of one or more
+list items [of the same type]. The list items
+may be separated by any number of blank lines.
+Two list items are [of the same type](@)
+if they begin with a [list marker] of the same type.
+Two list markers are of the
+same type if (a) they are bullet list markers using the same character
+(`-`, `+`, or `*`) or (b) they are ordered list numbers with the same
+delimiter (either `.` or `)`).
-This list is tight, because the consecutive block elements
-are not separated by blank lines:
+A list is an [ordered list](@)
+if its constituent list items begin with
+[ordered list markers], and a
+[bullet list](@) if its constituent list
+items begin with [bullet list markers].
-```````````````````````````````` example
-- a
- > b
- ```
- c
- ```
-- d
-.
-<ul>
-<li>a
-<blockquote>
-<p>b</p>
-</blockquote>
-<pre><code>c
-</code></pre>
-</li>
-<li>d</li>
-</ul>
-````````````````````````````````
+The [start number](@)
+of an [ordered list] is determined by the list number of
+its initial list item. The numbers of subsequent list items are
+disregarded.
+A list is [loose](@) if any of its constituent
+list items are separated by blank lines, or if any of its constituent
+list items directly contain two block-level elements with a blank line
+between them. Otherwise a list is [tight](@).
+(The difference in HTML output is that paragraphs in a loose list are
+wrapped in `<p>` tags, while paragraphs in a tight list are not.)
-A single-paragraph list is tight:
+Changing the bullet or ordered list delimiter starts a new list:
```````````````````````````````` example
-- a
+- foo
+- bar
++ baz
.
<ul>
-<li>a</li>
+<li>foo</li>
+<li>bar</li>
</ul>
-````````````````````````````````
-
-
-```````````````````````````````` example
-- a
- - b
-.
-<ul>
-<li>a
<ul>
-<li>b</li>
-</ul>
-</li>
+<li>baz</li>
</ul>
````````````````````````````````
-This list is loose, because of the blank line between the
-two block elements in the list item:
-
```````````````````````````````` example
-1. ```
- foo
- ```
-
- bar
+1. foo
+2. bar
+3) baz
.
<ol>
-<li>
-<pre><code>foo
-</code></pre>
-<p>bar</p>
-</li>
+<li>foo</li>
+<li>bar</li>
+</ol>
+<ol start="3">
+<li>baz</li>
</ol>
````````````````````````````````
-Here the outer list is loose, the inner list tight:
+In CommonMark, a list can interrupt a paragraph. That is,
+no blank line is needed to separate a paragraph from a following
+list:
```````````````````````````````` example
-* foo
- * bar
-
- baz
+Foo
+- bar
+- baz
.
-<ul>
-<li>
-<p>foo</p>
+<p>Foo</p>
<ul>
<li>bar</li>
-</ul>
-<p>baz</p>
-</li>
+<li>baz</li>
</ul>
````````````````````````````````
+`Markdown.pl` does not allow this, through fear of triggering a list
+via a numeral in a hard-wrapped line:
-```````````````````````````````` example
-- a
- - b
- - c
+``` markdown
+The number of windows in my house is
+14. The number of doors is 6.
+```
-- d
- - e
- - f
-.
-<ul>
-<li>
-<p>a</p>
-<ul>
-<li>b</li>
-<li>c</li>
-</ul>
-</li>
-<li>
-<p>d</p>
-<ul>
-<li>e</li>
-<li>f</li>
-</ul>
-</li>
-</ul>
-````````````````````````````````
+Oddly, though, `Markdown.pl` *does* allow a blockquote to
+interrupt a paragraph, even though the same considerations might
+apply.
+In CommonMark, we do allow lists to interrupt paragraphs, for
+two reasons. First, it is natural and not uncommon for people
+to start lists without blank lines:
-# Inlines
+``` markdown
+I need to buy
+- new shoes
+- a coat
+- a plane ticket
+```
-Inlines are parsed sequentially from the beginning of the character
-stream to the end (left to right, in left-to-right languages).
-Thus, for example, in
+Second, we are attracted to a
-```````````````````````````````` example
-`hi`lo`
-.
-<p><code>hi</code>lo`</p>
-````````````````````````````````
+> [principle of uniformity](@):
+> if a chunk of text has a certain
+> meaning, it will continue to have the same meaning when put into a
+> container block (such as a list item or blockquote).
-`hi` is parsed as code, leaving the backtick at the end as a literal
-backtick.
+(Indeed, the spec for [list items] and [block quotes] presupposes
+this principle.) This principle implies that if
+``` markdown
+ * I need to buy
+ - new shoes
+ - a coat
+ - a plane ticket
+```
-## Backslash escapes
+is a list item containing a paragraph followed by a nested sublist,
+as all Markdown implementations agree it is (though the paragraph
+may be rendered without `<p>` tags, since the list is "tight"),
+then
-Any ASCII punctuation character may be backslash-escaped:
+``` markdown
+I need to buy
+- new shoes
+- a coat
+- a plane ticket
+```
-```````````````````````````````` example
-\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\=\>\?\@\[\\\]\^\_\`\{\|\}\~
-.
-<p>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~</p>
-````````````````````````````````
+by itself should be a paragraph followed by a nested sublist.
+Since it is well established Markdown practice to allow lists to
+interrupt paragraphs inside list items, the [principle of
+uniformity] requires us to allow this outside list items as
+well. ([reStructuredText](http://docutils.sourceforge.net/rst.html)
+takes a different approach, requiring blank lines before lists
+even inside other list items.)
-Backslashes before other characters are treated as literal
-backslashes:
+In order to solve of unwanted lists in paragraphs with
+hard-wrapped numerals, we allow only lists starting with `1` to
+interrupt paragraphs. Thus,
```````````````````````````````` example
-\→\A\a\ \3\φ\«
+The number of windows in my house is
+14. The number of doors is 6.
.
-<p>\→\A\a\ \3\φ\«</p>
+<p>The number of windows in my house is
+14. The number of doors is 6.</p>
````````````````````````````````
-
-Escaped characters are treated as regular characters and do
-not have their usual Markdown meanings:
+We may still get an unintended result in cases like
```````````````````````````````` example
-\*not emphasized*
-\<br/> not a tag
-\[not a link](/foo)
-\`not code`
-1\. not a list
-\* not a list
-\# not a heading
-\[foo]: /url "not a reference"
-\ö not a character entity
+The number of windows in my house is
+1. The number of doors is 6.
.
-<p>*not emphasized*
-<br/> not a tag
-[not a link](/foo)
-`not code`
-1. not a list
-* not a list
-# not a heading
-[foo]: /url "not a reference"
-&ouml; not a character entity</p>
+<p>The number of windows in my house is</p>
+<ol>
+<li>The number of doors is 6.</li>
+</ol>
````````````````````````````````
+but this rule should prevent most spurious list captures.
-If a backslash is itself escaped, the following character is not:
+There can be any number of blank lines between items:
```````````````````````````````` example
-\\*emphasis*
-.
-<p>\<em>emphasis</em></p>
-````````````````````````````````
+- foo
+- bar
-A backslash at the end of the line is a [hard line break]:
-```````````````````````````````` example
-foo\
-bar
+- baz
.
-<p>foo<br />
-bar</p>
+<ul>
+<li>
+<p>foo</p>
+</li>
+<li>
+<p>bar</p>
+</li>
+<li>
+<p>baz</p>
+</li>
+</ul>
````````````````````````````````
+```````````````````````````````` example
+- foo
+ - bar
+ - baz
-Backslash escapes do not work in code blocks, code spans, autolinks, or
-raw HTML:
-```````````````````````````````` example
-`` \[\` ``
+ bim
.
-<p><code>\[\`</code></p>
+<ul>
+<li>foo
+<ul>
+<li>bar
+<ul>
+<li>
+<p>baz</p>
+<p>bim</p>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
````````````````````````````````
+To separate consecutive lists of the same type, or to separate a
+list from an indented code block that would otherwise be parsed
+as a subparagraph of the final list item, you can insert a blank HTML
+comment:
+
```````````````````````````````` example
- \[\]
-.
-<pre><code>\[\]
-</code></pre>
-````````````````````````````````
+- foo
+- bar
+<!-- -->
-```````````````````````````````` example
-~~~
-\[\]
-~~~
+- baz
+- bim
.
-<pre><code>\[\]
-</code></pre>
+<ul>
+<li>foo</li>
+<li>bar</li>
+</ul>
+<!-- -->
+<ul>
+<li>baz</li>
+<li>bim</li>
+</ul>
````````````````````````````````
```````````````````````````````` example
-<http://example.com?find=\*>
-.
-<p><a href="http://example.com?find=%5C*">http://example.com?find=\*</a></p>
-````````````````````````````````
+- foo
+
+ notcode
+- foo
-```````````````````````````````` example
-<a href="/bar\/)">
+<!-- -->
+
+ code
.
-<a href="/bar\/)">
+<ul>
+<li>
+<p>foo</p>
+<p>notcode</p>
+</li>
+<li>
+<p>foo</p>
+</li>
+</ul>
+<!-- -->
+<pre><code>code
+</code></pre>
````````````````````````````````
-But they work in all other contexts, including URLs and link titles,
-link references, and [info strings] in [fenced code blocks]:
+List items need not be indented to the same level. The following
+list items will be treated as items at the same list level,
+since none is indented enough to belong to the previous list
+item:
```````````````````````````````` example
-[foo](/bar\* "ti\*tle")
+- a
+ - b
+ - c
+ - d
+ - e
+ - f
+- g
.
-<p><a href="/bar*" title="ti*tle">foo</a></p>
+<ul>
+<li>a</li>
+<li>b</li>
+<li>c</li>
+<li>d</li>
+<li>e</li>
+<li>f</li>
+<li>g</li>
+</ul>
````````````````````````````````
```````````````````````````````` example
-[foo]
+1. a
-[foo]: /bar\* "ti\*tle"
+ 2. b
+
+ 3. c
.
-<p><a href="/bar*" title="ti*tle">foo</a></p>
+<ol>
+<li>
+<p>a</p>
+</li>
+<li>
+<p>b</p>
+</li>
+<li>
+<p>c</p>
+</li>
+</ol>
````````````````````````````````
+Note, however, that list items may not be indented more than
+three spaces. Here `- e` is treated as a paragraph continuation
+line, because it is indented more than three spaces:
```````````````````````````````` example
-``` foo\+bar
-foo
-```
+- a
+ - b
+ - c
+ - d
+ - e
.
-<pre><code class="language-foo+bar">foo
-</code></pre>
+<ul>
+<li>a</li>
+<li>b</li>
+<li>c</li>
+<li>d
+- e</li>
+</ul>
````````````````````````````````
+And here, `3. c` is treated as in indented code block,
+because it is indented four spaces and preceded by a
+blank line.
+```````````````````````````````` example
+1. a
-## Entity and numeric character references
-
-Valid HTML entity references and numeric character references
-can be used in place of the corresponding Unicode character,
-with the following exceptions:
-
-- Entity and character references are not recognized in code
- blocks and code spans.
+ 2. b
-- Entity and character references cannot stand in place of
- special characters that define structural elements in
- CommonMark. For example, although `*` can be used
- in place of a literal `*` character, `*` cannot replace
- `*` in emphasis delimiters, bullet list markers, or thematic
- breaks.
+ 3. c
+.
+<ol>
+<li>
+<p>a</p>
+</li>
+<li>
+<p>b</p>
+</li>
+</ol>
+<pre><code>3. c
+</code></pre>
+````````````````````````````````
-Conforming CommonMark parsers need not store information about
-whether a particular character was represented in the source
-using a Unicode character or an entity reference.
-[Entity references](@) consist of `&` + any of the valid
-HTML5 entity names + `;`. The
-document <https://html.spec.whatwg.org/multipage/entities.json>
-is used as an authoritative source for the valid entity
-references and their corresponding code points.
+This is a loose list, because there is a blank line between
+two of the list items:
```````````````````````````````` example
- & © Æ Ď
-¾ ℋ ⅆ
-∲ ≧̸
+- a
+- b
+
+- c
.
-<p> & © Æ Ď
-¾ ℋ ⅆ
-∲ ≧̸</p>
+<ul>
+<li>
+<p>a</p>
+</li>
+<li>
+<p>b</p>
+</li>
+<li>
+<p>c</p>
+</li>
+</ul>
````````````````````````````````
-[Decimal numeric character
-references](@)
-consist of `&#` + a string of 1--7 arabic digits + `;`. A
-numeric character reference is parsed as the corresponding
-Unicode character. Invalid Unicode code points will be replaced by
-the REPLACEMENT CHARACTER (`U+FFFD`). For security reasons,
-the code point `U+0000` will also be replaced by `U+FFFD`.
+So is this, with a empty second item:
```````````````````````````````` example
-# Ӓ Ϡ �
+* a
+*
+
+* c
.
-<p># Ӓ Ϡ �</p>
+<ul>
+<li>
+<p>a</p>
+</li>
+<li></li>
+<li>
+<p>c</p>
+</li>
+</ul>
````````````````````````````````
-[Hexadecimal numeric character
-references](@) consist of `&#` +
-either `X` or `x` + a string of 1-6 hexadecimal digits + `;`.
-They too are parsed as the corresponding Unicode character (this
-time specified with a hexadecimal numeral instead of decimal).
+These are loose lists, even though there is no space between the items,
+because one of the items directly contains two block-level elements
+with a blank line between them:
```````````````````````````````` example
-" ആ ಫ
+- a
+- b
+
+ c
+- d
.
-<p>" ആ ಫ</p>
+<ul>
+<li>
+<p>a</p>
+</li>
+<li>
+<p>b</p>
+<p>c</p>
+</li>
+<li>
+<p>d</p>
+</li>
+</ul>
````````````````````````````````
-Here are some nonentities:
-
```````````````````````````````` example
-  &x; &#; &#x;
-�
-&#abcdef0;
-&ThisIsNotDefined; &hi?;
+- a
+- b
+
+ [ref]: /url
+- d
.
-<p>&nbsp &x; &#; &#x;
-&#987654321;
-&#abcdef0;
-&ThisIsNotDefined; &hi?;</p>
+<ul>
+<li>
+<p>a</p>
+</li>
+<li>
+<p>b</p>
+</li>
+<li>
+<p>d</p>
+</li>
+</ul>
````````````````````````````````
-Although HTML5 does accept some entity references
-without a trailing semicolon (such as `©`), these are not
-recognized here, because it makes the grammar too ambiguous:
+This is a tight list, because the blank lines are in a code block:
```````````````````````````````` example
-©
+- a
+- ```
+ b
+
+
+ ```
+- c
.
-<p>&copy</p>
+<ul>
+<li>a</li>
+<li>
+<pre><code>b
+
+
+</code></pre>
+</li>
+<li>c</li>
+</ul>
````````````````````````````````
-Strings that are not on the list of HTML5 named entities are not
-recognized as entity references either:
+This is a tight list, because the blank line is between two
+paragraphs of a sublist. So the sublist is loose while
+the outer list is tight:
```````````````````````````````` example
-&MadeUpEntity;
+- a
+ - b
+
+ c
+- d
.
-<p>&MadeUpEntity;</p>
+<ul>
+<li>a
+<ul>
+<li>
+<p>b</p>
+<p>c</p>
+</li>
+</ul>
+</li>
+<li>d</li>
+</ul>
````````````````````````````````
-Entity and numeric character references are recognized in any
-context besides code spans or code blocks, including
-URLs, [link titles], and [fenced code block][] [info strings]:
+This is a tight list, because the blank line is inside the
+block quote:
```````````````````````````````` example
-<a href="öö.html">
+* a
+ > b
+ >
+* c
.
-<a href="öö.html">
+<ul>
+<li>a
+<blockquote>
+<p>b</p>
+</blockquote>
+</li>
+<li>c</li>
+</ul>
````````````````````````````````
+This list is tight, because the consecutive block elements
+are not separated by blank lines:
+
```````````````````````````````` example
-[foo](/föö "föö")
+- a
+ > b
+ ```
+ c
+ ```
+- d
.
-<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
+<ul>
+<li>a
+<blockquote>
+<p>b</p>
+</blockquote>
+<pre><code>c
+</code></pre>
+</li>
+<li>d</li>
+</ul>
````````````````````````````````
-```````````````````````````````` example
-[foo]
+A single-paragraph list is tight:
-[foo]: /föö "föö"
+```````````````````````````````` example
+- a
.
-<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>
+<ul>
+<li>a</li>
+</ul>
````````````````````````````````
```````````````````````````````` example
-``` föö
-foo
-```
+- a
+ - b
.
-<pre><code class="language-föö">foo
-</code></pre>
+<ul>
+<li>a
+<ul>
+<li>b</li>
+</ul>
+</li>
+</ul>
````````````````````````````````
-Entity and numeric character references are treated as literal
-text in code spans and code blocks:
+This list is loose, because of the blank line between the
+two block elements in the list item:
```````````````````````````````` example
-`föö`
-.
-<p><code>f&ouml;&ouml;</code></p>
-````````````````````````````````
-
+1. ```
+ foo
+ ```
-```````````````````````````````` example
- föfö
+ bar
.
-<pre><code>f&ouml;f&ouml;
+<ol>
+<li>
+<pre><code>foo
</code></pre>
+<p>bar</p>
+</li>
+</ol>
````````````````````````````````
-Entity and numeric character references cannot be used
-in place of symbols indicating structure in CommonMark
-documents.
+Here the outer list is loose, the inner list tight:
```````````````````````````````` example
-*foo*
-*foo*
+* foo
+ * bar
+
+ baz
.
-<p>*foo*
-<em>foo</em></p>
+<ul>
+<li>
+<p>foo</p>
+<ul>
+<li>bar</li>
+</ul>
+<p>baz</p>
+</li>
+</ul>
````````````````````````````````
+
```````````````````````````````` example
-* foo
+- a
+ - b
+ - c
-* foo
+- d
+ - e
+ - f
.
-<p>* foo</p>
<ul>
-<li>foo</li>
+<li>
+<p>a</p>
+<ul>
+<li>b</li>
+<li>c</li>
+</ul>
+</li>
+<li>
+<p>d</p>
+<ul>
+<li>e</li>
+<li>f</li>
+</ul>
+</li>
</ul>
````````````````````````````````
-```````````````````````````````` example
-foo bar
-.
-<p>foo
-bar</p>
-````````````````````````````````
+# Inlines
+
+Inlines are parsed sequentially from the beginning of the character
+stream to the end (left to right, in left-to-right languages).
+Thus, for example, in
```````````````````````````````` example
-	foo
+`hi`lo`
.
-<p>→foo</p>
+<p><code>hi</code>lo`</p>
````````````````````````````````
+`hi` is parsed as code, leaving the backtick at the end as a literal
+backtick.
-```````````````````````````````` example
-[a](url "tit")
-.
-<p>[a](url "tit")</p>
-````````````````````````````````
## Code spans
@@ -7461,10 +7466,11 @@ A [link destination](@) consists of either
closing `>` that contains no line breaks or unescaped
`<` or `>` characters, or
-- a nonempty sequence of characters that does not start with
- `<`, does not include ASCII space or control characters, and
- includes parentheses only if (a) they are backslash-escaped or
- (b) they are part of a balanced pair of unescaped parentheses.
+- a nonempty sequence of characters that does not start with `<`,
+ does not include [ASCII control characters][ASCII control character]
+ or [whitespace][], and includes parentheses only if (a) they are
+ backslash-escaped or (b) they are part of a balanced pair of
+ unescaped parentheses.
(Implementations may impose limits on parentheses nesting to
avoid performance issues, but at least three levels of nesting
should be supported.)
@@ -7616,6 +7622,13 @@ However, if you have unbalanced parentheses, you need to escape or use the
`<...>` form:
```````````````````````````````` example
+[link](foo(and(bar))
+.
+<p>[link](foo(and(bar))</p>
+````````````````````````````````
+
+
+```````````````````````````````` example
[link](foo\(and\(bar\))
.
<p><a href="foo(and(bar)">link</a></p>
@@ -7923,9 +7936,8 @@ perform the *Unicode case fold*, strip leading and trailing
matching reference link definitions, the one that comes first in the
document is used. (It is desirable in such cases to emit a warning.)
-The contents of the first link label are parsed as inlines, which are
-used as the link's text. The link's URI and title are provided by the
-matching [link reference definition].
+The link's URI and title are provided by the matching [link
+reference definition].
Here is a simple example:
@@ -8018,11 +8030,11 @@ emphasis grouping:
```````````````````````````````` example
-[foo *bar][ref]
+[foo *bar][ref]*
[ref]: /uri
.
-<p><a href="/uri">foo *bar</a></p>
+<p><a href="/uri">foo *bar</a>*</p>
````````````````````````````````
@@ -8070,11 +8082,11 @@ Matching is case-insensitive:
Unicode case fold is used:
```````````````````````````````` example
-[Толпой][Толпой] is a Russian word.
+[ẞ]
-[ТОЛПОЙ]: /url
+[SS]: /url
.
-<p><a href="/url">Толпой</a> is a Russian word.</p>
+<p><a href="/url">ẞ</a></p>
````````````````````````````````
@@ -8707,9 +8719,9 @@ a link to the URI, with the URI as the link's label.
An [absolute URI](@),
for these purposes, consists of a [scheme] followed by a colon (`:`)
-followed by zero or more characters other than ASCII
-[whitespace] and control characters, `<`, and `>`. If
-the URI includes these characters, they must be percent-encoded
+followed by zero or more characters other [ASCII control
+characters][ASCII control character] or [whitespace][] , `<`, and `>`.
+If the URI includes these characters, they must be percent-encoded
(e.g. `%20` for a space).
For purposes of this spec, a [scheme](@) is any sequence
@@ -8942,10 +8954,8 @@ consists of the string `<?`, a string
of characters not including the string `?>`, and the string
`?>`.
-A [declaration](@) consists of the
-string `<!`, a name consisting of one or more uppercase ASCII letters,
-[whitespace], a string of characters not including the
-character `>`, and the character `>`.
+A [declaration](@) consists of the string `<!`, an ASCII letter, zero or more
+characters not including the character `>`, and the character `>`.
A [CDATA section](@) consists of
the string `<![CDATA[`, a string of characters not including the string
@@ -9444,7 +9454,7 @@ blocks. But we cannot close unmatched blocks yet, because we may have a
blocks, we look for new block starts (e.g. `>` for a block quote).
If we encounter a new block start, we close any blocks unmatched
in step 1 before creating the new block as a child of the last
-matched block.
+matched container block.
3. Finally, we look at the remainder of the line (after block
markers like `>`, list markers, and indentation have been consumed).