cmark

My personal build of CMark ✏️

Commit
ab19f3cf3c247a5216ae7e7e78ef8c2eaac7ce0a
Parent
980231a345e5a5e9c3b4eb7f84896e6fccb429b5
Author
John MacFarlane <jgm@berkeley.edu>
Date

Clarify that unicode whitespace counts as whitespace in emph rules.

Added a test case with a unicode nonbreaking space.

See #108, though "whitespace" should still be defined more systematically. This is a step forward.

Diffstat

1 file changed, 11 insertions, 4 deletions

Status File Name N° Changes Insertions Deletions
Modified spec.txt 15 11 4
diff --git a/spec.txt b/spec.txt
@@ -4355,8 +4355,8 @@ The following rules capture all of these patterns, while allowing
 for efficient parsing strategies that do not backtrack:
 
 1.  A single `*` character [can open emphasis](@can-open-emphasis)
-    iff it is not followed by
-    whitespace.
+    iff it is not followed by whitespace.  (For these purposes,
+    any unicode space character counts as whitespace.)
 
 2.  A single `_` character [can open emphasis](#can-open-emphasis) iff
     it is not followed by whitespace and it is not preceded by an
@@ -4378,8 +4378,7 @@ for efficient parsing strategies that do not backtrack:
     ASCII alphanumeric character.
 
 7.  A double `**` [can close strong emphasis](@can-close-strong-emphasis)
-    iff it is not preceded by
-    whitespace.
+    iff it is not preceded by whitespace.
 
 8.  A double `__` [can close strong emphasis](#can-close-strong-emphasis)
     iff it is not preceded by whitespace and it is not followed by an
@@ -4459,6 +4458,14 @@ a * foo bar*
 <p>a * foo bar*</p>
 .
 
+Unicode nonbreaking spaces count as whitespace, too:
+
+.
+* a *
+.
+<p>* a *</p>
+.
+
 Intraword emphasis with `*` is permitted:
 
 .