summaryrefslogtreecommitdiff
path: root/data/unicode/NamesList.html
diff options
context:
space:
mode:
authorGravatar Sam Atman2025-04-30 20:32:23 -0400
committerGravatar Sam Atman2025-04-30 20:32:23 -0400
commita7164d9e7b3c3ec6813e06a42d82180d766e15ca (patch)
treeb9c55a45ddac98e51653cb64d39b6b26cfb50362 /data/unicode/NamesList.html
parentAllocation Failure Tests (diff)
downloadzg-a7164d9e7b3c3ec6813e06a42d82180d766e15ca.tar.gz
zg-a7164d9e7b3c3ec6813e06a42d82180d766e15ca.tar.xz
zg-a7164d9e7b3c3ec6813e06a42d82180d766e15ca.zip
Unicode 16.0
Went smoothly, needed to add some scripts and adjust the magic numbers, but other than that, all set.
Diffstat (limited to 'data/unicode/NamesList.html')
-rw-r--r--data/unicode/NamesList.html69
1 files changed, 46 insertions, 23 deletions
diff --git a/data/unicode/NamesList.html b/data/unicode/NamesList.html
index d6809e1..a67236e 100644
--- a/data/unicode/NamesList.html
+++ b/data/unicode/NamesList.html
@@ -100,7 +100,7 @@ a.headernav:hover {
100 <tbody> 100 <tbody>
101 <tr> 101 <tr>
102 <td>Revision</td> 102 <td>Revision</td>
103 <td>15.1.0</td> 103 <td>16.0.0</td>
104 </tr> 104 </tr>
105 <tr> 105 <tr>
106 <td>Authors</td> 106 <td>Authors</td>
@@ -108,19 +108,19 @@ a.headernav:hover {
108 </tr> 108 </tr>
109 <tr> 109 <tr>
110 <td>Date</td> 110 <td>Date</td>
111 <td>2023-08-23</td> 111 <td>2024-08-21</td>
112 </tr> 112 </tr>
113 <tr> 113 <tr>
114 <td>This Version</td> 114 <td>This Version</td>
115 <td > 115 <td >
116 <a href="https://www.unicode.org/Public/15.1.0/ucd/NamesList.html"> 116 <a href="https://www.unicode.org/Public/16.0.0/ucd/NamesList.html">
117 https://www.unicode.org/Public/15.1.0/ucd/NamesList.html</a></td> 117 https://www.unicode.org/Public/16.0.0/ucd/NamesList.html</a></td>
118 </tr> 118 </tr>
119 <tr> 119 <tr>
120 <td>Previous Version</td> 120 <td>Previous Version</td>
121 <td> 121 <td>
122 <a href="https://www.unicode.org/Public/15.0.0/ucd/NamesList.html"> 122 <a href="https://www.unicode.org/Public/15.1.0/ucd/NamesList.html">
123 https://www.unicode.org/Public/15.0.0/ucd/NamesList.html</a></td> 123 https://www.unicode.org/Public/15.1.0/ucd/NamesList.html</a></td>
124 </tr> 124 </tr>
125 <tr> 125 <tr>
126 <td>Latest Version</td> 126 <td>Latest Version</td>
@@ -159,8 +159,8 @@ released form of the NamesList.txt file.</p>
159draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset 159draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
160declaration in a comment at the head of the file were introduced after Unicode 160declaration in a comment at the head of the file were introduced after Unicode
1616.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction 1616.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
162in comments and aliases in the names list format was loosened from the prior 162in comments and aliases in the names list format was loosened from the earlier
163limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</p> 163limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0, and dropped entirely as of Unicode 16.0.0.</p>
164 164
165<p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC 165<p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC
166 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some 166 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
@@ -281,10 +281,18 @@ CHAR_ENTRY: NAME_LINE | RESERVED_LINE
281 charset declaration (see below). Alternatively, or in addition, a BOM may be 281 charset declaration (see below). Alternatively, or in addition, a BOM may be
282 present at the very beginning of the file, forcing the encoding to be 282 present at the very beginning of the file, forcing the encoding to be
283 interpreted as UTF-16 (little-endian only) or UTF-8. When 283 interpreted as UTF-16 (little-endian only) or UTF-8. When
284 declared as UTF-8, the names list format will support use of characters in 284 declared as UTF-8, the names list format will support use of any Unicode characters in
285 the range U+0020..U+02FF in LINE and LABEL elements. Otherwise, 285 STRING and LABEL elements. Otherwise,
286 the supported repertoire is limited to Latin-1, and attempted use of characters outside 286 the supported repertoire is limited to Latin-1, and attempted use of characters outside
287 the Latin-1 range will result in data corruption.</p> 287 the Latin-1 range will result in data corruption.</p>
288<p>The NamesList file format does not support styled text; each line or other element
289 will usually be displayed in a specific font selected for it. To allow CHAR elements
290 that normally use chart glyphs to better coexist with running text in LABEL and STRING
291 elements, a user defined limit can be set, below which the normal selection of (chart) glyphs
292 for the CHAR element is overridden in favor of equivalent glyphs from a font selected for better
293 readability in running text. Any running text outside that range will use standard chart
294 glyphs, which may result in a ransom note effect. For production of the Unicode Standard
295 Version 16.0.0 and later the limit is set to U+1EFF.</p>
288<p>Several of these elements, while part of the formal definition of the 296<p>Several of these elements, while part of the formal definition of the
289 file format, do not occur in final published versions of 297 file format, do not occur in final published versions of
290 NamesList.txt in the <a href="https://www.unicode.org/Public/UCD/latest/">UCD</a>.</p> 298 NamesList.txt in the <a href="https://www.unicode.org/Public/UCD/latest/">UCD</a>.</p>
@@ -514,14 +522,14 @@ is machine generated and will always explicitly provide any summary subheaders.<
514 <li>Because a LINE or an EXPAND_LINE can itself start with a special character followed 522 <li>Because a LINE or an EXPAND_LINE can itself start with a special character followed
515 by a SP or LF, an &quot;unmarked&quot; COMMENT_LINE should match the input in lower priority than line 523 by a SP or LF, an &quot;unmarked&quot; COMMENT_LINE should match the input in lower priority than line
516 types that require a special character or have a more restrictive set of characters than EXPAND_LINE. 524 types that require a special character or have a more restrictive set of characters than EXPAND_LINE.
517 Similarly, a SUBHEADER containing TAB &quot;!&quot; LF should match with a higher priority than those 525 Similarly, a SUBHEADER containing TAB &quot;!&quot; LF should match with a higher priority than one
518 where the TAB is followed by a LINE.</li> 526 where the TAB is followed by a LINE.</li>
519 </ul> 527 </ul>
520 528
521 529
522<h3 id="FilePrimitives">2.2 <a href="#FilePrimitives">NamesList File Primitives</a></h3> 530<h3 id="FilePrimitives">2.2 <a href="#FilePrimitives">NamesList File Primitives</a></h3>
523 531
524<p>The following are the primitives and terminals for the NamesList syntax.</p> 532<p>The following are the primitives and terminals for the NamesList syntax. "Limit" is a user-defined value; see discussion of the implications of Limit in the notes below.</p>
525 533
526<pre><strong>LINE</strong>: <strong>STRING LF 534<pre><strong>LINE</strong>: <strong>STRING LF
527COMMENT: &quot;(&quot; LABEL &quot;)&quot; 535COMMENT: &quot;(&quot; LABEL &quot;)&quot;
@@ -533,8 +541,8 @@ COMMENT: &quot;(&quot; LABEL &quot;)&quot;
533 541
534<strong>TAG</strong>: &lt;sequence of ASCII letters&gt; 542<strong>TAG</strong>: &lt;sequence of ASCII letters&gt;
535<strong>LCTAG</strong>: &lt;sequence of lowercase ASCII letters&gt; 543<strong>LCTAG</strong>: &lt;sequence of lowercase ASCII letters&gt;
536<strong>STRING</strong>: &lt;sequence of characters in the range U+0020..U+02FF, except controls&gt; 544<strong>STRING</strong>: &lt;sequence of characters, except controls&gt;
537<strong>LABEL</strong>: &lt;sequence of characters in the range U+0020..U+02FF, except controls, &quot;(&quot; or &quot;)&quot;&gt; 545<strong>LABEL</strong>: &lt;sequence of characters, except controls, &quot;(&quot; or &quot;)&quot;&gt;
538<strong>VARSEL</strong>: <strong>CHAR 546<strong>VARSEL</strong>: <strong>CHAR
539 | &quot;ALT&quot; ( &quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot; )</strong> 547 | &quot;ALT&quot; ( &quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot; )</strong>
540<strong>VARSEL_LIST</strong>: <strong>&quot;{&quot; CHAR_LIST &quot;}&quot;</strong> 548<strong>VARSEL_LIST</strong>: <strong>&quot;{&quot; CHAR_LIST &quot;}&quot;</strong>
@@ -580,19 +588,27 @@ COMMENT: &quot;(&quot; LABEL &quot;)&quot;
580 of following characters.</li> 588 of following characters.</li>
581 <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on 589 <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
582 output.</li> 590 output.</li>
583 <li>In a STRING or LABEL, a Unicode character outside the range
584 U+0000..U+02FF is displayed as is, with a glyph matching
585 the chart font, and not with the font that is otherwise defined for that element.</li>
586 <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a 591 <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a
587 FILE_COMMENT containing the declaration &quot;UTF-8&quot; or any casemap variation 592 FILE_COMMENT containing the declaration &quot;UTF-8&quot; or any casemap variation
588 thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond 593 thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
589 detecting the charset declaration (typically: &quot;; charset=utf-8&quot;) the 594 detecting the charset declaration (typically: &quot;; charset=utf-8&quot;) the
590 remainder of that comment is ignored. 595 remainder of that comment is ignored.
591 If the file is not encoded as 596 When declared as UTF-8, the NamesList format will support any Unicode character
592 UTF-8, the character repertoire for running text (anything 597 in STRING or LABEL elements, but see further implications below.</li>
593 other than CHAR) is effectively restricted to the repertoire of Latin-1. 598 <li>In a STRING or LABEL element, a Unicode character outside the range
594 Otherwise, characters in the range U+0020..U+02FF 599 U+0020..Limit is displayed with a glyph matching
595 are allowed in STRING or LABEL elements, and elements derived from them.</li> 600 the chart font, and not with the font that is otherwise defined for that element.
601 The Limit value is user defined.
602 For production of the Unicode Standard from Version 16.0.0 and later the Limit
603 value is set to U+1EFF.
604 All code points less than the Limit value can be mapped onto a font selected for best
605 results in running text. However, any CHAR elements contained in an EXPAND_LINE
606 are exempt from this and are always displayed with a glyph matching the chart font.
607 The net effect is a workaround for the fact that the NamesList format does
608 not support style runs within any element that encompasses a single unit of flowed text.</li>
609 <li>When drafting STRING or LABEL elements, one should note that text containing
610 characters outside the range U+0020..Limit may result in a ransom note effect,
611 as the regular text font and charts fonts would be alternated. This is best avoided.</li>
596 <li>The code chart layout program 612 <li>The code chart layout program
597 (<a href="https://www.unicode.org/unibook/">Unibook</a>) 613 (<a href="https://www.unicode.org/unibook/">Unibook</a>)
598 can accept files in several other formats. These include little-endian UTF-16, 614 can accept files in several other formats. These include little-endian UTF-16,
@@ -610,9 +626,16 @@ COMMENT: &quot;(&quot; LABEL &quot;)&quot;
610</ul> 626</ul>
611 <h2 id="Modifications"><a href="#Modifications">Modifications</a></h2> 627 <h2 id="Modifications"><a href="#Modifications">Modifications</a></h2>
612 628
629 <p><b>Version 16.0.0</b></p>
630 <ul>
631 <li>Reissued for Unicode 16.0.0</li>
632 <li>Reflect the wider range of possible values for the user defined Limit.</li>
633 <li>Added an explanation of the effect of the Limit value.</li>
634 </ul>
635
613 <p><b>Version 15.1.0</b></p> 636 <p><b>Version 15.1.0</b></p>
614 <ul> 637 <ul>
615 <li>Reissued for Unicode 15.0.0.</li> 638 <li>Reissued for Unicode 15.1.0.</li>
616 <li>Adjusted NAMELIST definition to account for positions of FILE_COMMENT.</li> 639 <li>Adjusted NAMELIST definition to account for positions of FILE_COMMENT.</li>
617 <li>Added a note to the bullets in Section 2.1 to clarify priority of matching for 640 <li>Added a note to the bullets in Section 2.1 to clarify priority of matching for
618 some line types.</li> 641 some line types.</li>