summaryrefslogtreecommitdiff
path: root/data/unicode/NamesList.html
diff options
context:
space:
mode:
Diffstat (limited to 'data/unicode/NamesList.html')
-rw-r--r--data/unicode/NamesList.html776
1 files changed, 776 insertions, 0 deletions
diff --git a/data/unicode/NamesList.html b/data/unicode/NamesList.html
new file mode 100644
index 0000000..d6809e1
--- /dev/null
+++ b/data/unicode/NamesList.html
@@ -0,0 +1,776 @@
1<!doctype html>
2
3<html lang="en-us">
4
5<head>
6<meta charset="utf-8">
7<title>Unicode NamesList Format</title>
8<link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
9<style>
10a.headernav {
11 font-size: 90%;
12}
13a.headernav:link {
14 color: white;
15}
16a.headernav:visited {
17 color: white;
18}
19a.headernav:active {
20 color: white;
21}
22a.headernav:hover {
23 color: #B0B0B0;
24}
25.pageheader {
26 margin-top: 0;
27 padding: 0 .5em 0 0;
28 display: flex;
29 flex-direction: row;
30 flex-wrap: nowrap;
31 justify-content: flex-start;
32 background-color: #5555FF;
33 color: white;
34 font-family: arial, geneva, sans-serif;
35 font-weight:bold;
36 align-items: center;
37 }
38.pageicon {
39 padding : 2px 4px 0 2px;
40 }
41.pagelogo {
42 height: 33px; width: 34px;
43 border: 0;
44 padding-bottom: 0px;
45 margin-bottom:-2px;
46 }
47.pagetitle {
48 font-size: 115%;
49 flex-grow: 4;
50 padding-left: 1em;
51 }
52.headernav { padding-top: 0px;
53 font-weight: bold;
54 font-size: 100%;
55 color: white; font-family: arial, geneva, sans-serif;
56 text-align:right;
57 }
58.graybar {
59 width: 100%;padding:0;
60 font-size:50%;
61 background-color: #EEEEFE;
62 }
63.pagecontents {
64 padding-left: 3.25em;
65 padding-right: 3.25em;
66 padding-bottom: 1.75em;
67 padding-top: 1em;
68}
69.pagebottom img
70{
71 padding-top: 2px;
72 width:216px;
73 height:50px;
74 border: 0;
75}
76.pagebottom
77{
78 margin: auto;
79 text-align:center;
80}
81</style>
82</head>
83
84<body>
85
86 <div class="pageheader">
87 <div class="pageicon"><a href="https://www.unicode.org/"><img class="pagelogo"
88 src="https://www.unicode.org/webscripts/logo60s2.gif"
89 alt="[Unicode]" ></a></div>
90
91 <div class="pagetitle"><a class="headernav"
92 href="https://www.unicode.org/ucd/">Unicode Character Database</a></div>
93
94 </div>
95 <div class="graybar">&nbsp;</div>
96
97<div class="body">
98 <h1>UnicodeĀ® NamesList File Format</h1>
99 <table class="simple">
100 <tbody>
101 <tr>
102 <td>Revision</td>
103 <td>15.1.0</td>
104 </tr>
105 <tr>
106 <td>Authors</td>
107 <td>Asmus Freytag, Ken Whistler</td>
108 </tr>
109 <tr>
110 <td>Date</td>
111 <td>2023-08-23</td>
112 </tr>
113 <tr>
114 <td>This Version</td>
115 <td >
116 <a href="https://www.unicode.org/Public/15.1.0/ucd/NamesList.html">
117 https://www.unicode.org/Public/15.1.0/ucd/NamesList.html</a></td>
118 </tr>
119 <tr>
120 <td>Previous Version</td>
121 <td>
122 <a href="https://www.unicode.org/Public/15.0.0/ucd/NamesList.html">
123 https://www.unicode.org/Public/15.0.0/ucd/NamesList.html</a></td>
124 </tr>
125 <tr>
126 <td>Latest Version</td>
127 <td><a href="https://www.unicode.org/Public/UCD/latest/ucd/NamesList.html">https://www.unicode.org/Public/UCD/latest/ucd/NamesList.html</a></td>
128 </tr>
129 </tbody>
130 </table>
131 <p>&nbsp;</p>
132 <h3><i>Summary</i></h3>
133 <blockquote>
134 <p>This file describes the format and contents of NamesList.txt</p>
135 </blockquote>
136 <h3><i>Status</i></h3>
137 <blockquote>
138 <p><i>The file and the files described herein are part of the <a href="https://www.unicode.org/ucd/">Unicode
139 Character Database</a> (UCD). The Unicode <a href="https://www.unicode.org/terms_of_use.html">
140 Terms of Use</a> apply.</i></p>
141 </blockquote>
142 <hr style="width:50%">
143
144<h2 id="Introduction">1.0 <a href="#Introduction">Introduction</a></h2>
145
146<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
147text file used to drive the layout of the character code charts in the Unicode
148Standard. The information in this file is a combination of several fields from
149the UnicodeData.txt and Blocks.txt files, together with additional annotations
150for many characters.</p>
151<p>This document describes the syntax rules for the file
152format, but also gives brief information on how each construct is rendered
153when laid out for the code charts. Some of the syntax elements are used only in
154preparation of the drafts of the code charts and are not present in the final,
155released form of the NamesList.txt file.</p>
156
157<p>Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode
1585.0. The syntax for marginal sidebar comments is utilized extensively in
159draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
160declaration in a comment at the head of the file were introduced after Unicode
1616.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
162in comments and aliases in the names list format was loosened from the prior
163limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</p>
164
165<p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC
166 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
167 information in the name list file that is not needed (and in fact removed
168 during parsing) for the Unicode code charts.</p>
169
170<p>With access to the layout program (<a href="https://www.unicode.org/unibook/">Unibook</a>) it is a simple matter of
171creating name lists for the purpose of formatting working drafts or other documents containing
172proposed characters.</p>
173 <p>The content of the NamesList.txt file is optimized for code chart creation.
174 Some information that can be inferred by the reader from context has been
175 suppressed to make the code charts more readable. See the chapter on Code
176 Charts in the <a href="https://www.unicode.org/versions/latest">Unicode
177 Standard</a>.</p>
178
179<h3 id="Overview">1.1 <a href="#Overview">NamesList File Overview</a></h3>
180
181<p>The NamesList files are plain text files which in their most simple form look
182like this:</p>
183
184<p>@@&lt;tab&gt;0020&lt;tab&gt;BASIC LATIN&lt;tab&gt;007F<br>
185; this is a file comment (ignored)<br>
1860020&lt;tab&gt;SPACE<br>
1870021&lt;tab&gt;EXCLAMATION MARK<br>
1880022&lt;tab&gt;QUOTATION MARK<br>
189. . . <br>
190007F&lt;tab&gt;DELETE</p>
191
192<p>The semicolon (as first character), @ and &lt;tab&gt; characters are used
193by the file syntax and must be provided as shown. Hexadecimal digits must be
194in UPPERCASE. A double @@ introduces a block header, with the title, and
195start and ending code of the block provided as shown.</p>
196
197<p>For a minimal name list, only the NAME_LINE and BLOCKHEADER and
198their constituent syntax elements are needed.</p>
199
200<p>The full syntax with all the options is provided in the following sections.</p>
201
202<h2 id="FileStructure">2.0 <a href="#FileStructure">NamesList File Structure</a></h2>
203
204<p>This section defines the overall file structure</p>
205
206<pre><strong>NAMELIST: FILE_COMMENT* TITLE_PAGE* EXTENDED_BLOCK*</strong>
207
208<strong>TITLE_PAGE: TITLE
209 | TITLE_PAGE SUBTITLE
210 | TITLE_PAGE SUBHEADER
211 | TITLE_PAGE IGNORED_LINE
212 | TITLE_PAGE EMPTY_LINE
213 | TITLE_PAGE NOTICE_LINE
214 | TITLE_PAGE COMMENT_LINE
215 | TITLE_PAGE PAGEBREAK
216 | TITLE_PAGE FILE_COMMENT
217
218
219EXTENDED_BLOCK: BLOCK
220 | BLOCK SUMMARY
221
222
223BLOCK: BLOCKHEADER
224 | BLOCKHEADER INDEX_TAB
225 | BLOCK CHAR_ENTRY
226 | BLOCK SUBHEADER
227 | BLOCK NOTICE_LINE
228 | BLOCK EMPTY_LINE
229 | BLOCK IGNORED_LINE
230 | BLOCK SIDEBAR_LINE
231 | BLOCK PAGEBREAK
232 | BLOCK FILE_COMMENT
233 | BLOCK CROSS_REF
234
235
236CHAR_ENTRY: NAME_LINE | RESERVED_LINE
237 | CHAR_ENTRY ALIAS_LINE
238 | CHAR_ENTRY FORMALALIAS_LINE
239 | CHAR_ENTRY COMMENT_LINE
240 | CHAR_ENTRY CROSS_REF
241 | CHAR_ENTRY DECOMPOSITION
242 | CHAR_ENTRY COMPAT_MAPPING
243 | CHAR_ENTRY IGNORED_LINE
244 | CHAR_ENTRY EMPTY_LINE
245 | CHAR_ENTRY NOTICE_LINE
246 | CHAR_ENTRY FILE_COMMENT
247 | CHAR_ENTRY VARIATION_LINE</strong>
248</pre>
249
250<p>In other words:</p>
251<p>
252 Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
253<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE,
254 EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.</p>
255<ul>
256 <li>CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, VARIATION_LINE, ALIAS and FORMALALIAS_LINE lines
257 occurring before the first block header are treated as if they were
258 COMMENT_LINEs.</li>
259</ul>
260<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted
261 sequence of the following lines may occur (in any order and repeated as often
262 as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE,
263 EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.</p>
264<ul>
265 <li>The conventional order of elements in a char entry: NAME_LINE,
266 FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally
267 ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program
268 (<a href="https://www.unicode.org/unibook/">Unibook</a>). </li>
269</ul>
270<p>Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and
271 FILE_COMMENT, none of these lines may
272 occur in any other place.</p>
273<ul>
274 <li>A NOTICE_LINE or CROSS_REF displays differently depending on whether it follows a header or title
275 or is part of a CHAR_ENTRY</li>
276 </ul>
277<p>A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY.
278 A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may
279 appear after any block header.</p>
280<p>If the first line of a file is a file comment, it may contain a UTF-8
281 charset declaration (see below). Alternatively, or in addition, a BOM may be
282 present at the very beginning of the file, forcing the encoding to be
283 interpreted as UTF-16 (little-endian only) or UTF-8. When
284 declared as UTF-8, the names list format will support use of characters in
285 the range U+0020..U+02FF in LINE and LABEL elements. Otherwise,
286 the supported repertoire is limited to Latin-1, and attempted use of characters outside
287 the Latin-1 range will result in data corruption.</p>
288<p>Several of these elements, while part of the formal definition of the
289 file format, do not occur in final published versions of
290 NamesList.txt in the <a href="https://www.unicode.org/Public/UCD/latest/">UCD</a>.</p>
291
292<h4>Blocks followed by Summaries</h4>
293<p>A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:</p>
294<pre><strong>
295SUMMARY: ALTGLYPH_SUMMARY
296 | VARIATION SUMMARY
297 | ALTGLYPH_SUMMARY VARIATION_SUMMARY
298 | MIXED_SUMMARY
299
300ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER
301 | ALTGLYPH_SUMMARY SUMMARY_LINE
302
303VARIATION_SUMMARY: VARIATION_SUBHEADER
304 | VARIATION_SUMMARY SUMMARY_LINE
305
306MIXED_SUMMARY: MIXED_SUBHEADER
307 | MIXED_SUMMARY SUMMARY_LINE
308
309SUMMARY_LINE: SUBHEADER
310 | NOTICE_LINE
311 | FILE_COMMENT
312 | EMPTY_LINE</strong>
313</pre>
314
315<p>When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements
316of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and
317preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately
318follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are
319interspersed between items in the summary.</p>
320
321<p>These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are
322omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements
323as described below, Unibook will automatically generate any required summaries using a default format for the headers.</p>
324
325<p>Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to
326provide specific contents for these summary titles as well as allow the ability to add additional
327information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list
328is machine generated and will always explicitly provide any summary subheaders.</p>
329
330<h3 id="FileElements">2.1 <a href="#FileElements">NamesList File Elements</a></h3>
331
332<p>This section provides the details of the syntax for the individual elements.</p>
333
334<pre><strong>ELEMENT SYNTAX</strong> // How rendered
335
336<strong>NAME_LINE: CHAR TAB NAME LF</strong>
337 // The CHAR and the corresponding image are echoed,
338 // followed by the name as given in NAME
339
340<strong> | CHAR TAB &quot;&lt;&quot; LCNAME &quot;&gt;&quot; LF</strong>
341 // Control and noncharacters use this form of
342 // lowercase, bracketed pseudo character name
343
344<strong> | CHAR TAB NAME SP COMMENT LF</strong>
345 // Names may have a comment, which is stripped off
346 // unless the file is parsed for an ISO style list
347
348<strong> | CHAR TAB &quot;&lt;&quot; LCNAME &quot;&gt;&quot; SP COMMENT LF</strong>
349 // Control and noncharacters may also have comments
350
351<strong>RESERVED_LINE: CHAR TAB &quot;&lt;reserved&gt;&quot; LF</strong>
352 // The CHAR is echoed followed by an icon for the
353 // reserved character and a fixed string e.g. &quot;&lt;reserved&gt;&quot;
354
355<strong>COMMENT_LINE: TAB &quot;*&quot; SP EXPAND_LINE</strong>
356 // * is replaced by BULLET, output line as comment
357
358<strong> | TAB EXPAND_LINE</strong>
359 // Output line as comment
360
361<strong>ALIAS_LINE: TAB &quot;=&quot; SP LINE</strong>
362 // Replace = by itself, output line as alias
363
364<strong>FORMALALIAS_LINE:
365 TAB &quot;%&quot; SP NAME LF</strong>
366 // Replace % by U+203B, output line as formal alias
367
368<strong>CROSS_REF: TAB &quot;x&quot; SP CHAR SP LCNAME LF
369 | TAB &quot;x&quot; SP CHAR SP &quot;&lt;&quot; LCNAME &quot;&gt;&quot; LF</strong>
370 // x is replaced by a right arrow
371
372<strong> | TAB &quot;x&quot; SP &quot;(&quot; LCNAME SP &quot;-&quot; SP CHAR &quot;)&quot; LF
373 | TAB &quot;x&quot; SP &quot;(&quot; &quot;&lt;&quot; LCNAME &quot;&gt;&quot; SP &quot;-&quot; SP CHAR &quot;)&quot; LF</strong>
374 // x is replaced by a right arrow;
375 // (second type as used for control and noncharacters)
376
377 // In the forms with parentheses the &quot;(&quot;,&quot;-&quot; and &quot;)&quot; are removed
378 // and the order of CHAR and LCNAME is reversed;
379 // i.e. all inputs result in the same order of output
380
381<strong> | TAB &quot;x&quot; SP CHAR LF</strong>
382 // x is replaced by a right arrow
383 // (this type is the only one without LCNAME
384 // and is used for ideographs)
385
386<strong>VARIATION_LINE: TAB &quot;~&quot; SP CHAR VARSEL SP LABEL LF
387 | TAB &quot;~&quot; SP CHAR VARSEL SP LABEL &quot;(&quot; LCTAG &quot;)&quot; LF</strong>
388 // output standardized variation sequence or simply the char code in case of alternate
389 // glyphs, followed by the alternate glyph or variation glyph and the label and context
390
391<strong>FILE_COMMENT: &quot;;&quot; LINE</strong>
392
393<strong>EMPTY_LINE: LF</strong>
394 // Empty and ignored lines as well as
395 // file comments are ignored
396
397<strong>IGNORED_LINE: TAB &quot;;&quot; LINE</strong>
398 // Ignore LINE
399
400<strong>SIDEBAR_LINE: &quot;;;&quot; LINE</strong>
401 // Output LINE as marginal note
402
403<strong>DECOMPOSITION: TAB &quot;:&quot; SP EXPAND_LINE
404 | TAB &quot;:&quot; SP &quot;&lt;&quot; TAG &quot;&gt;&quot; SP EXPAND_LINE</strong>
405 // Replace ':' by EQUIV, expand line into decomposition
406 // The &lt;tag&gt; gives optional information,
407 // e.g., about composition exclusion.
408 // by convention the tag has initial lowercase
409
410<strong>COMPAT_MAPPING: TAB &quot;#&quot; SP EXPAND_LINE
411 | TAB &quot;#&quot; SP &quot;&lt;&quot; TAG &quot;&gt;&quot; SP EXPAND_LINE</strong>
412 // Replace '#' by APPROX, output line as mapping
413 // The &lt;tag&gt; is the optional compatibility decomposition tag.
414 // by convention the tag has initial lowercase
415
416<strong>NOTICE_LINE: &quot;@+&quot; TAB LINE</strong>
417 // Output LINE as notice
418
419<strong> | &quot;@+&quot; TAB &quot;*&quot; SP LINE</strong>
420 // Output LINE as notice
421 // &quot;*&quot; expands to a bullet character
422 // Notices following a character code apply to the
423 // character and are indented. Notices not following
424 // a character code apply to the page/block/column
425 // and are italicized, but not indented
426
427<strong>TITLE: &quot;@@@&quot; TAB LINE</strong>
428 // Output LINE as text
429 // Title is used in page headers
430
431<strong>SUBTITLE: &quot;@@@+&quot; TAB LINE</strong>
432 // Output LINE as subtitle
433
434<strong>SUBHEADER: &quot;@&quot; TAB LINE</strong>
435 // Output LINE as column header
436
437<strong>VARIATION_SUBHEADER:</strong> <strong>&quot;@~&quot; TAB LINE</strong>
438 // Output LINE as column header (summary subheader)
439 <strong>| &quot;@~&quot; LF</strong>
440 // Output a default standard variation sequences summary subheader
441 <strong>| &quot;@~&quot; TAB &quot;!&quot; LF</strong>
442 // Suppress output of a default standard variant sequences summary subheader
443 // and disable display of summary
444 <strong>| &quot;@~&quot; TAB &quot;!&quot; VARSEL_LIST LF</strong>
445 <strong>| &quot;@~&quot; TAB &quot;!&quot; VARSEL_LIST LINE</strong>
446 // Output a standard summary subheader, using default or LINE respectively
447 // Suppress any std variation sequences using selectors from the list
448
449<strong>ALTGLYPH_SUBHEADER:</strong> <strong>&quot;@@~&quot; TAB LINE</strong>
450 // Output LINE as column header (summary subheader)
451 <strong>| &quot;@@~&quot; LF</strong>
452 // Output a default alternate glyph summary subheader
453 <strong>| &quot;@@~&quot; TAB &quot;!&quot; LF</strong>
454 // Suppress output of a default alternate glyph summary subheader
455 // and disable display of summary
456
457<strong>MIXED_SUBHEADER: </strong><strong>&quot;@@@~&quot; TAB LINE</strong>
458 // Output LINE as column header (summary subheader)
459 <strong>| &quot;@@@~&quot; LF</strong>
460 // Output a default combined variation and alternate glyph summary subheader
461 <strong>| &quot;@@@~&quot; TAB &quot;!&quot; LF</strong>
462 // Suppress output of a default alternate glyph summary subheader
463 // and disable display of summary
464 <strong>| &quot;@@@~&quot; TAB &quot;!&quot; VARSEL_LIST LF</strong>
465 <strong>| &quot;@@@~&quot; TAB &quot;!&quot; VARSEL_LIST LINE</strong>
466 // Output a combined summary subheader, using default or LINE respectively
467 // Suppress any std variation sequences using selectors from the list
468
469<strong>BLOCKHEADER: &quot;@@&quot; TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF</strong>
470 // Cause a page break and optional
471 // blank page, then output one or more charts
472 // followed by the list of character names.
473 // Use BLOCKSTART and BLOCKEND to define
474 // what characters belong to a block.
475 // Use BLOCKNAME in page and table headers
476
477<strong>BLOCKNAME: LABEL
478 | LABEL SP &quot;(&quot; LABEL &quot;)&quot;</strong>
479 // If an alternate label is present it replaces
480 // the BLOCKNAME when an ISO-style names list is
481 // laid out; it is ignored in the Unicode charts
482
483<strong>BLOCKSTART: CHAR</strong> // First character position in block
484<strong>BLOCKEND: CHAR</strong> // Last character position in block
485<strong>PAGEBREAK: &quot;@@&quot;</strong> // Insert a (column) break
486<strong>INDEX_TAB: &quot;@@+&quot;</strong> // Start a new index tab at latest BLOCKSTART
487
488<strong>EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF</strong>
489 // Instances of CHAR (see Notes) are replaced by
490 // CHAR NBSP x NBSP where x is the single Unicode
491 // character corresponding to CHAR.
492 // If character is combining, it is replaced with
493 // CHAR NBSP &lt;circ&gt; x NBSP where &lt;circ&gt; is the
494 // dotted circle
495</pre>
496
497
498 <b>Notes:</b><ul>
499 <li>Blocks must be aligned on 16-code point boundary and contain an integer
500 multiple of 16-code point columns. The exception to that rule is for blocks of
501 ideographs, <i>etc.</i>, for which no names are listed in the file. The BLOCKEND for such blocks
502 must correspond to the last assigned character, and not the actual end of the block.</li>
503 <li>Blocks must be non-overlapping and in ascending order. NAME_LINEs
504 must be in ascending order and follow the block header for the block to
505 which they belong. </li>
506 <li>Reserved entries are optional, and will normally be supplied automatically. They are
507 required whenever followed by ALIAS_LINE, COMMENT_LINE, NOTICE_LINE or CROSS_REF.
508 </li>
509 <li>An empty alternative glyph summary subheader expression will result in default header &quot;Selected Alternative Glyphs&quot;</li>
510 <li>An empty standard variation subheader expression will result in the default header &quot;Standardized Variation Sequences&quot;</li>
511 <li>A VARSEL_LIST may only contain code points for standard variation selectors (including script specific ones)</li>
512 <li>When displaying a VARIATION_LINE for alternate glyphs, the &quot;ALTn&quot; selector is not displayed. </li>
513 <li>If a glyph is unavailable for the variant glyph in a VARIATION_LINE it is replaced by the glyph for U+2591 LIGHT SHADE.</li>
514 <li>Because a LINE or an EXPAND_LINE can itself start with a special character followed
515 by a SP or LF, an &quot;unmarked&quot; COMMENT_LINE should match the input in lower priority than line
516 types that require a special character or have a more restrictive set of characters than EXPAND_LINE.
517 Similarly, a SUBHEADER containing TAB &quot;!&quot; LF should match with a higher priority than those
518 where the TAB is followed by a LINE.</li>
519 </ul>
520
521
522<h3 id="FilePrimitives">2.2 <a href="#FilePrimitives">NamesList File Primitives</a></h3>
523
524<p>The following are the primitives and terminals for the NamesList syntax.</p>
525
526<pre><strong>LINE</strong>: <strong>STRING LF
527COMMENT: &quot;(&quot; LABEL &quot;)&quot;
528 | &quot;(&quot; LABEL &quot;)&quot; SP &quot;*&quot;
529 | &quot;*&quot;</strong>
530
531<strong>NAME</strong>: &lt;sequence of uppercase ASCII letters, digits, space and hyphen&gt;
532<strong>LCNAME</strong>: &lt;sequence of lowercase ASCII letters, digits, space and hyphen&gt; <strong> (&quot;-&quot; CHAR)?</strong>
533
534<strong>TAG</strong>: &lt;sequence of ASCII letters&gt;
535<strong>LCTAG</strong>: &lt;sequence of lowercase ASCII letters&gt;
536<strong>STRING</strong>: &lt;sequence of characters in the range U+0020..U+02FF, except controls&gt;
537<strong>LABEL</strong>: &lt;sequence of characters in the range U+0020..U+02FF, except controls, &quot;(&quot; or &quot;)&quot;&gt;
538<strong>VARSEL</strong>: <strong>CHAR
539 | &quot;ALT&quot; ( &quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot; )</strong>
540<strong>VARSEL_LIST</strong>: <strong>&quot;{&quot; CHAR_LIST &quot;}&quot;</strong>
541<strong>CHAR_LIST</strong>: <strong>CHAR
542 | CHAR_LIST SP CHAR</strong>
543<strong>CHAR</strong>: <strong>X X X X</strong>
544 <strong>| X X X X X </strong>
545 <strong>| X X X X X X </strong>
546<strong>X</strong>: <strong>&quot;0&quot;|&quot;1&quot;|&quot;2&quot;|&quot;3&quot;|&quot;4&quot;|&quot;5&quot;|&quot;6&quot;|&quot;7&quot;|&quot;8&quot;|&quot;9&quot;|&quot;A&quot;|&quot;B&quot;|&quot;C&quot;|&quot;D&quot;|&quot;E&quot;|&quot;F&quot;</strong>
547<strong>ESC_CHAR</strong>: <strong>ESC CHAR</strong>
548<strong>ESC</strong>: <strong>&quot;\&quot;</strong>
549 // Special semantics of backslash (\) are supported
550 // only in EXPAND_LINE.
551<strong>TAB</strong>: &lt;sequence of one or more ASCII tab characters 0x09&gt;
552<strong>SP</strong>: &lt;ASCII 20&gt;
553<strong>LF</strong>: &lt;any sequence of a single ASCII 0A or 0D, or both&gt;
554</pre>
555
556<p><b>Notes:</b></p>
557<ul>
558 <li>Multiple or leading spaces, multiple or leading hyphens, as well as
559 word-initial digits in NAMEs or LCNAMEs are illegal.</li>
560 <li>The French version of the names list uses French rules, which allow
561 apostrophe and accented letters in character names.</li>
562 <li>When names containing code points are lowercased to make them LCNAMEs,
563 the code point values remain uppercase. Such code points by convention
564 follow a hyphen and are the last element in the name.</li>
565 <li>Special limited lookbehind logic prevents a 4 digit number for a standard, such
566 as ISO 9999 from being misinterpreted as ISO CHAR. Currently recognized are
567 &quot;ISO&quot;, &quot;DIN&quot;, &quot;IEC&quot; and &quot;S X&quot; as well as &quot;S C&quot; for the JIS X and JIS C series of
568 standards. (In addition &quot;EEE&quot; and &quot;S X&quot; are recognized for use with IEEE and KSC X standards. For the GB series of standards, &quot; GB&quot; is defined to prevent conversion to CHAR, but has no effect at the start of a line). For other standards, or for four-digit years in a comment, use a
569 NOTICE_LINE instead, which prevents expansion, or use &quot;\&quot; to escape the digits.</li>
570 <li>Single and double straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
571 Smart apostrophes are supported, but nested quotes are not.
572 Single quotes can only be applied around a single word.</li>
573 <li>A CHAR inside ' or &quot; is expanded, but only its glyph image is printed, the
574 code value is not echoed.</li>
575 <li>Inside an EXPAND_LINE, backslash is treated as an escape character that
576 removes the special meaning of any literal character and also prevents
577 the following digit sequence from being expanded. A backslash character in
578 isolation is never displayed. A sequence of two backslash characters results
579 in display of a single backslash, but has no effect on the interpretation
580 of following characters.</li>
581 <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
582 output.</li>
583 <li>In a STRING or LABEL, a Unicode character outside the range
584 U+0000..U+02FF is displayed as is, with a glyph matching
585 the chart font, and not with the font that is otherwise defined for that element.</li>
586 <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a
587 FILE_COMMENT containing the declaration &quot;UTF-8&quot; or any casemap variation
588 thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
589 detecting the charset declaration (typically: &quot;; charset=utf-8&quot;) the
590 remainder of that comment is ignored.
591 If the file is not encoded as
592 UTF-8, the character repertoire for running text (anything
593 other than CHAR) is effectively restricted to the repertoire of Latin-1.
594 Otherwise, characters in the range U+0020..U+02FF
595 are allowed in STRING or LABEL elements, and elements derived from them.</li>
596 <li>The code chart layout program
597 (<a href="https://www.unicode.org/unibook/">Unibook</a>)
598 can accept files in several other formats. These include little-endian UTF-16,
599 prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.</li>
600 <li>While the format allows multiple &lt;tab&gt; characters, by convention the
601 actual number of tabs is always one or two, chosen to provide the best
602 layout of the plain text file.</li>
603 <li>Earlier published versions of the NamesList.txt file may contain trailing or otherwise extraneous
604 spaces or tab characters; while these are errors in the files, they are not
605 being corrected, to retain stability of the published versions. Anyone
606 writing a parser for older versions of this file may need to be prepared to
607 handle such exceptions.</li>
608 <li>Lines are terminated by \r, \n, \r\n or \n\r. Repeated terminators imply empty lines, e.g. \r\r\n is treated as 2 lines, as is \r\n\r\n.</li>
609 <li>The final LF in the file must be present.</li>
610</ul>
611 <h2 id="Modifications"><a href="#Modifications">Modifications</a></h2>
612
613 <p><b>Version 15.1.0</b></p>
614 <ul>
615 <li>Reissued for Unicode 15.0.0.</li>
616 <li>Adjusted NAMELIST definition to account for positions of FILE_COMMENT.</li>
617 <li>Added a note to the bullets in Section 2.1 to clarify priority of matching for
618 some line types.</li>
619 <li>In Section 2.2, added a note clarifying the font handling for characters
620 outside the range U+0000..U+02FF occurring in NAME or LABEL elements.</li>
621 <li>Also in Section 2.2, updated the bullet about lookbehind logic
622 for identifying digit sequences that are part of identifiers for various standards,
623 to include the detection of IEEE, KSC X, and GB standards.</li>
624 <li>Added missing quotation marks around * in second expansion for
625 NOTICE_LINE.</li>
626 <li>Corrected and clarified the BNF statement of nameslist syntax.</li>
627 <li>Some literals had not been quoted, some productions were missing the trailing LF</li>
628 <li>The LF and LCNAME productions were clarified</li>
629 <li>Updated to HTML5</li>
630 </ul>
631 <p><b>Version 15.0.0</b></p>
632 <ul>
633 <li>Reissued for Unicode 15.0.0.</li>
634 </ul>
635 <p><b>Version 14.0.0</b></p>
636 <ul>
637 <li>Reissued for Unicode 14.0.0.</li>
638 <li>Corrected character name LIGHT SCREEN to LIGHT SHADE.</li>
639 </ul>
640 <p><b>Version 13.0.0</b></p>
641 <ul>
642 <li>Reissued for Unicode 13.0.0.</li>
643 <li>Added a second expansion for DECOMPOSITION, for possible future
644 use to designate specific subtypes of canonical decompositions
645 in the names list output.</li>
646 </ul>
647 <p><b>Version 12.1.0</b></p>
648 <ul>
649 <li>Reissued for Unicode 12.1.0.</li>
650 </ul>
651 <p><b>Version 12.0.0</b></p>
652 <ul>
653 <li>Reissued for Unicode 12.0.0.</li>
654 <li>Added definition of TAG (allowing uppercase letters), distinct from LCTAG.</li>
655 <li>Corrected definition of VARIATION_LINE to use LCTAG instead of LCNAME.</li>
656 <li>Corrected definition of COMPAT_MAPPING to use TAG instead of LCTAG.</li>
657 <li>Corrected the documentation regarding which elements allow use of characters
658 in the range U+0020..U+02FF.</li>
659 </ul>
660 <p><b>Version 11.0.0</b></p>
661 <ul>
662 <li>Reissued for Unicode 11.0.0.</li>
663 <li>Loosened the limitation on repertoire allowed in LINE and LABEL
664 elements to include characters outside Latin-1, in the range
665 U+0100..U+02FF.</li>
666 </ul>
667 <p><b>Version 10.0.0</b></p>
668 <ul>
669 <li>Reissued for Unicode 10.0.0.</li>
670 </ul>
671 <p><b>Version 9.0.0</b></p>
672 <ul>
673 <li>Reissued for Unicode 9.0.0.</li>
674 </ul>
675 <p><b>Version 8.0.0</b></p>
676 <ul>
677 <li>Reissued for Unicode 8.0.0.</li>
678 <li>Added MIXED_SUBHEADER, VARSEL_LIST, and CHAR_LIST to the syntax.</li>
679 <li>Tweaked BNF and notes for variation summaries.</li>
680 </ul>
681 <p><b>Version 7.0.0</b></p>
682 <ul>
683 <li>Reissued for Unicode 7.0.0.</li>
684 </ul>
685 <p><b>Version 6.3.0</b></p>
686 <ul>
687 <li>Reissued for Unicode 6.3.0.</li>
688 </ul>
689 <p><b>Version 6.2.0</b></p>
690 <ul>
691 <li>Edited the variation syntax definitions, description and corresponding notes for wording.</li>
692 <li>Minor tweaks to the layout of BNF syntax, mostly adding tabs and | characters as needed.</li>
693 <li>Fixed some typographical errors and minor inconsistencies.</li>
694 <li>Added syntax for elements required by variation sequence and alternate glyph summaries.</li>
695 <li>Edited and reformatted some notes for readability.</li>
696 <li>Documented the permitted presence of CROSS_REF outside character entries within blocks.
697 Such CROSS_REFs have been present in published names lists, but that information was missing in
698 the syntax description. For an example see the Currency Symbols block in the code charts.</li>
699 <li>Added description of UTF-8 charset declaration and file encoding.</li>
700 </ul>
701 <p><b>Version 6.1.0</b></p>
702 <ul>
703 <li>Removed constraint that LCTAG consist only of lowercase letters,
704 because of the existence of the &quot;noBreak&quot; tag.</li>
705 </ul>
706 <p><b>Version 6.0.0</b></p>
707 <ul>
708 <li>Added definitions for ESC_CHAR and ESC primitives.</li>
709 <li>Clarified interpretation of backslash escapes in EXPAND_LINE.</li>
710 </ul>
711 <p><b>Version 5.2.0</b></p>
712 <ul>
713 <li>Better aligned the rules section with the actual published files and
714 behavior of existing parsers. This included fixing some obvious typos
715 and clarifying some notes as well as the following changes, which are
716 listed individually.</li>
717 <li>Replaced instances of &lt;tab&gt; by TAB throughout.</li>
718 <li>NAME_LINE for special names may have trailing COMMENTs including COMMENTs
719 consisting entirely of &quot;*&quot;.</li>
720 <li>In CROSS_REF added the form without LCNAME, fixed the literal to the
721 correct lowercase &quot;x&quot; and noted that LCNAME may have &quot;&lt;&quot; and &quot;&gt;&quot; around
722 it in the data. Also added missing LF in the rules.</li>
723 <li>Removed a redundant rule for BLOCKHEADER.</li>
724 <li>Changed FORMALALIAS_LINE from LINE to NAME to match actual restriction
725 on contents.</li>
726 <li>Extended the documentation of lookahead logic for CHAR.</li>
727 <li>Accounted for FILE_COMMENT in overall file structure.</li>
728 </ul>
729 <p><b>Version 5.1.0</b></p>
730 <ul>
731 <li>Noted that comments in NAME_LINEs must be preceded by SP.</li>
732 <li>Provided additional information on allowable characters in names.</li>
733 <li>Added SIDEBAR_LINE.</li>
734 <li>Noted that CROSS_REF must contain a SP and CHAR, and that
735 COMPAT_MAPPING must contain a SP and may contain a &lt;tag&gt;</li>
736 <li>Noted that LCNAME may contain uppercase characters under
737 exceptional circumstances.</li>
738 <li>Relaxed the restriction on lines starting with #, :, %, x and = on
739 the TITLE_PAGE. These are now treated as comments.</li>
740 </ul>
741 <p><b>Version 5.0.0</b></p>
742 <ul>
743 <li>Added FORMALALIAS_LINE and INDEX_TAB to syntax.</li>
744 <li>Fixed the list of lines that may appear before a BLOCKHEADER by
745 adding NOTICE_LINE.</li>
746 <li>Minor fixes to the wording of several syntax definitions.</li>
747 </ul>
748 <p><b>Version 4.0.0</b></p>
749 <ul>
750 <li>Fixed syntax to better reflect restrictions on characters
751 in character and block names.</li>
752 <li>Better document treatment of comments in block names, plus
753 French name rules.</li>
754 </ul>
755 <p><b>Version 3.2.0</b></p>
756 <ul>
757 <li>Fixed several broken links, added a left margin,
758 changed version numbering.</li>
759 </ul>
760 <p><b>Version 3.1.0 (2)</b></p>
761 <ul>
762 <li>Use of 4-6 digit hex notation is now supported.</li>
763 </ul>
764</div>
765
766 <div class="pagebottom">
767 <hr style="width:50%">
768 <a href="https://www.unicode.org/copyright.html">
769 <img src="https://www.unicode.org/img/hb_notice.gif"
770 alt="Access to Copyright and terms of use" ></a>
771 </div>
772
773</body>
774
775</html>
776