From 6013b2ded106521ee9cae6bd77dacbd5254ff763 Mon Sep 17 00:00:00 2001 From: Jose Colon Rodriguez Date: Mon, 19 Feb 2024 09:11:56 -0400 Subject: Cleaned up directory structure --- data/unicode/NamesList.html | 776 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 776 insertions(+) create mode 100644 data/unicode/NamesList.html (limited to 'data/unicode/NamesList.html') diff --git a/data/unicode/NamesList.html b/data/unicode/NamesList.html new file mode 100644 index 0000000..d6809e1 --- /dev/null +++ b/data/unicode/NamesList.html @@ -0,0 +1,776 @@ + + + + + + +Unicode NamesList Format + + + + + + + +
 
+ +
+

UnicodeĀ® NamesList File Format

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
Revision15.1.0
AuthorsAsmus Freytag, Ken Whistler
Date2023-08-23
This Version + + https://www.unicode.org/Public/15.1.0/ucd/NamesList.html
Previous Version + + https://www.unicode.org/Public/15.0.0/ucd/NamesList.html
Latest Versionhttps://www.unicode.org/Public/UCD/latest/ucd/NamesList.html
+

 

+

Summary

+
+

This file describes the format and contents of NamesList.txt

+
+

Status

+
+

The file and the files described herein are part of the Unicode + Character Database (UCD). The Unicode + Terms of Use apply.

+
+
+ +

1.0 Introduction

+ +

The Unicode name list file NamesList.txt (also NamesList.lst) is a plain +text file used to drive the layout of the character code charts in the Unicode +Standard. The information in this file is a combination of several fields from +the UnicodeData.txt and Blocks.txt files, together with additional annotations +for many characters.

+

This document describes the syntax rules for the file +format, but also gives brief information on how each construct is rendered +when laid out for the code charts. Some of the syntax elements are used only in +preparation of the drafts of the code charts and are not present in the final, +released form of the NamesList.txt file.

+ +

Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode +5.0. The syntax for marginal sidebar comments is utilized extensively in +draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset +declaration in a comment at the head of the file were introduced after Unicode +6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction +in comments and aliases in the names list format was loosened from the prior +limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.

+ +

The same input file can be used for the preparation of drafts and final editions for ISO/IEC + 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some + information in the name list file that is not needed (and in fact removed + during parsing) for the Unicode code charts.

+ +

With access to the layout program (Unibook) it is a simple matter of +creating name lists for the purpose of formatting working drafts or other documents containing +proposed characters.

+

The content of the NamesList.txt file is optimized for code chart creation. + Some information that can be inferred by the reader from context has been + suppressed to make the code charts more readable. See the chapter on Code + Charts in the Unicode + Standard.

+ +

1.1 NamesList File Overview

+ +

The NamesList files are plain text files which in their most simple form look +like this:

+ +

@@<tab>0020<tab>BASIC LATIN<tab>007F
+; this is a file comment (ignored)
+0020<tab>SPACE
+0021<tab>EXCLAMATION MARK
+0022<tab>QUOTATION MARK
+. . .
+007F<tab>DELETE

+ +

The semicolon (as first character), @ and <tab> characters are used +by the file syntax and must be provided as shown. Hexadecimal digits must be +in UPPERCASE. A double @@ introduces a block header, with the title, and +start and ending code of the block provided as shown.

+ +

For a minimal name list, only the NAME_LINE and BLOCKHEADER and +their constituent syntax elements are needed.

+ +

The full syntax with all the options is provided in the following sections.

+ +

2.0 NamesList File Structure

+ +

This section defines the overall file structure

+ +
NAMELIST:     FILE_COMMENT* TITLE_PAGE* EXTENDED_BLOCK*
+
+TITLE_PAGE:   TITLE 
+		| TITLE_PAGE SUBTITLE 
+		| TITLE_PAGE SUBHEADER 
+		| TITLE_PAGE IGNORED_LINE 
+		| TITLE_PAGE EMPTY_LINE
+		| TITLE_PAGE NOTICE_LINE
+		| TITLE_PAGE COMMENT_LINE
+		| TITLE_PAGE PAGEBREAK 
+		| TITLE_PAGE FILE_COMMENT 
+
+
+EXTENDED_BLOCK:	BLOCK 
+		| BLOCK SUMMARY
+
+
+BLOCK:	BLOCKHEADER 
+		| BLOCKHEADER INDEX_TAB
+		| BLOCK CHAR_ENTRY 
+		| BLOCK SUBHEADER 
+		| BLOCK NOTICE_LINE 
+		| BLOCK EMPTY_LINE 
+		| BLOCK IGNORED_LINE
+		| BLOCK SIDEBAR_LINE
+		| BLOCK PAGEBREAK
+		| BLOCK FILE_COMMENT
+		| BLOCK CROSS_REF
+
+
+CHAR_ENTRY:   NAME_LINE | RESERVED_LINE
+		| CHAR_ENTRY ALIAS_LINE
+		| CHAR_ENTRY FORMALALIAS_LINE
+		| CHAR_ENTRY COMMENT_LINE
+		| CHAR_ENTRY CROSS_REF
+		| CHAR_ENTRY DECOMPOSITION
+		| CHAR_ENTRY COMPAT_MAPPING
+		| CHAR_ENTRY IGNORED_LINE
+		| CHAR_ENTRY EMPTY_LINE
+		| CHAR_ENTRY NOTICE_LINE
+		| CHAR_ENTRY FILE_COMMENT 
+		| CHAR_ENTRY VARIATION_LINE
+
+ +

In other words:

+

+ Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER.

+

Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE, + EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.

+ +

Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted + sequence of the following lines may occur (in any order and repeated as often + as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE, + EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.

+ +

Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and + FILE_COMMENT, none of these lines may + occur in any other place.

+ +

A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY. + A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may + appear after any block header.

+

If the first line of a file is a file comment, it may contain a UTF-8 + charset declaration (see below). Alternatively, or in addition, a BOM may be + present at the very beginning of the file, forcing the encoding to be + interpreted as UTF-16 (little-endian only) or UTF-8. When + declared as UTF-8, the names list format will support use of characters in + the range U+0020..U+02FF in LINE and LABEL elements. Otherwise, + the supported repertoire is limited to Latin-1, and attempted use of characters outside + the Latin-1 range will result in data corruption.

+

Several of these elements, while part of the formal definition of the + file format, do not occur in final published versions of + NamesList.txt in the UCD.

+ +

Blocks followed by Summaries

+

A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:

+

+SUMMARY:   ALTGLYPH_SUMMARY
+		| VARIATION SUMMARY
+		| ALTGLYPH_SUMMARY VARIATION_SUMMARY
+		| MIXED_SUMMARY
+
+ALTGLYPH_SUMMARY:   ALTGLYPH_SUBHEADER
+		| ALTGLYPH_SUMMARY SUMMARY_LINE
+
+VARIATION_SUMMARY:   VARIATION_SUBHEADER
+		| VARIATION_SUMMARY SUMMARY_LINE
+
+MIXED_SUMMARY:   MIXED_SUBHEADER
+		| MIXED_SUMMARY SUMMARY_LINE
+
+SUMMARY_LINE:   SUBHEADER
+		| NOTICE_LINE
+		| FILE_COMMENT
+		| EMPTY_LINE
+
+ +

When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements +of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and +preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately +follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are +interspersed between items in the summary.

+ +

These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are +omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements +as described below, Unibook will automatically generate any required summaries using a default format for the headers.

+ +

Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to +provide specific contents for these summary titles as well as allow the ability to add additional +information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list +is machine generated and will always explicitly provide any summary subheaders.

+ +

2.1 NamesList File Elements

+ +

This section provides the details of the syntax for the individual elements.

+ +
ELEMENT		SYNTAX	// How rendered
+
+NAME_LINE:	CHAR TAB NAME LF
+			// The CHAR and the corresponding image are echoed, 
+			// followed by the name as given in NAME
+
+		| CHAR TAB "<" LCNAME ">" LF
+			// Control and noncharacters use this form of
+			// lowercase, bracketed pseudo character name
+
+		| CHAR TAB NAME SP COMMENT LF
+			// Names may have a comment, which is stripped off
+			// unless the file is parsed for an ISO style list
+                        
+		| CHAR TAB "<" LCNAME ">" SP COMMENT LF
+			// Control and noncharacters may also have comments
+
+RESERVED_LINE:	CHAR TAB "<reserved>" LF
+			// The CHAR is echoed followed by an icon for the
+			// reserved character and a fixed string e.g. "<reserved>"
+
+COMMENT_LINE:	TAB "*" SP EXPAND_LINE
+			// * is replaced by BULLET, output line as comment
+
+		| TAB EXPAND_LINE       
+			// Output line as comment
+
+ALIAS_LINE:	TAB "=" SP LINE      
+			// Replace = by itself, output line as alias
+
+FORMALALIAS_LINE:
+		TAB "%" SP NAME LF
+			// Replace % by U+203B, output line as formal alias
+
+CROSS_REF:	TAB "x" SP CHAR SP LCNAME LF
+		| TAB "x" SP CHAR SP "<" LCNAME ">" LF
+			// x is replaced by a right arrow
+
+		| TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
+		| TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF
+			// x is replaced by a right arrow;
+			// (second type as used for control and noncharacters)
+
+			// In the forms with parentheses the "(","-" and ")" are removed
+			// and the order of CHAR and LCNAME is reversed;
+			// i.e. all inputs result in the same order of output
+
+		| TAB "x" SP CHAR LF
+			// x is replaced by a right arrow
+			// (this type is the only one without LCNAME 
+			// and is used for ideographs)
+
+VARIATION_LINE:	TAB "~" SP CHAR VARSEL SP LABEL LF   
+		| TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")" LF
+			// output standardized variation sequence or simply the char code in case of alternate
+			// glyphs, followed by the alternate glyph or variation glyph and the label and context
+
+FILE_COMMENT:	";"  LINE
+
+EMPTY_LINE:	LF       
+			// Empty and ignored lines as well as 
+			// file comments are ignored
+
+IGNORED_LINE:	TAB ";" LINE
+			// Ignore LINE
+
+SIDEBAR_LINE: 	";;" LINE
+			// Output LINE as marginal note
+
+DECOMPOSITION:	TAB ":" SP EXPAND_LINE
+		| TAB ":" SP "<" TAG ">" SP EXPAND_LINE
+			// Replace ':' by EQUIV, expand line into decomposition
+			// The <tag> gives optional information,
+			// e.g., about composition exclusion.
+			// by convention the tag has initial lowercase
+
+COMPAT_MAPPING:	TAB "#" SP EXPAND_LINE
+		| TAB "#" SP "<" TAG ">" SP EXPAND_LINE
+			// Replace '#' by APPROX, output line as mapping
+			// The <tag> is the optional compatibility decomposition tag.
+			// by convention the tag has initial lowercase
+
+NOTICE_LINE:	"@+" TAB LINE       
+			// Output LINE as notice
+
+		| "@+" TAB "*" SP LINE   
+			// Output LINE as notice
+			// "*" expands to a bullet character
+			// Notices following a character code apply to the
+			// character and are indented. Notices not following
+			// a character code apply to the page/block/column 
+			// and are italicized, but not indented
+
+TITLE:		"@@@" TAB LINE
+			// Output LINE as text
+			// Title is used in page headers
+
+SUBTITLE:	"@@@+" TAB LINE
+			// Output LINE as subtitle
+
+SUBHEADER:	"@" TAB LINE
+			// Output LINE as column header
+
+VARIATION_SUBHEADER:	"@~" TAB LINE		
+			// Output LINE as column header (summary subheader)
+		| "@~" LF
+			// Output a default standard variation sequences summary subheader
+		| "@~" TAB "!" LF
+			// Suppress output of a default standard variant sequences summary subheader
+			// and disable display of summary
+		| "@~" TAB "!" VARSEL_LIST LF
+		| "@~" TAB "!" VARSEL_LIST LINE
+			// Output a standard summary subheader, using default or LINE respectively
+			// Suppress any std variation sequences using selectors from the list
+
+ALTGLYPH_SUBHEADER:	"@@~" TAB LINE	
+			// Output LINE as column header (summary subheader)
+		| "@@~" LF
+			// Output a default alternate glyph summary subheader
+		| "@@~" TAB "!" LF
+			// Suppress output of a default alternate glyph summary subheader
+			// and disable display of summary
+
+MIXED_SUBHEADER:	"@@@~" TAB LINE
+			// Output LINE as column header (summary subheader)
+		| "@@@~" LF
+			// Output a default combined variation and alternate glyph summary subheader
+		| "@@@~" TAB "!" LF
+			// Suppress output of a default alternate glyph summary subheader
+			// and disable display of summary
+		| "@@@~" TAB "!" VARSEL_LIST LF
+		| "@@@~" TAB "!" VARSEL_LIST LINE
+			// Output a combined summary subheader, using default or LINE respectively
+			// Suppress any std variation sequences using selectors from the list
+
+BLOCKHEADER:	"@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF
+			// Cause a page break and optional
+			// blank page, then output one or more charts
+			// followed by the list of character names.
+			// Use BLOCKSTART and BLOCKEND to define
+			// what characters belong to a block.
+			// Use BLOCKNAME in page and table headers
+
+BLOCKNAME:	LABEL
+		| LABEL SP "(" LABEL ")"   
+			// If an alternate label is present it replaces
+			// the BLOCKNAME when an ISO-style names list is
+			// laid out; it is ignored in the Unicode charts
+
+BLOCKSTART:	CHAR	// First character position in block
+BLOCKEND:	CHAR	// Last character position in block
+PAGEBREAK:	"@@"	// Insert a (column) break
+INDEX_TAB:	"@@+"	// Start a new index tab at latest BLOCKSTART
+
+EXPAND_LINE:	{ESC_CHAR | CHAR | STRING | ESC +}+ LF
+			// Instances of CHAR (see Notes) are replaced by
+			// CHAR NBSP x NBSP where x is the single Unicode
+			// character corresponding to CHAR.
+			// If character is combining, it is replaced with
+			// CHAR NBSP <circ> x NBSP where <circ> is the
+			// dotted circle
+
+ + + Notes: + + +

2.2 NamesList File Primitives

+ +

The following are the primitives and terminals for the NamesList syntax.

+ +
LINE:		STRING LF
+COMMENT:	"(" LABEL ")"
+		| "(" LABEL ")" SP "*"
+		| "*"
+
+NAME:	  	<sequence of uppercase ASCII letters, digits, space and hyphen> 
+LCNAME:		<sequence of lowercase ASCII letters, digits, space and hyphen>  ("-" CHAR)?
+
+TAG:		<sequence of ASCII letters>
+LCTAG:		<sequence of lowercase ASCII letters>
+STRING:	  	<sequence of characters in the range U+0020..U+02FF, except controls> 
+LABEL:	  	<sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")"> 
+VARSEL:		CHAR
+		| "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
+VARSEL_LIST:	"{" CHAR_LIST "}"
+CHAR_LIST:	CHAR
+		| CHAR_LIST SP CHAR
+CHAR:		X X X X
+		| X X X X X 
+		| X X X X X X 
+X:	  	"0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" 
+ESC_CHAR:	ESC CHAR
+ESC:	        "\"
+			// Special semantics of backslash (\) are supported
+			// only in EXPAND_LINE.
+TAB:	  	<sequence of one or more ASCII tab characters 0x09>    
+SP:	  	<ASCII 20>
+LF:	  	<any sequence of a single ASCII 0A or 0D, or both>
+
+ +

Notes:

+ +

Modifications

+ +

Version 15.1.0

+ +

Version 15.0.0

+ +

Version 14.0.0

+ +

Version 13.0.0

+ +

Version 12.1.0

+ +

Version 12.0.0

+ +

Version 11.0.0

+ +

Version 10.0.0

+ +

Version 9.0.0

+ +

Version 8.0.0

+ +

Version 7.0.0

+ +

Version 6.3.0

+ +

Version 6.2.0

+ +

Version 6.1.0

+ +

Version 6.0.0

+ +

Version 5.2.0

+ +

Version 5.1.0

+ +

Version 5.0.0

+ +

Version 4.0.0

+ +

Version 3.2.0

+ +

Version 3.1.0 (2)

+ +
+ +
+
+ + Access to Copyright and terms of use +
+ + + + + -- cgit v1.2.3