From 6013b2ded106521ee9cae6bd77dacbd5254ff763 Mon Sep 17 00:00:00 2001 From: Jose Colon Rodriguez Date: Mon, 19 Feb 2024 09:11:56 -0400 Subject: Cleaned up directory structure --- data/unicode/NamesList.html | 776 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 776 insertions(+) create mode 100644 data/unicode/NamesList.html (limited to 'data/unicode/NamesList.html') diff --git a/data/unicode/NamesList.html b/data/unicode/NamesList.html new file mode 100644 index 0000000..d6809e1 --- /dev/null +++ b/data/unicode/NamesList.html @@ -0,0 +1,776 @@ + + + + +
+ +| Revision | +15.1.0 | +
| Authors | +Asmus Freytag, Ken Whistler | +
| Date | +2023-08-23 | +
| This Version | ++ + https://www.unicode.org/Public/15.1.0/ucd/NamesList.html | +
| Previous Version | ++ + https://www.unicode.org/Public/15.0.0/ucd/NamesList.html | +
| Latest Version | +https://www.unicode.org/Public/UCD/latest/ucd/NamesList.html | +
+
++This file describes the format and contents of NamesList.txt
+
++The file and the files described herein are part of the Unicode + Character Database (UCD). The Unicode + Terms of Use apply.
+
The Unicode name list file NamesList.txt (also NamesList.lst) is a plain +text file used to drive the layout of the character code charts in the Unicode +Standard. The information in this file is a combination of several fields from +the UnicodeData.txt and Blocks.txt files, together with additional annotations +for many characters.
+This document describes the syntax rules for the file +format, but also gives brief information on how each construct is rendered +when laid out for the code charts. Some of the syntax elements are used only in +preparation of the drafts of the code charts and are not present in the final, +released form of the NamesList.txt file.
+ +Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode +5.0. The syntax for marginal sidebar comments is utilized extensively in +draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset +declaration in a comment at the head of the file were introduced after Unicode +6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction +in comments and aliases in the names list format was loosened from the prior +limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.
+ +The same input file can be used for the preparation of drafts and final editions for ISO/IEC + 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some + information in the name list file that is not needed (and in fact removed + during parsing) for the Unicode code charts.
+ +With access to the layout program (Unibook) it is a simple matter of +creating name lists for the purpose of formatting working drafts or other documents containing +proposed characters.
+The content of the NamesList.txt file is optimized for code chart creation. + Some information that can be inferred by the reader from context has been + suppressed to make the code charts more readable. See the chapter on Code + Charts in the Unicode + Standard.
+ +The NamesList files are plain text files which in their most simple form look +like this:
+ +@@<tab>0020<tab>BASIC LATIN<tab>007F
+; this is a file comment (ignored)
+0020<tab>SPACE
+0021<tab>EXCLAMATION MARK
+0022<tab>QUOTATION MARK
+. . .
+007F<tab>DELETE
The semicolon (as first character), @ and <tab> characters are used +by the file syntax and must be provided as shown. Hexadecimal digits must be +in UPPERCASE. A double @@ introduces a block header, with the title, and +start and ending code of the block provided as shown.
+ +For a minimal name list, only the NAME_LINE and BLOCKHEADER and +their constituent syntax elements are needed.
+ +The full syntax with all the options is provided in the following sections.
+ +This section defines the overall file structure
+ +NAMELIST: FILE_COMMENT* TITLE_PAGE* EXTENDED_BLOCK* + +TITLE_PAGE: TITLE + | TITLE_PAGE SUBTITLE + | TITLE_PAGE SUBHEADER + | TITLE_PAGE IGNORED_LINE + | TITLE_PAGE EMPTY_LINE + | TITLE_PAGE NOTICE_LINE + | TITLE_PAGE COMMENT_LINE + | TITLE_PAGE PAGEBREAK + | TITLE_PAGE FILE_COMMENT + + +EXTENDED_BLOCK: BLOCK + | BLOCK SUMMARY + + +BLOCK: BLOCKHEADER + | BLOCKHEADER INDEX_TAB + | BLOCK CHAR_ENTRY + | BLOCK SUBHEADER + | BLOCK NOTICE_LINE + | BLOCK EMPTY_LINE + | BLOCK IGNORED_LINE + | BLOCK SIDEBAR_LINE + | BLOCK PAGEBREAK + | BLOCK FILE_COMMENT + | BLOCK CROSS_REF + + +CHAR_ENTRY: NAME_LINE | RESERVED_LINE + | CHAR_ENTRY ALIAS_LINE + | CHAR_ENTRY FORMALALIAS_LINE + | CHAR_ENTRY COMMENT_LINE + | CHAR_ENTRY CROSS_REF + | CHAR_ENTRY DECOMPOSITION + | CHAR_ENTRY COMPAT_MAPPING + | CHAR_ENTRY IGNORED_LINE + | CHAR_ENTRY EMPTY_LINE + | CHAR_ENTRY NOTICE_LINE + | CHAR_ENTRY FILE_COMMENT + | CHAR_ENTRY VARIATION_LINE ++ +
In other words:
++ Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER.
+Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE, + EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.
+Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted + sequence of the following lines may occur (in any order and repeated as often + as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE, + EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.
+Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and + FILE_COMMENT, none of these lines may + occur in any other place.
+A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY. + A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may + appear after any block header.
+If the first line of a file is a file comment, it may contain a UTF-8 + charset declaration (see below). Alternatively, or in addition, a BOM may be + present at the very beginning of the file, forcing the encoding to be + interpreted as UTF-16 (little-endian only) or UTF-8. When + declared as UTF-8, the names list format will support use of characters in + the range U+0020..U+02FF in LINE and LABEL elements. Otherwise, + the supported repertoire is limited to Latin-1, and attempted use of characters outside + the Latin-1 range will result in data corruption.
+Several of these elements, while part of the formal definition of the + file format, do not occur in final published versions of + NamesList.txt in the UCD.
+ +A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:
++SUMMARY: ALTGLYPH_SUMMARY + | VARIATION SUMMARY + | ALTGLYPH_SUMMARY VARIATION_SUMMARY + | MIXED_SUMMARY + +ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER + | ALTGLYPH_SUMMARY SUMMARY_LINE + +VARIATION_SUMMARY: VARIATION_SUBHEADER + | VARIATION_SUMMARY SUMMARY_LINE + +MIXED_SUMMARY: MIXED_SUBHEADER + | MIXED_SUMMARY SUMMARY_LINE + +SUMMARY_LINE: SUBHEADER + | NOTICE_LINE + | FILE_COMMENT + | EMPTY_LINE ++ +
When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements +of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and +preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately +follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are +interspersed between items in the summary.
+ +These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are +omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements +as described below, Unibook will automatically generate any required summaries using a default format for the headers.
+ +Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to +provide specific contents for these summary titles as well as allow the ability to add additional +information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list +is machine generated and will always explicitly provide any summary subheaders.
+ +This section provides the details of the syntax for the individual elements.
+ +ELEMENT SYNTAX // How rendered
+
+NAME_LINE: CHAR TAB NAME LF
+ // The CHAR and the corresponding image are echoed,
+ // followed by the name as given in NAME
+
+ | CHAR TAB "<" LCNAME ">" LF
+ // Control and noncharacters use this form of
+ // lowercase, bracketed pseudo character name
+
+ | CHAR TAB NAME SP COMMENT LF
+ // Names may have a comment, which is stripped off
+ // unless the file is parsed for an ISO style list
+
+ | CHAR TAB "<" LCNAME ">" SP COMMENT LF
+ // Control and noncharacters may also have comments
+
+RESERVED_LINE: CHAR TAB "<reserved>" LF
+ // The CHAR is echoed followed by an icon for the
+ // reserved character and a fixed string e.g. "<reserved>"
+
+COMMENT_LINE: TAB "*" SP EXPAND_LINE
+ // * is replaced by BULLET, output line as comment
+
+ | TAB EXPAND_LINE
+ // Output line as comment
+
+ALIAS_LINE: TAB "=" SP LINE
+ // Replace = by itself, output line as alias
+
+FORMALALIAS_LINE:
+ TAB "%" SP NAME LF
+ // Replace % by U+203B, output line as formal alias
+
+CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
+ | TAB "x" SP CHAR SP "<" LCNAME ">" LF
+ // x is replaced by a right arrow
+
+ | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
+ | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF
+ // x is replaced by a right arrow;
+ // (second type as used for control and noncharacters)
+
+ // In the forms with parentheses the "(","-" and ")" are removed
+ // and the order of CHAR and LCNAME is reversed;
+ // i.e. all inputs result in the same order of output
+
+ | TAB "x" SP CHAR LF
+ // x is replaced by a right arrow
+ // (this type is the only one without LCNAME
+ // and is used for ideographs)
+
+VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
+ | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")" LF
+ // output standardized variation sequence or simply the char code in case of alternate
+ // glyphs, followed by the alternate glyph or variation glyph and the label and context
+
+FILE_COMMENT: ";" LINE
+
+EMPTY_LINE: LF
+ // Empty and ignored lines as well as
+ // file comments are ignored
+
+IGNORED_LINE: TAB ";" LINE
+ // Ignore LINE
+
+SIDEBAR_LINE: ";;" LINE
+ // Output LINE as marginal note
+
+DECOMPOSITION: TAB ":" SP EXPAND_LINE
+ | TAB ":" SP "<" TAG ">" SP EXPAND_LINE
+ // Replace ':' by EQUIV, expand line into decomposition
+ // The <tag> gives optional information,
+ // e.g., about composition exclusion.
+ // by convention the tag has initial lowercase
+
+COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
+ | TAB "#" SP "<" TAG ">" SP EXPAND_LINE
+ // Replace '#' by APPROX, output line as mapping
+ // The <tag> is the optional compatibility decomposition tag.
+ // by convention the tag has initial lowercase
+
+NOTICE_LINE: "@+" TAB LINE
+ // Output LINE as notice
+
+ | "@+" TAB "*" SP LINE
+ // Output LINE as notice
+ // "*" expands to a bullet character
+ // Notices following a character code apply to the
+ // character and are indented. Notices not following
+ // a character code apply to the page/block/column
+ // and are italicized, but not indented
+
+TITLE: "@@@" TAB LINE
+ // Output LINE as text
+ // Title is used in page headers
+
+SUBTITLE: "@@@+" TAB LINE
+ // Output LINE as subtitle
+
+SUBHEADER: "@" TAB LINE
+ // Output LINE as column header
+
+VARIATION_SUBHEADER: "@~" TAB LINE
+ // Output LINE as column header (summary subheader)
+ | "@~" LF
+ // Output a default standard variation sequences summary subheader
+ | "@~" TAB "!" LF
+ // Suppress output of a default standard variant sequences summary subheader
+ // and disable display of summary
+ | "@~" TAB "!" VARSEL_LIST LF
+ | "@~" TAB "!" VARSEL_LIST LINE
+ // Output a standard summary subheader, using default or LINE respectively
+ // Suppress any std variation sequences using selectors from the list
+
+ALTGLYPH_SUBHEADER: "@@~" TAB LINE
+ // Output LINE as column header (summary subheader)
+ | "@@~" LF
+ // Output a default alternate glyph summary subheader
+ | "@@~" TAB "!" LF
+ // Suppress output of a default alternate glyph summary subheader
+ // and disable display of summary
+
+MIXED_SUBHEADER: "@@@~" TAB LINE
+ // Output LINE as column header (summary subheader)
+ | "@@@~" LF
+ // Output a default combined variation and alternate glyph summary subheader
+ | "@@@~" TAB "!" LF
+ // Suppress output of a default alternate glyph summary subheader
+ // and disable display of summary
+ | "@@@~" TAB "!" VARSEL_LIST LF
+ | "@@@~" TAB "!" VARSEL_LIST LINE
+ // Output a combined summary subheader, using default or LINE respectively
+ // Suppress any std variation sequences using selectors from the list
+
+BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF
+ // Cause a page break and optional
+ // blank page, then output one or more charts
+ // followed by the list of character names.
+ // Use BLOCKSTART and BLOCKEND to define
+ // what characters belong to a block.
+ // Use BLOCKNAME in page and table headers
+
+BLOCKNAME: LABEL
+ | LABEL SP "(" LABEL ")"
+ // If an alternate label is present it replaces
+ // the BLOCKNAME when an ISO-style names list is
+ // laid out; it is ignored in the Unicode charts
+
+BLOCKSTART: CHAR // First character position in block
+BLOCKEND: CHAR // Last character position in block
+PAGEBREAK: "@@" // Insert a (column) break
+INDEX_TAB: "@@+" // Start a new index tab at latest BLOCKSTART
+
+EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF
+ // Instances of CHAR (see Notes) are replaced by
+ // CHAR NBSP x NBSP where x is the single Unicode
+ // character corresponding to CHAR.
+ // If character is combining, it is replaced with
+ // CHAR NBSP <circ> x NBSP where <circ> is the
+ // dotted circle
+
+
+
+ Notes:The following are the primitives and terminals for the NamesList syntax.
+ +LINE: STRING LF
+COMMENT: "(" LABEL ")"
+ | "(" LABEL ")" SP "*"
+ | "*"
+
+NAME: <sequence of uppercase ASCII letters, digits, space and hyphen>
+LCNAME: <sequence of lowercase ASCII letters, digits, space and hyphen> ("-" CHAR)?
+
+TAG: <sequence of ASCII letters>
+LCTAG: <sequence of lowercase ASCII letters>
+STRING: <sequence of characters in the range U+0020..U+02FF, except controls>
+LABEL: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
+VARSEL: CHAR
+ | "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
+VARSEL_LIST: "{" CHAR_LIST "}"
+CHAR_LIST: CHAR
+ | CHAR_LIST SP CHAR
+CHAR: X X X X
+ | X X X X X
+ | X X X X X X
+X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"
+ESC_CHAR: ESC CHAR
+ESC: "\"
+ // Special semantics of backslash (\) are supported
+ // only in EXPAND_LINE.
+TAB: <sequence of one or more ASCII tab characters 0x09>
+SP: <ASCII 20>
+LF: <any sequence of a single ASCII 0A or 0D, or both>
+
+
+Notes:
+Version 15.1.0
+Version 15.0.0
+Version 14.0.0
+Version 13.0.0
+Version 12.1.0
+Version 12.0.0
+Version 11.0.0
+Version 10.0.0
+Version 9.0.0
+Version 8.0.0
+Version 7.0.0
+Version 6.3.0
+Version 6.2.0
+Version 6.1.0
+Version 6.0.0
+Version 5.2.0
+Version 5.1.0
+Version 5.0.0
+Version 4.0.0
+Version 3.2.0
+Version 3.1.0 (2)
+
+