From 6013b2ded106521ee9cae6bd77dacbd5254ff763 Mon Sep 17 00:00:00 2001 From: Jose Colon Rodriguez Date: Mon, 19 Feb 2024 09:11:56 -0400 Subject: Cleaned up directory structure --- unicode/NamesList.html | 776 ------------------------------------------------- 1 file changed, 776 deletions(-) delete mode 100644 unicode/NamesList.html (limited to 'unicode/NamesList.html') diff --git a/unicode/NamesList.html b/unicode/NamesList.html deleted file mode 100644 index d6809e1..0000000 --- a/unicode/NamesList.html +++ /dev/null @@ -1,776 +0,0 @@ - - - - -
- -| Revision | -15.1.0 | -
| Authors | -Asmus Freytag, Ken Whistler | -
| Date | -2023-08-23 | -
| This Version | -- - https://www.unicode.org/Public/15.1.0/ucd/NamesList.html | -
| Previous Version | -- - https://www.unicode.org/Public/15.0.0/ucd/NamesList.html | -
| Latest Version | -https://www.unicode.org/Public/UCD/latest/ucd/NamesList.html | -
-
--This file describes the format and contents of NamesList.txt
-
--The file and the files described herein are part of the Unicode - Character Database (UCD). The Unicode - Terms of Use apply.
-
The Unicode name list file NamesList.txt (also NamesList.lst) is a plain -text file used to drive the layout of the character code charts in the Unicode -Standard. The information in this file is a combination of several fields from -the UnicodeData.txt and Blocks.txt files, together with additional annotations -for many characters.
-This document describes the syntax rules for the file -format, but also gives brief information on how each construct is rendered -when laid out for the code charts. Some of the syntax elements are used only in -preparation of the drafts of the code charts and are not present in the final, -released form of the NamesList.txt file.
- -Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode -5.0. The syntax for marginal sidebar comments is utilized extensively in -draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset -declaration in a comment at the head of the file were introduced after Unicode -6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction -in comments and aliases in the names list format was loosened from the prior -limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.
- -The same input file can be used for the preparation of drafts and final editions for ISO/IEC - 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some - information in the name list file that is not needed (and in fact removed - during parsing) for the Unicode code charts.
- -With access to the layout program (Unibook) it is a simple matter of -creating name lists for the purpose of formatting working drafts or other documents containing -proposed characters.
-The content of the NamesList.txt file is optimized for code chart creation. - Some information that can be inferred by the reader from context has been - suppressed to make the code charts more readable. See the chapter on Code - Charts in the Unicode - Standard.
- -The NamesList files are plain text files which in their most simple form look -like this:
- -@@<tab>0020<tab>BASIC LATIN<tab>007F
-; this is a file comment (ignored)
-0020<tab>SPACE
-0021<tab>EXCLAMATION MARK
-0022<tab>QUOTATION MARK
-. . .
-007F<tab>DELETE
The semicolon (as first character), @ and <tab> characters are used -by the file syntax and must be provided as shown. Hexadecimal digits must be -in UPPERCASE. A double @@ introduces a block header, with the title, and -start and ending code of the block provided as shown.
- -For a minimal name list, only the NAME_LINE and BLOCKHEADER and -their constituent syntax elements are needed.
- -The full syntax with all the options is provided in the following sections.
- -This section defines the overall file structure
- -NAMELIST: FILE_COMMENT* TITLE_PAGE* EXTENDED_BLOCK* - -TITLE_PAGE: TITLE - | TITLE_PAGE SUBTITLE - | TITLE_PAGE SUBHEADER - | TITLE_PAGE IGNORED_LINE - | TITLE_PAGE EMPTY_LINE - | TITLE_PAGE NOTICE_LINE - | TITLE_PAGE COMMENT_LINE - | TITLE_PAGE PAGEBREAK - | TITLE_PAGE FILE_COMMENT - - -EXTENDED_BLOCK: BLOCK - | BLOCK SUMMARY - - -BLOCK: BLOCKHEADER - | BLOCKHEADER INDEX_TAB - | BLOCK CHAR_ENTRY - | BLOCK SUBHEADER - | BLOCK NOTICE_LINE - | BLOCK EMPTY_LINE - | BLOCK IGNORED_LINE - | BLOCK SIDEBAR_LINE - | BLOCK PAGEBREAK - | BLOCK FILE_COMMENT - | BLOCK CROSS_REF - - -CHAR_ENTRY: NAME_LINE | RESERVED_LINE - | CHAR_ENTRY ALIAS_LINE - | CHAR_ENTRY FORMALALIAS_LINE - | CHAR_ENTRY COMMENT_LINE - | CHAR_ENTRY CROSS_REF - | CHAR_ENTRY DECOMPOSITION - | CHAR_ENTRY COMPAT_MAPPING - | CHAR_ENTRY IGNORED_LINE - | CHAR_ENTRY EMPTY_LINE - | CHAR_ENTRY NOTICE_LINE - | CHAR_ENTRY FILE_COMMENT - | CHAR_ENTRY VARIATION_LINE -- -
In other words:
-- Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER.
-Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE, - EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.
-Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted - sequence of the following lines may occur (in any order and repeated as often - as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE, - EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.
-Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and - FILE_COMMENT, none of these lines may - occur in any other place.
-A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY. - A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may - appear after any block header.
-If the first line of a file is a file comment, it may contain a UTF-8 - charset declaration (see below). Alternatively, or in addition, a BOM may be - present at the very beginning of the file, forcing the encoding to be - interpreted as UTF-16 (little-endian only) or UTF-8. When - declared as UTF-8, the names list format will support use of characters in - the range U+0020..U+02FF in LINE and LABEL elements. Otherwise, - the supported repertoire is limited to Latin-1, and attempted use of characters outside - the Latin-1 range will result in data corruption.
-Several of these elements, while part of the formal definition of the - file format, do not occur in final published versions of - NamesList.txt in the UCD.
- -A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:
--SUMMARY: ALTGLYPH_SUMMARY - | VARIATION SUMMARY - | ALTGLYPH_SUMMARY VARIATION_SUMMARY - | MIXED_SUMMARY - -ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER - | ALTGLYPH_SUMMARY SUMMARY_LINE - -VARIATION_SUMMARY: VARIATION_SUBHEADER - | VARIATION_SUMMARY SUMMARY_LINE - -MIXED_SUMMARY: MIXED_SUBHEADER - | MIXED_SUMMARY SUMMARY_LINE - -SUMMARY_LINE: SUBHEADER - | NOTICE_LINE - | FILE_COMMENT - | EMPTY_LINE -- -
When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements -of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and -preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately -follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are -interspersed between items in the summary.
- -These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are -omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements -as described below, Unibook will automatically generate any required summaries using a default format for the headers.
- -Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to -provide specific contents for these summary titles as well as allow the ability to add additional -information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list -is machine generated and will always explicitly provide any summary subheaders.
- -This section provides the details of the syntax for the individual elements.
- -ELEMENT SYNTAX // How rendered
-
-NAME_LINE: CHAR TAB NAME LF
- // The CHAR and the corresponding image are echoed,
- // followed by the name as given in NAME
-
- | CHAR TAB "<" LCNAME ">" LF
- // Control and noncharacters use this form of
- // lowercase, bracketed pseudo character name
-
- | CHAR TAB NAME SP COMMENT LF
- // Names may have a comment, which is stripped off
- // unless the file is parsed for an ISO style list
-
- | CHAR TAB "<" LCNAME ">" SP COMMENT LF
- // Control and noncharacters may also have comments
-
-RESERVED_LINE: CHAR TAB "<reserved>" LF
- // The CHAR is echoed followed by an icon for the
- // reserved character and a fixed string e.g. "<reserved>"
-
-COMMENT_LINE: TAB "*" SP EXPAND_LINE
- // * is replaced by BULLET, output line as comment
-
- | TAB EXPAND_LINE
- // Output line as comment
-
-ALIAS_LINE: TAB "=" SP LINE
- // Replace = by itself, output line as alias
-
-FORMALALIAS_LINE:
- TAB "%" SP NAME LF
- // Replace % by U+203B, output line as formal alias
-
-CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
- | TAB "x" SP CHAR SP "<" LCNAME ">" LF
- // x is replaced by a right arrow
-
- | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
- | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF
- // x is replaced by a right arrow;
- // (second type as used for control and noncharacters)
-
- // In the forms with parentheses the "(","-" and ")" are removed
- // and the order of CHAR and LCNAME is reversed;
- // i.e. all inputs result in the same order of output
-
- | TAB "x" SP CHAR LF
- // x is replaced by a right arrow
- // (this type is the only one without LCNAME
- // and is used for ideographs)
-
-VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
- | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")" LF
- // output standardized variation sequence or simply the char code in case of alternate
- // glyphs, followed by the alternate glyph or variation glyph and the label and context
-
-FILE_COMMENT: ";" LINE
-
-EMPTY_LINE: LF
- // Empty and ignored lines as well as
- // file comments are ignored
-
-IGNORED_LINE: TAB ";" LINE
- // Ignore LINE
-
-SIDEBAR_LINE: ";;" LINE
- // Output LINE as marginal note
-
-DECOMPOSITION: TAB ":" SP EXPAND_LINE
- | TAB ":" SP "<" TAG ">" SP EXPAND_LINE
- // Replace ':' by EQUIV, expand line into decomposition
- // The <tag> gives optional information,
- // e.g., about composition exclusion.
- // by convention the tag has initial lowercase
-
-COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
- | TAB "#" SP "<" TAG ">" SP EXPAND_LINE
- // Replace '#' by APPROX, output line as mapping
- // The <tag> is the optional compatibility decomposition tag.
- // by convention the tag has initial lowercase
-
-NOTICE_LINE: "@+" TAB LINE
- // Output LINE as notice
-
- | "@+" TAB "*" SP LINE
- // Output LINE as notice
- // "*" expands to a bullet character
- // Notices following a character code apply to the
- // character and are indented. Notices not following
- // a character code apply to the page/block/column
- // and are italicized, but not indented
-
-TITLE: "@@@" TAB LINE
- // Output LINE as text
- // Title is used in page headers
-
-SUBTITLE: "@@@+" TAB LINE
- // Output LINE as subtitle
-
-SUBHEADER: "@" TAB LINE
- // Output LINE as column header
-
-VARIATION_SUBHEADER: "@~" TAB LINE
- // Output LINE as column header (summary subheader)
- | "@~" LF
- // Output a default standard variation sequences summary subheader
- | "@~" TAB "!" LF
- // Suppress output of a default standard variant sequences summary subheader
- // and disable display of summary
- | "@~" TAB "!" VARSEL_LIST LF
- | "@~" TAB "!" VARSEL_LIST LINE
- // Output a standard summary subheader, using default or LINE respectively
- // Suppress any std variation sequences using selectors from the list
-
-ALTGLYPH_SUBHEADER: "@@~" TAB LINE
- // Output LINE as column header (summary subheader)
- | "@@~" LF
- // Output a default alternate glyph summary subheader
- | "@@~" TAB "!" LF
- // Suppress output of a default alternate glyph summary subheader
- // and disable display of summary
-
-MIXED_SUBHEADER: "@@@~" TAB LINE
- // Output LINE as column header (summary subheader)
- | "@@@~" LF
- // Output a default combined variation and alternate glyph summary subheader
- | "@@@~" TAB "!" LF
- // Suppress output of a default alternate glyph summary subheader
- // and disable display of summary
- | "@@@~" TAB "!" VARSEL_LIST LF
- | "@@@~" TAB "!" VARSEL_LIST LINE
- // Output a combined summary subheader, using default or LINE respectively
- // Suppress any std variation sequences using selectors from the list
-
-BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF
- // Cause a page break and optional
- // blank page, then output one or more charts
- // followed by the list of character names.
- // Use BLOCKSTART and BLOCKEND to define
- // what characters belong to a block.
- // Use BLOCKNAME in page and table headers
-
-BLOCKNAME: LABEL
- | LABEL SP "(" LABEL ")"
- // If an alternate label is present it replaces
- // the BLOCKNAME when an ISO-style names list is
- // laid out; it is ignored in the Unicode charts
-
-BLOCKSTART: CHAR // First character position in block
-BLOCKEND: CHAR // Last character position in block
-PAGEBREAK: "@@" // Insert a (column) break
-INDEX_TAB: "@@+" // Start a new index tab at latest BLOCKSTART
-
-EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF
- // Instances of CHAR (see Notes) are replaced by
- // CHAR NBSP x NBSP where x is the single Unicode
- // character corresponding to CHAR.
- // If character is combining, it is replaced with
- // CHAR NBSP <circ> x NBSP where <circ> is the
- // dotted circle
-
-
-
- Notes:The following are the primitives and terminals for the NamesList syntax.
- -LINE: STRING LF
-COMMENT: "(" LABEL ")"
- | "(" LABEL ")" SP "*"
- | "*"
-
-NAME: <sequence of uppercase ASCII letters, digits, space and hyphen>
-LCNAME: <sequence of lowercase ASCII letters, digits, space and hyphen> ("-" CHAR)?
-
-TAG: <sequence of ASCII letters>
-LCTAG: <sequence of lowercase ASCII letters>
-STRING: <sequence of characters in the range U+0020..U+02FF, except controls>
-LABEL: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
-VARSEL: CHAR
- | "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
-VARSEL_LIST: "{" CHAR_LIST "}"
-CHAR_LIST: CHAR
- | CHAR_LIST SP CHAR
-CHAR: X X X X
- | X X X X X
- | X X X X X X
-X: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"
-ESC_CHAR: ESC CHAR
-ESC: "\"
- // Special semantics of backslash (\) are supported
- // only in EXPAND_LINE.
-TAB: <sequence of one or more ASCII tab characters 0x09>
-SP: <ASCII 20>
-LF: <any sequence of a single ASCII 0A or 0D, or both>
-
-
-Notes:
-Version 15.1.0
-Version 15.0.0
-Version 14.0.0
-Version 13.0.0
-Version 12.1.0
-Version 12.0.0
-Version 11.0.0
-Version 10.0.0
-Version 9.0.0
-Version 8.0.0
-Version 7.0.0
-Version 6.3.0
-Version 6.2.0
-Version 6.1.0
-Version 6.0.0
-Version 5.2.0
-Version 5.1.0
-Version 5.0.0
-Version 4.0.0
-Version 3.2.0
-Version 3.1.0 (2)
-
-