From 6013b2ded106521ee9cae6bd77dacbd5254ff763 Mon Sep 17 00:00:00 2001 From: Jose Colon Rodriguez Date: Mon, 19 Feb 2024 09:11:56 -0400 Subject: Cleaned up directory structure --- unicode/NamesList.html | 776 ------------------------------------------------- 1 file changed, 776 deletions(-) delete mode 100644 unicode/NamesList.html (limited to 'unicode/NamesList.html') diff --git a/unicode/NamesList.html b/unicode/NamesList.html deleted file mode 100644 index d6809e1..0000000 --- a/unicode/NamesList.html +++ /dev/null @@ -1,776 +0,0 @@ - - - - - - -Unicode NamesList Format - - - - - - - -
 
- -
-

UnicodeĀ® NamesList File Format

- - - - - - - - - - - - - - - - - - - - - - - - - - - -
Revision15.1.0
AuthorsAsmus Freytag, Ken Whistler
Date2023-08-23
This Version - - https://www.unicode.org/Public/15.1.0/ucd/NamesList.html
Previous Version - - https://www.unicode.org/Public/15.0.0/ucd/NamesList.html
Latest Versionhttps://www.unicode.org/Public/UCD/latest/ucd/NamesList.html
-

 

-

Summary

-
-

This file describes the format and contents of NamesList.txt

-
-

Status

-
-

The file and the files described herein are part of the Unicode - Character Database (UCD). The Unicode - Terms of Use apply.

-
-
- -

1.0 Introduction

- -

The Unicode name list file NamesList.txt (also NamesList.lst) is a plain -text file used to drive the layout of the character code charts in the Unicode -Standard. The information in this file is a combination of several fields from -the UnicodeData.txt and Blocks.txt files, together with additional annotations -for many characters.

-

This document describes the syntax rules for the file -format, but also gives brief information on how each construct is rendered -when laid out for the code charts. Some of the syntax elements are used only in -preparation of the drafts of the code charts and are not present in the final, -released form of the NamesList.txt file.

- -

Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode -5.0. The syntax for marginal sidebar comments is utilized extensively in -draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset -declaration in a comment at the head of the file were introduced after Unicode -6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction -in comments and aliases in the names list format was loosened from the prior -limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.

- -

The same input file can be used for the preparation of drafts and final editions for ISO/IEC - 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some - information in the name list file that is not needed (and in fact removed - during parsing) for the Unicode code charts.

- -

With access to the layout program (Unibook) it is a simple matter of -creating name lists for the purpose of formatting working drafts or other documents containing -proposed characters.

-

The content of the NamesList.txt file is optimized for code chart creation. - Some information that can be inferred by the reader from context has been - suppressed to make the code charts more readable. See the chapter on Code - Charts in the Unicode - Standard.

- -

1.1 NamesList File Overview

- -

The NamesList files are plain text files which in their most simple form look -like this:

- -

@@<tab>0020<tab>BASIC LATIN<tab>007F
-; this is a file comment (ignored)
-0020<tab>SPACE
-0021<tab>EXCLAMATION MARK
-0022<tab>QUOTATION MARK
-. . .
-007F<tab>DELETE

- -

The semicolon (as first character), @ and <tab> characters are used -by the file syntax and must be provided as shown. Hexadecimal digits must be -in UPPERCASE. A double @@ introduces a block header, with the title, and -start and ending code of the block provided as shown.

- -

For a minimal name list, only the NAME_LINE and BLOCKHEADER and -their constituent syntax elements are needed.

- -

The full syntax with all the options is provided in the following sections.

- -

2.0 NamesList File Structure

- -

This section defines the overall file structure

- -
NAMELIST:     FILE_COMMENT* TITLE_PAGE* EXTENDED_BLOCK*
-
-TITLE_PAGE:   TITLE 
-		| TITLE_PAGE SUBTITLE 
-		| TITLE_PAGE SUBHEADER 
-		| TITLE_PAGE IGNORED_LINE 
-		| TITLE_PAGE EMPTY_LINE
-		| TITLE_PAGE NOTICE_LINE
-		| TITLE_PAGE COMMENT_LINE
-		| TITLE_PAGE PAGEBREAK 
-		| TITLE_PAGE FILE_COMMENT 
-
-
-EXTENDED_BLOCK:	BLOCK 
-		| BLOCK SUMMARY
-
-
-BLOCK:	BLOCKHEADER 
-		| BLOCKHEADER INDEX_TAB
-		| BLOCK CHAR_ENTRY 
-		| BLOCK SUBHEADER 
-		| BLOCK NOTICE_LINE 
-		| BLOCK EMPTY_LINE 
-		| BLOCK IGNORED_LINE
-		| BLOCK SIDEBAR_LINE
-		| BLOCK PAGEBREAK
-		| BLOCK FILE_COMMENT
-		| BLOCK CROSS_REF
-
-
-CHAR_ENTRY:   NAME_LINE | RESERVED_LINE
-		| CHAR_ENTRY ALIAS_LINE
-		| CHAR_ENTRY FORMALALIAS_LINE
-		| CHAR_ENTRY COMMENT_LINE
-		| CHAR_ENTRY CROSS_REF
-		| CHAR_ENTRY DECOMPOSITION
-		| CHAR_ENTRY COMPAT_MAPPING
-		| CHAR_ENTRY IGNORED_LINE
-		| CHAR_ENTRY EMPTY_LINE
-		| CHAR_ENTRY NOTICE_LINE
-		| CHAR_ENTRY FILE_COMMENT 
-		| CHAR_ENTRY VARIATION_LINE
-
- -

In other words:

-

- Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER.

-

Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE, - EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.

- -

Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted - sequence of the following lines may occur (in any order and repeated as often - as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE, - EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.

- -

Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and - FILE_COMMENT, none of these lines may - occur in any other place.

- -

A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY. - A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may - appear after any block header.

-

If the first line of a file is a file comment, it may contain a UTF-8 - charset declaration (see below). Alternatively, or in addition, a BOM may be - present at the very beginning of the file, forcing the encoding to be - interpreted as UTF-16 (little-endian only) or UTF-8. When - declared as UTF-8, the names list format will support use of characters in - the range U+0020..U+02FF in LINE and LABEL elements. Otherwise, - the supported repertoire is limited to Latin-1, and attempted use of characters outside - the Latin-1 range will result in data corruption.

-

Several of these elements, while part of the formal definition of the - file format, do not occur in final published versions of - NamesList.txt in the UCD.

- -

Blocks followed by Summaries

-

A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:

-

-SUMMARY:   ALTGLYPH_SUMMARY
-		| VARIATION SUMMARY
-		| ALTGLYPH_SUMMARY VARIATION_SUMMARY
-		| MIXED_SUMMARY
-
-ALTGLYPH_SUMMARY:   ALTGLYPH_SUBHEADER
-		| ALTGLYPH_SUMMARY SUMMARY_LINE
-
-VARIATION_SUMMARY:   VARIATION_SUBHEADER
-		| VARIATION_SUMMARY SUMMARY_LINE
-
-MIXED_SUMMARY:   MIXED_SUBHEADER
-		| MIXED_SUMMARY SUMMARY_LINE
-
-SUMMARY_LINE:   SUBHEADER
-		| NOTICE_LINE
-		| FILE_COMMENT
-		| EMPTY_LINE
-
- -

When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements -of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and -preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately -follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are -interspersed between items in the summary.

- -

These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are -omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements -as described below, Unibook will automatically generate any required summaries using a default format for the headers.

- -

Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to -provide specific contents for these summary titles as well as allow the ability to add additional -information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list -is machine generated and will always explicitly provide any summary subheaders.

- -

2.1 NamesList File Elements

- -

This section provides the details of the syntax for the individual elements.

- -
ELEMENT		SYNTAX	// How rendered
-
-NAME_LINE:	CHAR TAB NAME LF
-			// The CHAR and the corresponding image are echoed, 
-			// followed by the name as given in NAME
-
-		| CHAR TAB "<" LCNAME ">" LF
-			// Control and noncharacters use this form of
-			// lowercase, bracketed pseudo character name
-
-		| CHAR TAB NAME SP COMMENT LF
-			// Names may have a comment, which is stripped off
-			// unless the file is parsed for an ISO style list
-                        
-		| CHAR TAB "<" LCNAME ">" SP COMMENT LF
-			// Control and noncharacters may also have comments
-
-RESERVED_LINE:	CHAR TAB "<reserved>" LF
-			// The CHAR is echoed followed by an icon for the
-			// reserved character and a fixed string e.g. "<reserved>"
-
-COMMENT_LINE:	TAB "*" SP EXPAND_LINE
-			// * is replaced by BULLET, output line as comment
-
-		| TAB EXPAND_LINE       
-			// Output line as comment
-
-ALIAS_LINE:	TAB "=" SP LINE      
-			// Replace = by itself, output line as alias
-
-FORMALALIAS_LINE:
-		TAB "%" SP NAME LF
-			// Replace % by U+203B, output line as formal alias
-
-CROSS_REF:	TAB "x" SP CHAR SP LCNAME LF
-		| TAB "x" SP CHAR SP "<" LCNAME ">" LF
-			// x is replaced by a right arrow
-
-		| TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
-		| TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF
-			// x is replaced by a right arrow;
-			// (second type as used for control and noncharacters)
-
-			// In the forms with parentheses the "(","-" and ")" are removed
-			// and the order of CHAR and LCNAME is reversed;
-			// i.e. all inputs result in the same order of output
-
-		| TAB "x" SP CHAR LF
-			// x is replaced by a right arrow
-			// (this type is the only one without LCNAME 
-			// and is used for ideographs)
-
-VARIATION_LINE:	TAB "~" SP CHAR VARSEL SP LABEL LF   
-		| TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")" LF
-			// output standardized variation sequence or simply the char code in case of alternate
-			// glyphs, followed by the alternate glyph or variation glyph and the label and context
-
-FILE_COMMENT:	";"  LINE
-
-EMPTY_LINE:	LF       
-			// Empty and ignored lines as well as 
-			// file comments are ignored
-
-IGNORED_LINE:	TAB ";" LINE
-			// Ignore LINE
-
-SIDEBAR_LINE: 	";;" LINE
-			// Output LINE as marginal note
-
-DECOMPOSITION:	TAB ":" SP EXPAND_LINE
-		| TAB ":" SP "<" TAG ">" SP EXPAND_LINE
-			// Replace ':' by EQUIV, expand line into decomposition
-			// The <tag> gives optional information,
-			// e.g., about composition exclusion.
-			// by convention the tag has initial lowercase
-
-COMPAT_MAPPING:	TAB "#" SP EXPAND_LINE
-		| TAB "#" SP "<" TAG ">" SP EXPAND_LINE
-			// Replace '#' by APPROX, output line as mapping
-			// The <tag> is the optional compatibility decomposition tag.
-			// by convention the tag has initial lowercase
-
-NOTICE_LINE:	"@+" TAB LINE       
-			// Output LINE as notice
-
-		| "@+" TAB "*" SP LINE   
-			// Output LINE as notice
-			// "*" expands to a bullet character
-			// Notices following a character code apply to the
-			// character and are indented. Notices not following
-			// a character code apply to the page/block/column 
-			// and are italicized, but not indented
-
-TITLE:		"@@@" TAB LINE
-			// Output LINE as text
-			// Title is used in page headers
-
-SUBTITLE:	"@@@+" TAB LINE
-			// Output LINE as subtitle
-
-SUBHEADER:	"@" TAB LINE
-			// Output LINE as column header
-
-VARIATION_SUBHEADER:	"@~" TAB LINE		
-			// Output LINE as column header (summary subheader)
-		| "@~" LF
-			// Output a default standard variation sequences summary subheader
-		| "@~" TAB "!" LF
-			// Suppress output of a default standard variant sequences summary subheader
-			// and disable display of summary
-		| "@~" TAB "!" VARSEL_LIST LF
-		| "@~" TAB "!" VARSEL_LIST LINE
-			// Output a standard summary subheader, using default or LINE respectively
-			// Suppress any std variation sequences using selectors from the list
-
-ALTGLYPH_SUBHEADER:	"@@~" TAB LINE	
-			// Output LINE as column header (summary subheader)
-		| "@@~" LF
-			// Output a default alternate glyph summary subheader
-		| "@@~" TAB "!" LF
-			// Suppress output of a default alternate glyph summary subheader
-			// and disable display of summary
-
-MIXED_SUBHEADER:	"@@@~" TAB LINE
-			// Output LINE as column header (summary subheader)
-		| "@@@~" LF
-			// Output a default combined variation and alternate glyph summary subheader
-		| "@@@~" TAB "!" LF
-			// Suppress output of a default alternate glyph summary subheader
-			// and disable display of summary
-		| "@@@~" TAB "!" VARSEL_LIST LF
-		| "@@@~" TAB "!" VARSEL_LIST LINE
-			// Output a combined summary subheader, using default or LINE respectively
-			// Suppress any std variation sequences using selectors from the list
-
-BLOCKHEADER:	"@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF
-			// Cause a page break and optional
-			// blank page, then output one or more charts
-			// followed by the list of character names.
-			// Use BLOCKSTART and BLOCKEND to define
-			// what characters belong to a block.
-			// Use BLOCKNAME in page and table headers
-
-BLOCKNAME:	LABEL
-		| LABEL SP "(" LABEL ")"   
-			// If an alternate label is present it replaces
-			// the BLOCKNAME when an ISO-style names list is
-			// laid out; it is ignored in the Unicode charts
-
-BLOCKSTART:	CHAR	// First character position in block
-BLOCKEND:	CHAR	// Last character position in block
-PAGEBREAK:	"@@"	// Insert a (column) break
-INDEX_TAB:	"@@+"	// Start a new index tab at latest BLOCKSTART
-
-EXPAND_LINE:	{ESC_CHAR | CHAR | STRING | ESC +}+ LF
-			// Instances of CHAR (see Notes) are replaced by
-			// CHAR NBSP x NBSP where x is the single Unicode
-			// character corresponding to CHAR.
-			// If character is combining, it is replaced with
-			// CHAR NBSP <circ> x NBSP where <circ> is the
-			// dotted circle
-
- - - Notes: - - -

2.2 NamesList File Primitives

- -

The following are the primitives and terminals for the NamesList syntax.

- -
LINE:		STRING LF
-COMMENT:	"(" LABEL ")"
-		| "(" LABEL ")" SP "*"
-		| "*"
-
-NAME:	  	<sequence of uppercase ASCII letters, digits, space and hyphen> 
-LCNAME:		<sequence of lowercase ASCII letters, digits, space and hyphen>  ("-" CHAR)?
-
-TAG:		<sequence of ASCII letters>
-LCTAG:		<sequence of lowercase ASCII letters>
-STRING:	  	<sequence of characters in the range U+0020..U+02FF, except controls> 
-LABEL:	  	<sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")"> 
-VARSEL:		CHAR
-		| "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
-VARSEL_LIST:	"{" CHAR_LIST "}"
-CHAR_LIST:	CHAR
-		| CHAR_LIST SP CHAR
-CHAR:		X X X X
-		| X X X X X 
-		| X X X X X X 
-X:	  	"0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F" 
-ESC_CHAR:	ESC CHAR
-ESC:	        "\"
-			// Special semantics of backslash (\) are supported
-			// only in EXPAND_LINE.
-TAB:	  	<sequence of one or more ASCII tab characters 0x09>    
-SP:	  	<ASCII 20>
-LF:	  	<any sequence of a single ASCII 0A or 0D, or both>
-
- -

Notes:

- -

Modifications

- -

Version 15.1.0

- -

Version 15.0.0

- -

Version 14.0.0

- -

Version 13.0.0

- -

Version 12.1.0

- -

Version 12.0.0

- -

Version 11.0.0

- -

Version 10.0.0

- -

Version 9.0.0

- -

Version 8.0.0

- -

Version 7.0.0

- -

Version 6.3.0

- -

Version 6.2.0

- -

Version 6.1.0

- -

Version 6.0.0

- -

Version 5.2.0

- -

Version 5.1.0

- -

Version 5.0.0

- -

Version 4.0.0

- -

Version 3.2.0

- -

Version 3.1.0 (2)

- -
- -
-
- - Access to Copyright and terms of use -
- - - - - -- cgit v1.2.3