From a7164d9e7b3c3ec6813e06a42d82180d766e15ca Mon Sep 17 00:00:00 2001 From: Sam Atman Date: Wed, 30 Apr 2025 20:32:23 -0400 Subject: Unicode 16.0 Went smoothly, needed to add some scripts and adjust the magic numbers, but other than that, all set. --- data/unicode/NamesList.html | 69 ++++++++++++++++++++++++++++++--------------- 1 file changed, 46 insertions(+), 23 deletions(-) (limited to 'data/unicode/NamesList.html') diff --git a/data/unicode/NamesList.html b/data/unicode/NamesList.html index d6809e1..a67236e 100644 --- a/data/unicode/NamesList.html +++ b/data/unicode/NamesList.html @@ -100,7 +100,7 @@ a.headernav:hover {
The same input file can be used for the preparation of drafts and final editions for ISO/IEC 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some @@ -281,10 +281,18 @@ CHAR_ENTRY: NAME_LINE | RESERVED_LINE charset declaration (see below). Alternatively, or in addition, a BOM may be present at the very beginning of the file, forcing the encoding to be interpreted as UTF-16 (little-endian only) or UTF-8. When - declared as UTF-8, the names list format will support use of characters in - the range U+0020..U+02FF in LINE and LABEL elements. Otherwise, + declared as UTF-8, the names list format will support use of any Unicode characters in + STRING and LABEL elements. Otherwise, the supported repertoire is limited to Latin-1, and attempted use of characters outside the Latin-1 range will result in data corruption.
+The NamesList file format does not support styled text; each line or other element + will usually be displayed in a specific font selected for it. To allow CHAR elements + that normally use chart glyphs to better coexist with running text in LABEL and STRING + elements, a user defined limit can be set, below which the normal selection of (chart) glyphs + for the CHAR element is overridden in favor of equivalent glyphs from a font selected for better + readability in running text. Any running text outside that range will use standard chart + glyphs, which may result in a ransom note effect. For production of the Unicode Standard + Version 16.0.0 and later the limit is set to U+1EFF.
Several of these elements, while part of the formal definition of the file format, do not occur in final published versions of NamesList.txt in the UCD.
@@ -514,14 +522,14 @@ is machine generated and will always explicitly provide any summary subheaders.<The following are the primitives and terminals for the NamesList syntax.
+The following are the primitives and terminals for the NamesList syntax. "Limit" is a user-defined value; see discussion of the implications of Limit in the notes below.
LINE: STRING LF
COMMENT: "(" LABEL ")"
@@ -533,8 +541,8 @@ COMMENT: "(" LABEL ")"
TAG: <sequence of ASCII letters>
LCTAG: <sequence of lowercase ASCII letters>
-STRING: <sequence of characters in the range U+0020..U+02FF, except controls>
-LABEL: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
+STRING: <sequence of characters, except controls>
+LABEL: <sequence of characters, except controls, "(" or ")">
VARSEL: CHAR
| "ALT" ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )
VARSEL_LIST: "{" CHAR_LIST "}"
@@ -580,19 +588,27 @@ COMMENT: "(" LABEL ")"
of following characters.
The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
output.
- In a STRING or LABEL, a Unicode character outside the range
- U+0000..U+02FF is displayed as is, with a glyph matching
- the chart font, and not with the font that is otherwise defined for that element.
The NamesList.txt file is encoded in UTF-8 if the first line is a
FILE_COMMENT containing the declaration "UTF-8" or any casemap variation
thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
detecting the charset declaration (typically: "; charset=utf-8") the
remainder of that comment is ignored.
- If the file is not encoded as
- UTF-8, the character repertoire for running text (anything
- other than CHAR) is effectively restricted to the repertoire of Latin-1.
- Otherwise, characters in the range U+0020..U+02FF
- are allowed in STRING or LABEL elements, and elements derived from them.
+ When declared as UTF-8, the NamesList format will support any Unicode character
+ in STRING or LABEL elements, but see further implications below.
+ In a STRING or LABEL element, a Unicode character outside the range
+ U+0020..Limit is displayed with a glyph matching
+ the chart font, and not with the font that is otherwise defined for that element.
+ The Limit value is user defined.
+ For production of the Unicode Standard from Version 16.0.0 and later the Limit
+ value is set to U+1EFF.
+ All code points less than the Limit value can be mapped onto a font selected for best
+ results in running text. However, any CHAR elements contained in an EXPAND_LINE
+ are exempt from this and are always displayed with a glyph matching the chart font.
+ The net effect is a workaround for the fact that the NamesList format does
+ not support style runs within any element that encompasses a single unit of flowed text.
+ When drafting STRING or LABEL elements, one should note that text containing
+ characters outside the range U+0020..Limit may result in a ransom note effect,
+ as the regular text font and charts fonts would be alternated. This is best avoided.
The code chart layout program
(Unibook)
can accept files in several other formats. These include little-endian UTF-16,
@@ -610,9 +626,16 @@ COMMENT: "(" LABEL ")"
Modifications
+ Version 16.0.0
+
+ - Reissued for Unicode 16.0.0
+ - Reflect the wider range of possible values for the user defined Limit.
+ - Added an explanation of the effect of the Limit value.
+
+
Version 15.1.0
- - Reissued for Unicode 15.0.0.
+ - Reissued for Unicode 15.1.0.
- Adjusted NAMELIST definition to account for positions of FILE_COMMENT.
- Added a note to the bullets in Section 2.1 to clarify priority of matching for
some line types.
--
cgit v1.2.3