From a7164d9e7b3c3ec6813e06a42d82180d766e15ca Mon Sep 17 00:00:00 2001 From: Sam Atman Date: Wed, 30 Apr 2025 20:32:23 -0400 Subject: Unicode 16.0 Went smoothly, needed to add some scripts and adjust the magic numbers, but other than that, all set. --- data/unicode/auxiliary/GraphemeBreakTest.html | 63 +++++++++++++-------------- 1 file changed, 31 insertions(+), 32 deletions(-) (limited to 'data/unicode/auxiliary/GraphemeBreakTest.html') diff --git a/data/unicode/auxiliary/GraphemeBreakTest.html b/data/unicode/auxiliary/GraphemeBreakTest.html index f1ab9de..eb93740 100644 --- a/data/unicode/auxiliary/GraphemeBreakTest.html +++ b/data/unicode/auxiliary/GraphemeBreakTest.html @@ -6,38 +6,37 @@ td, th { vertical-align: top }

Grapheme_Cluster_Break Chart

-

Unicode Version: 15.1.0

-

Date: 2023-08-07, 15:52:55 GMT

+

Unicode Version: 16.0.0

+

Date: 2024-05-02, 15:02:48 GMT

This page illustrates the application of the Grapheme_Cluster_Break specification. The material here is informative, not normative.

The first chart shows where breaks would appear between different sample characters or strings. The sample characters are chosen mechanically to represent the different properties used by the specification.

Each cell shows the break-status for the position between the character(s) in its row header and the character(s) in its column header. The × symbol indicates no break, while the ÷ symbol indicated a break. The cells with × are also shaded to make it easier to scan the table. For example, in the cell at the intersection of the row headed by “CR” and the column headed by “LF”, there is a × symbol, indicating that there is no break between CR and LF.

After the heavy blue line in the table are additional rows, either with different sample characters or for sequences. Some column headers may be composed, reflecting “treat as” or “ignore” rules.

If your browser handles titles (tooltips), then hovering the mouse over the row header will show a sample character of that type. Hovering over a column header will show the sample character, plus its abbreviated general category and script. Hovering over the intersected cells shows the rule number that produces the break-status. For example, hovering over the cell at the intersection of LVT and T shows ×, with the rule 8.0. Checking below the table, rule 8.0 is “( LVT | T) × T”, which is the one that applies to that case. Note that a rule is invoked only when no lower-numbered rules have applied.

Table

- - - - - - - - - - - - - - - - - - - - - - - - - - +
OtherCRLFControlExtendRIPrependSpacingMarkLVTLVLVTExtend_ConjunctLinkingScriptsSpacingMark_ConjunctLinkingScriptsConjunctLinkingScriptsPrepend_ConjunctLinkingScriptsConjunctLinkingScripts_LinkingConsonantExtPictExtend_ExtCccZwjExtend_ConjunctLinkingScripts_ExtCccZwjExtend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwjZWJ_ExtCccZwj
Other÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
CR÷÷×÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
LF÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
Control÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
Extend÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
RI÷÷÷÷××÷×÷÷÷÷÷××÷÷÷÷××××
Prepend×÷÷÷×××××××××××××××××××
SpacingMark÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
L÷÷÷÷×÷÷×××÷××××÷÷÷÷××××
V÷÷÷÷×÷÷×÷××÷÷××÷÷÷÷××××
T÷÷÷÷×÷÷×÷÷×÷÷××÷÷÷÷××××
LV÷÷÷÷×÷÷×÷××÷÷××÷÷÷÷××××
LVT÷÷÷÷×÷÷×÷÷×÷÷××÷÷÷÷××××
Extend_ConjunctLinkingScripts÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
SpacingMark_ConjunctLinkingScripts÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
ConjunctLinkingScripts÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
Prepend_ConjunctLinkingScripts×÷÷÷×××××××××××××××××××
ConjunctLinkingScripts_LinkingConsonant÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
ExtPict÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
Extend_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
Extend_ConjunctLinkingScripts_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
ZWJ_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
 
Other÷÷÷÷×÷÷×÷÷÷÷÷××÷÷÷÷××××
+ + + + + + + + + + + + + + + + + + + + + + + +
OtherCRLFControlExtendRIPrependSpacingMarkLVTLVLVTSpacingMark_ConjunctLinkingScriptsConjunctLinkingScriptsPrepend_ConjunctLinkingScriptsConjunctLinkingScripts_LinkingConsonantExtPictExtend_ExtCccZwjExtend_ConjunctLinkingScripts_ExtCccZwjExtend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwjZWJ_ExtCccZwj
Other÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
CR÷÷×÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
LF÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
Control÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷
Extend÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
RI÷÷÷÷××÷×÷÷÷÷÷×÷÷÷÷××××
Prepend×÷÷÷××××××××××××××××××
SpacingMark÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
L÷÷÷÷×÷÷×××÷×××÷÷÷÷××××
V÷÷÷÷×÷÷×÷××÷÷×÷÷÷÷××××
T÷÷÷÷×÷÷×÷÷×÷÷×÷÷÷÷××××
LV÷÷÷÷×÷÷×÷××÷÷×÷÷÷÷××××
LVT÷÷÷÷×÷÷×÷÷×÷÷×÷÷÷÷××××
SpacingMark_ConjunctLinkingScripts÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
ConjunctLinkingScripts÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
Prepend_ConjunctLinkingScripts×÷÷÷××××××××××××××××××
ConjunctLinkingScripts_LinkingConsonant÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
ExtPict÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
Extend_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
Extend_ConjunctLinkingScripts_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
Extend_ConjunctLinkingScripts_ConjunctLinker_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
ZWJ_ExtCccZwj÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××
 
Other÷÷÷÷×÷÷×÷÷÷÷÷×÷÷÷÷××××

Rules

This section shows the rules. They are mechanically modified for programmatic generation of the tables and test code, and thus do not match the UAX rules precisely. In particular:

  1. The rules are cast into a form that is more like regular expressions.
  2. The rules “sot ÷”, “÷ eot”, and “÷ Any” are added mechanically, and have artificial numbers.
  3. The rules are given decimal numbers using tenths, and are written without prefix. For example, rule GB9a is given the number 9.1.
  4. Any “treat as” or “ignore” rules are handled as discussed in UAX #29, and thus reflected in a transformation of the rules usually not visible here. In addition, final rules like “Any ÷ Any” may be recast as the equivalent expression “÷ Any”.
  5. In some cases, the numbering and form of a rule is changed due to “treat as” rules.

For the original rules, see UAX #29.

@@ -111,16 +110,16 @@ td, th { vertical-align: top }   a    b   17 -  👶  🏿  👶   +  👶  🏿  👶   18 -  a  🏿  👶   +  a  🏿  👶   19 -  a  🏿  👶    🛑   +  a  🏿  👶    🛑   20 -  👶  🏿  ◌̈    👶  🏿   +  👶  🏿  ◌̈    👶  🏿   21   🛑    🛑   -- cgit v1.2.3