summaryrefslogtreecommitdiff
path: root/norm_notes.txt
blob: ae78ea64290150da0066224e9fc73023c7fab7fa (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
* ASCII (0x0..0x7f) needs no normalization.
* Latin-1 (0x0..0xff) needs no NFC normalization.
* Composition exclusions cannot appear in any 
  normalized string of any normalization form.
* Singleton decompositions are excluded from the
  composition algorithm.
* Non-starter decompositions are excluded from the
  composition algorithm.
* There are no Quick Check MAYBE values for NFD and NFKD.
* Combining Class Code 255 is available as a flag.
* Sample Java Quick Check code:

public int quickCheck(String source) {
    short lastCanonicalClass = 0;
    int result = YES;
    for (int i = 0; i < source.length(); ++i) {
        int ch = source.codepointAt(i);
        if (Character.isSupplementaryCodePoint(ch)) ++i;
        short canonicalClass = getCanonicalClass(ch);
        if (lastCanonicalClass > canonicalClass && canonicalClass != 0) {
            return NO;        }
        int check = isAllowed(ch);
        if (check == NO) return NO;
        if (check == MAYBE) result = MAYBE;
        lastCanonicalClass = canonicalClass;
    }
    return result;
}

* No string when decomposed with NFC expands to more than 3×
  in length (measured in code units).
* When concatenating normalized strings, re-normalize from the 
  last code point in string A with Quick_Check=YES and 
  Canonical_Combining_Class=0 to the first code point in string B
  with Quick_Check=YES and Canonical_Combining_Class=0.
* If requiring Stream Safe Format strings, a 128 byte buffer is all
  that's needed to normalize.

* Flags:
  - Combining Class
  - Hangul Syllable Type
  - Full Composition Exclusion