summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGravatar Sam Atman2025-05-15 15:19:45 -0400
committerGravatar Sam Atman2025-05-15 15:19:45 -0400
commit5c6ad6a9de758c2680ba723ed5589927405fafca (patch)
tree1ee066e59d0891c72ab7b79a72480f49e20e2a06
parentMaximal Subparts tests (diff)
downloadzg-5c6ad6a9de758c2680ba723ed5589927405fafca.tar.gz
zg-5c6ad6a9de758c2680ba723ed5589927405fafca.tar.xz
zg-5c6ad6a9de758c2680ba723ed5589927405fafca.zip
Update NEWS.md
-rw-r--r--NEWS.md86
1 files changed, 60 insertions, 26 deletions
diff --git a/NEWS.md b/NEWS.md
index 4a8d651..a432c2f 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -13,20 +13,20 @@ will mainly take place around importing, creating, and deinitializing.
13 13
14### The Great Renaming 14### The Great Renaming
15 15
16The most obvious change is on the surface API: more than half of the modules 16The most obvious change is on the surface API: more than half of the
17have been renamed. There are no user-facing modules with `Data` in the name, 17modules have been renamed. There are no user-facing modules with `Data`
18and some abbreviations have been spelled in full. 18in the name, and some abbreviations have been spelled in full.
19 19
20### No More Separation of Data and Functionality 20### No More Separation of Data and Functionality
21 21
22It is no longer necessary to separately create, for example, a `GraphemeData` 22It is no longer necessary to separately create, for example, a
23structure, in order to use the functionality provided by the `grapheme` 23`GraphemeData` structure, in order to use the functionality provided
24module. 24by the `grapheme` module.
25 25
26Instead there's just `Graphemes`, and the same for a couple of other modules 26Instead there's just `Graphemes`, and the same for a couple of other
27which worked the same way. This means that the cases where functionality 27modules which worked the same way. This means that the cases where
28was provided by a wrapped pointer are now provided directly from the struct 28functionality was provided by a wrapped pointer are now provided
29with the necessary data. 29directly from the struct with the necessary data.
30 30
31This would make user structs larger in some cases, while eliminating a 31This would make user structs larger in some cases, while eliminating a
32pointer chase. If that isn't a desirable trade off for your code, 32pointer chase. If that isn't a desirable trade off for your code,
@@ -45,10 +45,42 @@ Getting up to speed is a matter of passing the allocator to `deinit`.
45This change comes courtesy of [lch361](https://lch361.net), in his 45This change comes courtesy of [lch361](https://lch361.net), in his
46first contribution to the repo. Thanks Lich! 46first contribution to the repo. Thanks Lich!
47 47
48### `code_point` Now Unicode-Compliant
49
50The `v0.15.x` decoder used a simple, fast, but naïve method to decode
51UTF-8 into codepoints. Concerningly, this interpreted overlong
52sequences, which has been forbidden by Unicode for more than 20 years
53due to the security risks involved.
54
55This has been replaced with a DFA decoder based on the work of [Björn
56Höhrmann][UTF], which has proven itself fast[^1] and reliable. This is
57a breaking change; sequences such as `"\xc0\xaf"` will no longer
58produce the code `'/'`, nor will surrogates return their codepoint
59value.
60
61The new decoder faithfully implements §3.9.6 of the Unicode Standard,
62_U+FFFD Substitution of Maximal Subparts_. While this is itself not
63required to claim Unicode conformance, it is the W3C specification for
64replacement behavior.
65
66Along with this, `code_point.decode` is deprecated, and will be removed
67in a later version of `zg`. It was basically an exposed piece of the
68`Iterator` implementation, and is no longer used in that capacity.
69
70Instead, prefer `decodeAtIndex([]const u8, u32) ?CodePoint`, or better
71yet, `decodeAtCursor([]const u8, *u32)`. The latter advances its
72second argument to the next possible index for a valid codepoint, which
73is good for the fetch pipeline, and more ergonomic in many cases.
74
75[UTF]: https://bjoern.hoehrmann.de/utf-8/decoder/dfa/
76
77[^1]: A bit more than twice as fast as the standard library for
78decoding, according to my (limited) benchmarks.
79
48### DisplayWidth and CaseFolding Can Share Data 80### DisplayWidth and CaseFolding Can Share Data
49 81
50Both of these modules use another module to get the job done, `Graphemes` 82Both of these modules use another module to get the job done,
51for `DisplayWidth`, and `Normalize` for `CaseFolding`. 83`Graphemes` for `DisplayWidth`, and `Normalize` for `CaseFolding`.
52 84
53It is now possible to initialize them with a borrowed copy of those 85It is now possible to initialize them with a borrowed copy of those
54modules, to make it simpler to write code which also needs the base 86modules, to make it simpler to write code which also needs the base
@@ -102,32 +134,34 @@ so we no longer make user code deal with that unlikely event.
102### New DisplayWidth options 134### New DisplayWidth options
103 135
104A `DisplayWidth` can now be compiled to treat `c0` and `c1` control codes 136A `DisplayWidth` can now be compiled to treat `c0` and `c1` control codes
105as having a width. Canonically, terminals don't print them, so they would 137as having a width. Canonically, terminals don't print them, so they
106have a width of 0. However, some applications (`vim` for example) need to 138would have a width of 0. However, some applications (`vim` for example)
107escape control codes to make them visible. Setting these options will let 139need to escape control codes to make them visible. Setting these
108`DisplayWidth` return the correct widths when this is done. 140options will let `DisplayWidth` return the correct widths when this
141is done.
109 142
110### Unicode 16.0 143### Unicode 16.0
111 144
112This updates `zg` to use the latest Unicode edition. This should be 145This updates `zg` to use the latest Unicode edition. This should be
113the only change which will change behavior of user code, other than through 146the only change which will change behavior of user code, other than
114the use of the new `DisplayWidth` options. 147through the use of the new `DisplayWidth` options.
115 148
116### Tests 149### Tests
117 150
118It is now possible to run all the tests, not just the `unicode-test` subset. 151It is now possible to run all the tests, not just the `unicode-test`
119Accordingly, that step is removed, and `zig build test` runs everything. 152subset. Accordingly, that step is removed, and `zig build test`
153runs everything.
120 154
121#### Allocations Tested 155#### Allocations Tested
122 156
123Every allocate-able now has a `checkAllAllocationFailures` test. This 157Every allocate-able now has a `checkAllAllocationFailures` test. This
124process turned up two bugs. Also discovered were 8,663 allocations, which 158process turned up two bugs. Also discovered were 8,663 allocations,
125were reduced to two, these were also being individually freed on deinit. 159which were reduced to two, these were also being individually freed
126So that's nice. 160on deinit. So that's nice.
127 161
128#### That's It! 162#### That's It!
129 163
130I hope you find converting over `zg v0.13` code to be fairly painless and 164I hope you find converting over `zg v0.13` code to be fairly painless
131straightforward. There should be no need to make changes of this magnitude 165and straightforward. There should be no need to make changes of this
132in the future. 166magnitude in the future.
133 167