diff options
| author | 2025-05-15 15:19:45 -0400 | |
|---|---|---|
| committer | 2025-05-15 15:19:45 -0400 | |
| commit | 5c6ad6a9de758c2680ba723ed5589927405fafca (patch) | |
| tree | 1ee066e59d0891c72ab7b79a72480f49e20e2a06 | |
| parent | Maximal Subparts tests (diff) | |
| download | zg-5c6ad6a9de758c2680ba723ed5589927405fafca.tar.gz zg-5c6ad6a9de758c2680ba723ed5589927405fafca.tar.xz zg-5c6ad6a9de758c2680ba723ed5589927405fafca.zip | |
Update NEWS.md
| -rw-r--r-- | NEWS.md | 86 |
1 files changed, 60 insertions, 26 deletions
| @@ -13,20 +13,20 @@ will mainly take place around importing, creating, and deinitializing. | |||
| 13 | 13 | ||
| 14 | ### The Great Renaming | 14 | ### The Great Renaming |
| 15 | 15 | ||
| 16 | The most obvious change is on the surface API: more than half of the modules | 16 | The most obvious change is on the surface API: more than half of the |
| 17 | have been renamed. There are no user-facing modules with `Data` in the name, | 17 | modules have been renamed. There are no user-facing modules with `Data` |
| 18 | and some abbreviations have been spelled in full. | 18 | in the name, and some abbreviations have been spelled in full. |
| 19 | 19 | ||
| 20 | ### No More Separation of Data and Functionality | 20 | ### No More Separation of Data and Functionality |
| 21 | 21 | ||
| 22 | It is no longer necessary to separately create, for example, a `GraphemeData` | 22 | It is no longer necessary to separately create, for example, a |
| 23 | structure, in order to use the functionality provided by the `grapheme` | 23 | `GraphemeData` structure, in order to use the functionality provided |
| 24 | module. | 24 | by the `grapheme` module. |
| 25 | 25 | ||
| 26 | Instead there's just `Graphemes`, and the same for a couple of other modules | 26 | Instead there's just `Graphemes`, and the same for a couple of other |
| 27 | which worked the same way. This means that the cases where functionality | 27 | modules which worked the same way. This means that the cases where |
| 28 | was provided by a wrapped pointer are now provided directly from the struct | 28 | functionality was provided by a wrapped pointer are now provided |
| 29 | with the necessary data. | 29 | directly from the struct with the necessary data. |
| 30 | 30 | ||
| 31 | This would make user structs larger in some cases, while eliminating a | 31 | This would make user structs larger in some cases, while eliminating a |
| 32 | pointer chase. If that isn't a desirable trade off for your code, | 32 | pointer chase. If that isn't a desirable trade off for your code, |
| @@ -45,10 +45,42 @@ Getting up to speed is a matter of passing the allocator to `deinit`. | |||
| 45 | This change comes courtesy of [lch361](https://lch361.net), in his | 45 | This change comes courtesy of [lch361](https://lch361.net), in his |
| 46 | first contribution to the repo. Thanks Lich! | 46 | first contribution to the repo. Thanks Lich! |
| 47 | 47 | ||
| 48 | ### `code_point` Now Unicode-Compliant | ||
| 49 | |||
| 50 | The `v0.15.x` decoder used a simple, fast, but naïve method to decode | ||
| 51 | UTF-8 into codepoints. Concerningly, this interpreted overlong | ||
| 52 | sequences, which has been forbidden by Unicode for more than 20 years | ||
| 53 | due to the security risks involved. | ||
| 54 | |||
| 55 | This has been replaced with a DFA decoder based on the work of [Björn | ||
| 56 | Höhrmann][UTF], which has proven itself fast[^1] and reliable. This is | ||
| 57 | a breaking change; sequences such as `"\xc0\xaf"` will no longer | ||
| 58 | produce the code `'/'`, nor will surrogates return their codepoint | ||
| 59 | value. | ||
| 60 | |||
| 61 | The new decoder faithfully implements §3.9.6 of the Unicode Standard, | ||
| 62 | _U+FFFD Substitution of Maximal Subparts_. While this is itself not | ||
| 63 | required to claim Unicode conformance, it is the W3C specification for | ||
| 64 | replacement behavior. | ||
| 65 | |||
| 66 | Along with this, `code_point.decode` is deprecated, and will be removed | ||
| 67 | in a later version of `zg`. It was basically an exposed piece of the | ||
| 68 | `Iterator` implementation, and is no longer used in that capacity. | ||
| 69 | |||
| 70 | Instead, prefer `decodeAtIndex([]const u8, u32) ?CodePoint`, or better | ||
| 71 | yet, `decodeAtCursor([]const u8, *u32)`. The latter advances its | ||
| 72 | second argument to the next possible index for a valid codepoint, which | ||
| 73 | is good for the fetch pipeline, and more ergonomic in many cases. | ||
| 74 | |||
| 75 | [UTF]: https://bjoern.hoehrmann.de/utf-8/decoder/dfa/ | ||
| 76 | |||
| 77 | [^1]: A bit more than twice as fast as the standard library for | ||
| 78 | decoding, according to my (limited) benchmarks. | ||
| 79 | |||
| 48 | ### DisplayWidth and CaseFolding Can Share Data | 80 | ### DisplayWidth and CaseFolding Can Share Data |
| 49 | 81 | ||
| 50 | Both of these modules use another module to get the job done, `Graphemes` | 82 | Both of these modules use another module to get the job done, |
| 51 | for `DisplayWidth`, and `Normalize` for `CaseFolding`. | 83 | `Graphemes` for `DisplayWidth`, and `Normalize` for `CaseFolding`. |
| 52 | 84 | ||
| 53 | It is now possible to initialize them with a borrowed copy of those | 85 | It is now possible to initialize them with a borrowed copy of those |
| 54 | modules, to make it simpler to write code which also needs the base | 86 | modules, to make it simpler to write code which also needs the base |
| @@ -102,32 +134,34 @@ so we no longer make user code deal with that unlikely event. | |||
| 102 | ### New DisplayWidth options | 134 | ### New DisplayWidth options |
| 103 | 135 | ||
| 104 | A `DisplayWidth` can now be compiled to treat `c0` and `c1` control codes | 136 | A `DisplayWidth` can now be compiled to treat `c0` and `c1` control codes |
| 105 | as having a width. Canonically, terminals don't print them, so they would | 137 | as having a width. Canonically, terminals don't print them, so they |
| 106 | have a width of 0. However, some applications (`vim` for example) need to | 138 | would have a width of 0. However, some applications (`vim` for example) |
| 107 | escape control codes to make them visible. Setting these options will let | 139 | need to escape control codes to make them visible. Setting these |
| 108 | `DisplayWidth` return the correct widths when this is done. | 140 | options will let `DisplayWidth` return the correct widths when this |
| 141 | is done. | ||
| 109 | 142 | ||
| 110 | ### Unicode 16.0 | 143 | ### Unicode 16.0 |
| 111 | 144 | ||
| 112 | This updates `zg` to use the latest Unicode edition. This should be | 145 | This updates `zg` to use the latest Unicode edition. This should be |
| 113 | the only change which will change behavior of user code, other than through | 146 | the only change which will change behavior of user code, other than |
| 114 | the use of the new `DisplayWidth` options. | 147 | through the use of the new `DisplayWidth` options. |
| 115 | 148 | ||
| 116 | ### Tests | 149 | ### Tests |
| 117 | 150 | ||
| 118 | It is now possible to run all the tests, not just the `unicode-test` subset. | 151 | It is now possible to run all the tests, not just the `unicode-test` |
| 119 | Accordingly, that step is removed, and `zig build test` runs everything. | 152 | subset. Accordingly, that step is removed, and `zig build test` |
| 153 | runs everything. | ||
| 120 | 154 | ||
| 121 | #### Allocations Tested | 155 | #### Allocations Tested |
| 122 | 156 | ||
| 123 | Every allocate-able now has a `checkAllAllocationFailures` test. This | 157 | Every allocate-able now has a `checkAllAllocationFailures` test. This |
| 124 | process turned up two bugs. Also discovered were 8,663 allocations, which | 158 | process turned up two bugs. Also discovered were 8,663 allocations, |
| 125 | were reduced to two, these were also being individually freed on deinit. | 159 | which were reduced to two, these were also being individually freed |
| 126 | So that's nice. | 160 | on deinit. So that's nice. |
| 127 | 161 | ||
| 128 | #### That's It! | 162 | #### That's It! |
| 129 | 163 | ||
| 130 | I hope you find converting over `zg v0.13` code to be fairly painless and | 164 | I hope you find converting over `zg v0.13` code to be fairly painless |
| 131 | straightforward. There should be no need to make changes of this magnitude | 165 | and straightforward. There should be no need to make changes of this |
| 132 | in the future. | 166 | magnitude in the future. |
| 133 | 167 | ||