summaryrefslogtreecommitdiff
path: root/src/code_point.zig (unfollow)
Commit message (Collapse)AuthorFilesLines
2025-07-08Add Words.zig example to READMEGravatar Sam Atman1-0/+3
2025-06-01Add graphemeAtIndex + iterate before and afterGravatar Sam Atman1-2/+58
That completes the set. I do think it's possible to bum a few more cycles from the implementation, but, I'm not going to. It passes the acceptance suite and that's what it needs to do.
2025-05-23Make offset size configurableGravatar Sam Atman1-7/+9
Hopefully I can talk users out of taking advantage of this configuration but I'll have better luck with that if it's available.
2025-05-15wordAtIndex passes conformanceGravatar Sam Atman1-1/+0
I removed the initAtIndex functions from the public vocabulary, because the last couple of days of sweat and blood prove that it's hard to use correctly. That's probably it for WordBreak, now to fix the overlong bug on v0.14 and get this integrated with the new reverse grapheme iterator.
2025-05-15Add format for CodePointGravatar Sam Atman1-2/+10
2025-05-15Hooked up break test, some bugs squashedGravatar Sam Atman1-10/+0
The handling of ignorables is really different, because they 'adhere' to the future of the iteration, not the past.
2025-05-15Reverse Word IteratorGravatar Sam Atman1-1/+1
Next up I hook it to the tests.
2025-05-15Begin conformance testGravatar Sam Atman1-0/+5
I'm not sure the details of this strategy can actually be made to work. But, something can.
2025-05-15Various small iterator improvementsGravatar Sam Atman1-4/+51
2025-05-15Add reverse CodePoint iteratorGravatar Sam Atman1-1/+67
2025-05-15Maximal Subparts testsGravatar Sam Atman1-37/+114
The decoder now properly returns substitution bytes according to Substitution of Maximal Subparts, with tests to prove it.
2025-05-15Replace CodePoint Decoding with Hörhmann MethodGravatar Sam Atman1-59/+204
This still needs a small barrage of tests to confirm that it correctly performs substitution of maximal subparts (Unicode 16.0.0 §3.9.6). I'm pretty sure this edition is 'overly maximal' actually, the name of the algorithm is somewhat misleading as to what it actually does.
2025-05-14Add overlong test, which should failGravatar Sam Atman1-2/+15
But does not.
2025-05-13Various small iterator improvementswork-branchGravatar Sam Atman1-9/+46
2025-05-09Add reverse CodePoint iteratorGravatar Sam Atman1-6/+75
2024-07-05refactor CodePoint.Iterator into a reusable fnGravatar Jonathan Raphaelson1-57/+79
without changing the algorithm at all, move the responsibility of decoding a u8 slice out of the iterator, and into a reusable function so that it can be used by consumers of the library
2024-06-10codepoint: prevent panic when last cp too shortGravatar Tim Culverhouse1-0/+11
If the last codepoint in a byte slice is incomplete (IE has a length of 3 but there are only 2 bytes remaining), the iterator will panic. Instead of panicking, prefer to return a replacement character. This strategy is similar to that in the block just above which returns a replacement character if the first byte is not valid. In this latter block, we also consume only one byte and allow the iterator to continue. This allows for sections of text which may have a single byte incorrect near the end of the slice.
2024-02-18Back to zg code_point. 4ms faster than Ghostty's Utf8DecoderGravatar Jose Colon Rodriguez1-29/+39
2024-02-18Code point code is now a method not a field.Gravatar Jose Colon Rodriguez1-39/+29
2024-02-18Code point and grapheme are now namespaces.Gravatar Jose Colon Rodriguez1-19/+20
2024-02-17Fixed isAsciiOnly and CodePointIterator ASCII bugsGravatar Jose Colon Rodriguez1-3/+3
2024-02-17GraphemeIterator ASCII optimization 3x fasterGravatar Jose Colon Rodriguez1-12/+15
2024-02-14Removed readCodePoint and StreamingGraphemeIteratorGravatar Jose Colon Rodriguez1-50/+0
2024-02-13Removed unreachables from CodePointIteratorGravatar Jose Colon Rodriguez1-0/+131