summaryrefslogtreecommitdiff
path: root/src/Words.zig (unfollow)
Commit message (Collapse)AuthorFilesLines
2025-09-14Update codebase to Zig 0.15.1.Gravatar Michael Chaten1-3/+1
Removes compression support
2025-07-08Add Words.zig example to READMEGravatar Sam Atman1-0/+17
2025-06-01Add graphemeAtIndex + iterate before and afterGravatar Sam Atman1-2/+2
That completes the set. I do think it's possible to bum a few more cycles from the implementation, but, I'm not going to. It passes the acceptance suite and that's what it needs to do.
2025-05-23Make offset size configurableGravatar Sam Atman1-6/+8
Hopefully I can talk users out of taking advantage of this configuration but I'll have better luck with that if it's available.
2025-05-23Add iterateBefore and iterateAfterGravatar Sam Atman1-32/+66
These create reverse or forward iterators before or after a Word. So this way, the user can get the word at an index, then iterate forward or back from that word. Also: Fixes #59 Which was fixed awhile back, but I don't feel like doing repo surgery to tag the fix where it happened. We have blame for that kind of thing.
2025-05-16Words moduleGravatar Sam Atman1-21/+21
In keeping with the new nomenclature, we're calling the module "Words", not "WordBreak". The latter is Unicode jargon, the module provides word iterators. Words are the figure, word breaks are the ground.
2025-05-16Move WordBreak to WordsGravatar Sam Atman1-0/+0
2025-05-16ProofreadGravatar Sam Atman1-5/+6
2025-05-15wordAtIndex passes conformanceGravatar Sam Atman1-89/+72
I removed the initAtIndex functions from the public vocabulary, because the last couple of days of sweat and blood prove that it's hard to use correctly. That's probably it for WordBreak, now to fix the overlong bug on v0.14 and get this integrated with the new reverse grapheme iterator.
2025-05-15Rewrite wordAtIndex to use iterator flippingGravatar Sam Atman1-24/+83
This also adds helper functions for initializing iterators at an index within the string. Not that user code should do that necessarily, but `wordAtIndex` definitely should, and there's no reason not to make it available to others. With an appropriate warning at least.
2025-05-15Add reversal functions for word iteratorsGravatar Sam Atman1-2/+81
While of only occasional use in real programs, one thing these are good for is reliably retrieving the word at a given index. Which turns out to be.. tricky is the best word.
2025-05-15ReverseWordIterator passes conformance testGravatar Sam Atman1-19/+64
Ended up needing a clone of the codepoint iterator, so that WB4 can ignore points in a matter compatible with backward search. So I created a special SneakIterator which can return WBPs directly, so as to skip ignorables. This is also needed for flag emoji, since the odd-number case must be handle immediately. So we count back in a WB4 compatible way, then store the count on the word iterator, and voila.
2025-05-15Hooked up break test, some bugs squashedGravatar Sam Atman1-9/+30
The handling of ignorables is really different, because they 'adhere' to the future of the iteration, not the past.
2025-05-15Reverse Word IteratorGravatar Sam Atman1-0/+156
Next up I hook it to the tests.
2025-05-15Add wordAtCursorGravatar Sam Atman1-48/+100
This is not actually the way to do it, and can break on some crafted strings. The way to actually do it: implement a reverse word search iterator, then do next() to find a word break, prev() to find a _valid_ word start, then next() again to find the valid end of said word. Maybe 2+, 2-, 1+ actually. I can probably write a test to see if the cursor spot is ambiguous, and apply an extra round if so. Need to mull the rules over before making any rash moves.
2025-05-15Rewrite, passes WordBreakTestGravatar Sam Atman1-74/+37
After fixing a bug in Runicode which was fenceposting codepoints off the end of ranges. As one does.
2025-05-15Begin conformance testGravatar Sam Atman1-26/+57
I'm not sure the details of this strategy can actually be made to work. But, something can.
2025-05-15Implement Word iteratorGravatar Sam Atman1-0/+228
A by-the-book implmentation of the word break rules from tr29. This is superficially inefficient, but compilers are more than able to handle the common subexpression folding ignored by this approach. Now to port the WordBreakPropertyTests, and clean up the inevitable bugs in the implementation.
2025-05-15Add WordBreakPropertyDataGravatar Sam Atman1-0/+102
Passes some simple lookup tests.