summaryrefslogtreecommitdiff
path: root/src (unfollow)
Commit message (Collapse)AuthorFilesLines
12 daysPrerelease touchupv0.16.0-rc1no-allocationGravatar Sam Atman1-3/+3
2026-02-06zg module, casing improvementsGravatar Sam Atman5-14/+122
2026-02-06Slightly better hash reduction for comptime_mapGravatar Sam Atman1-3/+21
This reduces collisions by 11% while adding no branching, so I'm calling it a win.
2026-02-05De-allocate Emoji moduleGravatar Sam Atman1-86/+44
2026-02-05Base units do not allocateGravatar Sam Atman4-166/+85
CanonData included. I may still sort out caseless matching without allocation, but that's a stretch goal. Closes #86 Closes #85
2026-02-04Teasing out canonicalizationGravatar Sam Atman2-45/+54
After coping with a spuriously broken autohash for awhile, I got the one remaining hash table moved into memory, so there's no further reason to put up with allocation of basic structures. So that's nice.
2026-02-04Rest of the 'easy' stuffGravatar Sam Atman4-396/+162
This gets us up to feature parity with Jacob's work. I want to eliminate that last allocation using the comptime hash map, and then see about eliminating allocations from case comparisons as well. That should just about do it.
2026-02-04Normalization and case foldingGravatar Sam Atman7-383/+371
Both of which deserve some further attention.
2026-02-04Convert Words module to no-allocationGravatar Sam Atman2-144/+85
2026-02-04Port DisplayWidthGravatar Sam Atman1-184/+105
2026-02-04Convert Graphemes to static allocationGravatar Sam Atman2-99/+68
And DisplayWidth, although untested at present. The plan is to just work through the codegen / module pairings, and move tests over until everything is covered.
2025-12-23Use width 2 when skin tone modifier detectedGravatar Sam Atman1-0/+5
Fix: #82
2025-12-23Fix #74: Check for characters before popping in wrapv0.15.3Gravatar Sam Atman1-2/+16
2025-11-08Use takeDelimiterInclusive to support Zig 0.15.2Gravatar Jay1-1/+2
2025-09-14Embed data files in scripts rather than relying on filesystem access for ↵Gravatar Michael Chaten1-17/+6
easier packaging
2025-09-14Update codebase to Zig 0.15.1.Gravatar Michael Chaten15-104/+73
Removes compression support
2025-07-20Moved part of the `strWidth` into its own `graphemeClusterWidth` functionGravatar Lich1-23/+27
2025-07-08Add Words.zig example to READMEGravatar Sam Atman2-0/+20
2025-06-24fix infinityGravatar Jacob Sandlund1-1/+1
2025-06-24Add Emoji module and codegen/emojiGravatar Jacob Sandlund1-0/+132
2025-06-01Add graphemeAtIndex + iterate before and afterGravatar Sam Atman4-87/+266
That completes the set. I do think it's possible to bum a few more cycles from the implementation, but, I'm not going to. It passes the acceptance suite and that's what it needs to do.
2025-05-23Make offset size configurableGravatar Sam Atman4-26/+34
Hopefully I can talk users out of taking advantage of this configuration but I'll have better luck with that if it's available.
2025-05-23Add iterateBefore and iterateAfterGravatar Sam Atman2-32/+104
These create reverse or forward iterators before or after a Word. So this way, the user can get the word at an index, then iterate forward or back from that word. Also: Fixes #59 Which was fixed awhile back, but I don't feel like doing repo surgery to tag the fix where it happened. We have blame for that kind of thing.
2025-05-16Words moduleGravatar Sam Atman2-24/+24
In keeping with the new nomenclature, we're calling the module "Words", not "WordBreak". The latter is Unicode jargon, the module provides word iterators. Words are the figure, word breaks are the ground.
2025-05-16Move WordBreak to WordsGravatar Sam Atman1-0/+0
2025-05-16ProofreadGravatar Sam Atman1-5/+6
2025-05-15Merge Grapheme Segmentation Iterator TestsGravatar Sam Atman1-79/+34
2025-05-15wordAtIndex passes conformanceGravatar Sam Atman3-103/+135
I removed the initAtIndex functions from the public vocabulary, because the last couple of days of sweat and blood prove that it's hard to use correctly. That's probably it for WordBreak, now to fix the overlong bug on v0.14 and get this integrated with the new reverse grapheme iterator.
2025-05-15Rewrite wordAtIndex to use iterator flippingGravatar Sam Atman1-24/+83
This also adds helper functions for initializing iterators at an index within the string. Not that user code should do that necessarily, but `wordAtIndex` definitely should, and there's no reason not to make it available to others. With an appropriate warning at least.
2025-05-15Add format for CodePointGravatar Sam Atman1-2/+10
2025-05-15Add reversal functions for word iteratorsGravatar Sam Atman1-2/+81
While of only occasional use in real programs, one thing these are good for is reliably retrieving the word at a given index. Which turns out to be.. tricky is the best word.
2025-05-15Peek tests for word iteratorsGravatar Sam Atman1-0/+19
2025-05-15ReverseWordIterator passes conformance testGravatar Sam Atman1-19/+64
Ended up needing a clone of the codepoint iterator, so that WB4 can ignore points in a matter compatible with backward search. So I created a special SneakIterator which can return WBPs directly, so as to skip ignorables. This is also needed for flag emoji, since the odd-number case must be handle immediately. So we count back in a WB4 compatible way, then store the count on the word iterator, and voila.
2025-05-15Hooked up break test, some bugs squashedGravatar Sam Atman3-34/+64
The handling of ignorables is really different, because they 'adhere' to the future of the iteration, not the past.
2025-05-15Reverse Word IteratorGravatar Sam Atman2-1/+157
Next up I hook it to the tests.
2025-05-15Add wordAtCursorGravatar Sam Atman1-48/+100
This is not actually the way to do it, and can break on some crafted strings. The way to actually do it: implement a reverse word search iterator, then do next() to find a word break, prev() to find a _valid_ word start, then next() again to find the valid end of said word. Maybe 2+, 2-, 1+ actually. I can probably write a test to see if the cursor spot is ambiguous, and apply an extra round if so. Need to mull the rules over before making any rash moves.
2025-05-15Rewrite, passes WordBreakTestGravatar Sam Atman3-78/+40
After fixing a bug in Runicode which was fenceposting codepoints off the end of ranges. As one does.
2025-05-15Begin conformance testGravatar Sam Atman5-58/+361
I'm not sure the details of this strategy can actually be made to work. But, something can.
2025-05-15Implement Word iteratorGravatar Sam Atman1-0/+228
A by-the-book implmentation of the word break rules from tr29. This is superficially inefficient, but compilers are more than able to handle the common subexpression folding ignored by this approach. Now to port the WordBreakPropertyTests, and clean up the inevitable bugs in the implementation.
2025-05-15Vastly simplify peek()Gravatar Sam Atman1-60/+3
Idiomatic Zig takes awhile, what can I say (yes I wrote the first one).
2025-05-15Refactor in unicode_testsGravatar Sam Atman2-32/+53
The comments in WordBreak and SentenceBreak tests get really long, the provided buffer would be inadequate. So this just provides a sub- iterator which will strip comments and comment lines, while keeping an eye on line numbers for any debugging.
2025-05-15Add WordBreakPropertyDataGravatar Sam Atman1-0/+102
Passes some simple lookup tests.
2025-05-15Various small iterator improvementsGravatar Sam Atman1-4/+51
2025-05-15Add reverse CodePoint iteratorGravatar Sam Atman1-1/+67
2025-05-15Maximal Subparts testsGravatar Sam Atman1-37/+114
The decoder now properly returns substitution bytes according to Substitution of Maximal Subparts, with tests to prove it.
2025-05-15Replace CodePoint Decoding with Hörhmann MethodGravatar Sam Atman1-59/+204
This still needs a small barrage of tests to confirm that it correctly performs substitution of maximal subparts (Unicode 16.0.0 §3.9.6). I'm pretty sure this edition is 'overly maximal' actually, the name of the algorithm is somewhat misleading as to what it actually does.
2025-05-15feat: add reverse grapheme iteratorGravatar Matteo Romano2-0/+294
Closes #53
2025-05-14Add overlong test, which should failGravatar Sam Atman1-2/+15
But does not.
2025-05-13Various small iterator improvementswork-branchGravatar Sam Atman1-9/+46
2025-05-12fix: State.unset* did toggle the bit instead of unsetting itGravatar Matteo Romano1-3/+3