zg - Mirror of https://codeberg.org/atman/zg/

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' of https://codeberg.org/atman/zg into graphemeClusterWidth	Lich	2026-01-13	15	-114/+92
\|\
\| *	Use width 2 when skin tone modifier detected	Sam Atman	2025-12-23	1	-0/+5
\| \| \| \| \| \| \| \|	Fix: #82
\| *	Fix #74: Check for characters before popping in wrapv0.15.3	Sam Atman	2025-12-23	1	-2/+16
\| \|
\| *	Use takeDelimiterInclusive to support Zig 0.15.2	Jay	2025-11-08	1	-1/+2
\| \|
\| *	Embed data files in scripts rather than relying on filesystem access for ↵	Michael Chaten	2025-09-14	1	-17/+6
\| \| \| \| \| \| \| \|	easier packaging
\| *	Update codebase to Zig 0.15.1.	Michael Chaten	2025-09-14	15	-104/+73
\| \| \| \| \| \| \| \|	Removes compression support
* \|	Moved part of the `strWidth` into its own `graphemeClusterWidth` function	Lich	2025-07-20	1	-23/+27
\|/
*	Add Words.zig example to README	Sam Atman	2025-07-08	2	-0/+20
\|
*	Add graphemeAtIndex + iterate before and after	Sam Atman	2025-06-01	4	-87/+266
\| \| \| \| \| \|	That completes the set. I do think it's possible to bum a few more cycles from the implementation, but, I'm not going to. It passes the acceptance suite and that's what it needs to do.
*	Make offset size configurable	Sam Atman	2025-05-23	4	-26/+34
\| \| \| \| \|	Hopefully I can talk users out of taking advantage of this configuration but I'll have better luck with that if it's available.
*	Add iterateBefore and iterateAfter	Sam Atman	2025-05-23	2	-32/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	These create reverse or forward iterators before or after a Word. So this way, the user can get the word at an index, then iterate forward or back from that word. Also: Fixes #59 Which was fixed awhile back, but I don't feel like doing repo surgery to tag the fix where it happened. We have blame for that kind of thing.
*	Words module	Sam Atman	2025-05-16	2	-24/+24
\| \| \| \| \| \|	In keeping with the new nomenclature, we're calling the module "Words", not "WordBreak". The latter is Unicode jargon, the module provides word iterators. Words are the figure, word breaks are the ground.
*	Move WordBreak to Words	Sam Atman	2025-05-16	1	-0/+0
\|
*	Proofread	Sam Atman	2025-05-16	1	-5/+6
\|
*	Merge Grapheme Segmentation Iterator Tests	Sam Atman	2025-05-15	1	-79/+34
\|
*	Merge commit 'b5d955f' into develop-next	Sam Atman	2025-05-15	2	-3/+297
\|\
\| *	Merge branch 'work-branch' into HEAD	Sam Atman	2025-05-15	1	-9/+46
\| \|\
\| \| *	Various small iterator improvementswork-branch	Sam Atman	2025-05-13	1	-9/+46
\| \| \|
\| * \|	feat: add reverse grapheme iterator	Matteo Romano	2025-05-15	2	-0/+294
\| \| \| \| \| \| \| \| \| \| \| \|	Closes #53
\| * \|	fix: State.unset* did toggle the bit instead of unsetting it	Matteo Romano	2025-05-12	1	-3/+3
\| \|/
\| *	Add reverse CodePoint iterator	Sam Atman	2025-05-09	1	-6/+75
\| \|
* \|	wordAtIndex passes conformance	Sam Atman	2025-05-15	3	-103/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I removed the initAtIndex functions from the public vocabulary, because the last couple of days of sweat and blood prove that it's hard to use correctly. That's probably it for WordBreak, now to fix the overlong bug on v0.14 and get this integrated with the new reverse grapheme iterator.
* \|	Rewrite wordAtIndex to use iterator flipping	Sam Atman	2025-05-15	1	-24/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This also adds helper functions for initializing iterators at an index within the string. Not that user code should do that necessarily, but `wordAtIndex` definitely should, and there's no reason not to make it available to others. With an appropriate warning at least.
* \|	Add format for CodePoint	Sam Atman	2025-05-15	1	-2/+10
\| \|
* \|	Add reversal functions for word iterators	Sam Atman	2025-05-15	1	-2/+81
\| \| \| \| \| \| \| \| \| \| \| \|	While of only occasional use in real programs, one thing these are good for is reliably retrieving the word at a given index. Which turns out to be.. tricky is the best word.
* \|	Peek tests for word iterators	Sam Atman	2025-05-15	1	-0/+19
\| \|
* \|	ReverseWordIterator passes conformance test	Sam Atman	2025-05-15	1	-19/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ended up needing a clone of the codepoint iterator, so that WB4 can ignore points in a matter compatible with backward search. So I created a special SneakIterator which can return WBPs directly, so as to skip ignorables. This is also needed for flag emoji, since the odd-number case must be handle immediately. So we count back in a WB4 compatible way, then store the count on the word iterator, and voila.
* \|	Hooked up break test, some bugs squashed	Sam Atman	2025-05-15	3	-34/+64
\| \| \| \| \| \| \| \| \| \|	The handling of ignorables is really different, because they 'adhere' to the future of the iteration, not the past.
* \|	Reverse Word Iterator	Sam Atman	2025-05-15	2	-1/+157
\| \| \| \| \| \| \| \|	Next up I hook it to the tests.
* \|	Add wordAtCursor	Sam Atman	2025-05-15	1	-48/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is not actually the way to do it, and can break on some crafted strings. The way to actually do it: implement a reverse word search iterator, then do next() to find a word break, prev() to find a _valid_ word start, then next() again to find the valid end of said word. Maybe 2+, 2-, 1+ actually. I can probably write a test to see if the cursor spot is ambiguous, and apply an extra round if so. Need to mull the rules over before making any rash moves.
* \|	Rewrite, passes WordBreakTest	Sam Atman	2025-05-15	3	-78/+40
\| \| \| \| \| \| \| \| \| \|	After fixing a bug in Runicode which was fenceposting codepoints off the end of ranges. As one does.
* \|	Begin conformance test	Sam Atman	2025-05-15	5	-58/+361
\| \| \| \| \| \| \| \| \| \|	I'm not sure the details of this strategy can actually be made to work. But, something can.
* \|	Implement Word iterator	Sam Atman	2025-05-15	1	-0/+228
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A by-the-book implmentation of the word break rules from tr29. This is superficially inefficient, but compilers are more than able to handle the common subexpression folding ignored by this approach. Now to port the WordBreakPropertyTests, and clean up the inevitable bugs in the implementation.
* \|	Vastly simplify peek()	Sam Atman	2025-05-15	1	-60/+3
\| \| \| \| \| \| \| \|	Idiomatic Zig takes awhile, what can I say (yes I wrote the first one).
* \|	Refactor in unicode_tests	Sam Atman	2025-05-15	2	-32/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The comments in WordBreak and SentenceBreak tests get really long, the provided buffer would be inadequate. So this just provides a sub- iterator which will strip comments and comment lines, while keeping an eye on line numbers for any debugging.
* \|	Add WordBreakPropertyData	Sam Atman	2025-05-15	1	-0/+102
\| \| \| \| \| \| \| \|	Passes some simple lookup tests.
* \|	Various small iterator improvements	Sam Atman	2025-05-15	1	-4/+51
\| \|
* \|	Add reverse CodePoint iterator	Sam Atman	2025-05-15	1	-1/+67
\| \|
* \|	Maximal Subparts tests	Sam Atman	2025-05-15	1	-37/+114
\| \| \| \| \| \| \| \| \| \|	The decoder now properly returns substitution bytes according to Substitution of Maximal Subparts, with tests to prove it.
* \|	Replace CodePoint Decoding with Hörhmann Method	Sam Atman	2025-05-15	1	-59/+204
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This still needs a small barrage of tests to confirm that it correctly performs substitution of maximal subparts (Unicode 16.0.0 §3.9.6). I'm pretty sure this edition is 'overly maximal' actually, the name of the algorithm is somewhat misleading as to what it actually does.
* \|	Add overlong test, which should fail	Sam Atman	2025-05-14	1	-2/+15
\|/ \| \| \|	But does not.
*	Make DisplayWidth.setup publicv0.14.0-rc2	Sam Atman	2025-05-04	1	-1/+7
\| \| \| \|	Also adds setupWithGraphemes variant.
*	Remove inner setup from GeneralCategories	Sam Atman	2025-05-01	1	-10/+1
\| \| \| \| \|	It was one `try` block away from only returning Allocator.Error, so now there's no need to filter errors in an outer `catch`.
*	Update Unicode version in README.md	Sam Atman	2025-04-30	1	-0/+1
\| \| \| \| \| \| \|	Lets me slip these in: Closes #12 Closes #14
*	Unicode 16.0	Sam Atman	2025-04-30	1	-1/+7
\| \| \| \| \|	Went smoothly, needed to add some scripts and adjust the magic numbers, but other than that, all set.
*	Allocation Failure Tests	Sam Atman	2025-04-30	11	-91/+178
\| \| \| \| \| \| \| \| \| \|	These turned up an excessive amount of allocations in CanonData and CompatData, which have been reduced to two through the somewhat squirrely use of 'magic numbers'. There are now allocation tests for every allocated structure in the library, and they run to completion in a reasonable amount of time. So, that's nice.
*	Setup variants for all allocating modules	Sam Atman	2025-04-30	7	-146/+228
\| \| \| \| \| \| \| \|	This harmonizes the allocating modules in a couple of ways. All can now be constructed by pointer, and all treat various miscellaneous read failures as `unreachable`, which indeed they should be. The README has been updated to inform users of this option.
*	Update README.md to new API	Sam Atman	2025-04-30	1	-10/+10
\|
*	Rest of the Renamings	Sam Atman	2025-04-30	5	-0/+0
\| \| \| \|	These get different names, but don't otherwise change.
*	Remove FoldData, make CaseFolding	Sam Atman	2025-04-30	4	-167/+218
\| \| \| \| \|	CaseFolding now has the FoldData, and can be initialized with a copy of Normalize if wanted.