zg - Mirror of https://codeberg.org/atman/zg/

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add reversal functions for word iterators	Sam Atman	2025-05-15	1	-2/+81
\| \| \| \| \| \|	While of only occasional use in real programs, one thing these are good for is reliably retrieving the word at a given index. Which turns out to be.. tricky is the best word.
*	ReverseWordIterator passes conformance test	Sam Atman	2025-05-15	1	-19/+64
\| \| \| \| \| \| \| \| \| \| \|	Ended up needing a clone of the codepoint iterator, so that WB4 can ignore points in a matter compatible with backward search. So I created a special SneakIterator which can return WBPs directly, so as to skip ignorables. This is also needed for flag emoji, since the odd-number case must be handle immediately. So we count back in a WB4 compatible way, then store the count on the word iterator, and voila.
*	Hooked up break test, some bugs squashed	Sam Atman	2025-05-15	1	-9/+30
\| \| \| \| \|	The handling of ignorables is really different, because they 'adhere' to the future of the iteration, not the past.
*	Reverse Word Iterator	Sam Atman	2025-05-15	1	-0/+156
\| \| \| \|	Next up I hook it to the tests.
*	Add wordAtCursor	Sam Atman	2025-05-15	1	-48/+100
\| \| \| \| \| \| \| \| \| \| \| \|	This is not actually the way to do it, and can break on some crafted strings. The way to actually do it: implement a reverse word search iterator, then do next() to find a word break, prev() to find a _valid_ word start, then next() again to find the valid end of said word. Maybe 2+, 2-, 1+ actually. I can probably write a test to see if the cursor spot is ambiguous, and apply an extra round if so. Need to mull the rules over before making any rash moves.
*	Rewrite, passes WordBreakTest	Sam Atman	2025-05-15	1	-74/+37
\| \| \| \| \|	After fixing a bug in Runicode which was fenceposting codepoints off the end of ranges. As one does.
*	Begin conformance test	Sam Atman	2025-05-15	1	-26/+57
\| \| \| \| \|	I'm not sure the details of this strategy can actually be made to work. But, something can.
*	Implement Word iterator	Sam Atman	2025-05-15	1	-0/+228
\| \| \| \| \| \| \| \| \|	A by-the-book implmentation of the word break rules from tr29. This is superficially inefficient, but compilers are more than able to handle the common subexpression folding ignored by this approach. Now to port the WordBreakPropertyTests, and clean up the inevitable bugs in the implementation.
*	Add WordBreakPropertyData	Sam Atman	2025-05-15	1	-0/+102
	Passes some simple lookup tests.