| Commit message (Collapse) | Author | Age | Files | Lines |
| |\ |
|
| | | |
|
| | |
| |
| |
| |
| |
| | |
That completes the set. I do think it's possible to bum a few more
cycles from the implementation, but, I'm not going to. It passes
the acceptance suite and that's what it needs to do.
|
| | | |
|
| | |
| |
| |
| |
| | |
Hopefully I can talk users out of taking advantage of this configuration
but I'll have better luck with that if it's available.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
These create reverse or forward iterators before or after a Word. So
this way, the user can get the word at an index, then iterate forward
or back from that word.
Also:
Fixes #59
Which was fixed awhile back, but I don't feel like doing repo surgery
to tag the fix where it happened. We have blame for that kind of
thing.
|
| | |
| |
| |
| |
| | |
`ziglyph` is no longer maintained and basically abandoned, there's no
need to keep the comparison between them active going forward.
|
| | |
| |
| |
| | |
Rebasing my way through that again was just not in the cards.
|
| | |
| |
| |
| |
| |
| | |
In keeping with the new nomenclature, we're calling the module "Words",
not "WordBreak". The latter is Unicode jargon, the module provides word
iterators. Words are the figure, word breaks are the ground.
|
| | | |
|
| | | |
|
| | | |
|
| | |\ |
|
| | | |\ |
|
| | | | | |
|
| | | | |
| | | |
| | | |
| | | | |
Closes #53
|
| | | |/ |
|
| | | | |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I removed the initAtIndex functions from the public vocabulary, because
the last couple of days of sweat and blood prove that it's hard to use
correctly.
That's probably it for WordBreak, now to fix the overlong bug on v0.14
and get this integrated with the new reverse grapheme iterator.
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | | |
This also adds helper functions for initializing iterators at an index
within the string. Not that user code should do that necessarily, but
`wordAtIndex` definitely should, and there's no reason not to make it
available to others. With an appropriate warning at least.
|
| | | | |
|
| | | |
| | |
| | |
| | |
| | |
| | | |
While of only occasional use in real programs, one thing these are good
for is reliably retrieving the word at a given index. Which turns out
to be.. tricky is the best word.
|
| | | | |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Ended up needing a clone of the codepoint iterator, so that WB4 can
ignore points in a matter compatible with backward search.
So I created a special SneakIterator which can return WBPs directly,
so as to skip ignorables. This is also needed for flag emoji, since
the odd-number case must be handle immediately. So we count back in
a WB4 compatible way, then store the count on the word iterator, and
voila.
|
| | | |
| | |
| | |
| | |
| | | |
The handling of ignorables is really different, because they 'adhere'
to the future of the iteration, not the past.
|
| | | |
| | |
| | |
| | | |
Next up I hook it to the tests.
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is not actually the way to do it, and can break on some crafted
strings. The way to actually do it: implement a reverse word search
iterator, then do next() to find a word break, prev() to find a
_valid_ word start, then next() again to find the valid end of said
word. Maybe 2+, 2-, 1+ actually.
I can probably write a test to see if the cursor spot is ambiguous,
and apply an extra round if so. Need to mull the rules over before
making any rash moves.
|
| | | |
| | |
| | |
| | |
| | | |
After fixing a bug in Runicode which was fenceposting codepoints off the
end of ranges. As one does.
|
| | | |
| | |
| | |
| | |
| | | |
I'm not sure the details of this strategy can actually be made to work.
But, something can.
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A by-the-book implmentation of the word break rules from tr29. This is
superficially inefficient, but compilers are more than able to handle
the common subexpression folding ignored by this approach.
Now to port the WordBreakPropertyTests, and clean up the inevitable bugs
in the implementation.
|
| | | |
| | |
| | |
| | | |
Idiomatic Zig takes awhile, what can I say (yes I wrote the first one).
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | | |
The comments in WordBreak and SentenceBreak tests get really long, the
provided buffer would be inadequate. So this just provides a sub-
iterator which will strip comments and comment lines, while keeping an
eye on line numbers for any debugging.
|
| | | |
| | |
| | |
| | | |
Passes some simple lookup tests.
|
| | | | |
|
| | | | |
|
| |\ \ \
| |/ /
|/| |
| | |
| | |
| | |
| | | |
into v0.14-beta
Reviewed-on: https://codeberg.org/atman/zg/pulls/56
Reviewed-by: atman <atman@noreply.codeberg.org>
|
| |/ / |
|
| | |
| |
| |
| | |
Fixes #55
|
| | | |
|
| | |
| |
| |
| |
| | |
The decoder now properly returns substitution bytes according to
Substitution of Maximal Subparts, with tests to prove it.
|
| | |
| |
| |
| |
| |
| |
| |
| | |
This still needs a small barrage of tests to confirm that it correctly
performs substitution of maximal subparts (Unicode 16.0.0 §3.9.6).
I'm pretty sure this edition is 'overly maximal' actually, the name of
the algorithm is somewhat misleading as to what it actually does.
|
| |/
|
|
| |
But does not.
|
| |
|
|
| |
Also adds setupWithGraphemes variant.
|
| |
|
|
|
| |
It was one `try` block away from only returning Allocator.Error, so now
there's no need to filter errors in an outer `catch`.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
Closes #52
|
| |
|
|
|
| |
Also replaces the obsolete HTML/CSS version of the Unicode License
with the plain text version found on unicode.org.
|