| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
|
|
| |
That completes the set. I do think it's possible to bum a few more
cycles from the implementation, but, I'm not going to. It passes
the acceptance suite and that's what it needs to do.
|
| |
|
|
|
| |
Hopefully I can talk users out of taking advantage of this configuration
but I'll have better luck with that if it's available.
|
| |
|
|
|
|
|
|
|
| |
I removed the initAtIndex functions from the public vocabulary, because
the last couple of days of sweat and blood prove that it's hard to use
correctly.
That's probably it for WordBreak, now to fix the overlong bug on v0.14
and get this integrated with the new reverse grapheme iterator.
|
| | |
|
| |
|
|
|
| |
The handling of ignorables is really different, because they 'adhere'
to the future of the iteration, not the past.
|
| |
|
|
| |
Next up I hook it to the tests.
|
| |
|
|
|
| |
I'm not sure the details of this strategy can actually be made to work.
But, something can.
|
| | |
|
| | |
|
| |
|
|
|
| |
The decoder now properly returns substitution bytes according to
Substitution of Maximal Subparts, with tests to prove it.
|
| |
|
|
|
|
|
|
| |
This still needs a small barrage of tests to confirm that it correctly
performs substitution of maximal subparts (Unicode 16.0.0 §3.9.6).
I'm pretty sure this edition is 'overly maximal' actually, the name of
the algorithm is somewhat misleading as to what it actually does.
|
| |
|
|
| |
But does not.
|
| |
|
|
|
|
| |
without changing the algorithm at all, move the responsibility of
decoding a u8 slice out of the iterator, and into a reusable function
so that it can be used by consumers of the library
|
| |
|
|
|
|
|
|
|
|
|
| |
If the last codepoint in a byte slice is incomplete (IE has a length of
3 but there are only 2 bytes remaining), the iterator will panic.
Instead of panicking, prefer to return a replacement character. This
strategy is similar to that in the block just above which returns a
replacement character if the first byte is not valid. In this latter
block, we also consume only one byte and allow the iterator to continue.
This allows for sections of text which may have a single byte incorrect
near the end of the slice.
|
| | |
|
| | |
|
| |
|