-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Current GB18030's DBCS subset (aka "GBK") data:
- should be updated with a U+E7C7/U+1E3F swap in GB18030-2005,
- may use 24 Unicode 4.1 mappings instead of the PUA code points for decoder output. This can be implemented as an extra convertor filter.
The offset table should be updated with one conformant to GB18030-2005 like whatwg's. The current offset table omits a lot of tiny gaps like U+00A5, so you get off-by-a-few errors. (Or you can "do it the glibc way" for speed -- only consider the bigger continuous blocks linear, and hardcode the smaller fragments. These holes are annoying indeed.)
A "web gb18030" decoder implemented as GB18030-WEB that tolerates GBK's 0x80 → U+20AC will be helpful too.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels