Conversation
CompositeProps behind feature flag
|
Is there some of those attributes that packtab itself should generate? We already generate some. Also, is a trailing newline missing? I can fix. |
nicoburns
left a comment
There was a problem hiding this comment.
Not sure if this is waiting on approval. Consider this a rubber stamp approval (+ a review of the "plumbing"). I can't comment on whether the new data is actually valid.
| 2, 114, 2, 2, 2, 2, 2, 2, 2, 2, 115, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 116, 2, 2, 2, 2, 2, 2, 2, 2, | ||
| 2, 2, 2, 2, 2, 117, 2, 118, 0, 0, 0, 0, 2, 119, 0, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 120, 2, 2, 2, | ||
| 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
| 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, |
There was a problem hiding this comment.
This data looks very repetitive, which seems weird for compressed data...
There was a problem hiding this comment.
PackTab just converts the data to a branchless multi-level lookup table. In this case, the shape of the table (from the comment in the generated code) is: [2^8,2^5,2^3,2^1]. That how it is broken down. You still get lots of repetition because eg. all different blocks of 32 numbers are encoded separately.
There was a problem hiding this comment.
Interesting. So presumably it would be possible to further decrease the binary size impact by running the packtab'd data through a general purpose compression algorithm (LZ4, gzip, etc), at the cost of slightly higher RAM usage and a small one-time runtime cost to decompress the data.
There was a problem hiding this comment.
If that kind of minimization is desired, I'm also curious to see what compression=10 does.
There was a problem hiding this comment.
If my understanding is correct then the cost of compression=10 would be paid for every lookup? Whereas if the was compressed, it could be uncompressed into the compression=5 (or whatever) format at a one-time cost?
There was a problem hiding this comment.
You are correct. I just became curious.
If the primary concern is woff serving, isn't that handled by compression in the transport layer?
There was a problem hiding this comment.
If that kind of minimization is desired, I'm also curious to see what compression=10 does.
Performance seems a bit worse and the binary decreased in size by only 128 bytes.
If my understanding is correct then the cost of compression=10 would be paid for every lookup? Whereas if the was compressed, it could be uncompressed into the compression=5 (or whatever) format at a one-time cost?
Great idea! I'll create an issue after this is merged documenting it.
.github/workflows/ci.yml
Outdated
| run: cargo run --locked -p parley_data_gen -- ./parley_data/src/generated | ||
|
|
||
| - name: regenerate unicode data (PackTab) | ||
| run: cargo run --locked -p parley_data_gen -- ./parley_data/src/generated_packtab --packtab --compression=5 |
There was a problem hiding this comment.
Why level 5? Why not level 9? Is it because higher compression levels also affect decompression speed?
There was a problem hiding this comment.
Correct.
In this case, it's unlikely that level 9 generate any different result. There's level 10 that gives you absolute smallest data size with whatever speed comes with it. The level numbers from 1 to 9 tune the heuristic to pick which solution in the tradeoff space (number of lookups vs data size) to pick.
There was a problem hiding this comment.
it's unlikely that level 9 generate any different result
Behdad is right! Levels 5 through 9 produced the same result
parley_data_gen/src/lib.rs
Outdated
| code.push_str(&format!( | ||
| "#[inline]\npub fn composite_get(cp: u32) -> u32 {{\n {namespace}_get(cp as usize)\n}}\n" | ||
| )); |
There was a problem hiding this comment.
Nit: you should be able to write!(code, ...) (or writeln) directlty into code.
Hey @behdad - I noticed that packtab generates these fields into separate tables. Our I'm curious what your thoughts on this approach are. Is it better to split the data into separate tables (that are presumably better compressed?) but suffer additional lookups or is it better to do something as we've done here? Happy to benchmark both approaches in time. |
Oh I meant the rust attributes. I think you call them lints: As for separate tables or combined, I think if you can look up once, this approach is better in general. Can you tell me which properties you are packing? |
Ah, yes. I think it's probably better for PackTab to generate these.
I think we should also use PackTab for CodePointMap/Trie data here: parley/parley/src/analysis/mod.rs Lines 29 to 95 in 6d5e2a8 |
CompositeProps behind feature flagCompositeProps
Which is that? Given what you have, if they all fit in a 32bit value, keep it that way. Some (variation selector, regional indicator) are a few operations to compute, but will probably won't save any bytes removing them from the composite. |
How's this? I can make a release if it looks good. |
Intent
Adds https://github.com/behdad/packtab.rs for generating
CompositePropsbehind a feature flag.Results
I'm seeing roughly 115 kB savings on binary size (building
vello_cpu_renderwith and without PackTab):Overall layout performance seems a touch faster:
Tested via:
cargo export target/benchmarks -- bench --bench=main cargo bench -q --bench=main --features parley_bench/packtab -- compare target/benchmarks/main -p --time 5These results seem unsurprising considering taj-p#4.
Happy to remove the feature flagging and release to main directly. Also happy to keep it defensively behind a flag for a few weeks.
cc @behdad