Skip to content

Add PackTab for CompositeProps#569

Open
taj-p wants to merge 8 commits intomainfrom
tajp/packTab
Open

Add PackTab for CompositeProps#569
taj-p wants to merge 8 commits intomainfrom
tajp/packTab

Conversation

@taj-p
Copy link
Contributor

@taj-p taj-p commented Mar 3, 2026

Intent

Adds https://github.com/behdad/packtab.rs for generating CompositeProps behind a feature flag.

Results

I'm seeing roughly 115 kB savings on binary size (building vello_cpu_render with and without PackTab):

image

Overall layout performance seems a touch faster:

image

Tested via:

cargo export target/benchmarks -- bench --bench=main
cargo bench -q --bench=main --features parley_bench/packtab -- compare target/benchmarks/main -p --time 5

These results seem unsurprising considering taj-p#4.

Happy to remove the feature flagging and release to main directly. Also happy to keep it defensively behind a flag for a few weeks.

cc @behdad

@taj-p taj-p changed the title Add PackTab behind feature flag Add PackTab for CompositeProps behind feature flag Mar 3, 2026
@behdad
Copy link

behdad commented Mar 4, 2026

Is there some of those attributes that packtab itself should generate? We already generate some.

Also, is a trailing newline missing? I can fix.

Copy link
Collaborator

@nicoburns nicoburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is waiting on approval. Consider this a rubber stamp approval (+ a review of the "plumbing"). I can't comment on whether the new data is actually valid.

2, 114, 2, 2, 2, 2, 2, 2, 2, 2, 115, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 116, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 117, 2, 118, 0, 0, 0, 0, 2, 119, 0, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 120, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 121, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data looks very repetitive, which seems weird for compressed data...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PackTab just converts the data to a branchless multi-level lookup table. In this case, the shape of the table (from the comment in the generated code) is: [2^8,2^5,2^3,2^1]. That how it is broken down. You still get lots of repetition because eg. all different blocks of 32 numbers are encoded separately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So presumably it would be possible to further decrease the binary size impact by running the packtab'd data through a general purpose compression algorithm (LZ4, gzip, etc), at the cost of slightly higher RAM usage and a small one-time runtime cost to decompress the data.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that kind of minimization is desired, I'm also curious to see what compression=10 does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct then the cost of compression=10 would be paid for every lookup? Whereas if the was compressed, it could be uncompressed into the compression=5 (or whatever) format at a one-time cost?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. I just became curious.

If the primary concern is woff serving, isn't that handled by compression in the transport layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that kind of minimization is desired, I'm also curious to see what compression=10 does.

Performance seems a bit worse and the binary decreased in size by only 128 bytes.

image

If my understanding is correct then the cost of compression=10 would be paid for every lookup? Whereas if the was compressed, it could be uncompressed into the compression=5 (or whatever) format at a one-time cost?

Great idea! I'll create an issue after this is merged documenting it.

run: cargo run --locked -p parley_data_gen -- ./parley_data/src/generated

- name: regenerate unicode data (PackTab)
run: cargo run --locked -p parley_data_gen -- ./parley_data/src/generated_packtab --packtab --compression=5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why level 5? Why not level 9? Is it because higher compression levels also affect decompression speed?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

In this case, it's unlikely that level 9 generate any different result. There's level 10 that gives you absolute smallest data size with whatever speed comes with it. The level numbers from 1 to 9 tune the heuristic to pick which solution in the tradeoff space (number of lookups vs data size) to pick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unlikely that level 9 generate any different result

Behdad is right! Levels 5 through 9 produced the same result

Comment on lines +109 to +111
code.push_str(&format!(
"#[inline]\npub fn composite_get(cp: u32) -> u32 {{\n {namespace}_get(cp as usize)\n}}\n"
));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you should be able to write!(code, ...) (or writeln) directlty into code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Updated in 6d5e2a8

@taj-p
Copy link
Contributor Author

taj-p commented Mar 10, 2026

Is there some of those attributes that packtab itself should generate? We already generate some.

Hey @behdad - I noticed that packtab generates these fields into separate tables. Our parley_data_gen creates a single table of packed u32 that contains all the properties we care about so that we only perform 1 lookup on this table during analysis.

I'm curious what your thoughts on this approach are. Is it better to split the data into separate tables (that are presumably better compressed?) but suffer additional lookups or is it better to do something as we've done here?

Happy to benchmark both approaches in time.

@behdad
Copy link

behdad commented Mar 10, 2026

Is there some of those attributes that packtab itself should generate? We already generate some.

Oh I meant the rust attributes. I think you call them lints:

for lint in [
        "unsafe_code",
        "trivial_numeric_casts",
        "missing_docs",
        "clippy::allow_attributes_without_reason",
        "clippy::unseparated_literal_suffix",
        "clippy::double_parens",
        "clippy::unnecessary_cast",
    ] 

As for separate tables or combined, I think if you can look up once, this approach is better in general. Can you tell me which properties you are packing?

@taj-p
Copy link
Contributor Author

taj-p commented Mar 10, 2026

Is there some of those attributes that packtab itself should generate? We already generate some.

Oh I meant the rust attributes. I think you call them lints:

for lint in [
        "unsafe_code",
        "trivial_numeric_casts",
        "missing_docs",
        "clippy::allow_attributes_without_reason",
        "clippy::unseparated_literal_suffix",
        "clippy::double_parens",
        "clippy::unnecessary_cast",
    ] 

Ah, yes. I think it's probably better for PackTab to generate these.

As for separate tables or combined, I think if you can look up once, this approach is better in general. Can you tell me which properties you are packing?

  • script
  • gc
  • gib
  • bidi
  • is_emoji_or_pictograph (isEmoji || isExtendedPictograph)
  • is_variation_selector
  • is_region_indicator
  • is_mandatory_linebreak (LineBreak is MandatoryBreak, CarriageReturn, LineFeed, or NextLine)

I think we should also use PackTab for CodePointMap/Trie data here:

impl AnalysisDataSources {
pub(crate) fn new() -> Self {
Self
}
#[inline(always)]
pub(crate) fn properties(&self, c: char) -> Properties {
Properties::get(c)
}
#[inline(always)]
pub(crate) fn grapheme_segmenter(&self) -> GraphemeClusterSegmenterBorrowed<'_> {
const { GraphemeClusterSegmenter::new() }
}
#[inline(always)]
fn word_segmenter(&self) -> WordSegmenterBorrowed<'static> {
const { WordSegmenter::new_for_non_complex_scripts(WordBreakInvariantOptions::default()) }
}
#[inline(always)]
fn line_segmenter(&self, word_break_strength: WordBreak) -> LineSegmenterBorrowed<'static> {
match word_break_strength {
WordBreak::Normal => {
const {
let mut opt = LineBreakOptions::default();
opt.word_option = Some(LineBreakWordOption::Normal);
LineSegmenter::new_for_non_complex_scripts(opt)
}
}
WordBreak::BreakAll => {
const {
let mut opt = LineBreakOptions::default();
opt.word_option = Some(LineBreakWordOption::BreakAll);
LineSegmenter::new_for_non_complex_scripts(opt)
}
}
WordBreak::KeepAll => {
const {
let mut opt = LineBreakOptions::default();
opt.word_option = Some(LineBreakWordOption::KeepAll);
LineSegmenter::new_for_non_complex_scripts(opt)
}
}
}
}
#[inline(always)]
fn composing_normalizer(&self) -> CanonicalCompositionBorrowed<'_> {
const { CanonicalComposition::new() }
}
#[inline(always)]
fn decomposing_normalizer(&self) -> CanonicalDecompositionBorrowed<'_> {
const { CanonicalDecomposition::new() }
}
#[inline(always)]
pub(crate) fn script_short_name(&self) -> PropertyNamesShortBorrowed<'static, Script> {
PropertyNamesShort::new()
}
#[inline(always)]
fn brackets(&self) -> CodePointMapDataBorrowed<'_, BidiMirroringGlyph> {
const { CodePointMapData::new() }
}
}

@taj-p taj-p changed the title Add PackTab for CompositeProps behind feature flag Add PackTab for CompositeProps Mar 10, 2026
@taj-p taj-p marked this pull request as ready for review March 10, 2026 20:37
@behdad
Copy link

behdad commented Mar 10, 2026

  • gib

Which is that?

Given what you have, if they all fit in a 32bit value, keep it that way. Some (variation selector, regional indicator) are a few operations to compute, but will probably won't save any bytes removing them from the composite.

@behdad
Copy link

behdad commented Mar 11, 2026

for lint in [
        "unsafe_code",
        "trivial_numeric_casts",
        "missing_docs",
        "clippy::allow_attributes_without_reason",
        "clippy::unseparated_literal_suffix",
        "clippy::double_parens",
        "clippy::unnecessary_cast",
    ] 

Ah, yes. I think it's probably better for PackTab to generate these.

How's this?
behdad/packtab.rs@126e0dd

I can make a release if it looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants