perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843
Open
TKanX wants to merge 2 commits intorust-lang:mainfrom
Open
perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843TKanX wants to merge 2 commits intorust-lang:mainfrom
size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843TKanX wants to merge 2 commits intorust-lang:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Contributor
Author
|
@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler |
Member
|
r? codegen |
This comment has been minimized.
This comment has been minimized.
scottmcm
requested changes
Feb 22, 2026
Collaborator
|
Reminder, once the PR becomes ready for a review, use |
…= 0` for non-ZST DSTs
… on non-ZST DSTs
a9ec27f to
8339cfe
Compare
Collaborator
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
Contributor
Author
|
@rustbot ready |
scottmcm
reviewed
Feb 22, 2026
Comment on lines
+183
to
+189
| // Alignment rounding can only increase the size, never decrease it: | ||
| // `round_up(x, a) >= x` for power-of-two `a`. With the `nuw` on the | ||
| // addition above, LLVM can therefore deduce | ||
| // `full_size >= unrounded_size >= offset`, which proves `full_size > 0` | ||
| // for types with a non-zero-sized prefix (#152788). | ||
| let size_ge = bx.icmp(IntPredicate::IntUGE, full_size, unrounded_size); | ||
| bx.assume(size_ge); |
Member
There was a problem hiding this comment.
Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...
Member
There was a problem hiding this comment.
I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.
Member
|
r? scottmcm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Problem:
size_of_val(p) == 0fails to optimize away for DST types that have a statically-known non-zero-sized prefix:Foohas a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters becauseBox<dyn T>drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.The slice tail variant
Foo<[i32]>already optimized correctly;Foo<dyn Trait>andFoo<[u8]>did not.Root Cause:
In
size_and_align_of_dst(the ADT/Tuple branch), the size computation is:LLVM cannot prove
full_size > 0because:offset + unsized_sizeused plainadd— no NUW flag, so LLVM cannot conclude the result is ≥offset.(x + addend) & -align— LLVM has no information that alignment rounding never reduces the value belowx.Additionally, the vtable alignment range metadata was
[1, u64::MAX](only non-zero), despite the actual bound being[1, 1 << (ptr_width - 1)](all alignments are powers of two with a tighter upper bound).Solution:
Three minimal additions, each grounded in a precise invariant:
add nuwonoffset + unsized_size— sound because both operands are≤ isize::MAXfor any valid Rust object, so unsigned overflow is impossible. Tells LLVM:unrounded_size ≥ offset.assume(full_size ≥ unrounded_size)—round_up(x, a) ≥ xis a mathematical identity for power-of-twoa. Tells LLVM:full_size ≥ unrounded_size ≥ offset. Ifoffset > 0, the chain provesfull_size > 0.Tighten vtable alignment range from
[1, u64::MAX]to[1, 1 << (ptr_width - 1)]— consistent with Rust's alignment constraints. Applied in bothsize_of_val.rsand thevtable_alignintrinsic inmir/intrinsic.rs.LLVM IR Comparison:
Foo<dyn Debug>— before (godbolt):Foo<dyn Debug>— after:Foo<[u8]>— before:Foo<[u8]>— after:Changes:
compiler/rustc_codegen_ssa/src/size_of_val.rs:add→unchecked_uadd(NUW) onoffset + unsized_size; addassume(full_size ≥ unrounded_size); tighten vtable alignment range.compiler/rustc_codegen_ssa/src/mir/intrinsic.rs: tighten alignment range on thevtable_alignintrinsic, consistent with the above.tests/codegen-llvm/dst-vtable-align-nonzero.rs: update FileCheck metadata expectation to match the new tighter range.tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifyingsize_of_val == 0folds toret i1 falseforFoo<dyn Debug>,Foo<[u8]>, andFoo<[i32]>.Fixes #152788.