Skip to content

Comments

perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843

Open
TKanX wants to merge 2 commits intorust-lang:mainfrom
TKanX:bugfix/152788-codegen-dst-size-nuw-assume
Open

perf(codegen): Eliminate size_of_val == 0 for DSTs with Non-zero-sized Prefix via NUW and Assume#152843
TKanX wants to merge 2 commits intorust-lang:mainfrom
TKanX:bugfix/152788-codegen-dst-size-nuw-assume

Conversation

@TKanX
Copy link
Contributor

@TKanX TKanX commented Feb 19, 2026

Summary:

Problem:

size_of_val(p) == 0 fails to optimize away for DST types that have a statically-known non-zero-sized prefix:

pub struct Foo<T: ?Sized>(pub [u32; 3], pub T);

pub fn demo(p: &Foo<dyn std::fmt::Debug>) -> bool {
    std::mem::size_of_val(p) == 0  // always false, but LLVM can't prove it
}

Foo has a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters because Box<dyn T> drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.

The slice tail variant Foo<[i32]> already optimized correctly; Foo<dyn Trait> and Foo<[u8]> did not.

Root Cause:

In size_and_align_of_dst (the ADT/Tuple branch), the size computation is:

full_size = (offset + unsized_size + (align-1)) & -align

LLVM cannot prove full_size > 0 because:

  1. offset + unsized_size used plain add — no NUW flag, so LLVM cannot conclude the result is ≥ offset.
  2. (x + addend) & -align — LLVM has no information that alignment rounding never reduces the value below x.

Additionally, the vtable alignment range metadata was [1, u64::MAX] (only non-zero), despite the actual bound being [1, 1 << (ptr_width - 1)] (all alignments are powers of two with a tighter upper bound).

Solution:

Three minimal additions, each grounded in a precise invariant:

  1. add nuw on offset + unsized_size — sound because both operands are ≤ isize::MAX for any valid Rust object, so unsigned overflow is impossible. Tells LLVM: unrounded_size ≥ offset.

  2. assume(full_size ≥ unrounded_size)round_up(x, a) ≥ x is a mathematical identity for power-of-two a. Tells LLVM: full_size ≥ unrounded_size ≥ offset. If offset > 0, the chain proves full_size > 0.

  3. Tighten vtable alignment range from [1, u64::MAX] to [1, 1 << (ptr_width - 1)] — consistent with Rust's alignment constraints. Applied in both size_of_val.rs and the vtable_align intrinsic in mir/intrinsic.rs.

LLVM IR Comparison:

Foo<dyn Debug> — before (godbolt):

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  %0 = getelementptr inbounds nuw i8, ptr %p.1, i64 8
  %1 = load i64, ptr %0, align 8, !range !3, !invariant.load !4
  %2 = getelementptr inbounds nuw i8, ptr %p.1, i64 16
  %3 = load i64, ptr %2, align 8, !range !5, !invariant.load !4
  %4 = tail call i64 @llvm.umax.i64(i64 %3, i64 4)
  %5 = add nuw i64 %1, 11
  %6 = add i64 %5, %4
  %7 = sub i64 0, %4
  %8 = and i64 %6, %7
  %_0 = icmp eq i64 %8, 0
  ret i1 %_0
}

Foo<dyn Debug> — after:

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  ret i1 false
}

Foo<[u8]> — before:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  %0 = add i64 %p.1, 15
  %_0 = icmp ult i64 %0, 4
  ret i1 %_0
}

Foo<[u8]> — after:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  ret i1 false
}

Changes:

  • compiler/rustc_codegen_ssa/src/size_of_val.rs: addunchecked_uadd (NUW) on offset + unsized_size; add assume(full_size ≥ unrounded_size); tighten vtable alignment range.
  • compiler/rustc_codegen_ssa/src/mir/intrinsic.rs: tighten alignment range on the vtable_align intrinsic, consistent with the above.
  • tests/codegen-llvm/dst-vtable-align-nonzero.rs: update FileCheck metadata expectation to match the new tighter range.
  • tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifying size_of_val == 0 folds to ret i1 false for Foo<dyn Debug>, Foo<[u8]>, and Foo<[i32]>.

Fixes #152788.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 19, 2026
@rustbot

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@TKanX
Copy link
Contributor Author

TKanX commented Feb 20, 2026

@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler

@rustbot rustbot added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Feb 20, 2026
@fmease
Copy link
Member

fmease commented Feb 21, 2026

r? codegen

@rustbot rustbot assigned dianqk and unassigned fmease Feb 21, 2026
@rust-bors

This comment has been minimized.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@TKanX TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a9ec27f to 8339cfe Compare February 22, 2026 05:32
@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@TKanX
Copy link
Contributor Author

TKanX commented Feb 22, 2026

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2026
@TKanX TKanX requested a review from scottmcm February 22, 2026 05:34
Comment on lines +183 to +189
// Alignment rounding can only increase the size, never decrease it:
// `round_up(x, a) >= x` for power-of-two `a`. With the `nuw` on the
// addition above, LLVM can therefore deduce
// `full_size >= unrounded_size >= offset`, which proves `full_size > 0`
// for types with a non-zero-sized prefix (#152788).
let size_ge = bx.icmp(IntPredicate::IntUGE, full_size, unrounded_size);
bx.assume(size_ge);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

@dianqk
Copy link
Member

dianqk commented Feb 22, 2026

r? scottmcm

@rustbot rustbot assigned scottmcm and unassigned dianqk Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

size_of_val(p) == 0 doesn't optimize out for clearly-not-ZST values

6 participants