Skip to content

Use niche-filling optimization even when multiple variants have data.#94075

Merged
bors merged 2 commits intorust-lang:masterfrom
mikebenfield:wip-enum
Sep 8, 2022
Merged

Use niche-filling optimization even when multiple variants have data.#94075
bors merged 2 commits intorust-lang:masterfrom
mikebenfield:wip-enum

Conversation

@mikebenfield
Copy link
Contributor

Fixes #46213

@rust-highfive
Copy link
Contributor

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 17, 2022
@rust-highfive
Copy link
Contributor

r? @michaelwoerister

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 17, 2022
@mikebenfield
Copy link
Contributor Author

Several static size checks needed the size lowered. I'm unsure how to handle this since they'll fail with a bootstrapped compiler, so I've just commented them out for now.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@oli-obk oli-obk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Will review in detail later, but starting a perf run now

@oli-obk
Copy link
Contributor

oli-obk commented Feb 17, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 17, 2022
@bors
Copy link
Collaborator

bors commented Feb 17, 2022

⌛ Trying commit 31b86437df15d7b335697d3f971721104335d04e with merge dbc0f1b7dd1014d9c7e56714953a0daf1b4f4c35...

@michaelwoerister
Copy link
Member

r? @eddyb -- reviewing this is out of my league ;)

Before merging something like this, we need to make sure that debuginfo generation handles this correctly (especially the cpp-like encoding). cc @wesleywiser

@rust-log-analyzer

This comment has been minimized.

@RalfJung
Copy link
Member

Wow that's cool. :D

It would probably be good to also have a ui test ensuring specific cases are handled (like some of the examples from the issue). I imagine there would be an existing ui test that can be extended, but I am not sure.

@mikebenfield
Copy link
Contributor Author

I added a simple test case.

not(bootstrap) doesn't seem to do the trick; I still get assertion failures with that.

@mikebenfield
Copy link
Contributor Author

Nevermind; not(bootstrap) works fine.

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case seems to have crept in from another PR...?

@joshtriplett
Copy link
Member

This is incredible work! I've wanted to see this level of niche optimization for a long time. I think this makes niche optimization work more like people often expect it'll work.

@mikebenfield
Copy link
Contributor Author

@nnethercote What do you find unsatisfactory about the output now?

@nnethercote
Copy link
Contributor

The main problem is that there are things marked as padding that aren't padding. E.g. for Call and Path, bytes 8..16 contain the tag.

The smaller problem is that it's not shown how the tag is stored for MethodCall.

Something like this would be much clearer:

print-type-size     variant `MethodCall`: 64 bytes
print-type-size         field `.0`: 24 bytes
print-type-size         field `.1`: 8 bytes (includes niche discriminant)
print-type-size         field `.2`: 24 bytes
print-type-size         field `.3`: 8 bytes
print-type-size     variant `Path`: 64 bytes 
print-type-size         discriminant: 8 bytes
print-type-size         padding: 8 bytes
print-type-size         field `.0`: 8 bytes, alignment: 8 bytes
print-type-size         field `.1`: 40 bytes

@pnkfelix
Copy link
Contributor

On the subject of the performance regression, it seems like the impact to syn here might be severe: 9.44% regression for incr-patched:println sounds bad to me.

From the detailed info, it seems like a big portion of that is blamed on time in LLVM_lto_optimize. Maybe there's nothing that can be done to address that, but it seemed worth noting.

(Is it conceivable that the change here might actually be causing LLVM_lto_optimize to be doing new useful work when compiling syn? )

@mikebenfield
Copy link
Contributor Author

Is it conceivable that the change here might actually be causing LLVM_lto_optimize to be doing new useful work when compiling syn?

Not in a way that is obvious to me.

@mikebenfield
Copy link
Contributor Author

I don't understand the linked performance results. The command says the cachegrind profiler is used, but the results don't look like those produced by cachegrind. Here the results are measured in seconds and organized by query. When I run the cachegrind profiler the results are measured in CPU cycles and organized by individual function.

@bjorn3
Copy link
Member

bjorn3 commented Sep 14, 2022

The cachegrind commands are if you want to investigate what changed. They are not the actual commands that ran. What you see in the table is the result of rustc -Zself-profile being formatted by the summarize command of the rust-lang/measureme repo. This is noisier than cachegrind, but a lot faster. Cachegrind is way too slow to run on the perf bot and doesn't account for ipc changes.

@RalfJung
Copy link
Member

RalfJung commented Sep 15, 2022 via email

@jdahlstrom
Copy link

jdahlstrom commented Nov 1, 2022

Hmm, should the following enum be eligible for niche optimization now (courtesy u/gitpy on Reddit)?

enum Data {
    A(f64),
    B(Buf),
}

struct Buf(*const u8, i16);

fn main() {
    println!("{}", std::mem::size_of::<Buf>());  // 16
    println!("{}", std::mem::size_of::<Data>()); // 24 even on nightly
}

@cuviper
Copy link
Member

cuviper commented Nov 1, 2022

Padding is not used as a niche -- it could contain anything, including the value that you would want to indicate the other variant.

@jdahlstrom
Copy link

jdahlstrom commented Nov 1, 2022

@cuviper Ah, right. Amusingly you can now offer the compiler a place that works as a niche, though =)

enum Data {
    A(f64),
    B(Buf),
}
struct Buf(*const u8, i16, bool);
//                       ++++++

println!("{}", std::mem::size_of::<Buf>());  // 16
println!("{}", std::mem::size_of::<Data>()); // 16 on nightly!

@cuviper
Copy link
Member

cuviper commented Nov 1, 2022

Yep! And if you want even more niche, you can use a univariant enum, with a repr to keep it from being a ZST:

#[repr(u8)] // or bigger!
enum Pad { Empty }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize enums by niche-filling even when the smaller variants also have data.