Deterministic query cycles for parallel front-end by zetanumbers · Pull Request #149849 · rust-lang/rust

zetanumbers · 2025-12-10T13:58:44Z

The mechanism is similar to cycle detection in single-threaded mode. We traverse the deadlocked query graph from the top active query downwards to subqueries until we visit some query a second time, thus finding a cycle. With multi-thread front-end enabled one query may now have more than one active subqueries, aka we used one of parallel interfaces parallel!, join, par_for_each, etc. As such we have to traverse the "leftmost" active subquery to recover the sequential behavior of these parallel interfaces in single-threaded mode. New TreeNodeIndex saves implicit context information about what join (or scope) task we entered while executing a query, which we then use in break_query_cycle.

However we then have to guarantee the query stack from single-threaded mode is included in the active query graph. This is true for join function as their first task will be completed on the same thread and same will be tried for the second task unless stolen which is fine for us. scope places tasks in local queue and pops them in LIFO maner, while other worker threads could only steal from that queue in FIFO maner, thus we can guarantee the next task is either stolen or available for execution.

Fixes #142064
Fixes #142063
Fixes #127971

UPDATE: commits are sliced to the finest detail

rustbot · 2025-12-10T13:58:49Z

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

rustbot · 2025-12-10T13:58:51Z

r? @eholk

rustbot has assigned @eholk.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

compiler/rustc_data_structures/src/sync/branch_key.rs

matthiaskrgr · 2025-12-10T16:32:29Z

@bors try @rust-timer queue

Deterministic query cycles for parallel front-end

Kivooeo · 2025-12-10T16:53:27Z

~~@matthiaskrgr there is a compilation error in this PR, not sure if this can be benchmarked?~~

Ok this is surprising

rust-bors · 2025-12-10T18:49:39Z

☀️ Try build successful (CI)
Build commit: 306a768 (306a768e2f03217e62aee6655739249be1d8aa95, parent: 377656d3dd3f9c23a9c8713e163f4365a5261a84)

rust-timer · 2025-12-10T19:30:52Z

Finished benchmarking commit (306a768): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.2%, 1.8%]	155
Regressions ❌ (secondary)	0.7%	[0.2%, 1.9%]	138
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	1
All ❌✅ (primary)	0.5%	[0.2%, 1.8%]	155

Max RSS (memory usage)

Results (secondary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.3%	[2.3%, 2.3%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.1%	[-3.5%, -0.9%]	3
All ❌✅ (primary)	-	-	0

Cycles

Results (primary 2.2%, secondary 2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.2%	[2.0%, 2.4%]	3
Regressions ❌ (secondary)	3.6%	[2.8%, 5.3%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-4.6%	[-4.6%, -4.6%]	1
All ❌✅ (primary)	2.2%	[2.0%, 2.4%]	3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 471.42s -> 474.532s (0.66%)
Artifact size: 389.04 MiB -> 389.08 MiB (0.01%)

Zoxc · 2025-12-14T18:51:43Z

It's a bit unclear to me what source of non-determinism this tries to fix. Does this alter parallel execution in any way other than picking which query in the cycle use for resumption? It looks to me like you're trying to pick the point in the cycle to break which corresponds to a single threaded execution, but I don't think that point is guaranteed to be resumable.

Currently it looks like we're not deterministically picking which query in a cycle to resume when multiple is present. That's fairly simple to improve, though I think the queries available for resumption is non-deterministic to start with.

zetanumbers · 2025-12-14T19:31:43Z

It's a bit unclear to me what source of non-determinism this tries to fix. Does this alter parallel execution in any way other than picking which query in the cycle use for resumption? It looks to me like you're trying to pick the point in the cycle to break which corresponds to a single threaded execution, but I don't think that point is guaranteed to be resumable.

I make an assumption that when we get a query cycle there is the "point in the cycle to break which corresponds to a single threaded execution" which currently rustc_thread_scope::scope violates due to the order of task executions (and which I plan to replace with join). If you traverse query graph in a "single-threaded manner" you will eventually find a thread waiting for already visited query to finish because graph is finite and every active query during a deadlock has a subquery (either direct or waiting on) to go to next. That last query wait closing a loop has to be resumed. I assume any query can be reached by traversing down to some subquery from the root query which I also assume is unique.

though I think the queries available for resumption is non-deterministic to start with.

That's what I am trying to make deterministic.

rustbot · 2026-01-13T15:28:27Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

rustbot · 2026-02-24T15:42:37Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

zetanumbers · 2026-02-24T18:40:30Z

Added a big doc comment about new query cycle breaking code.

zetanumbers · 2026-02-24T18:44:47Z

@rustbot ready
@rustbot reroll

zetanumbers · 2026-02-24T18:48:33Z

r? @nnethercote

This PR is stalling for 3.5 months. Please, I need any tangible feedback.

nnethercote · 2026-02-25T04:59:01Z

@zetanumbers: I see this PR hasn't been handled well and has been frustrating for you. I'm sorry about that. I also know very little about the query cycle handling code but I will do my best to do a close review, and try to get this PR back on track.

nnethercote

First review is just looking at the TreeNodeIndex. Seems reasonable, though there is scope for more comments plus some tests.

View changes since this review

compiler/rustc_data_structures/src/tree_node_index.rs

nnethercote · 2026-02-25T04:43:19Z

compiler/rustc_data_structures/src/tree_node_index.rs

+
+impl Display for BranchingError {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        "TreeNodeIndex's free bits have been exhausted, make sure recursion is used carefully"


Do you know if we might ever reach this limit? A balanced binary tree that is 64 deep is very large, but is it possible we might get a highly unbalanced binary tree of that depth?

For this to happen there has to be a recursive function that isn't a query and uses par_join or par_slice recursively, until someone writes 64 nested par_join calls or something like that. Each query starts with a fresh binary tree, so I don't expect this to ever happen.

This would be good information to put into a comment.

Done in 6e39f24

compiler/rustc_data_structures/src/tree_node_index.rs

compiler/rustc_middle/src/query/job.rs

nnethercote · 2026-02-25T05:06:15Z

compiler/rustc_data_structures/src/sync/parallel.rs

    ret
 }

-fn serial_join<A, B, RA, RB>(oper_a: A, oper_b: B) -> (RA, RB)


It would be helpful in the commit message to add a brief explanation why these functions are being moved. Something like "Because the next commit will modify them to use ImplicitCtxt which is not available in rustc_data_structures."

compiler/rustc_middle/src/sync.rs

nnethercote · 2026-02-25T05:14:06Z

compiler/rustc_middle/src/sync.rs

    })
 }
+
+fn raw_branched_join<A, B, RA: Send, RB: Send>(oper_a: A, oper_b: B) -> (RA, RB)


This needs a comment.

I've inlined this function into par_join as it only used there after recent rebase. Done in 15674b4

I believe branch_context's doc-comment should give a reader enough information.

nnethercote · 2026-02-25T05:14:15Z

compiler/rustc_middle/src/sync.rs

+    rustc_thread_pool::join(|| branch_context(0, 2, oper_a), || branch_context(1, 2, oper_b))
+}
+
+fn branch_context<F, R>(branch_num: u64, branch_space: u64, f: F) -> R


And this needs a comment.

Done in 15674b4

compiler/rustc_middle/src/sync.rs

nnethercote · 2026-02-25T05:32:18Z

I had lots of little comments, but in general the first four commits seem reasonable. They are basically adding a bunch of plumbing to track node ordering.

The final commit is the important one, where the plumbing is put to use to change the cycle handling. This is difficult for me to evaluate because I know very little about cycle handling. AFAIK @Zoxc and @zetanumbers are the only ones who do know about cycle handling. I do like the expanded comment with the examples.

@Zoxc: you asked some questions earlier, and @zetanumbers gave what seem like reasonable answers. Do you have any other observations or concerns that would prevent this PR from being merged. (It appears that this PR served as, at least, partial inspiration for #152229, which speaks to its merits.)

@zetanumbers You said this fixes #142064,
#142063, and
#127971. Is it possible to write tests for these issues? It would be good to have test-based evidence that the problems are fixed, and won't regress in the future.

zetanumbers · 2026-02-25T17:06:52Z

Is it possible to write tests for these issues? It would be good to have test-based evidence that the problems are fixed, and won't regress in the future.

I believe it requires a rewrite in ./x test utility to execute some ui tests sequentially to retain multi-threading behavior and repeatedly for more common bug reproduction. The latter requirement is done in #143953, while I am unaware if the former one is implemented anywhere. However the majority of parallel compiler bugs should be consistently reproducible this way.

I plan to work on parallel testsuite one way or another.

nnethercote · 2026-02-26T01:23:19Z

@zetanumbers: thanks for the additional commits, they have good improvements. I have marked most of my comments as resolved. There are still a small number that need action.

@Zoxc: any thoughts?

rustbot assigned eholk Dec 10, 2025