Fix spinlock race condition (rwlock) by trim-the-main · Pull Request #560 · hawkw/mycelium

trim-the-main · 2026-02-02T15:04:46Z

Readers try to acquire the lock using fetch_add(2) (least significant bit is reserved for the writers) and they check if a writer exists after they perform the addition. If there is a writer they undo their change using fetch_sub(2) . However if the writer unlocks before the fetch_sub(2) of the reader, setting the state to UNLOCKED = 0, then the subsequent fetch_sub of the reader ends up corrupting the state, overflowing and wrapping around.

With this change the unlock_exclusive just clears the writer bit from the state, leaving the reader counts unchanged. The caller must hold the lock so the state must have the writer bit set prior.

$ RUSTFLAGS="--cfg loom" cargo test blocking::rwlock
   
running 2 tests
test blocking::rwlock::tests::read_write ... ok
test blocking::rwlock::tests::write ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 43 filtered out; finished in 0.00s```

Readers try to acquire the lock using `fetch_add(2)` (least significant bit is reserved for the writers) and they check if a writer exists after they perform the addition. If there is a writer they undo their change using `fetch_sub(2)` . However if the writer unlocks before the `fetch_sub(2)` of the reader, setting the state to `UNLOCKED = 0`, then the subsequent `fetch_sub` of the reader ends up corrupting the state, overflowing and wrapping around. With this change the `unlock_exclusive` just clears the writer bit from the state, leaving the reader counts unchanged. The caller must hold the lock so the state must have the writer bit set prior.

trim-the-main · 2026-02-05T09:29:52Z

I went back to this to see how the loom tests did not catch this. I realized that LOOM_MAX_PREEMPTIONS=2 is not enough to hit this, the test blocking::rwlock::tests::read_write fails with 3 preemptions. I can suggest either bumping this limit to 3 which comes with longer test runs. Or add another test which requires one less preemption to hit this case. Something like:

// Following tests
//     blocking::rwlock::tests::write
//     blocking::rwlock::tests::read_write

#[test]
fn write_read() {
    loom::model(|| {
        let lock = Arc::new(RwLock::<usize>::new(0));
        let r_thread = {
            let reader_lock = lock.clone();
            thread::spawn(move || {
                let guard = reader_lock.read();
                assert!(*guard == 0 || *guard == 1);
            })
        };
        writer(lock)();

        r_thread.join().expect("reader thread mustn't panic")
    });
}

I just changed the order, writer is in the main thread, reader is spawn. This way it requires one less preemption to hit the bug.

hawkw · 2026-03-13T16:53:16Z

Good catch, thanks for fixing this!

Re:

I went back to this to see how the loom tests did not catch this. I realized that LOOM_MAX_PREEMPTIONS=2 is not enough to hit this, the test blocking::rwlock::tests::read_write fails with 3 preemptions. I can suggest either bumping this limit to 3 which comes with longer test runs. Or add another test which requires one less preemption to hit this case. Something like:
// Following tests
//     blocking::rwlock::tests::write
//     blocking::rwlock::tests::read_write

#[test]
fn write_read() {
    loom::model(|| {
        let lock = Arc::new(RwLock::<usize>::new(0));
        let r_thread = {
            let reader_lock = lock.clone();
            thread::spawn(move || {
                let guard = reader_lock.read();
                assert!(*guard == 0 || *guard == 1);
            })
        };
        writer(lock)();

        r_thread.join().expect("reader thread mustn't panic")
    });
}
I just changed the order, writer is in the main thread, reader is spawn. This way it requires one less preemption to hit the bug.

I think adding a new test where the writer is in the main thread seems like a good approach --- it's unfortunate that the ordering impacts whether or not the test catches the bug, but I think increasing the global preemption bound on CI will make some of the other loom models take a really long time to run.

Another option might be to use loom::model::Builder's preemption_bound setting to override the preemption bound for some tests, so that the tests that can run in a reasonable time with a higher preemption bound get that value, and the tests where we rely on PREEMPTION_BOUND=2 to not have CI take multiple hours don't.

hawkw enabled auto-merge (squash) March 13, 2026 16:54

Merge branch 'main' into patch-1

083f326

hawkw merged commit a4ae029 into hawkw:main Mar 13, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix spinlock race condition (rwlock)#560

Fix spinlock race condition (rwlock)#560
hawkw merged 2 commits intohawkw:mainfrom
trim-the-main:patch-1

trim-the-main commented Feb 2, 2026

Uh oh!

trim-the-main commented Feb 5, 2026 •

edited

Loading

Uh oh!

hawkw commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

trim-the-main commented Feb 2, 2026

Uh oh!

trim-the-main commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hawkw commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

trim-the-main commented Feb 5, 2026 •

edited

Loading