Skip to content

Fix spinlock race condition (rwlock)#560

Merged
hawkw merged 2 commits intohawkw:mainfrom
trim-the-main:patch-1
Mar 13, 2026
Merged

Fix spinlock race condition (rwlock)#560
hawkw merged 2 commits intohawkw:mainfrom
trim-the-main:patch-1

Conversation

@trim-the-main
Copy link
Contributor

Readers try to acquire the lock using fetch_add(2) (least significant bit is reserved for the writers) and they check if a writer exists after they perform the addition. If there is a writer they undo their change using fetch_sub(2) . However if the writer unlocks before the fetch_sub(2) of the reader, setting the state to UNLOCKED = 0, then the subsequent fetch_sub of the reader ends up corrupting the state, overflowing and wrapping around.

With this change the unlock_exclusive just clears the writer bit from the state, leaving the reader counts unchanged. The caller must hold the lock so the state must have the writer bit set prior.

$ RUSTFLAGS="--cfg loom" cargo test blocking::rwlock
   
running 2 tests
test blocking::rwlock::tests::read_write ... ok
test blocking::rwlock::tests::write ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 43 filtered out; finished in 0.00s```

Readers try to acquire the lock using `fetch_add(2)` (least significant bit is reserved for the writers) and they check if a writer exists after they perform the addition. If there is a writer they undo their change using `fetch_sub(2)` . However if the writer unlocks before the `fetch_sub(2)` of the reader, setting the state to `UNLOCKED = 0`, then the subsequent `fetch_sub` of the reader ends up corrupting the state, overflowing and wrapping around.

With this change the `unlock_exclusive` just clears the writer bit from the state, leaving the reader counts unchanged. The caller must hold the lock so the state must have the writer bit set prior.
@trim-the-main
Copy link
Contributor Author

trim-the-main commented Feb 5, 2026

I went back to this to see how the loom tests did not catch this. I realized that LOOM_MAX_PREEMPTIONS=2 is not enough to hit this, the test blocking::rwlock::tests::read_write fails with 3 preemptions. I can suggest either bumping this limit to 3 which comes with longer test runs. Or add another test which requires one less preemption to hit this case. Something like:

// Following tests
//     blocking::rwlock::tests::write
//     blocking::rwlock::tests::read_write

#[test]
fn write_read() {
    loom::model(|| {
        let lock = Arc::new(RwLock::<usize>::new(0));
        let r_thread = {
            let reader_lock = lock.clone();
            thread::spawn(move || {
                let guard = reader_lock.read();
                assert!(*guard == 0 || *guard == 1);
            })
        };
        writer(lock)();

        r_thread.join().expect("reader thread mustn't panic")
    });
}

I just changed the order, writer is in the main thread, reader is spawn. This way it requires one less preemption to hit the bug.

@hawkw
Copy link
Owner

hawkw commented Mar 13, 2026

Good catch, thanks for fixing this!

Re:

I went back to this to see how the loom tests did not catch this. I realized that LOOM_MAX_PREEMPTIONS=2 is not enough to hit this, the test blocking::rwlock::tests::read_write fails with 3 preemptions. I can suggest either bumping this limit to 3 which comes with longer test runs. Or add another test which requires one less preemption to hit this case. Something like:

// Following tests
//     blocking::rwlock::tests::write
//     blocking::rwlock::tests::read_write

#[test]
fn write_read() {
    loom::model(|| {
        let lock = Arc::new(RwLock::<usize>::new(0));
        let r_thread = {
            let reader_lock = lock.clone();
            thread::spawn(move || {
                let guard = reader_lock.read();
                assert!(*guard == 0 || *guard == 1);
            })
        };
        writer(lock)();

        r_thread.join().expect("reader thread mustn't panic")
    });
}

I just changed the order, writer is in the main thread, reader is spawn. This way it requires one less preemption to hit the bug.

I think adding a new test where the writer is in the main thread seems like a good approach --- it's unfortunate that the ordering impacts whether or not the test catches the bug, but I think increasing the global preemption bound on CI will make some of the other loom models take a really long time to run.

Another option might be to use loom::model::Builder's preemption_bound setting to override the preemption bound for some tests, so that the tests that can run in a reasonable time with a higher preemption bound get that value, and the tests where we rely on PREEMPTION_BOUND=2 to not have CI take multiple hours don't.

@hawkw hawkw enabled auto-merge (squash) March 13, 2026 16:54
@hawkw hawkw merged commit a4ae029 into hawkw:main Mar 13, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants