fix: data branch create table + DROP DATABASE race condition#23788
fix: data branch create table + DROP DATABASE race condition#23788mergify[bot] merged 13 commits intomatrixorigin:mainfrom
Conversation
Two-layer fix for orphan records in mo_tables caused by concurrent data branch create table (CLONE) and DROP DATABASE, which leads to ExpectedEOB panic on checkpoint replay. Fix 1 - Lock Service: Lock mode sync on holder promotion Lock is a Go value type. When addHolder is called during tryHold, modifications to l.value (lock mode byte) don't propagate to the store. Add setMode() method to correctly update the stored lock entry's mode when a waiter is promoted to holder (newHolder=true). This ensures Exclusive locks actually block Shared requests. Files: lock.go, lock_table_local.go, types.go Fix 2 - DDL Layer: Snapshot refresh after exclusive lock (clock.Now) After acquiring exclusive lock on mo_database in DropDatabase, call refreshSnapshotAfterLock which uses clock.Now() + WaitLogTailAppliedAt + UpdateSnapshot to advance the transaction snapshot. This ensures Relations() sees all tables committed before the lock was acquired. Uses clock.Now() instead of GetLatestCommitTS() (v1) because doWrite defer LIFO ordering causes unlock to execute before updateLastCommitTS, making the v1 conditional check always false in the critical scenario. Files: ddl.go, ddl_test.go
Review Summary by QodoFix race condition between CLONE and DROP DATABASE via lock mode sync
WalkthroughsDescription• Fix lock mode synchronization when waiter promoted to holder - Add setMode() method to update stored lock entry mode - Ensures Exclusive locks block Shared requests correctly • Fix snapshot staleness in DROP DATABASE with concurrent CLONE - Use clock.Now() instead of GetLatestCommitTS() for snapshot refresh - Prevents orphan records in mo_tables and ExpectedEOB panic • Add comprehensive tests for lock mode transitions - Verify Exclusive holder blocks Shared requests - Verify Shared holder allows subsequent Shared requests Diagramflowchart LR
A["Concurrent CLONE<br/>and DROP DATABASE"] -->|Lock mode<br/>not synced| B["Stale lock mode<br/>in storage"]
B -->|Shared requests<br/>incorrectly allowed| C["Orphan records<br/>in mo_tables"]
C -->|Checkpoint replay| D["ExpectedEOB panic"]
E["Fix 1: setMode()"] -->|Sync mode on<br/>holder promotion| F["Correct lock mode<br/>in storage"]
G["Fix 2: clock.Now()<br/>snapshot refresh"] -->|Advance snapshot<br/>after exclusive lock| H["Relations() sees<br/>all committed tables"]
F --> I["Race condition<br/>resolved"]
H --> I
File Changes1. pkg/lockservice/lock.go
|
Code Review by Qodo
1.
|
…e only refreshSnapshotAfterLock was inside lockMoDatabase and triggered for all Exclusive lock acquisitions. CreateDatabase also calls lockMoDatabase with Exclusive mode, so during restore cluster operations the snapshot advance caused CreateDatabase to see already-restored tables, producing: Duplicate entry '(0,mo_catalog,mo_branch_metadata)' for key '__mo_cpkey_col' Move refreshSnapshotAfterLock to DropDatabase directly, since only DropDatabase needs the snapshot advance (to enumerate tables via Relations()). lockMoDatabase is now a simple wrapper around doLockMoDatabase.
…eordering doWrite defers
Two fixes for the data-branch-create-table + DROP DATABASE race that
leaves orphan rows in mo_tables:
1. lockservice: sync mode on both ends of range lock (lock_table_local.go)
When a waiter is promoted to holder, setMode updates Lock.value on
the conflicting key only. But a range lock is stored as two
independent btree entries (range-start and range-end) with separate
Lock.value bytes. The paired entry retains stale mode, causing
isLockModeAllowed to incorrectly allow or deny Shared requests.
Add setModePairedRangeLock helper that scans past interleaved row
locks to find and update the paired entry. Called from both
acquireRowLockLocked and addRangeLockLocked when newHolder=true.
No-op for row locks.
2. txn: reorder defers in doWrite to update latestCommitTS before unlock (operator.go)
Original code registered two independent defers in doWrite:
defer tc.unlock(ctx) // LIFO: executes first
defer func() { closeLocked(); mu.Unlock() }() // executes second
This meant unlock released the lock-service lock before closeLocked
triggered updateLastCommitTS. In the window between unlock and
closeLocked, DROP DATABASE could acquire the Exclusive lock and call
GetLatestCommitTS, getting a stale value. The UpdateSnapshot fix
(layer 2) would then not trigger because SnapshotTS >= stale
latestCommitTS.
Merge unlock into the first defer so execution order becomes:
closeLocked() -> mu.Unlock() -> unlock(ctx)
This ensures latestCommitTS is updated before the lock is released.
…ne into fix_clone_drop_race
Merge Queue StatusRule:
This pull request spent 10 seconds in the queue, with no time running CI. Required conditions to merge
|
What type of PR is this?
Which issue(s) this PR fixes:
issue #23766
What this PR does / why we need it:
Two-layer fix for orphan records in mo_tables caused by concurrent data branch create table (CLONE) and DROP DATABASE, which leads to ExpectedEOB panic on checkpoint replay.
Fix 1 - Lock Service: Lock mode sync on holder promotion
Lock is a Go value type. When addHolder is called during tryHold,
modifications to l.value (lock mode byte) don't propagate to the
store. Add setMode() method to correctly update the stored lock
entry's mode when a waiter is promoted to holder (newHolder=true).
This ensures Exclusive locks actually block Shared requests.
Files: lock.go, lock_table_local.go, types.go
Fix 2 - DDL Layer: Snapshot refresh after exclusive lock (clock.Now)
After acquiring exclusive lock on mo_database in DropDatabase, call
refreshSnapshotAfterLock which uses clock.Now() + WaitLogTailAppliedAt
Uses clock.Now() instead of GetLatestCommitTS() (v1) because doWrite
defer LIFO ordering causes unlock to execute before updateLastCommitTS,
making the v1 conditional check always false in the critical scenario.
Files: ddl.go, ddl_test.go