Skip to content

revert refreshSnapshotAfterLock#23794

Merged
heni02 merged 1 commit intomatrixorigin:3.0-devfrom
LeftHandCold:revert_refreshSnapshotAfterLock
Mar 3, 2026
Merged

revert refreshSnapshotAfterLock#23794
heni02 merged 1 commit intomatrixorigin:3.0-devfrom
LeftHandCold:revert_refreshSnapshotAfterLock

Conversation

@LeftHandCold
Copy link
Contributor

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #23766

What this PR does / why we need it:

revert refreshSnapshotAfterLock

@qodo-code-review
Copy link

Review Summary by Qodo

Refine snapshot refresh logic for DROP DATABASE race condition

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Replace generic snapshot refresh with conditional logic based on transaction isolation level
• Add explicit snapshot advancement only for pessimistic RC transactions when stale
• Remove unused refreshSnapshotAfterLock function and moruntime import
• Update tests to verify conditional snapshot refresh behavior
Diagram
flowchart LR
  A["DROP DATABASE acquires<br/>exclusive lock"] -->|Check isolation level| B{"Pessimistic RC<br/>transaction?"}
  B -->|Yes| C{"Snapshot stale?"}
  B -->|No| D["Skip refresh"]
  C -->|Yes| E["Get latest commit TS"]
  C -->|No| D
  E --> F["Wait for logtail applied"]
  F --> G["Update snapshot"]
  G --> H["Call Relations<br/>with fresh snapshot"]
  D --> H
Loading

Grey Divider

File Changes

1. pkg/sql/compile/ddl.go 🐞 Bug fix +33/-27

Implement conditional snapshot refresh for pessimistic RC transactions

• Removed moruntime import and refreshSnapshotAfterLock function
• Replaced unconditional snapshot refresh with conditional logic that checks transaction isolation
 level and snapshot staleness
• Added inline snapshot advancement code that only updates snapshot for pessimistic RC transactions
 when snapshotTS < latestCommitTS
• Enhanced code comments explaining the race condition between concurrent CLONE and DROP DATABASE
 operations

pkg/sql/compile/ddl.go


2. pkg/sql/compile/ddl_test.go 🧪 Tests +29/-35

Update tests for conditional snapshot refresh logic

• Removed comments about clock.Now() implementation approach
• Renamed first test from "snapshot_always_refreshed_via_clock_now" to
 "snapshot_refreshed_when_stale"
• Added snapshotTS and latestCommitTS test variables to verify conditional behavior
• Added mock expectations for GetLatestCommitTS() and WaitLogTailAppliedAt() calls
• Renamed second test from "snapshot_refresh_error_propagated" to
 "snapshot_not_refreshed_when_fresh"
• Added assertions verifying UpdateSnapshot is NOT called when snapshot is already fresh
• Removed refreshSnapshotAfterLock stub and replaced with direct mock setup

pkg/sql/compile/ddl_test.go


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 3, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. GetLatestCommitTS race window 🐞 Bug ✓ Correctness
Description
DropDatabase now conditionally refreshes snapshot based on TxnClient.GetLatestCommitTS(), but that
value can lag behind a just-committed txn because it’s updated only on ClosedEvent, which fires
after commit unlock. DROP can acquire the exclusive lock during this window and skip snapshot
refresh, reintroducing stale Relations() reads and orphaned catalog rows.
Code

pkg/sql/compile/ddl.go[R137-148]

+		txnOp := c.proc.GetTxnOperator()
+		if txnOp.Txn().IsPessimistic() && txnOp.Txn().IsRCIsolation() {
+			latestCommitTS := c.proc.Base.TxnClient.GetLatestCommitTS()
+			if txnOp.Txn().SnapshotTS.Less(latestCommitTS) {
+				newTS, err := c.proc.Base.TxnClient.WaitLogTailAppliedAt(c.proc.Ctx, latestCommitTS)
+				if err != nil {
+					return err
+				}
+				if err := txnOp.UpdateSnapshot(c.proc.Ctx, newTS); err != nil {
+					return err
+				}
+			}
Evidence
DropDatabase’s new logic relies on GetLatestCommitTS and only refreshes when SnapshotTS <
latestCommitTS. However, latestCommitTS is maintained by txnClient.updateLastCommitTS, which is
registered on ClosedEvent; during commit, tc.unlock(ctx) is deferred after tc.closeLocked() but
executes first (LIFO), meaning locks may be released before ClosedEvent runs and before
latestCommitTS is advanced. This creates a real timing window where DROP can lock successfully yet
observe an old latestCommitTS and skip the refresh, leaving the snapshot stale relative to
concurrent commits that happened just before the lock was granted.

pkg/sql/compile/ddl.go[117-150]
pkg/txn/client/client.go[327-333]
pkg/txn/client/operator.go[998-1012]
pkg/txn/client/client.go[424-476]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`DropDatabase` refreshes the transaction snapshot based on `TxnClient.GetLatestCommitTS()` and may skip the refresh when `latestCommitTS` is stale. `latestCommitTS` is updated via a `ClosedEvent` callback, but transaction commit unlock is deferred to run before `closeLocked()` triggers `ClosedEvent`, creating a window where a concurrent txn can commit+unlock and DROP can acquire the exclusive lock before `latestCommitTS` is advanced.

### Issue Context
This can reintroduce the stale-snapshot catalog race (DROP missing recently created tables and leaving orphan records).

### Fix Focus Areas
- Reintroduce/restore the previous approach that advances snapshot based on a monotonic current timestamp (HLC/clock) and `WaitLogTailAppliedAt(now)` after the exclusive lock is acquired.
- Keep the refresh scoped to the same txn modes intended (pessimistic + RC), but do not depend on `GetLatestCommitTS()` being up-to-date in the immediate post-commit window.

### Fix Focus Areas (code locations)
- pkg/sql/compile/ddl.go[117-150]
- (reference) pkg/txn/client/operator.go[998-1012]
- (reference) pkg/txn/client/client.go[327-333]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@matrix-meow matrix-meow added the size/M Denotes a PR that changes [100,499] lines label Mar 3, 2026
@mergify mergify bot requested a review from XuPeng-SH March 3, 2026 10:28
@mergify mergify bot added the kind/bug Something isn't working label Mar 3, 2026
Comment on lines +137 to +148
txnOp := c.proc.GetTxnOperator()
if txnOp.Txn().IsPessimistic() && txnOp.Txn().IsRCIsolation() {
latestCommitTS := c.proc.Base.TxnClient.GetLatestCommitTS()
if txnOp.Txn().SnapshotTS.Less(latestCommitTS) {
newTS, err := c.proc.Base.TxnClient.WaitLogTailAppliedAt(c.proc.Ctx, latestCommitTS)
if err != nil {
return err
}
if err := txnOp.UpdateSnapshot(c.proc.Ctx, newTS); err != nil {
return err
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Getlatestcommitts race window 🐞 Bug ✓ Correctness

DropDatabase now conditionally refreshes snapshot based on TxnClient.GetLatestCommitTS(), but that
value can lag behind a just-committed txn because it’s updated only on ClosedEvent, which fires
after commit unlock. DROP can acquire the exclusive lock during this window and skip snapshot
refresh, reintroducing stale Relations() reads and orphaned catalog rows.
Agent Prompt
### Issue description
`DropDatabase` refreshes the transaction snapshot based on `TxnClient.GetLatestCommitTS()` and may skip the refresh when `latestCommitTS` is stale. `latestCommitTS` is updated via a `ClosedEvent` callback, but transaction commit unlock is deferred to run before `closeLocked()` triggers `ClosedEvent`, creating a window where a concurrent txn can commit+unlock and DROP can acquire the exclusive lock before `latestCommitTS` is advanced.

### Issue Context
This can reintroduce the stale-snapshot catalog race (DROP missing recently created tables and leaving orphan records).

### Fix Focus Areas
- Reintroduce/restore the previous approach that advances snapshot based on a monotonic current timestamp (HLC/clock) and `WaitLogTailAppliedAt(now)` after the exclusive lock is acquired.
- Keep the refresh scoped to the same txn modes intended (pessimistic + RC), but do not depend on `GetLatestCommitTS()` being up-to-date in the immediate post-commit window.

### Fix Focus Areas (code locations)
- pkg/sql/compile/ddl.go[117-150]
- (reference) pkg/txn/client/operator.go[998-1012]
- (reference) pkg/txn/client/client.go[327-333]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@heni02 heni02 merged commit f693275 into matrixorigin:3.0-dev Mar 3, 2026
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants