fix(data-branch): improve diff correctness, memory control, and output summary support#23789
fix(data-branch): improve diff correctness, memory control, and output summary support#23789gouhongshen wants to merge 20 commits intomatrixorigin:mainfrom
Conversation
# Conflicts: # pkg/frontend/stmt_kind.go
# Conflicts: # pkg/sql/parsers/dialect/mysql/mysql_sql.go
Review Summary by QodoRefactor data branch operations with memory management, hash-based diff, and output formatting improvements
WalkthroughsDescription• Refactored data branch operations with comprehensive memory management improvements, including branchHashmapAllocator and branchHashmapDeallocator for throttled memory allocation • Redesigned in-memory hashmap storage from linked-list buckets to hash-indexed memStore with LRU eviction and tombstone support, plus spillStore for efficient disk-based overflow • Implemented hash-based diff algorithm for comparing data branches with LCA (Lowest Common Ancestor) support and conflict detection/resolution • Added new data branch output operations module supporting multiple output modes: summary statistics, row count, limited rows, and file-based exports (CSV/SQL) • Fixed commit timestamp indexing in in-memory committed insert filtering and added GetObjectCreateTS method for object creation timestamp retrieval • Added support for OUTPUT SUMMARY and OUTPUT COUNT syntax in diff operations with comprehensive parsing and validation • Consolidated data branch type definitions and constants into dedicated file for improved code organization • Added comprehensive test coverage for diff output modes, summary validation, update splitting, and complex type handling • Improved block data read function with proper error handling and DataSource parameter propagation • Added debug logging for commit timestamp placeholder scenarios to diagnose TN nonappendable block issues Diagramflowchart LR
A["Data Branch Operations"] --> B["Memory Management"]
A --> C["Hash-based Diff"]
A --> D["Output Formatting"]
B --> B1["branchHashmapAllocator"]
B --> B2["memStore with LRU"]
B --> B3["spillStore for Overflow"]
C --> C1["LCA Resolution"]
C --> C2["Conflict Detection"]
D --> D1["Summary Statistics"]
D --> D2["CSV/SQL Export"]
D --> D3["Batch Processing"]
File Changes1. pkg/frontend/data_branch.go
|
What type of PR is this?
Which issue(s) this PR fixes:
Fixes #23751
What this PR does / why we need it:
OUTPUT SUMMARYsyntax fordata branch diff.Behavior changes
data branch diff ... output summaryis now supported.DELETE + INSERTinstead ofREPLACE INTOfor clearer semantics.Tests
OUTPUT SUMMARY.test/distributed/cases/git4data/branch/{diff,merge}.