mlcommons · wvaske · Nov 21, 2025 · Dec 9, 2025 · Dec 19, 2025 · Dec 19, 2025
@@ -0,0 +1,74 @@
+name: Tests
+
+on:
+  push:
+    branches: [main, master]
+  pull_request:
+    branches: [main, master]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ['3.10', '3.11', '3.12']
+
+    steps:
+    - uses: actions/checkout@v4
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v5
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Install system dependencies
+      run: |
+        sudo apt-get update
+        sudo apt-get install -y libopenmpi-dev openmpi-common
+
+    - name: Install package and test dependencies
+      run: |
+        python -m pip install --upgrade pip
+        # Install the package in editable mode without DLIO
+        pip install -e ".[test]"
+
+    - name: Run unit tests
+      run: |
+        pytest tests/unit -v --tb=short
+
+    - name: Run unit tests with coverage
+      run: |
+        pytest tests/unit -v --cov=mlpstorage --cov-report=xml --cov-report=term-missing
+
+    - name: Upload coverage to Codecov
+      uses: codecov/codecov-action@v4
+      with:
+        files: ./coverage.xml
+        fail_ci_if_error: false
+        verbose: true
+      env:
+        CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
+
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+
+    - name: Set up Python
+      uses: actions/setup-python@v5
+      with:
+        python-version: '3.11'
+
+    - name: Install lint dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install ruff
+
+    - name: Run ruff check
+      run: |
+        ruff check mlpstorage/ --output-format=github || true
+
+    - name: Run ruff format check
+      run: |
+        ruff format --check mlpstorage/ || true
@@ -0,0 +1,39 @@
+# Python cache
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+
+# Distribution / packaging
+dist/
+build/
+*.egg-info/
+
+# Virtual environments
+venv/
+.venv/
+env/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+
+# Test artifacts
+.pytest_cache/
+.coverage
+htmlcov/
+*.html
+
+# OS files
+.DS_Store
+Thumbs.db
+
+
+# Coding Agents
+.agent/
+.roo/
+.vscode/
+CLAUDE.md
+.roomodes
@@ -0,0 +1,100 @@
+# MLPerf Storage Benchmark Suite v3.0
+
+## What This Is
+
+A benchmark orchestration framework for the MLCommons MLPerf Storage working group. The suite runs storage benchmarks aligned with MLPerf rules and reports results with verification of rules compliance.
+
+## Core Value
+
+**The ONE thing that must work:** Orchestrate multiple benchmark types (training, checkpointing, kv-cache, vectordb) across distributed systems and produce verified, rules-compliant results.
+
+## Context
+
+### Current State
+- v2.0 release with Claude Code enhancements
+- Training and checkpointing benchmarks use DLIO as underlying engine
+- KV cache benchmark exists in separate directory (`kv_cache_benchmark/`)
+- VectorDB benchmark code exists in external branch
+- MPI-based execution and host collection for DLIO benchmarks
+- Existing error handling and validation pipeline
+
+### Target State (v3.0)
+- Fully integrated KV cache and VectorDB benchmarks as Benchmark subclasses
+- New training models (dlrm, retinanet, flux)
+- Package version management with lockfiles
+- SSH-based host collection for non-MPI benchmarks
+- Time-series /proc/ data collection during benchmark execution
+- Improved error messaging and user guidance
+
+### Timeline
+- **Feature freeze:** 6 weeks
+- **Bugfix period:** 6 weeks
+- **Code freeze:** 12 weeks total
+
+## Requirements
+
+### Validated (Existing)
+
+- ✓ Training benchmark orchestration via DLIO — existing
+- ✓ Checkpointing benchmark orchestration via DLIO — existing
+- ✓ MPI-based distributed execution — existing
+- ✓ Rules validation pipeline — existing
+- ✓ Report generation — existing
+- ✓ CLI with nested subcommands — existing
+- ✓ Benchmark registry pattern — existing
+
+### Active
+
+- [ ] Package version lockfile management
+- [ ] Remove GPU package dependencies (not used)
+- [ ] KV cache Benchmark class (wraps kv-cache.py)
+- [ ] KV cache MPI execution across hosts
+- [ ] VectorDB Benchmark class (wraps load_vdb.py, compact_and_watch.py, simple_bench.py)
+- [ ] SSH-based host collection for non-MPI benchmarks
+- [ ] New training models: dlrm, retinanet, flux
+- [ ] Improved error messaging for missing commands/packages
+- [ ] Clear user guidance for resolving dependency issues
+- [ ] Time-series /proc/ collection (diskstats, vmstat, cpuinfo, etc.)
+- [ ] Parallel collection process (10 sec intervals) without impacting benchmark
+
+### Out of Scope
+
+- GPU support — deliberately not supporting GPU execution
+- Rewriting KV/VDB as native benchmarks — v3.0 wraps existing scripts
+- Real-time monitoring UI — collection only, no visualization
+- Cloud provider integrations — on-premise/bare-metal focus
+
+## Key Decisions
+
+| Decision | Rationale | Outcome |
+|----------|-----------|---------|
+| Lockfile for package versions | Reproducibility across systems, MPI version issues | Pending |
+| Benchmark subclasses for KV/VDB | Minimal integration, reuse CLI and reporting infrastructure | Pending |
+| SSH for non-MPI host collection | KV cache and VectorDB don't require MPI execution | Pending |
+| Parallel process for time-series | Must not impact benchmark performance | Pending |
+
+## Constraints
+
+- **No GPU dependencies** — storage benchmark, not compute
+- **MPI compatibility** — must work with various MPI implementations
+- **Cross-platform** — Linux primarily, various distributions
+- **Minimal dependencies** — reduce version conflict surface area
+
+## External Code References
+
+| Component | Location | Notes |
+|-----------|----------|-------|
+| KV cache benchmark | `kv_cache_benchmark/` (local) | Also: `mlcommons/storage/TF_KVCache` branch |
+| VectorDB benchmark | `mlcommons/storage/TF_VDBBench` branch | Scripts: load_vdb.py, compact_and_watch.py, simple_bench.py |
+| DLIO benchmark | External package | Upstream dependency for training/checkpointing |
+
+## Success Metrics
+
+- All 4 benchmark types (training, checkpointing, kv-cache, vectordb) runnable from unified CLI
+- Package lockfile prevents version conflicts in CI
+- Error messages guide users to resolution for common issues
+- Host data collected for all benchmark types (MPI or SSH)
+- Time-series collection runs without measurable benchmark impact
+
+---
+*Last updated: 2026-01-23 after initialization*
@@ -0,0 +1,92 @@
+# MLPerf Storage v3.0 Requirements
+
+## v1 Requirements
+
+### Package Management
+
+- [x] **PKG-01**: Lockfile for Python dependencies with pinned versions
+- [x] **PKG-02**: Remove GPU package dependencies from default install
+- [x] **PKG-03**: Validate package versions match lockfile before benchmark execution
+
+### Benchmark Integration
+
+- [x] **BENCH-01**: KVCacheBenchmark class extending Benchmark base (wraps kv-cache.py)
+- [x] **BENCH-02**: KV cache MPI execution across multiple hosts
+- [x] **BENCH-03**: VectorDBBenchmark class extending Benchmark base (wraps VDB scripts)
+- [x] **BENCH-04**: VectorDB CLI commands (run, datagen operations)
+- [x] **BENCH-05**: Integration with existing validation/reporting pipeline
+
+### Training Updates
+
+- [x] **TRAIN-01**: Add dlrm model configuration
+- [x] **TRAIN-02**: Add retinanet model configuration
+- [x] **TRAIN-03**: Add flux model configuration
+- [x] **TRAIN-04**: Update DLIO to support parquet for data loaders, readers, data generation
+- [x] **TRAIN-05**: Production-ready parquet reader with memory-efficient I/O
+- [x] **TRAIN-06**: Update pyproject.toml to reference DLIO fork
+
+### Host Collection
+
+- [x] **HOST-01**: SSH-based host collection for non-MPI benchmarks
+- [x] **HOST-02**: Collect /proc/ data (diskstats, vmstat, cpuinfo, filesystems, cgroups)
+- [x] **HOST-03**: Collection at benchmark start and end
+- [x] **HOST-04**: Time-series collection (10 sec intervals) during execution
+- [x] **HOST-05**: Parallel collection process without benchmark performance impact
+
+### Error Handling & UX
+
+- [x] **UX-01**: Detect missing commands/packages with actionable error messages
+- [x] **UX-02**: Suggest installation steps for missing dependencies
+- [x] **UX-03**: Validate environment before benchmark execution (fail-fast)
+- [x] **UX-04**: Clear progress indication during long operations
+
+---
+
+## v2 Requirements (Deferred)
+
+- [ ] Deeper KV cache integration (native implementation vs wrapper)
+- [ ] Deeper VectorDB integration (native implementation vs wrapper)
+- [ ] Real-time monitoring dashboard for time-series data
+- [ ] Cloud provider integrations (AWS, GCP, Azure)
+
+---
+
+## Out of Scope
+
+- **GPU support** — Storage benchmark, deliberately not supporting GPU execution
+- **Rewriting KV/VDB as native benchmarks** — v3.0 wraps existing scripts
+- **Real-time visualization** — Collection only, no visualization in v3.0
+- **Windows support** — Linux-only target
+
+---
+
+## Traceability
+
+| Requirement | Phase | Status |
+|-------------|-------|--------|
+| PKG-01 | Phase 1 | Complete |
+| PKG-02 | Phase 1 | Complete |
+| PKG-03 | Phase 1 | Complete |
+| UX-01 | Phase 2 | Complete |
+| UX-02 | Phase 2 | Complete |
+| UX-03 | Phase 2 | Complete |
+| BENCH-01 | Phase 3 | Complete |
+| BENCH-02 | Phase 3 | Complete |
+| BENCH-03 | Phase 4 | Complete |
+| BENCH-04 | Phase 4 | Complete |
+| BENCH-05 | Phase 5 | Complete |
+| HOST-01 | Phase 6 | Complete |
+| HOST-02 | Phase 6 | Complete |
+| HOST-03 | Phase 6 | Complete |
+| HOST-04 | Phase 7 | Complete |
+| HOST-05 | Phase 7 | Complete |
+| TRAIN-01 | Phase 8 | Complete |
+| TRAIN-02 | Phase 8 | Complete |
+| TRAIN-03 | Phase 8 | Complete |
+| TRAIN-04 | Phase 9 | Complete |
+| UX-04 | Phase 10 | Complete |
+| TRAIN-05 | Phase 11 | Complete |
+| TRAIN-06 | Phase 11 | Complete |
+
+---
+*Last updated: 2026-01-25*