Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
39246aa
Merge pull request #214 from hazemawadalla/TF_KVCache
FileSystemGuy Nov 25, 2025
92d5e89
feat: Replace legacy spillover logic with Waterfall LRU architecture …
hazemawadalla Dec 22, 2025
d1fc97a
feat(kv-cache): MLPerf v3.0 compliance and configuration overhaul
hazemawadalla Jan 27, 2026
d9715bc
feat(wrapper): config integration and workload automation
hazemawadalla Jan 27, 2026
001fd3b
test(kv-cache): comprehensive pytest suite for v3.0 features
hazemawadalla Jan 27, 2026
2956288
docs(readme): comprehensive documentation for v3.0
hazemawadalla Jan 27, 2026
166f2b2
test(results): add pytest HTML test report
hazemawadalla Jan 27, 2026
99b42f0
feat(xlsx): extended metrics export for v3.0
hazemawadalla Jan 27, 2026
1bfe885
deps(requirements): add pyyaml for config support
hazemawadalla Jan 27, 2026
8a6aa50
config: add default YAML configuration file
hazemawadalla Jan 27, 2026
3db89bd
Refactor monolithic kv-cache.py into modular kv_cache/ package
hazemawadalla Feb 10, 2026
e38cfe9
Fix DeepSeek-V3 MLA values in README, move validate.sh to utils/
hazemawadalla Feb 10, 2026
f4c10a2
docs: fix decode_batch_size shown as hardcoded in proposal
hazemawadalla Feb 10, 2026
f7ecca1
docs: clarify eviction mechanisms in proposal
hazemawadalla Feb 10, 2026
0bf572b
Merge hazem/modular-refactor into TF_KVCache with conflict resolution
FileSystemGuy Feb 18, 2026
059c494
Merge pull request #244 from mlcommons/feature/hazem-refactor-merge
dslik Feb 18, 2026
1f6fbca
feat: Add s3dlio integration for MLPerf Storage with s3torchconnector…
Feb 7, 2026
95d1396
feat: Add multi-library S3 storage support (s3torchconnector, minio, …
Feb 13, 2026
cfa584a
refactor: Organize integration tests into tests/integration/
Feb 16, 2026
366904a
docs: Add branch strategy and PR management infrastructure
Feb 16, 2026
2b8cf25
feat: Integrate dgen-py for 155x faster checkpoint data generation
Feb 16, 2026
79a9849
feat: Add StreamingCheckpointing implementation for producer-consumer…
Feb 17, 2026
0271bc3
feat: Add multi-library streaming checkpoint support
Feb 19, 2026
af8e3fd
test: Add comprehensive streaming checkpoint tests and demos
Feb 19, 2026
b5eb1fe
docs: Consolidate and enhance documentation
Feb 19, 2026
1f818c9
security: Remove hardcoded credentials and internal IPs from test files
Feb 19, 2026
afb6f1f
docs: Remove unnecessary TWO_PR_WORKFLOW.md
Feb 19, 2026
d7e73fe
Point to russfellows/dlio_benchmark fork for integrated setup
Feb 19, 2026
923df54
docs: Clean up outdated documentation and remove azstoragetorch refer…
Feb 19, 2026
ac1a07f
refactor: Remove all azstoragetorch references from codebase
Feb 19, 2026
0eee558
deps: Update s3dlio requirement to version 0.9.50
Feb 19, 2026
0c5a0a4
refactor: Remove dlio_benchmark from git tracking
Feb 19, 2026
ce62e8b
Add required dependencies and remove native Azure backend
Feb 19, 2026
690e6b8
Merged TF_KVCache into main, accepting all incoming changes for kv_ca…
Feb 25, 2026
ac5970c
feat: add --io-trace-log trace mode with tensor-parallel / multi-GPU …
Feb 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Virtual Environments
.venv/
venv/
ENV/
env/
.env
env-*

# uv
.uv/
uv.lock

# IDEs
.vscode/
.idea/
*.swp
*.swo
*~

# Testing
.pytest_cache/
.coverage
htmlcov/
.tox/

# DLIO outputs
hydra_out/
results/
*.log
*.history

# MLPerf Storage outputs
results_dir/
mlperf.history

# Temporary files
*.tmp
.tmp/
*.bak
*.backup
*.OLD_*/

# OS
.DS_Store
Thumbs.db

# Test artifacts
hydra_log/
minio_test/
Test-Backup/

# Dependencies (installed via pip)
dlio_benchmark/
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ MLPerf® Storage is a benchmark suite to characterize the performance of storage
- [Overview](#overview)
- [Prerequisite](#prerequisite)
- [Installation](#installation)
- [Testing and Demos](#testing-and-demos)
- [Configuration](#configuration)
- [Workloads](#workloads)
- [U-Net3D](#u-net3d)
Expand Down Expand Up @@ -76,6 +77,24 @@ The working directory structure is as follows

The benchmark simulation will be performed through the [dlio_benchmark](https://github.com/argonne-lcf/dlio_benchmark) code, a benchmark suite for emulating I/O patterns for deep learning workloads. [dlio_benchmark](https://github.com/argonne-lcf/dlio_benchmark) is listed as a prerequisite to a specific git branch. A future release will update the installer to pull DLIO from PyPi. The DLIO configuration of each workload is specified through a yaml file. You can see the configs of all MLPerf Storage workloads in the `configs` folder.

## Testing and Demos

The `tests/` directory contains validation scripts and demonstrations of new features:

### Quick Demos

- **StreamingCheckpointing Demo**: Run `./tests/scripts/demo_streaming_checkpoint.sh` to see:
- dgen-py integration (155x faster data generation)
- StreamingCheckpointing (192x memory reduction)
- Comparison of old vs new checkpoint methods

- **Backend Validation**: Test multi-library support:
```bash
python tests/checkpointing/test_streaming_backends.py --backends s3dlio minio
```

See [tests/README.md](tests/README.md) for complete documentation of all test scripts and demos.

## Operation
The benchmarks uses nested commands to select the workload category, workload, and workload parameters.

Expand Down
Loading
Loading