⚡ HSPMN v3.0: Hybrid Sparse-Predictive Matter Network

Hey! Some Science Guy here. Welcome to HSPMN v3.0 - an LLM architecture built directly for the NVIDIA Blackwell (RTX 5090) GPU. We stop burning cycles on every single token and borrow a trick from the mammalian brain: route the predictable stuff fast, and save the heavy matrix math for the complex problems.

"The brain does not use the full neocortex to process a simple 'hello'. I just apply this exact same rule to my models."

🧠 Architecture Overview: The Hybrid Approach

Instead of imposing a monolithic $O(N^2)$ computational penalty on every token equally, HSPMN bifurcates the processing stream based on inherent token complexity. Here is the operational flow:

graph TD
    A[Input Stream] -->|Token Embeddings| B{ALF-LB Router}
    
    B -->|Predictable & Routine 80%| C["Reflexive Stream<br/>(Linear Attn + SwiGLU O(N))"]
    B -->|Complex Anomaly 20%| D["Contextual Stream<br/>(Full SQSK Attention O(N²))"]
    
    C --> E{Merge & Add}
    D --> E
    
    E --> F[Output to Next Block]

    style A fill:#2d3436,stroke:#fff,color:#fff
    style B fill:#e17055,stroke:#fff,color:#fff
    style C fill:#74b9ff,stroke:#fff,color:#fff,font-weight:bold
    style D fill:#a29bfe,stroke:#fff,color:#fff,font-weight:bold
    style E fill:#00b894,stroke:#fff,color:#fff
    style F fill:#2d3436,stroke:#fff,color:#fff

🌍 Real-World Example: Scalable Cyber-Security Logging

Imagine processing hundreds of thousands of routine firewall logs per second. A standard transformer immediately allocates massive VRAM tensors attempting to map relations between routine [INFO] pings until it crashes with an Out-of-Memory (OOM) error.

HSPMN sidesteps this limit gracefully: background [INFO] logs are compressed linearly (near-zero memory footprint). But the exact millisecond a rogue [SQL_INJECTION_ATTACK] is parsed, the hardware router mathematically snaps the anomaly into the heavy Contextual Stream for deep, focused reasoning. Result: You get massive, sustained sequence parsing where routine data incurs a flat O(1) VRAM cost, geometrically expanding your effective context window without exploding memory.

🎯 Key Technical Features (No Fluff)

🏎️ Hybrid Execution: FlexAttention for training + Custom Triton kernels for inference.
📉 Hardware Sparsity: Custom Triton kernels built ground-up for Blackwell architectures.
🧠 328k Context Window: Tested on the RTX 5090 using just 30.24 GB VRAM via True Sparse KV Cache.
⚡ Silly Fast: 1.33M tokens/sec at BF16 precision.
🎲 ALF-LB Routing: A bias-based routing method without that annoying gradient/Gumbel noise.
⚖️ Dual Entropy Loss: Forces strict 0 or 1 token choices while keeping the hardware load totally even across batches.
🚫 Zero Graph Breaks: Native static routing (torch.topk) so torch.compile(fullgraph=True) actually does its job.
📦 CUDAGraphs Compatible: Sparsity targets stored as core Python floats (no .item() sync!). Captured neatly in precisely 2 partitions.

🚀 Performance Bracket (RTX 5090)

Metric	Value	Notes
Throughput	1,329,516 tok/s	Batch=64, Seq=4096, Dim=2048
VRAM (throughput)	12.28 GB	CUDAGraphs 2 partitions
Max Context	335,872 tokens	Batch=1, Dim=2048 (30.24 GB VRAM)
Latency	197.17 ms avg	Full forward pass (P95: 197.81 ms)
Training Speed	~980k tok/s	Real training speed using FlexAttention

📂 Repository Structure & Architecture

The codebase is strictly modularized into core architectural models, hardware-accelerated execution pipelines, and rigorous validation suites. This ensures a clean separation between the mathematical framework and its runtime components.

Here is the high-level topology of the repository:

graph LR
    Root["📁 HSPMN-v3"]

    subgraph Core ["🧠 Core Architecture"]
        A1["🤖 hspmn_v3_0.py<br/><small>Main Architecture</small>"]
        A2["🤗 hspmn_hf_wrapper.py<br/><small>HuggingFace Wrap</small>"]
        A3["⚙️ kernels_v3_0.py<br/><small>Triton Magic!</small>"]
    end

    subgraph Runners ["🏃‍♂️ Execution & Training"]
        B1["🏎️ benchmark_v3_0.py<br/><small>Go Fast</small>"]
        B2["🏋️ train_v3_0.py<br/><small>Get Smart</small>"]
        B3["🛠️ utils_v3_0.py<br/><small>Helper Logic</small>"]
    end

    subgraph Testing ["🧪 Validation & Tests"]
        C1["🧪 test_v3_0.py<br/><small>Unit Tests</small>"]
        C2["⚡ test_kernels_v3_0.py"]
        C3["🪡 needle_test.py<br/><small>Context Check</small>"]
        C4["✅ verify_models.py"]
    end

    subgraph Docs ["📚 Documentation & Config"]
        D1["📖 README.md<br/><small>You are here</small>"]
        D2["📜 HSPMN_v3_0.tex & .pdf<br/><small>Architecture Paper</small>"]
        D3["📦 requirements.txt"]
        D4["⚖️ LICENSE"]
    end

    Root --> Core
    Root --> Runners
    Root --> Testing
    Root --> Docs

    %% Colors optimized for dark mode (white/bright text, saturated dark backgrounds)
    style Root fill:#d63031,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold
    
    style A1 fill:#0984e3,stroke:#fff,color:#fff
    style A2 fill:#0984e3,stroke:#fff,color:#fff
    style A3 fill:#0984e3,stroke:#fff,color:#fff
    
    style B1 fill:#00b894,stroke:#fff,color:#fff
    style B2 fill:#00b894,stroke:#fff,color:#fff
    style B3 fill:#00b894,stroke:#fff,color:#fff
    
    style C1 fill:#e17055,stroke:#fff,color:#fff,font-weight:bold
    style C2 fill:#e17055,stroke:#fff,color:#fff,font-weight:bold
    style C3 fill:#e17055,stroke:#fff,color:#fff,font-weight:bold
    style C4 fill:#e17055,stroke:#fff,color:#fff,font-weight:bold

    style D1 fill:#6c5ce7,stroke:#fff,color:#fff
    style D2 fill:#6c5ce7,stroke:#fff,color:#fff
    style D3 fill:#6c5ce7,stroke:#fff,color:#fff
    style D4 fill:#6c5ce7,stroke:#fff,color:#fff
    
    classDef default font-family:sans-serif,font-size:14px;
    classDef title font-weight:bold,color:#fff;

🛠️ Blackwell (sm_120) Hackery & Fixes

Running things on bleeding-edge tech like the NVIDIA GB202 (RTX 5090) isn't without quirks. Here's what I fixed under the hood:

TF32 Math Errors: PyTorch defaults to TF32, which broke our router sigmoid gate math due to precision. Forced FP32 via set_float32_matmul_precision('highest'). Boom. Sorted.
Quantization Noise Gate: Fast MXFP8 math was bleeding noise. I added a < 0.05 hard floor to protect the routing logic.
SiLU NaN Errors: Deep padding into Blackwell SiLU kernels crashed them. Fixed with a good old clamp and nan_to_num.
TMA Stride Protection: Replaced tl.load with tl.make_block_ptr to stop massive L2 Cache misses dead in their tracks.
CUDAGraphs .item() Fix: Gutted tensor.item() from the router forward path. CUDAGraphs now captures properly since sparsity targets are standard _sparsity_float.

📦 Getting Started (Installation)

Prerequisites: NVIDIA Driver 570+, CUDA 12.8+, Python 3.10+, PyTorch 2.10+ (nightly)

Pro-tip for reproducible benchmarks (OS tuning):

# GPU: persistence + power limit
sudo nvidia-smi -pm 1 && sudo nvidia-smi -pl 500
# CPU: performance governor + boost
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
echo 1 | sudo tee /sys/devices/system/cpu/cpuboost
# Memory: reduce OS jitter
sudo sysctl vm.swappiness=10

# Clone the repository
git clone https://github.com/NetBr3ak/HSPMN.git
cd HSPMN

# Install dependencies
pip install -r requirements.txt

🎓 Quick Start Scripts

1. Let it Rip (Benchmarks)

Test how fast your rig really is:

python benchmark_v3_0.py --mode all

2. Standard Inference

For direct integration or testing the core block programmatically:

import torch
from hspmn_v3_0 import HSPMNBlock
from utils_v3_0 import HSPMNConfig

# Initialize configuration
config = HSPMNConfig(dim=2048, num_heads=16, num_kv_heads=4, sparsity_k=0.2)
model = HSPMNBlock(config).cuda().bfloat16()

# Compile the model
model = torch.compile(model, mode="max-autotune", fullgraph=True)

# Process a dummy sequence
x = torch.randn(1, 4096, 2048).cuda().bfloat16()
output, aux_loss, kv_cache = model(x)
print(f"Output shape: {output.shape}")

3. Spin Up Training

python train_v3_0.py \
    --batch 32 \
    --seq_len 4096 \
    --dim 2048 \
    --steps 1000 \
    --grad_accum 4 \
    --wandb "hspmn-experiment-1"

4. Integrity Testing

Tear it down to see if it breaks:

python test_kernels_v3_0.py
python verify_models.py

Source-available for portfolio viewing only. Commercial use, unauthorized modification, reproduction, or distribution is strictly prohibited. But feel free to look around!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ HSPMN v3.0: Hybrid Sparse-Predictive Matter Network

🧠 Architecture Overview: The Hybrid Approach

🌍 Real-World Example: Scalable Cyber-Security Logging

🎯 Key Technical Features (No Fluff)

🚀 Performance Bracket (RTX 5090)

📂 Repository Structure & Architecture

🛠️ Blackwell (sm_120) Hackery & Fixes

📦 Getting Started (Installation)

🎓 Quick Start Scripts

1. Let it Rip (Benchmarks)

2. Standard Inference

3. Spin Up Training

4. Integrity Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
.gitignore		.gitignore
CITATION.cff		CITATION.cff
HSPMN_v3_0.pdf		HSPMN_v3_0.pdf
HSPMN_v3_0.tex		HSPMN_v3_0.tex
LICENSE		LICENSE
README.md		README.md
benchmark_v3_0.py		benchmark_v3_0.py
hspmn_hf_wrapper.py		hspmn_hf_wrapper.py
hspmn_v3_0.py		hspmn_v3_0.py
kernels_v3_0.py		kernels_v3_0.py
needle_test.py		needle_test.py
requirements.txt		requirements.txt
test_kernels_v3_0.py		test_kernels_v3_0.py
test_v3_0.py		test_v3_0.py
train_v3_0.py		train_v3_0.py
utils_v3_0.py		utils_v3_0.py
verify_models.py		verify_models.py

Folders and files

Latest commit

History

Repository files navigation

⚡ HSPMN v3.0: Hybrid Sparse-Predictive Matter Network

🧠 Architecture Overview: The Hybrid Approach

🌍 Real-World Example: Scalable Cyber-Security Logging

🎯 Key Technical Features (No Fluff)

🚀 Performance Bracket (RTX 5090)

📂 Repository Structure & Architecture

🛠️ Blackwell (sm_120) Hackery & Fixes

📦 Getting Started (Installation)

🎓 Quick Start Scripts

1. Let it Rip (Benchmarks)

2. Standard Inference

3. Spin Up Training

4. Integrity Testing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages