ML Architecture Researcher · Non-Transformer Models · Systematic Learning
Researching non-transformer architectures through systematic implementation.
Not just reading papers — building Mamba, RWKV, Flash Attention from scratch.
Questioning the assumption that transformers are the only viable solution
-
State-Space Models → Mamba & RWKV implementations
- Linear-time inference vs quadratic attention complexity
- Selective state mechanisms for efficient memory
- Comparing trade-offs: speed vs expressiveness
-
Hybrid Architectures → RNN + Attention combinations
- Exploring best of both worlds: recurrence + selectivity
- Custom memory systems for long-context tasks
- Implementation-first approach to understanding
-
Flash Attention v2 → From-scratch CUDA optimization
- Understanding memory-efficient attention at kernel level
- Production inference optimization
- 10x speedup through proper memory access patterns
|
Open-source alternative architecture research Implementation-first approach to understanding non-transformer models. Building Mamba, RWKV, and hybrid systems from scratch to compare trade-offs.
Current Phase: |
Structured habit & progress management SQLite-based system for breaking vibe coding habits. Daily plans, evening reports, focus zone management, and streak tracking with AI coach integration.
Features: |
|
Distributed P2P File Sync (Archived) Autonomous peer-to-peer synchronization with ML-based anomaly detection, delta-sync algorithms, and genetic topology remeshing for fault tolerance.
Key Features: |
Implementation Notes & Architecture Analysis Documenting learning journey: from-scratch implementations, architecture comparisons, and trade-off analysis. Focus on practical insights over theory.
Planned Topics: |
|
|
|
Open to research collaboration & technical discussions
Interested in alternative architectures, efficient inference, or systematic ML learning?
What I'm looking for:
• Co-researchers on non-transformer architectures
• Code review & implementation feedback
• Trade-off discussions: speed vs accuracy vs memory
What I'm not interested in:
❌ Wrapper apps without novel architecture
❌ "Just use ChatGPT API" projects
❌ Hype-driven development
"Implementation over theory. Trade-offs over hype. Systematic learning over vibe coding."
Star repos if you find them useful | Building in public


