LessUp

Follow

Lessup LessUp

Follow

16 followers · 79 following

shenzhen
19:56 (UTC +08:00)

Achievements

Achievements

LessUp/README.md

LessUp · AI Infrastructure & HPC Developer

Building efficient AI infrastructure, high-performance systems, and data pipelines.
专注于 AI 基础设施与高性能计算的工程实践。

⚡ Focus: CUDA Kernel Optimization · LLM Inference · GPU Computing
🌱 Currently exploring: AI inference acceleration & large-scale pipeline orchestration
📍 Open to: Research collaboration & open-source co-building

About · Projects · Experience · Tech Stack · Stats · Contact

👨‍💻 About Me / 关于我

I build AI infrastructure and high-performance systems with C++/CUDA, Python, and Go.
主要关注 AI 基础设施、GPU 算子优化与高性能计算等方向的工程实践。

GPU Kernel Engineering: CUDA/Triton kernel optimization, FlashAttention, GEMM, quantization / GPU 算子优化
AI Inference Systems: LLM inference engines, model quantization (W8A16/FP8), KV Cache / AI 推理系统
High-Performance Computing: N-body simulation, ray tracing, image processing pipelines / 高性能计算
Real-time Systems: WebRTC signaling, real-time detection, digital human platform / 实时系统

🚀 Projects / 项目全景

⚡ GPU Kernel Optimization / GPU 算子优化

TensorCraft-HPC

Modern C++17/CUDA AI kernel library — Elementwise, GEMM, FlashAttention, Conv2D, SpMV, FP8 quantization

SGEMM Optimization

From naive 3-loop to Tensor Core — progressive SGEMM optimization reaching 40% cuBLAS

Triton Fused Ops

RMSNorm+RoPE fusion, Gated MLP fusion, FP8 GEMM with auto-tuning for Transformers

LLM-Speed

CUDA LLM kernel library — FlashAttention (online softmax), FP16/INT8 GEMM with Tensor Core

🧠 AI Inference Engines / AI 推理引擎

Tiny-LLM

Lightweight LLM inference engine — W8A16 quantization, KV Cache, multi-sampling strategies

Mini Inference Engine

7-level GEMM optimization (Naive→Tensor Core), reaching 72% cuBLAS with MNIST demo

Tiny-DL-Inference

WebGPU micro inference engine — Conv2d, kernel fusion, Im2Col, MNIST classification

YOLO-Toys

Multi-model real-time vision — YOLO/DETR/OWL-ViT/BLIP with WebSocket streaming

🎮 GPU Computing & Simulation / GPU 计算与仿真

CUDA Ray Tracer

Pure CUDA ray tracer — Phong shading, path tracing, BVH acceleration, warp divergence optimization

N-Body Simulation

Million-particle GPU simulation — Direct N², Barnes-Hut, Spatial Hash with CUDA-OpenGL interop

Particle Fluid Sim

10K-particle real-time fluid simulation using WebGPU compute shaders with trail effects

Mini-OpenCV

CUDA image processing library — convolution, morphology, geometric transforms, pipeline processing

Mini-ImagePipe

DAG-based heterogeneous image pipeline — multi-stream scheduling, pinned memory pool

🌐 Applications / 应用项目

MetaHuman

3D digital human platform — Three.js rendering, voice interaction, behavior control, emotion FSM

WebRTC

Minimal WebRTC demo — Go WebSocket signaling, room management, peer-to-peer media

Note Sync Now

E2E encrypted note sync — AES-256, 12-word mnemonic, real-time collaboration via WebSocket

Mind Gym

Browser-based memory training — N-back, spaced reinforcement, adaptive difficulty, PWA

🎓 Education & Experience / 教育与经历

🎓 Education

Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray · ZEGO · BGI

Medical imaging, RTC & Genomic data. / 医疗、音视频与基因数据工程

🛠️ Tech Stack / 技术栈

Category	Technologies
Languages
AI & HPC	CUDA · Triton · cuBLAS · Tensor Core · WebGPU
System & DevOps
Web & Frontend

📊 GitHub Stats / 数据概览

Show stats / 展开数据

📫 Connect with me / 联系方式

Feel free to reach out for collaboration, technical discussions, or open-source ideas.

欢迎通过邮箱与我交流技术想法、合作机会或开源项目。

Pinned Loading

bookmarks-cleaner bookmarks-cleaner Public

🧹✨智能书签清理与去重工具，支持规则过滤、一键整理浏览器书签。

Python 2
awesome-cursorrules-zh awesome-cursorrules-zh Public

💻✨专为中文开发者优化的 Cursor AI 编程规则集合

Python 111 15
meta-human meta-human Public

🧑‍🚀 集成 3D 建模、语音交互与行为控制的数字人平台

TypeScript 7 1
webrtc webrtc Public

📡 WebRTC 信令与示例服务仓库，聚焦连接协商、房间管理与实时通信流程实践。

JavaScript 1
yolo-toys yolo-toys Public

🎯 基于 FastAPI 与 YOLOv8 的实时视频流目标检测项目

Python 1