Skip to content
View LessUp's full-sized avatar
  • shenzhen
  • 19:56 (UTC +08:00)

Block or report LessUp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
LessUp/README.md
Header

LessUp · AI Infrastructure & HPC Developer

Building efficient AI infrastructure, high-performance systems, and data pipelines.
专注于 AI 基础设施与高性能计算的工程实践。


⚡ Focus: CUDA Kernel Optimization · LLM Inference · GPU Computing
🌱 Currently exploring: AI inference acceleration & large-scale pipeline orchestration
📍 Open to: Research collaboration & open-source co-building


Followers   Stars   Views   Visitors

About · Projects · Experience · Tech Stack · Stats · Contact



👨‍💻 About Me / 关于我

I build AI infrastructure and high-performance systems with C++/CUDA, Python, and Go.
主要关注 AI 基础设施、GPU 算子优化与高性能计算等方向的工程实践。

  • GPU Kernel Engineering: CUDA/Triton kernel optimization, FlashAttention, GEMM, quantization / GPU 算子优化
  • AI Inference Systems: LLM inference engines, model quantization (W8A16/FP8), KV Cache / AI 推理系统
  • High-Performance Computing: N-body simulation, ray tracing, image processing pipelines / 高性能计算
  • Real-time Systems: WebRTC signaling, real-time detection, digital human platform / 实时系统

🚀 Projects / 项目全景

⚡ GPU Kernel Optimization / GPU 算子优化

Modern C++17/CUDA AI kernel library — Elementwise, GEMM, FlashAttention, Conv2D, SpMV, FP8 quantization

C++17 CUDA Tensor Core

From naive 3-loop to Tensor Core — progressive SGEMM optimization reaching 40% cuBLAS

CUDA WMMA Roofline

RMSNorm+RoPE fusion, Gated MLP fusion, FP8 GEMM with auto-tuning for Transformers

Triton FP8 Python

CUDA LLM kernel library — FlashAttention (online softmax), FP16/INT8 GEMM with Tensor Core

CUDA PyTorch FlashAttention

🧠 AI Inference Engines / AI 推理引擎

Lightweight LLM inference engine — W8A16 quantization, KV Cache, multi-sampling strategies

CUDA C++17 INT8

7-level GEMM optimization (Naive→Tensor Core), reaching 72% cuBLAS with MNIST demo

CUDA C++17 FP16

WebGPU micro inference engine — Conv2d, kernel fusion, Im2Col, MNIST classification

WebGPU TypeScript WGSL

Multi-model real-time vision — YOLO/DETR/OWL-ViT/BLIP with WebSocket streaming

FastAPI YOLOv8 Docker

🎮 GPU Computing & Simulation / GPU 计算与仿真

Pure CUDA ray tracer — Phong shading, path tracing, BVH acceleration, warp divergence optimization

CUDA Path Tracing BVH

Million-particle GPU simulation — Direct N², Barnes-Hut, Spatial Hash with CUDA-OpenGL interop

CUDA OpenGL Barnes-Hut

10K-particle real-time fluid simulation using WebGPU compute shaders with trail effects

WebGPU TypeScript WGSL

CUDA image processing library — convolution, morphology, geometric transforms, pipeline processing

CUDA C++17 Image Processing

DAG-based heterogeneous image pipeline — multi-stream scheduling, pinned memory pool

CUDA C++17 DAG

🌐 Applications / 应用项目

3D digital human platform — Three.js rendering, voice interaction, behavior control, emotion FSM

React Three.js TypeScript

Minimal WebRTC demo — Go WebSocket signaling, room management, peer-to-peer media

Go WebRTC Docker

E2E encrypted note sync — AES-256, 12-word mnemonic, real-time collaboration via WebSocket

React Express Socket.IO

Browser-based memory training — N-back, spaced reinforcement, adaptive difficulty, PWA

JavaScript Tailwind PWA


🎓 Education & Experience / 教育与经历

🎓 Education

Xidian University Xidian University

Background in communications engineering. / 通信与信息工程相关背景

💼 Experience

Mindray Mindray · ZEGO ZEGO · BGI BGI

Medical imaging, RTC & Genomic data. / 医疗、音视频与基因数据工程


🛠️ Tech Stack / 技术栈

Category Technologies
Languages Languages
AI & HPC AI   CUDA · Triton · cuBLAS · Tensor Core · WebGPU
System & DevOps System
Web & Frontend Web

📊 GitHub Stats / 数据概览

Show stats / 展开数据

LessUp's GitHub stats Top Languages

GitHub Streak

GitHub Activity Graph


📫 Connect with me / 联系方式

Feel free to reach out for collaboration, technical discussions, or open-source ideas.

欢迎通过邮箱与我交流技术想法、合作机会或开源项目。

Email   GitHub

Footer

Pinned Loading

  1. bookmarks-cleaner bookmarks-cleaner Public

    🧹✨智能书签清理与去重工具,支持规则过滤、一键整理浏览器书签。

    Python 2

  2. awesome-cursorrules-zh awesome-cursorrules-zh Public

    💻✨专为中文开发者优化的 Cursor AI 编程规则集合

    Python 111 15

  3. meta-human meta-human Public

    🧑‍🚀 集成 3D 建模、语音交互与行为控制的数字人平台

    TypeScript 7 1

  4. webrtc webrtc Public

    📡 WebRTC 信令与示例服务仓库,聚焦连接协商、房间管理与实时通信流程实践。

    JavaScript 1

  5. yolo-toys yolo-toys Public

    🎯 基于 FastAPI 与 YOLOv8 的实时视频流目标检测项目

    Python 1