End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
-
Updated
Jan 8, 2026 - Python
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
A survey of modern quantization formats (e.g., MXFP8, NVFP4) and inference optimization tools (e.g., TorchAO, GemLite), illustrated through the example of Llama-3.1 inference.
Deploy AI models with an API through quantization and containerization.
This repository contains code for benchmarking ModernBERT, RoBERTa, and OPT-350m on multi-class emotion classification using 8-bit quantization, backbone freezing, and LoRA-based PEFT.
Add a description, image, and links to the torchao topic page so that developers can more easily learn about it.
To associate your repository with the torchao topic, visit your repo's landing page and select "manage topics."