Local OCR powered by LightOnOCR-2-1B, a vision-language model that extracts text from images and PDFs. Runs entirely on-device using MLX (Apple Silicon) or Transformers.
- Single OCR — upload an image or PDF and extract text, with an optional prompt to guide extraction.
- Batch OCR — process multiple images/PDFs at once; produces per-page text files, merged per-document files, and a downloadable
.ziparchive. - PDF support — pages are rendered at 200 DPI via
pypdfium2and resized to fit model input. - Gradio UI — web interface for both single and batch modes.
- Python 3.12+
- uv (recommended) or pip
- Apple Silicon Mac (for MLX backend) or CUDA GPU (for Transformers backend)
uv syncuv run app/gradio_app.pyOpens a web UI with Single OCR and Batch OCR tabs. Backend selection is automatic:
mlxon Apple MPS (Metal)transformerson CUDA (and CPU fallback when CUDA is unavailable)
Run OCR on data/quaderno.jpeg directly from the command line:
# MLX backend (Apple Silicon)
uv run scripts/run_mlx.py
# Transformers backend (MPS / CUDA / CPU)
uv run scripts/run_transformers.pyThe project also includes a Modal-ready OCR job worker powered by the same
LightOnOCR-2-1B model:
# run locally against a remote Modal function
modal run scripts/doc_ocr_jobs.py
# deploy the worker app
modal deploy scripts/doc_ocr_jobs.pyscripts/doc_ocr_jobs.py exposes parse_document, which accepts document bytes
(PDF or image) and returns OCR output in markdown, html, json, or
chunks format.
app/
gradio_app.py # Gradio web UI
ocr_core.py # Model loading, single & batch OCR logic
pdf_utils.py # PDF rendering and image resizing
scripts/
run_mlx.py # Minimal MLX inference script
run_transformers.py # Minimal Transformers inference script
doc_ocr_jobs.py # Modal OCR worker using LightOnOCR
data/
quaderno.jpeg # Example image