OCR

Local OCR powered by LightOnOCR-2-1B, a vision-language model that extracts text from images and PDFs. Runs entirely on-device using MLX (Apple Silicon) or Transformers.

Features

Single OCR — upload an image or PDF and extract text, with an optional prompt to guide extraction.
Batch OCR — process multiple images/PDFs at once; produces per-page text files, merged per-document files, and a downloadable .zip archive.
PDF support — pages are rendered at 200 DPI via pypdfium2 and resized to fit model input.
Gradio UI — web interface for both single and batch modes.

Requirements

Python 3.12+
uv (recommended) or pip
Apple Silicon Mac (for MLX backend) or CUDA GPU (for Transformers backend)

Setup

uv sync

Usage

Gradio app (auto backend)

uv run app/gradio_app.py

Opens a web UI with Single OCR and Batch OCR tabs. Backend selection is automatic:

mlx on Apple MPS (Metal)
transformers on CUDA (and CPU fallback when CUDA is unavailable)

CLI scripts

Run OCR on data/quaderno.jpeg directly from the command line:

# MLX backend (Apple Silicon)
uv run scripts/run_mlx.py

# Transformers backend (MPS / CUDA / CPU)
uv run scripts/run_transformers.py

Modal job queue script

The project also includes a Modal-ready OCR job worker powered by the same LightOnOCR-2-1B model:

# run locally against a remote Modal function
modal run scripts/doc_ocr_jobs.py

# deploy the worker app
modal deploy scripts/doc_ocr_jobs.py

scripts/doc_ocr_jobs.py exposes parse_document, which accepts document bytes (PDF or image) and returns OCR output in markdown, html, json, or chunks format.

Project structure

app/
  gradio_app.py   # Gradio web UI
  ocr_core.py     # Model loading, single & batch OCR logic
  pdf_utils.py    # PDF rendering and image resizing
scripts/
  run_mlx.py      # Minimal MLX inference script
  run_transformers.py  # Minimal Transformers inference script
  doc_ocr_jobs.py # Modal OCR worker using LightOnOCR
data/
  quaderno.jpeg   # Example image

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
data		data
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR

Features

Requirements

Setup

Usage

Gradio app (auto backend)

CLI scripts

Modal job queue script

Project structure

About

Uh oh!

Releases

Packages

Languages

tatop/ocr

Folders and files

Latest commit

History

Repository files navigation

OCR

Features

Requirements

Setup

Usage

Gradio app (auto backend)

CLI scripts

Modal job queue script

Project structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages