Skip to content
/ ocr Public

Simple OCR implementation in python with a web ui

Notifications You must be signed in to change notification settings

tatop/ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR

Local OCR powered by LightOnOCR-2-1B, a vision-language model that extracts text from images and PDFs. Runs entirely on-device using MLX (Apple Silicon) or Transformers.

Features

  • Single OCR — upload an image or PDF and extract text, with an optional prompt to guide extraction.
  • Batch OCR — process multiple images/PDFs at once; produces per-page text files, merged per-document files, and a downloadable .zip archive.
  • PDF support — pages are rendered at 200 DPI via pypdfium2 and resized to fit model input.
  • Gradio UI — web interface for both single and batch modes.

Requirements

  • Python 3.12+
  • uv (recommended) or pip
  • Apple Silicon Mac (for MLX backend) or CUDA GPU (for Transformers backend)

Setup

uv sync

Usage

Gradio app (auto backend)

uv run app/gradio_app.py

Opens a web UI with Single OCR and Batch OCR tabs. Backend selection is automatic:

  • mlx on Apple MPS (Metal)
  • transformers on CUDA (and CPU fallback when CUDA is unavailable)

CLI scripts

Run OCR on data/quaderno.jpeg directly from the command line:

# MLX backend (Apple Silicon)
uv run scripts/run_mlx.py

# Transformers backend (MPS / CUDA / CPU)
uv run scripts/run_transformers.py

Modal job queue script

The project also includes a Modal-ready OCR job worker powered by the same LightOnOCR-2-1B model:

# run locally against a remote Modal function
modal run scripts/doc_ocr_jobs.py

# deploy the worker app
modal deploy scripts/doc_ocr_jobs.py

scripts/doc_ocr_jobs.py exposes parse_document, which accepts document bytes (PDF or image) and returns OCR output in markdown, html, json, or chunks format.

Project structure

app/
  gradio_app.py   # Gradio web UI
  ocr_core.py     # Model loading, single & batch OCR logic
  pdf_utils.py    # PDF rendering and image resizing
scripts/
  run_mlx.py      # Minimal MLX inference script
  run_transformers.py  # Minimal Transformers inference script
  doc_ocr_jobs.py # Modal OCR worker using LightOnOCR
data/
  quaderno.jpeg   # Example image

About

Simple OCR implementation in python with a web ui

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages