Skip to content

feat(ocr): tree-like structure with text levels#44

Merged
martsokha merged 8 commits intomainfrom
feature/tree-ocr
Mar 9, 2026
Merged

feat(ocr): tree-like structure with text levels#44
martsokha merged 8 commits intomainfrom
feature/tree-ocr

Conversation

@martsokha
Copy link
Member

No description provided.

martsokha and others added 7 commits March 8, 2026 07:51
…/Word tree

Preserves hierarchical structure from each provider's API instead of
discarding it during conversion. Adds BoundingBox::enclosing() to
nvisy-core and updates all six provider backends to build the tree.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete the mindee_doctr backend, params, and module. Update OcrProvider
enum, doc examples, and README to reflect five remaining providers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also renames to_u32() to to_pixel() for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use the upstream `page` and `image_bbox` fields instead of deriving
page number from enumeration index and leaving dimensions empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the flat DocumentType enum with nested format-specific enums:
ImageFormat, WordFormat, PresentationFormat, SpreadsheetFormat,
AudioFormat, and TextFormat. Pdf and Html remain standalone variants.

Each sub-enum owns its own from_mime/mime_type methods, keeping
DocumentType::from_mime as a concise chain of delegates. Remove
Archive from ContentKind. Unify nvisy-ocr's ImageFormat with the
new core ImageFormat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove turbofish type parameters from Client::builder() calls to
match the new builder API where the HTTP client type is set via
.http_client() rather than as a generic parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha self-assigned this Mar 9, 2026
@martsokha martsokha changed the title Feature/tree ocr feat(ocr): tree-like structure with text levels Mar 9, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@martsokha martsokha merged commit 4d868e9 into main Mar 9, 2026
5 checks passed
@martsokha martsokha deleted the feature/tree-ocr branch March 9, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant