Skip to content

Conversation

@sneakybatman
Copy link
Contributor

Summary

This PR adds support for configurable word-level confidence score aggregation methods in text recognition models. Previously, models used either arithmetic mean or minimum for aggregating character-level confidence scores into word-level confidence, with no way for users to customize this behavior.

Motivation

Different use cases may require different confidence aggregation strategies:

  • Arithmetic mean: Good general-purpose default, balances all character confidences
  • Geometric mean: More sensitive to low confidence characters, useful when any low confidence should significantly impact the word score
  • Harmonic mean: Even more conservative, heavily penalizes low confidence characters
  • Minimum: Most conservative approach, word confidence equals weakest character (good for high-precision requirements)
  • Maximum: Most optimistic, useful when you want the best-case confidence
  • Custom callable: Full flexibility for specialized use cases

Changes

  • Add aggregate_confidence() utility function in core.py with support for 5 built-in methods plus custom callables
  • Add ConfidenceAggregation type alias for type hints
  • Add confidence_aggregation parameter to RecognitionPostProcessor base class
  • Update all PyTorch PostProcessors: PARSeq, ViTSTR, CRNN, SAR, MASTER, VIPTR
  • Update all TensorFlow PostProcessors: PARSeq, ViTSTR, SAR, MASTER
  • Update remap_preds() for split crop handling to use configurable aggregation
  • Add comprehensive unit tests (20 new test cases)

Usage Example

from doctr.models import recognition

# Use default aggregation (model-specific)
model = recognition.parseq(pretrained=True)

# Or customize at the PostProcessor level
from doctr.models.recognition.parseq.pytorch import PARSeqPostProcessor

# Use geometric mean for more conservative confidence scores
processor = PARSeqPostProcessor(vocab, confidence_aggregation="geometric_mean")

# Use custom aggregation function
import numpy as np
processor = PARSeqPostProcessor(vocab, confidence_aggregation=lambda probs: np.percentile(probs, 25))

Test plan

  • All existing tests pass
  • New unit tests for aggregate_confidence() function cover all 5 methods
  • Tests verify correct handling of edge cases (empty arrays, single values, zeros)
  • Tests verify custom callable support
  • PyTorch postprocessor tests updated and passing
  • TensorFlow postprocessor tests updated and passing

Add support for configurable word-level confidence score aggregation
methods in text recognition models. Users can now choose how to
aggregate character-level confidence scores into word-level confidence.

Supported aggregation methods:
- "mean": Arithmetic mean (default for transformer models)
- "geometric_mean": Geometric mean (sensitive to low values)
- "harmonic_mean": Harmonic mean (even more sensitive to low values)
- "min": Minimum confidence (most conservative, default for CTC/attention models)
- "max": Maximum confidence (most optimistic)
- Custom callable: User-defined aggregation function

Changes:
- Add `aggregate_confidence()` utility function in core.py
- Add `confidence_aggregation` parameter to RecognitionPostProcessor
- Update all PyTorch PostProcessors (PARSeq, ViTSTR, CRNN, SAR, MASTER, VIPTR)
- Update all TensorFlow PostProcessors (PARSeq, ViTSTR, SAR, MASTER)
- Update `remap_preds()` for split crop handling
- Add comprehensive unit tests for aggregation methods
- Maintain backward compatibility with sensible defaults per model type
@sneakybatman
Copy link
Contributor Author

@felixdittrich92 anything else needed for this PR to be approved?

@felixdittrich92
Copy link
Collaborator

@felixdittrich92 anything else needed for this PR to be approved?

Hi @sneakybatman 👋,

Excuse the late reply.
Have had some delays last year but will be back soon and then check your PR also.
In front thanks for opening and working on doctr 👍

@felixdittrich92 felixdittrich92 self-assigned this Feb 12, 2026
@felixdittrich92 felixdittrich92 added this to the 1.1.0 milestone Feb 12, 2026
@felixdittrich92 felixdittrich92 added topic: documentation Improvements or additions to documentation module: models Related to doctr.models ext: tests Related to tests folder topic: text recognition Related to the task of text recognition type: new feature New feature ext: docs Related to docs folder labels Feb 12, 2026
Copy link
Collaborator

@felixdittrich92 felixdittrich92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sneakybatman 👋

First of all, thanks a lot for the PR !
Could you please rebase your branch onto the main branch ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext: docs Related to docs folder ext: tests Related to tests folder module: models Related to doctr.models topic: documentation Improvements or additions to documentation topic: text recognition Related to the task of text recognition type: new feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants