Skip to content

High memory usage when reading a specific PDF #1268

@arnestockmans-itp

Description

@arnestockmans-itp

Describe the bug

When trying to convert PDF to text, I notice very high memory usage for one specific PDF. This might mean there's a memory leak in the library.

To Reproduce

Code to reproduce the issue:

    val reader = PdfReader(contents)
    val extractor = PdfTextExtractor(reader)

    for (i in 1..reader.numberOfPages) {
        val text = extractor.getTextFromPage(i)
        onPageParsed(text)
    }

When reaching page 4 of the attached PDF, I see this uses ~17GB of memory. With other PDFs, this is way lower.

Expected behavior

Not using this much memory

System

  • OS: macOS, Google Cloud Run
  • OpenPDF version: 2.0.3

Your real name

Arne Stockmans

Additional context

0a715c43-5b76-4bc7-9d91-38ebae902f97_paper.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions