Skip to content

[Bug]: chardet.detect() blocks asyncio event loop in _handle_http #1751

@coderJASFK

Description

@coderJASFK

crawl4ai version

0.7.8 (also confirmed in main branch / 0.8.0)

Expected Behavior

chardet.detect() in _handle_http should not block the asyncio event loop.

Since chardet.detect() is a CPU-bound synchronous call that can take several seconds on large pages, it should be wrapped with await asyncio.to_thread() to avoid blocking the event loop — similar to how PDF processing already uses asyncio.to_thread in the codebase.

Suggested fix in async_crawler_strategy.py line 2451:

Before (blocking):

encoding = chardet.detect(content.tobytes())['encoding'] or 'utf-8'

After (non-blocking):

detected = await asyncio.to_thread(chardet.detect, content.tobytes())
encoding = detected['encoding'] or 'utf-8'

Current Behavior

When crawling pages with HTTP strategy, _handle_http calls chardet.detect(content.tobytes()) synchronously on the event loop (line 2451 in async_crawler_strategy.py). For large pages, this blocks the event loop for multiple seconds, causing lag for all concurrent async tasks.

We detected this using an event loop watchdog thread that captures the blocking call stack in real-time. Here is the stack trace:

Event loop lag detected: >0.500s

[BLOCKED STACK TRACE of event-loop thread]:
  File "crawl4ai/async_crawler_strategy.py", line 2451, in _handle_http
    encoding = chardet.detect(content.tobytes())['encoding'] or 'utf-8'
  File "chardet/__init__.py", line 49, in detect
    detector.feed(byte_str)
  File "chardet/universaldetector.py", line 274, in feed
    if prober.feed(byte_str) == ProbingState.FOUND_IT:
  File "chardet/charsetgroupprober.py", line 70, in feed
    state = prober.feed(byte_str)
  File "chardet/sbcharsetprober.py", line 122, in feed
    self._last_order = order

This was observed repeatedly (8 consecutive lag warnings in a single page crawl), indicating the chardet detection blocked the event loop for ~8 seconds total.

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Linux

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions