Skip to content

feat(browser): Add CDP Headers Support for Remote Browser Integration #1417

Open
NickMandylas wants to merge 1 commit intounclecode:mainfrom
NickMandylas:main
Open

feat(browser): Add CDP Headers Support for Remote Browser Integration #1417
NickMandylas wants to merge 1 commit intounclecode:mainfrom
NickMandylas:main

Conversation

@NickMandylas
Copy link

@NickMandylas NickMandylas commented Aug 20, 2025

Summary

This update adds support for CDP headers in the browser configuration, enabling integration with remote browser services that require authentication headers. An example of a usecase for this enhancemen enables seamless integration with Amazon Bedrock AgentCore Browser, allowing Crawl4AI to connect to managed browser instances in AWS environments.

The change introduces a new cdp_headers parameter to the BrowserConfig class that allows passing authentication headers when connecting to remote browsers via CDP, expanding Crawl4AI's compatibility with cloud-based browser services.

List of files changed and why

Core Configuration Changes:

  • crawl4ai/async_configs.py - Added cdp_headers parameter to BrowserConfig class constructor, initialization, from_kwargs method, and to_dict method to support authentication headers for CDP connections
  • crawl4ai/browser_manager.py - Modified the CDP connection logic to pass the cdp_headers to Playwright's connect_over_cdp method

Documentation and Examples:

  • docs/examples/amazon_bedrock_agentcore_browser_integration.py - New example demonstrating integration with Amazon Bedrock AgentCore Browser using the new CDP headers functionality
  • docs/md_v2/core/examples.md - Updated examples documentation to include the new Amazon Bedrock integration example

Testing:

  • tests/browser/test_cdp_headers.py - Comprehensive test suite for the CDP headers functionality, including integration tests with Amazon Bedrock AgentCore Browser

Dependencies:

  • pyproject.toml - Added new optional dependency group aws = ["bedrock-agentcore"] to support Amazon Bedrock integration

How Has This Been Tested?

I've tested this integration with Amazon Bedrock AgentCore Browser which utilises CDP Headers in order to have an authorised connection from Crawl4AI.

Test is available in file: tests/browser/test_cdp_headers.py.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added/updated unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Summary by CodeRabbit

  • New Features

    • Support custom headers for remote browser (CDP) connections via browser settings, improving compatibility with managed remote browsers (e.g., Amazon Bedrock Agentcore).
  • Documentation

    • Added example demonstrating integration with Amazon Bedrock Agentcore Browser for remote sessions.
    • Updated examples directory and docs with a link to the new sample.
  • Tests

    • Added functional test covering CDP connections using authentication headers.
  • Chores

    • Introduced optional aws dependency group to simplify installing Bedrock Agentcore integration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 20, 2025

Walkthrough

Adds CDP headers support to BrowserConfig and propagates it through serialization and BrowserManager’s CDP connection. Introduces an example integrating Amazon Bedrock Agentcore Browser, updates docs to reference it, adds an optional aws dependency group, and adds a functional test exercising CDP headers with a remote Playwright session.

Changes

Cohort / File(s) Summary
Browser config: CDP headers support
crawl4ai/async_configs.py
Adds cdp_headers parameter and attribute to BrowserConfig; updates from_kwargs and to_dict to handle cdp_headers.
Browser manager: pass headers to CDP
crawl4ai/browser_manager.py
start now calls connect_over_cdp(cdp_url, headers=self.config.cdp_headers).
Docs: Bedrock Agentcore example
docs/examples/amazon_bedrock_agentcore_browser_integration.py, docs/md_v2/core/examples.md
Adds new example demonstrating remote Playwright via Bedrock Agentcore and links it in the examples table.
Packaging: optional deps
pyproject.toml
Adds aws optional dependency group with bedrock-agentcore.
Tests: CDP headers via AWS
tests/browser/test_cdp_headers.py
New async test using BrowserClient to obtain CDP URL and headers; runs AsyncWebCrawler with BrowserConfig(cdp_url, cdp_headers).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User Code
  participant AM as AsyncWebCrawler
  participant BM as BrowserManager
  participant PW as Playwright (CDP)
  participant RB as Remote Browser (Bedrock Agentcore)

  U->>AM: arun(url, config with cdp_url + cdp_headers)
  AM->>BM: start(config)
  BM->>PW: connect_over_cdp(cdp_url, headers=cdp_headers)
  PW-->>RB: WebSocket handshake + headers
  RB-->>PW: Session established
  BM-->>AM: Browser ready
  AM->>RB: Navigate and extract
  RB-->>AM: Page content/markdown
  AM-->>U: Result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at headers neat,
Hop-hop, I pass them, quick and fleet.
A warren of websockets spun in air,
Bedrock winds beneath my hare.
Chromium burrows, markdown gleams—
I nibble bytes and chase new streams. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@NickMandylas NickMandylas changed the title feat(browser): Add feat(browser): Add CDP Headers Support for Remote Browser Integration Aug 20, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crawl4ai/async_configs.py (1)

562-596: Redact sensitive headers in to_dict to prevent secret leakage

Serializing BrowserConfig currently emits cdp_headers verbatim. If configs are logged or persisted, Authorization and similar secrets leak.

Apply this diff to mask common sensitive headers on serialization:

     def to_dict(self):
-        result = {
+        # Redact sensitive header values to avoid leaking secrets in logs/telemetry.
+        def _redact_headers(d: dict) -> dict:
+            if not isinstance(d, dict):
+                return d
+            sensitive = {
+                "authorization",
+                "proxy-authorization",
+                "x-api-key",
+                "x-amzn-authorization",
+                "x-aws-ec2-metadata-token",
+            }
+            redacted = {}
+            for k, v in d.items():
+                redacted[k] = "***" if isinstance(k, str) and k.lower() in sensitive else v
+            return redacted
+
+        result = {
             "browser_type": self.browser_type,
             "headless": self.headless,
             "browser_mode": self.browser_mode,
             "use_managed_browser": self.use_managed_browser,
             "cdp_url": self.cdp_url,
-            "cdp_headers": self.cdp_headers,
+            "cdp_headers": _redact_headers(self.cdp_headers),
             "use_persistent_context": self.use_persistent_context,
             "user_data_dir": self.user_data_dir,
             "chrome_channel": self.chrome_channel,
             "channel": self.channel,
             "proxy": self.proxy,
             "proxy_config": self.proxy_config,
             "viewport_width": self.viewport_width,
             "viewport_height": self.viewport_height,
             "accept_downloads": self.accept_downloads,
             "downloads_path": self.downloads_path,
             "storage_state": self.storage_state,
             "ignore_https_errors": self.ignore_https_errors,
             "java_script_enabled": self.java_script_enabled,
             "cookies": self.cookies,
             "headers": self.headers,
             "user_agent": self.user_agent,
             "user_agent_mode": self.user_agent_mode,
             "user_agent_generator_config": self.user_agent_generator_config,
             "text_mode": self.text_mode,
             "light_mode": self.light_mode,
             "extra_args": self.extra_args,
             "sleep_on_close": self.sleep_on_close,
             "verbose": self.verbose,
             "debugging_port": self.debugging_port,
             "host": self.host,
             "enable_stealth": self.enable_stealth,
         }
🧹 Nitpick comments (15)
pyproject.toml (2)

67-77: Add bedrock-agentcore to the “all” extra for consistency

You added the aws extra, but “all” doesn’t include bedrock-agentcore. Users installing crawl4ai[all] will miss this dependency.

Apply this diff to include bedrock-agentcore in the all extra:

 [project.optional-dependencies]
 pdf = ["PyPDF2"]  
 torch = ["torch", "nltk", "scikit-learn"]
 transformer = ["transformers", "tokenizers", "sentence-transformers"]
 cosine = ["torch", "transformers", "nltk", "sentence-transformers"]
 sync = ["selenium"]
 aws = ["bedrock-agentcore"]
 all = [
     "PyPDF2",
     "torch",
     "nltk",
     "scikit-learn",
     "transformers",
     "tokenizers",
     "sentence-transformers",
-    "selenium"
+    "selenium",
+    "bedrock-agentcore"
 ]

67-67: Document the extra name in examples

Since the new example relies on this extra, please ensure docs mention installation via:

  • pip install 'crawl4ai[aws]'

I can add a short prerequisites note to the example section if you’d like.

docs/md_v2/core/examples.md (2)

25-25: Fix product casing: “AgentCore”, not “Agentcore”

Use “Amazon Bedrock AgentCore Browser” consistently.

-| Remote Browser (with Amazon Bedrock AgentCore Browser) | Shows how to integrate with Amazon Bedrock Agentcore Browser for remote browser sessions. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_bedrock_agentcore_browser_integration.py) |
+| Remote Browser (with Amazon Bedrock AgentCore Browser) | Shows how to integrate with Amazon Bedrock AgentCore Browser for remote browser sessions. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_bedrock_agentcore_browser_integration.py) |

105-121: Add note about installing extras for the Bedrock example (optional)

Consider adding a brief mention that some examples (e.g., Bedrock AgentCore) require extras like:

  • pip install 'crawl4ai[aws]'

I can raise a small doc PR if preferred.

crawl4ai/async_configs.py (1)

405-406: Type narrow cdp_headers to Mapping[str, str] (optional)

Using Optional[Mapping[str, str]] instead of bare dict communicates intent and avoids unexpected mutability.

-        cdp_headers: dict = None,
+        cdp_headers: Optional[Mapping[str, str]] = None,
...
-        self.cdp_headers = cdp_headers if cdp_headers is not None else {}
+        self.cdp_headers = dict(cdp_headers) if cdp_headers is not None else {}
crawl4ai/browser_manager.py (1)

663-671: Improve error handling for CDP connect (optional)

Wrap connect_over_cdp in try/except to surface header-related failures clearly (401/403), and avoid silent fall-throughs.

-            self.browser = await self.playwright.chromium.connect_over_cdp(cdp_url, headers=self.config.cdp_headers)
+            try:
+                self.browser = await self.playwright.chromium.connect_over_cdp(
+                    cdp_url, headers=(self.config.cdp_headers or None)
+                )
+            except Exception as e:
+                if self.logger:
+                    self.logger.error(
+                        message="CDP connect failed: {error}",
+                        tag="BROWSER",
+                        params={"error": str(e)},
+                    )
+                raise
docs/examples/amazon_bedrock_agentcore_browser_integration.py (4)

1-3: Fix product casing in example docstring

Use “AgentCore” consistently.

-"""
-This example demonstrates how to integrate Amazon Bedrock Agentcore Browser, a remote browser session using Playwright.
-"""
+"""
+This example demonstrates how to integrate Amazon Bedrock AgentCore Browser for a remote Playwright session.
+"""

12-16: Graceful lifecycle management for BrowserClient (verify and adopt context manager if available)

Starting the client without closing it may leak resources. Prefer an async context manager or explicit close/shutdown if provided.

Proposed pattern (adjust to the actual API):

-    client = BrowserClient(region="us-east-1")
-    client.start()
-
-    ws_url, headers = client.generate_ws_headers()
+    async with BrowserClient(region="us-east-1") as client:
+        await client.start()  # if required; some context managers auto-start
+        ws_url, headers = client.generate_ws_headers()

If no async CM exists, ensure a corresponding client.stop()/close() is called in a finally block.


23-27: Add runnable entrypoint for convenience (optional)

Including an entrypoint helps users run the example directly.

 async def use_bedrock_agentcore():
@@
         print(result.markdown)
+
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(use_bedrock_agentcore())

5-5: Call out installation prerequisites in the example header (optional)

Add a short comment indicating the required extra and AWS credentials.

Example:

  • Requires: pip install 'crawl4ai[aws]'
  • AWS credentials with permissions to use Bedrock AgentCore Browser (e.g., via environment or config)
tests/browser/test_cdp_headers.py (5)

19-22: Remove unused import flagged by Ruff

CrawlerRunConfig isn’t used here. Clean it up to satisfy Ruff F401.

Apply this diff:

-from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
+from crawl4ai.async_configs import BrowserConfig

37-41: Be explicit: set browser_mode="cdp"

Depending on BrowserManager logic, relying only on cdp_url may work, but being explicit avoids ambiguity and future regressions.

You already set this in the refactor above. If you prefer the implicit behavior, please confirm that crawl4ai/browser_manager.py branches to connect_over_cdp solely based on cdp_url being set, independent of browser_mode.


23-25: Reduce CI noise from test logging

Global verbose logger can spam CI output. Make verbosity opt-in via env.

Apply this diff:

-logger = AsyncLogger(verbose=True, log_file=None)
+logger = AsyncLogger(verbose=os.getenv("CRAWL4AI_TEST_VERBOSE") == "1", log_file=None)

57-76: Remove custom runner from tests directory or move it to examples

A bespoke test runner in tests/ won’t be used by pytest and adds confusion. Either delete it or move the script-style runner to docs/examples alongside the Bedrock example.

Apply this diff to remove it:

-async def run_tests():
-    """Run all tests sequentially."""
-    results = []
-
-    results.append(await test_cdp_headers_with_aws())
-
-    # Print summary
-    total = len(results)
-    passed = sum(results)
-    logger.info(f"Tests complete: {passed}/{total} passed", tag="SUMMARY")
-
-    if passed == total:
-        logger.success("All tests passed!", tag="SUMMARY")
-    else:
-        logger.error(f"{total - passed} tests failed", tag="SUMMARY")
-
-
-if __name__ == "__main__":
-    asyncio.run(run_tests())

1-5: Name/intent mismatch: this is an integration test, not a unit test

The docstring labels this as “examples” and the implementation depends on external infrastructure. Marking and gating it (as suggested) keeps CI stable while preserving the example.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e651e04 and 9315151.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • crawl4ai/async_configs.py (4 hunks)
  • crawl4ai/browser_manager.py (1 hunks)
  • docs/examples/amazon_bedrock_agentcore_browser_integration.py (1 hunks)
  • docs/md_v2/core/examples.md (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/browser/test_cdp_headers.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
docs/examples/amazon_bedrock_agentcore_browser_integration.py (2)
crawl4ai/async_webcrawler.py (1)
  • AsyncWebCrawler (53-852)
crawl4ai/async_configs.py (1)
  • BrowserConfig (329-625)
tests/browser/test_cdp_headers.py (3)
crawl4ai/async_webcrawler.py (1)
  • AsyncWebCrawler (53-852)
crawl4ai/async_configs.py (2)
  • BrowserConfig (329-625)
  • CrawlerRunConfig (837-1672)
crawl4ai/async_logger.py (1)
  • AsyncLogger (79-322)
🪛 Ruff (0.12.2)
tests/browser/test_cdp_headers.py

20-20: crawl4ai.async_configs.CrawlerRunConfig imported but unused

Remove unused import: crawl4ai.async_configs.CrawlerRunConfig

(F401)

🔇 Additional comments (7)
pyproject.toml (1)

67-67: Verify PyPI package existence and pin a minimal version
It wasn’t possible to confirm “bedrock-agentcore” on PyPI using the prior script (the JSON response wasn’t iterable). Please manually verify the correct package name and list of available versions, then pin at least the minimum known‐good release to prevent future breakage.

• To check existence and HTTP status:

curl -o /dev/null -s -w "%{http_code}" https://pypi.org/pypi/bedrock-agentcore/json

• Or visit: https://pypi.org/project/bedrock-agentcore/
• If the package is named differently (e.g. bedrock-agent-core), update the dependency.
• Once confirmed, change the entry in pyproject.toml, for example:

aws = ["bedrock-agentcore>=1.2.3"]
crawl4ai/async_configs.py (3)

405-406: Good addition: cdp_headers in BrowserConfig ctor

Accepting CDP headers at construction time is aligned with the new CDP integration.


444-445: Sane default for cdp_headers

Defaulting to {} avoids downstream None checks.


531-532: from_kwargs wiring looks correct

cdp_headers is correctly passed through during deserialization.

crawl4ai/browser_manager.py (2)

663-667: Pass-through of CDP headers to connect_over_cdp is correct

This is the right place to forward headers to the CDP endpoint.


663-667: Playwright headers support confirmed
The headers parameter for BrowserType.connect_over_cdp was introduced in Playwright Python v1.11¹, and our minimum pinned version (>= 1.49.0) is well beyond that². No runtime risk of a missing headers kwarg—no changes needed.

¹ https://playwright.dev/python/docs/api/class-browsertype?utm_source=chatgpt.com
² https://playwright.dev/python/docs/release-notes?utm_source=chatgpt.com

docs/examples/amazon_bedrock_agentcore_browser_integration.py (1)

17-21: Example is correct and concise

Creating BrowserConfig with cdp_url and cdp_headers demonstrates the new feature well.

Comment on lines +11 to +12
from bedrock_agentcore.tools.browser_client import BrowserClient

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the Bedrock import optional and avoid import-time failures

Unconditionally importing an optional dependency will break test collection in environments that don’t install the extra (CI by default). Gate it behind pytest.importorskip and move it inside the test. Also add contextlib/pytest imports now used by the test changes below.

Apply this diff:

@@
 import asyncio
 import os
 import sys
+import contextlib
+import pytest
 
-from bedrock_agentcore.tools.browser_client import BrowserClient
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from bedrock_agentcore.tools.browser_client import BrowserClient
import asyncio
import os
import sys
import contextlib
import pytest
🤖 Prompt for AI Agents
In tests/browser/test_cdp_headers.py around lines 11 to 12, the test
unconditionally imports BrowserClient from
bedrock_agentcore.tools.browser_client which causes import-time failures in
environments without that optional extra; change this to use pytest.importorskip
to gate the bedrock dependency and move the import inside the test function so
test collection doesn't fail, and add the necessary imports for pytest and
contextlib at the top of the file to support the new pattern.

Comment on lines +27 to +55
async def test_cdp_headers_with_aws():
"""Test launching a browser and connecting via CDP."""
logger.info("Testing launch and connect via CDP", tag="TEST")

try:
client = BrowserClient(region="us-east-1")
client.start()

ws_url, headers = client.generate_ws_headers()

browser_config = BrowserConfig(
browser_type="chromium",
cdp_url=ws_url,
cdp_headers=headers,
)

async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business",
)
print(result.markdown)

logger.success("CDP headers test passed", tag="TEST")
return True

except Exception as e:
logger.error(f"CDP headers test failed: {str(e)}", tag="TEST")
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Test currently always passes (exceptions swallowed); convert to a real pytest-asyncio test with assertions, skip gating, and proper cleanup

Catching all exceptions and returning True/False means pytest will treat the test as passed regardless of failures. Also, the import should be optional, and the external E2E dependency should be opt-in. Add:

  • pytest-asyncio decorator
  • opt-in gating via env var
  • importorskip for the optional extra
  • deterministic assertion(s)
  • finally cleanup for the BrowserClient
  • explicitly set browser_mode="cdp"
  • avoid printing large page content and use a stable URL

Apply this diff:

@@
-async def test_cdp_headers_with_aws():
-    """Test launching a browser and connecting via CDP."""
-    logger.info("Testing launch and connect via CDP", tag="TEST")
-
-    try:
-        client = BrowserClient(region="us-east-1")
-        client.start()
-
-        ws_url, headers = client.generate_ws_headers()
-
-        browser_config = BrowserConfig(
-            browser_type="chromium",
-            cdp_url=ws_url,
-            cdp_headers=headers,
-        )
-
-        async with AsyncWebCrawler(config=browser_config) as crawler:
-            result = await crawler.arun(
-                url="https://www.nbcnews.com/business",
-            )
-            print(result.markdown)
-
-        logger.success("CDP headers test passed", tag="TEST")
-        return True
-
-    except Exception as e:
-        logger.error(f"CDP headers test failed: {str(e)}", tag="TEST")
-        return False
+@pytest.mark.asyncio
+async def test_cdp_headers_with_aws():
+    """Test launching a browser and connecting via CDP."""
+    logger.info("Testing launch and connect via CDP", tag="TEST")
+
+    # Opt-in: run only when explicitly enabled
+    if os.getenv("CRAWL4AI_E2E_BEDROCK") != "1":
+        pytest.skip("E2E Bedrock test disabled. Set CRAWL4AI_E2E_BEDROCK=1 to enable.")
+
+    # Optional dependency
+    browser_mod = pytest.importorskip(
+        "bedrock_agentcore.tools.browser_client",
+        reason="Install crawl4ai[bedrock-agentcore] to run this test.",
+    )
+    BrowserClient = getattr(browser_mod, "BrowserClient")
+
+    region = os.getenv("BEDROCK_REGION", "us-east-1")
+    client = BrowserClient(region=region)
+    try:
+        client.start()
+
+        ws_url, headers = client.generate_ws_headers()
+
+        browser_config = BrowserConfig(
+            browser_type="chromium",
+            browser_mode="cdp",
+            cdp_url=ws_url,
+            cdp_headers=headers,
+        )
+
+        async with AsyncWebCrawler(config=browser_config, logger=logger) as crawler:
+            result = await crawler.arun(
+                url="https://example.com",
+            )
+
+        # Minimal, deterministic assertions
+        assert result.success is True
+        assert result.markdown is not None and len(result.markdown) > 0
+        logger.success("CDP headers test passed", tag="TEST")
+    finally:
+        with contextlib.suppress(Exception):
+            client.stop()

As a follow-up, consider adding a small unit test (with monkeypatch) to assert that BrowserManager passes cdp_headers to Playwright’s connect_over_cdp. I can draft it if you want.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_cdp_headers_with_aws():
"""Test launching a browser and connecting via CDP."""
logger.info("Testing launch and connect via CDP", tag="TEST")
try:
client = BrowserClient(region="us-east-1")
client.start()
ws_url, headers = client.generate_ws_headers()
browser_config = BrowserConfig(
browser_type="chromium",
cdp_url=ws_url,
cdp_headers=headers,
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business",
)
print(result.markdown)
logger.success("CDP headers test passed", tag="TEST")
return True
except Exception as e:
logger.error(f"CDP headers test failed: {str(e)}", tag="TEST")
return False
@pytest.mark.asyncio
async def test_cdp_headers_with_aws():
"""Test launching a browser and connecting via CDP."""
logger.info("Testing launch and connect via CDP", tag="TEST")
# Opt-in: run only when explicitly enabled
if os.getenv("CRAWL4AI_E2E_BEDROCK") != "1":
pytest.skip("E2E Bedrock test disabled. Set CRAWL4AI_E2E_BEDROCK=1 to enable.")
# Optional dependency
browser_mod = pytest.importorskip(
"bedrock_agentcore.tools.browser_client",
reason="Install crawl4ai[bedrock-agentcore] to run this test.",
)
BrowserClient = getattr(browser_mod, "BrowserClient")
region = os.getenv("BEDROCK_REGION", "us-east-1")
client = BrowserClient(region=region)
try:
client.start()
ws_url, headers = client.generate_ws_headers()
browser_config = BrowserConfig(
browser_type="chromium",
browser_mode="cdp",
cdp_url=ws_url,
cdp_headers=headers,
)
async with AsyncWebCrawler(config=browser_config, logger=logger) as crawler:
result = await crawler.arun(
url="https://example.com",
)
# Minimal, deterministic assertions
assert result.success is True
assert result.markdown is not None and len(result.markdown) > 0
logger.success("CDP headers test passed", tag="TEST")
finally:
with contextlib.suppress(Exception):
client.stop()
🤖 Prompt for AI Agents
In tests/browser/test_cdp_headers.py lines 27-55, the current test swallows
exceptions and always returns True/False; convert it into a proper
pytest-asyncio test: add the pytest.mark.asyncio decorator, gate running via an
environment variable (skip the test unless set), use pytest.importorskip for
optional E2E extras, set BrowserConfig/browser client to explicit
browser_mode="cdp", avoid printing page content and use a stable URL, replace
the top-level try/except with real assertions (e.g., assert crawler.run result
meets expected conditions) and ensure deterministic checks, and add a finally
block to always stop/cleanup the BrowserClient to avoid leaking resources.

logger = AsyncLogger(verbose=True, log_file=None)


async def test_cdp_headers_with_aws():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify pytest-asyncio configuration is present

Since we’re using @pytest.mark.asyncio, ensure pytest-asyncio is installed and asyncio mode is configured. Run this repo-scoped script to confirm:


🏁 Script executed:

#!/bin/bash
# Check for pytest-asyncio in project files and config of asyncio mode
set -euo pipefail

echo "Searching for pytest-asyncio dependency mentions..."
rg -n -C2 -i 'pytest-asyncio' || true

echo
echo "Looking for asyncio_mode in pytest config files..."
fd -a 'pytest.ini|pyproject.toml|setup.cfg|tox.ini' | while read -r f; do
  echo "---- $f ----"
  rg -n -C2 -i 'asyncio_mode|pytest_plugins|markers' "$f" || true
done

echo
echo "Examples of existing async tests (if any):"
rg -nP --type=py -g 'tests/**' -C3 '@pytest\.mark\.asyncio|async def test_' || true

Length of output: 161871


Add pytest-asyncio and configure asyncio mode

It looks like there’s no explicit pytest-asyncio dependency or asyncio_mode/marker registration in your pytest config, and the new async test in tests/browser/test_cdp_headers.py isn’t decorated—under the default “strict” mode your async def tests will be skipped or error.

• tests/browser/test_cdp_headers.py:27
– Add the missing decorator above the async test.
• Project config (pyproject.toml, pytest.ini or setup.cfg)
– Ensure pytest-asyncio is listed in your dev-dependencies (or requirements).
– Register the asyncio marker or set asyncio_mode to “auto” so bare async def tests run.

Example diff for the test itself:

--- a/tests/browser/test_cdp_headers.py
+++ b/tests/browser/test_cdp_headers.py
@@
-async def test_cdp_headers_with_aws():
+import pytest
+
+@pytest.mark.asyncio
+async def test_cdp_headers_with_aws():

And in your pytest config (e.g. pyproject.toml):

[tool.pytest.ini_options]
asyncio_mode = "auto"
markers = [
  "asyncio: mark async tests using pytest-asyncio"
]

or equivalently in pytest.ini / setup.cfg under [pytest] section.

🤖 Prompt for AI Agents
In tests/browser/test_cdp_headers.py around line 27, the new async test function
is missing a pytest-asyncio marker/decorator so it will be skipped or error
under strict asyncio mode; add @pytest.mark.asyncio above the async def
test_cdp_headers_with_aws() (and import pytest if not present). Also add
pytest-asyncio to the dev dependencies and configure pytest to allow bare async
tests by setting asyncio_mode = "auto" or registering the asyncio marker in your
test config (pyproject.toml, pytest.ini, or setup.cfg).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant