feat(browser): Add CDP Headers Support for Remote Browser Integration by NickMandylas · Pull Request #1417 · unclecode/crawl4ai

NickMandylas · 2025-08-20T06:02:38Z

Summary

This update adds support for CDP headers in the browser configuration, enabling integration with remote browser services that require authentication headers. An example of a usecase for this enhancemen enables seamless integration with Amazon Bedrock AgentCore Browser, allowing Crawl4AI to connect to managed browser instances in AWS environments.

The change introduces a new cdp_headers parameter to the BrowserConfig class that allows passing authentication headers when connecting to remote browsers via CDP, expanding Crawl4AI's compatibility with cloud-based browser services.

List of files changed and why

Core Configuration Changes:

crawl4ai/async_configs.py - Added cdp_headers parameter to BrowserConfig class constructor, initialization, from_kwargs method, and to_dict method to support authentication headers for CDP connections
crawl4ai/browser_manager.py - Modified the CDP connection logic to pass the cdp_headers to Playwright's connect_over_cdp method

Documentation and Examples:

docs/examples/amazon_bedrock_agentcore_browser_integration.py - New example demonstrating integration with Amazon Bedrock AgentCore Browser using the new CDP headers functionality
docs/md_v2/core/examples.md - Updated examples documentation to include the new Amazon Bedrock integration example

Testing:

tests/browser/test_cdp_headers.py - Comprehensive test suite for the CDP headers functionality, including integration tests with Amazon Bedrock AgentCore Browser

Dependencies:

pyproject.toml - Added new optional dependency group aws = ["bedrock-agentcore"] to support Amazon Bedrock integration

How Has This Been Tested?

I've tested this integration with Amazon Bedrock AgentCore Browser which utilises CDP Headers in order to have an authorised connection from Crawl4AI.

Test is available in file: tests/browser/test_cdp_headers.py.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Summary by CodeRabbit

New Features
- Support custom headers for remote browser (CDP) connections via browser settings, improving compatibility with managed remote browsers (e.g., Amazon Bedrock Agentcore).
Documentation
- Added example demonstrating integration with Amazon Bedrock Agentcore Browser for remote sessions.
- Updated examples directory and docs with a link to the new sample.
Tests
- Added functional test covering CDP connections using authentication headers.
Chores
- Introduced optional aws dependency group to simplify installing Bedrock Agentcore integration.

… agentcore.

coderabbitai · 2025-08-20T06:02:44Z

Walkthrough

Adds CDP headers support to BrowserConfig and propagates it through serialization and BrowserManager’s CDP connection. Introduces an example integrating Amazon Bedrock Agentcore Browser, updates docs to reference it, adds an optional aws dependency group, and adds a functional test exercising CDP headers with a remote Playwright session.

Changes

Cohort / File(s)	Summary
Browser config: CDP headers support `crawl4ai/async_configs.py`	Adds cdp_headers parameter and attribute to BrowserConfig; updates from_kwargs and to_dict to handle cdp_headers.
Browser manager: pass headers to CDP `crawl4ai/browser_manager.py`	start now calls connect_over_cdp(cdp_url, headers=self.config.cdp_headers).
Docs: Bedrock Agentcore example `docs/examples/amazon_bedrock_agentcore_browser_integration.py`, `docs/md_v2/core/examples.md`	Adds new example demonstrating remote Playwright via Bedrock Agentcore and links it in the examples table.
Packaging: optional deps `pyproject.toml`	Adds aws optional dependency group with bedrock-agentcore.
Tests: CDP headers via AWS `tests/browser/test_cdp_headers.py`	New async test using BrowserClient to obtain CDP URL and headers; runs AsyncWebCrawler with BrowserConfig(cdp_url, cdp_headers).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User Code
  participant AM as AsyncWebCrawler
  participant BM as BrowserManager
  participant PW as Playwright (CDP)
  participant RB as Remote Browser (Bedrock Agentcore)

  U->>AM: arun(url, config with cdp_url + cdp_headers)
  AM->>BM: start(config)
  BM->>PW: connect_over_cdp(cdp_url, headers=cdp_headers)
  PW-->>RB: WebSocket handshake + headers
  RB-->>PW: Session established
  BM-->>AM: Browser ready
  AM->>RB: Navigate and extract
  RB-->>AM: Page content/markdown
  AM-->>U: Result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at headers neat,
Hop-hop, I pass them, quick and fleet.
A warren of websockets spun in air,
Bedrock winds beneath my hare.
Chromium burrows, markdown gleams—
I nibble bytes and chase new streams. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crawl4ai/async_configs.py (1)

562-596: Redact sensitive headers in to_dict to prevent secret leakage

Serializing BrowserConfig currently emits cdp_headers verbatim. If configs are logged or persisted, Authorization and similar secrets leak.

Apply this diff to mask common sensitive headers on serialization:

     def to_dict(self):
-        result = {
+        # Redact sensitive header values to avoid leaking secrets in logs/telemetry.
+        def _redact_headers(d: dict) -> dict:
+            if not isinstance(d, dict):
+                return d
+            sensitive = {
+                "authorization",
+                "proxy-authorization",
+                "x-api-key",
+                "x-amzn-authorization",
+                "x-aws-ec2-metadata-token",
+            }
+            redacted = {}
+            for k, v in d.items():
+                redacted[k] = "***" if isinstance(k, str) and k.lower() in sensitive else v
+            return redacted
+
+        result = {
             "browser_type": self.browser_type,
             "headless": self.headless,
             "browser_mode": self.browser_mode,
             "use_managed_browser": self.use_managed_browser,
             "cdp_url": self.cdp_url,
-            "cdp_headers": self.cdp_headers,
+            "cdp_headers": _redact_headers(self.cdp_headers),
             "use_persistent_context": self.use_persistent_context,
             "user_data_dir": self.user_data_dir,
             "chrome_channel": self.chrome_channel,
             "channel": self.channel,
             "proxy": self.proxy,
             "proxy_config": self.proxy_config,
             "viewport_width": self.viewport_width,
             "viewport_height": self.viewport_height,
             "accept_downloads": self.accept_downloads,
             "downloads_path": self.downloads_path,
             "storage_state": self.storage_state,
             "ignore_https_errors": self.ignore_https_errors,
             "java_script_enabled": self.java_script_enabled,
             "cookies": self.cookies,
             "headers": self.headers,
             "user_agent": self.user_agent,
             "user_agent_mode": self.user_agent_mode,
             "user_agent_generator_config": self.user_agent_generator_config,
             "text_mode": self.text_mode,
             "light_mode": self.light_mode,
             "extra_args": self.extra_args,
             "sleep_on_close": self.sleep_on_close,
             "verbose": self.verbose,
             "debugging_port": self.debugging_port,
             "host": self.host,
             "enable_stealth": self.enable_stealth,
         }

🧹 Nitpick comments (15)

pyproject.toml (2)
67-77: Add bedrock-agentcore to the “all” extra for consistency

You added the aws extra, but “all” doesn’t include bedrock-agentcore. Users installing crawl4ai[all] will miss this dependency.

Apply this diff to include bedrock-agentcore in the all extra:
 [project.optional-dependencies]
 pdf = ["PyPDF2"]  
 torch = ["torch", "nltk", "scikit-learn"]
 transformer = ["transformers", "tokenizers", "sentence-transformers"]
 cosine = ["torch", "transformers", "nltk", "sentence-transformers"]
 sync = ["selenium"]
 aws = ["bedrock-agentcore"]
 all = [
     "PyPDF2",
     "torch",
     "nltk",
     "scikit-learn",
     "transformers",
     "tokenizers",
     "sentence-transformers",
-    "selenium"
+    "selenium",
+    "bedrock-agentcore"
 ]
67-67: Document the extra name in examples

Since the new example relies on this extra, please ensure docs mention installation via:

pip install 'crawl4ai[aws]'

I can add a short prerequisites note to the example section if you’d like.
docs/md_v2/core/examples.md (2)
25-25: Fix product casing: “AgentCore”, not “Agentcore”

Use “Amazon Bedrock AgentCore Browser” consistently.
-| Remote Browser (with Amazon Bedrock AgentCore Browser) | Shows how to integrate with Amazon Bedrock Agentcore Browser for remote browser sessions. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_bedrock_agentcore_browser_integration.py) |
+| Remote Browser (with Amazon Bedrock AgentCore Browser) | Shows how to integrate with Amazon Bedrock AgentCore Browser for remote browser sessions. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_bedrock_agentcore_browser_integration.py) |
105-121: Add note about installing extras for the Bedrock example (optional)

Consider adding a brief mention that some examples (e.g., Bedrock AgentCore) require extras like:

pip install 'crawl4ai[aws]'

I can raise a small doc PR if preferred.
crawl4ai/async_configs.py (1)
405-406: Type narrow cdp_headers to Mapping[str, str] (optional)

Using Optional[Mapping[str, str]] instead of bare dict communicates intent and avoids unexpected mutability.
-        cdp_headers: dict = None,
+        cdp_headers: Optional[Mapping[str, str]] = None,
...
-        self.cdp_headers = cdp_headers if cdp_headers is not None else {}
+        self.cdp_headers = dict(cdp_headers) if cdp_headers is not None else {}
crawl4ai/browser_manager.py (1)
663-671: Improve error handling for CDP connect (optional)

Wrap connect_over_cdp in try/except to surface header-related failures clearly (401/403), and avoid silent fall-throughs.
-            self.browser = await self.playwright.chromium.connect_over_cdp(cdp_url, headers=self.config.cdp_headers)
+            try:
+                self.browser = await self.playwright.chromium.connect_over_cdp(
+                    cdp_url, headers=(self.config.cdp_headers or None)
+                )
+            except Exception as e:
+                if self.logger:
+                    self.logger.error(
+                        message="CDP connect failed: {error}",
+                        tag="BROWSER",
+                        params={"error": str(e)},
+                    )
+                raise
docs/examples/amazon_bedrock_agentcore_browser_integration.py (4)
1-3: Fix product casing in example docstring

Use “AgentCore” consistently.
-"""
-This example demonstrates how to integrate Amazon Bedrock Agentcore Browser, a remote browser session using Playwright.
-"""
+"""
+This example demonstrates how to integrate Amazon Bedrock AgentCore Browser for a remote Playwright session.
+"""
12-16: Graceful lifecycle management for BrowserClient (verify and adopt context manager if available)

Starting the client without closing it may leak resources. Prefer an async context manager or explicit close/shutdown if provided.

Proposed pattern (adjust to the actual API):
-    client = BrowserClient(region="us-east-1")
-    client.start()
-
-    ws_url, headers = client.generate_ws_headers()
+    async with BrowserClient(region="us-east-1") as client:
+        await client.start()  # if required; some context managers auto-start
+        ws_url, headers = client.generate_ws_headers()
If no async CM exists, ensure a corresponding client.stop()/close() is called in a finally block.

23-27: Add runnable entrypoint for convenience (optional)

Including an entrypoint helps users run the example directly.
 async def use_bedrock_agentcore():
@@
         print(result.markdown)
+
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(use_bedrock_agentcore())
5-5: Call out installation prerequisites in the example header (optional)

Add a short comment indicating the required extra and AWS credentials.

Example:

Requires: pip install 'crawl4ai[aws]'

AWS credentials with permissions to use Bedrock AgentCore Browser (e.g., via environment or config)
tests/browser/test_cdp_headers.py (5)
19-22: Remove unused import flagged by Ruff

CrawlerRunConfig isn’t used here. Clean it up to satisfy Ruff F401.

Apply this diff:
-from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
+from crawl4ai.async_configs import BrowserConfig
37-41: Be explicit: set browser_mode="cdp"

Depending on BrowserManager logic, relying only on cdp_url may work, but being explicit avoids ambiguity and future regressions.

You already set this in the refactor above. If you prefer the implicit behavior, please confirm that crawl4ai/browser_manager.py branches to connect_over_cdp solely based on cdp_url being set, independent of browser_mode.

23-25: Reduce CI noise from test logging

Global verbose logger can spam CI output. Make verbosity opt-in via env.

Apply this diff:
-logger = AsyncLogger(verbose=True, log_file=None)
+logger = AsyncLogger(verbose=os.getenv("CRAWL4AI_TEST_VERBOSE") == "1", log_file=None)
57-76: Remove custom runner from tests directory or move it to examples

A bespoke test runner in tests/ won’t be used by pytest and adds confusion. Either delete it or move the script-style runner to docs/examples alongside the Bedrock example.

Apply this diff to remove it:
-async def run_tests():
-    """Run all tests sequentially."""
-    results = []
-
-    results.append(await test_cdp_headers_with_aws())
-
-    # Print summary
-    total = len(results)
-    passed = sum(results)
-    logger.info(f"Tests complete: {passed}/{total} passed", tag="SUMMARY")
-
-    if passed == total:
-        logger.success("All tests passed!", tag="SUMMARY")
-    else:
-        logger.error(f"{total - passed} tests failed", tag="SUMMARY")
-
-
-if __name__ == "__main__":
-    asyncio.run(run_tests())
1-5: Name/intent mismatch: this is an integration test, not a unit test

The docstring labels this as “examples” and the implementation depends on external infrastructure. Marking and gating it (as suggested) keeps CI stable while preserving the example.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e651e04 and 9315151.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

crawl4ai/async_configs.py (4 hunks)
crawl4ai/browser_manager.py (1 hunks)
docs/examples/amazon_bedrock_agentcore_browser_integration.py (1 hunks)
docs/md_v2/core/examples.md (1 hunks)
pyproject.toml (1 hunks)
tests/browser/test_cdp_headers.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

docs/examples/amazon_bedrock_agentcore_browser_integration.py (2)

crawl4ai/async_webcrawler.py (1)

AsyncWebCrawler (53-852)

crawl4ai/async_configs.py (1)

BrowserConfig (329-625)

tests/browser/test_cdp_headers.py (3)

crawl4ai/async_webcrawler.py (1)

AsyncWebCrawler (53-852)

crawl4ai/async_configs.py (2)

BrowserConfig (329-625)

CrawlerRunConfig (837-1672)

crawl4ai/async_logger.py (1)

AsyncLogger (79-322)

🪛 Ruff (0.12.2)

tests/browser/test_cdp_headers.py

20-20: crawl4ai.async_configs.CrawlerRunConfig imported but unused

Remove unused import: crawl4ai.async_configs.CrawlerRunConfig

(F401)

🔇 Additional comments (7)

pyproject.toml (1)

67-67: Verify PyPI package existence and pin a minimal version
It wasn’t possible to confirm “bedrock-agentcore” on PyPI using the prior script (the JSON response wasn’t iterable). Please manually verify the correct package name and list of available versions, then pin at least the minimum known‐good release to prevent future breakage.

• To check existence and HTTP status:
curl -o /dev/null -s -w "%{http_code}" https://pypi.org/pypi/bedrock-agentcore/json
• Or visit: https://pypi.org/project/bedrock-agentcore/
• If the package is named differently (e.g. bedrock-agent-core), update the dependency.
• Once confirmed, change the entry in pyproject.toml, for example:
aws = ["bedrock-agentcore>=1.2.3"]
crawl4ai/async_configs.py (3)

405-406: Good addition: cdp_headers in BrowserConfig ctor

Accepting CDP headers at construction time is aligned with the new CDP integration.

444-445: Sane default for cdp_headers

Defaulting to {} avoids downstream None checks.

531-532: from_kwargs wiring looks correct

cdp_headers is correctly passed through during deserialization.

crawl4ai/browser_manager.py (2)

663-667: Pass-through of CDP headers to connect_over_cdp is correct

This is the right place to forward headers to the CDP endpoint.

663-667: Playwright headers support confirmed
The headers parameter for BrowserType.connect_over_cdp was introduced in Playwright Python v1.11¹, and our minimum pinned version (>= 1.49.0) is well beyond that². No runtime risk of a missing headers kwarg—no changes needed.

¹ https://playwright.dev/python/docs/api/class-browsertype?utm_source=chatgpt.com
² https://playwright.dev/python/docs/release-notes?utm_source=chatgpt.com

docs/examples/amazon_bedrock_agentcore_browser_integration.py (1)

17-21: Example is correct and concise

Creating BrowserConfig with cdp_url and cdp_headers demonstrates the new feature well.

coderabbitai · 2025-08-20T06:08:57Z

tests/browser/test_cdp_headers.py

+from bedrock_agentcore.tools.browser_client import BrowserClient
+


🛠️ Refactor suggestion

Make the Bedrock import optional and avoid import-time failures

Unconditionally importing an optional dependency will break test collection in environments that don’t install the extra (CI by default). Gate it behind pytest.importorskip and move it inside the test. Also add contextlib/pytest imports now used by the test changes below.

Apply this diff:

@@ import asyncio import os import sys +import contextlib +import pytest -from bedrock_agentcore.tools.browser_client import BrowserClient +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from bedrock_agentcore.tools.browser_client import BrowserClient

import asyncio

import os

import sys

import contextlib

import pytest

🤖 Prompt for AI Agents

In tests/browser/test_cdp_headers.py around lines 11 to 12, the test unconditionally imports BrowserClient from bedrock_agentcore.tools.browser_client which causes import-time failures in environments without that optional extra; change this to use pytest.importorskip to gate the bedrock dependency and move the import inside the test function so test collection doesn't fail, and add the necessary imports for pytest and contextlib at the top of the file to support the new pattern.

coderabbitai · 2025-08-20T06:08:58Z

tests/browser/test_cdp_headers.py

+async def test_cdp_headers_with_aws():
+    """Test launching a browser and connecting via CDP."""
+    logger.info("Testing launch and connect via CDP", tag="TEST")
+
+    try:
+        client = BrowserClient(region="us-east-1")
+        client.start()
+
+        ws_url, headers = client.generate_ws_headers()
+
+        browser_config = BrowserConfig(
+            browser_type="chromium",
+            cdp_url=ws_url,
+            cdp_headers=headers,
+        )
+
+        async with AsyncWebCrawler(config=browser_config) as crawler:
+            result = await crawler.arun(
+                url="https://www.nbcnews.com/business",
+            )
+            print(result.markdown)
+
+        logger.success("CDP headers test passed", tag="TEST")
+        return True
+
+    except Exception as e:
+        logger.error(f"CDP headers test failed: {str(e)}", tag="TEST")
+        return False
+


🛠️ Refactor suggestion

⚠️ Potential issue

Test currently always passes (exceptions swallowed); convert to a real pytest-asyncio test with assertions, skip gating, and proper cleanup

Catching all exceptions and returning True/False means pytest will treat the test as passed regardless of failures. Also, the import should be optional, and the external E2E dependency should be opt-in. Add:

pytest-asyncio decorator

opt-in gating via env var

importorskip for the optional extra

deterministic assertion(s)

finally cleanup for the BrowserClient

explicitly set browser_mode="cdp"

avoid printing large page content and use a stable URL

Apply this diff:

@@ -async def test_cdp_headers_with_aws(): - """Test launching a browser and connecting via CDP.""" - logger.info("Testing launch and connect via CDP", tag="TEST") - - try: - client = BrowserClient(region="us-east-1") - client.start() - - ws_url, headers = client.generate_ws_headers() - - browser_config = BrowserConfig( - browser_type="chromium", - cdp_url=ws_url, - cdp_headers=headers, - ) - - async with AsyncWebCrawler(config=browser_config) as crawler: - result = await crawler.arun( - url="https://www.nbcnews.com/business", - ) - print(result.markdown) - - logger.success("CDP headers test passed", tag="TEST") - return True - - except Exception as e: - logger.error(f"CDP headers test failed: {str(e)}", tag="TEST") - return False +@pytest.mark.asyncio +async def test_cdp_headers_with_aws(): + """Test launching a browser and connecting via CDP.""" + logger.info("Testing launch and connect via CDP", tag="TEST") + + # Opt-in: run only when explicitly enabled + if os.getenv("CRAWL4AI_E2E_BEDROCK") != "1": + pytest.skip("E2E Bedrock test disabled. Set CRAWL4AI_E2E_BEDROCK=1 to enable.") + + # Optional dependency + browser_mod = pytest.importorskip( + "bedrock_agentcore.tools.browser_client", + reason="Install crawl4ai[bedrock-agentcore] to run this test.", + ) + BrowserClient = getattr(browser_mod, "BrowserClient") + + region = os.getenv("BEDROCK_REGION", "us-east-1") + client = BrowserClient(region=region) + try: + client.start() + + ws_url, headers = client.generate_ws_headers() + + browser_config = BrowserConfig( + browser_type="chromium", + browser_mode="cdp", + cdp_url=ws_url, + cdp_headers=headers, + ) + + async with AsyncWebCrawler(config=browser_config, logger=logger) as crawler: + result = await crawler.arun( + url="https://example.com", + ) + + # Minimal, deterministic assertions + assert result.success is True + assert result.markdown is not None and len(result.markdown) > 0 + logger.success("CDP headers test passed", tag="TEST") + finally: + with contextlib.suppress(Exception): + client.stop()

As a follow-up, consider adding a small unit test (with monkeypatch) to assert that BrowserManager passes cdp_headers to Playwright’s connect_over_cdp. I can draft it if you want.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def test_cdp_headers_with_aws():

"""Test launching a browser and connecting via CDP."""

logger.info("Testing launch and connect via CDP", tag="TEST")

try:

client = BrowserClient(region="us-east-1")

client.start()

ws_url, headers = client.generate_ws_headers()

browser_config = BrowserConfig(

browser_type="chromium",

cdp_url=ws_url,

cdp_headers=headers,

)

async with AsyncWebCrawler(config=browser_config) as crawler:

result = await crawler.arun(

url="https://www.nbcnews.com/business",

)

print(result.markdown)

logger.success("CDP headers test passed", tag="TEST")

return True

except Exception as e:

logger.error(f"CDP headers test failed: {str(e)}", tag="TEST")

return False

@pytest.mark.asyncio

async def test_cdp_headers_with_aws():

"""Test launching a browser and connecting via CDP."""

logger.info("Testing launch and connect via CDP", tag="TEST")

# Opt-in: run only when explicitly enabled

if os.getenv("CRAWL4AI_E2E_BEDROCK") != "1":

pytest.skip("E2E Bedrock test disabled. Set CRAWL4AI_E2E_BEDROCK=1 to enable.")

# Optional dependency

browser_mod = pytest.importorskip(

"bedrock_agentcore.tools.browser_client",

reason="Install crawl4ai[bedrock-agentcore] to run this test.",

)

BrowserClient = getattr(browser_mod, "BrowserClient")

region = os.getenv("BEDROCK_REGION", "us-east-1")

client = BrowserClient(region=region)

try:

client.start()

ws_url, headers = client.generate_ws_headers()

browser_config = BrowserConfig(

browser_type="chromium",

browser_mode="cdp",

cdp_url=ws_url,

cdp_headers=headers,

)

async with AsyncWebCrawler(config=browser_config, logger=logger) as crawler:

result = await crawler.arun(

url="https://example.com",

)

# Minimal, deterministic assertions

assert result.success is True

assert result.markdown is not None and len(result.markdown) > 0

logger.success("CDP headers test passed", tag="TEST")

finally:

with contextlib.suppress(Exception):

client.stop()

🤖 Prompt for AI Agents

In tests/browser/test_cdp_headers.py lines 27-55, the current test swallows exceptions and always returns True/False; convert it into a proper pytest-asyncio test: add the pytest.mark.asyncio decorator, gate running via an environment variable (skip the test unless set), use pytest.importorskip for optional E2E extras, set BrowserConfig/browser client to explicit browser_mode="cdp", avoid printing page content and use a stable URL, replace the top-level try/except with real assertions (e.g., assert crawler.run result meets expected conditions) and ensure deterministic checks, and add a finally block to always stop/cleanup the BrowserClient to avoid leaking resources.

coderabbitai · 2025-08-20T06:08:58Z

tests/browser/test_cdp_headers.py

+logger = AsyncLogger(verbose=True, log_file=None)
+
+
+async def test_cdp_headers_with_aws():


💡 Verification agent

🧩 Analysis chain

Verify pytest-asyncio configuration is present

Since we’re using @pytest.mark.asyncio, ensure pytest-asyncio is installed and asyncio mode is configured. Run this repo-scoped script to confirm:

🏁 Script executed:

#!/bin/bash # Check for pytest-asyncio in project files and config of asyncio mode set -euo pipefail echo "Searching for pytest-asyncio dependency mentions..." rg -n -C2 -i 'pytest-asyncio' || true echo echo "Looking for asyncio_mode in pytest config files..." fd -a 'pytest.ini|pyproject.toml|setup.cfg|tox.ini' | while read -r f; do echo "---- $f ----" rg -n -C2 -i 'asyncio_mode|pytest_plugins|markers' "$f" || true done echo echo "Examples of existing async tests (if any):" rg -nP --type=py -g 'tests/**' -C3 '@pytest\.mark\.asyncio|async def test_' || true

Length of output: 161871

Add pytest-asyncio and configure asyncio mode

It looks like there’s no explicit pytest-asyncio dependency or asyncio_mode/marker registration in your pytest config, and the new async test in tests/browser/test_cdp_headers.py isn’t decorated—under the default “strict” mode your async def tests will be skipped or error.

• tests/browser/test_cdp_headers.py:27
– Add the missing decorator above the async test.
• Project config (pyproject.toml, pytest.ini or setup.cfg)
– Ensure pytest-asyncio is listed in your dev-dependencies (or requirements).
– Register the asyncio marker or set asyncio_mode to “auto” so bare async def tests run.

Example diff for the test itself:

--- a/tests/browser/test_cdp_headers.py +++ b/tests/browser/test_cdp_headers.py @@ -async def test_cdp_headers_with_aws(): +import pytest + +@pytest.mark.asyncio +async def test_cdp_headers_with_aws():

And in your pytest config (e.g. pyproject.toml):

[tool.pytest.ini_options] asyncio_mode = "auto" markers = [ "asyncio: mark async tests using pytest-asyncio" ]

or equivalently in pytest.ini / setup.cfg under [pytest] section.

🤖 Prompt for AI Agents

In tests/browser/test_cdp_headers.py around line 27, the new async test function is missing a pytest-asyncio marker/decorator so it will be skipped or error under strict asyncio mode; add @pytest.mark.asyncio above the async def test_cdp_headers_with_aws() (and import pytest if not present). Also add pytest-asyncio to the dev dependencies and configure pytest to allow bare async tests by setting asyncio_mode = "auto" or registering the asyncio marker in your test config (pyproject.toml, pytest.ini, or setup.cfg).

update: add cdp headers in browser config + tests with amazon bedrock…

9315151

… agentcore.

NickMandylas changed the title ~~feat(browser): Add~~ feat(browser): Add CDP Headers Support for Remote Browser Integration Aug 20, 2025

coderabbitai bot reviewed Aug 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(browser): Add CDP Headers Support for Remote Browser Integration #1417

feat(browser): Add CDP Headers Support for Remote Browser Integration #1417
NickMandylas wants to merge 1 commit intounclecode:mainfrom
NickMandylas:main

NickMandylas commented Aug 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 20, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 20, 2025

Uh oh!

coderabbitai bot Aug 20, 2025

Uh oh!

coderabbitai bot Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from bedrock_agentcore.tools.browser_client import BrowserClient

-from bedrock_agentcore.tools.browser_client import BrowserClient
+import asyncio
+import os
+import sys
+import contextlib
+import pytest

		logger = AsyncLogger(verbose=True, log_file=None)


		async def test_cdp_headers_with_aws():

Uh oh!

Conversation

NickMandylas commented Aug 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of files changed and why

How Has This Been Tested?

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NickMandylas commented Aug 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 20, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)