feat(browser): Add CDP Headers Support for Remote Browser Integration #1417
feat(browser): Add CDP Headers Support for Remote Browser Integration #1417NickMandylas wants to merge 1 commit intounclecode:mainfrom
Conversation
WalkthroughAdds CDP headers support to BrowserConfig and propagates it through serialization and BrowserManager’s CDP connection. Introduces an example integrating Amazon Bedrock Agentcore Browser, updates docs to reference it, adds an optional aws dependency group, and adds a functional test exercising CDP headers with a remote Playwright session. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User Code
participant AM as AsyncWebCrawler
participant BM as BrowserManager
participant PW as Playwright (CDP)
participant RB as Remote Browser (Bedrock Agentcore)
U->>AM: arun(url, config with cdp_url + cdp_headers)
AM->>BM: start(config)
BM->>PW: connect_over_cdp(cdp_url, headers=cdp_headers)
PW-->>RB: WebSocket handshake + headers
RB-->>PW: Session established
BM-->>AM: Browser ready
AM->>RB: Navigate and extract
RB-->>AM: Page content/markdown
AM-->>U: Result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crawl4ai/async_configs.py (1)
562-596: Redact sensitive headers in to_dict to prevent secret leakageSerializing BrowserConfig currently emits cdp_headers verbatim. If configs are logged or persisted, Authorization and similar secrets leak.
Apply this diff to mask common sensitive headers on serialization:
def to_dict(self): - result = { + # Redact sensitive header values to avoid leaking secrets in logs/telemetry. + def _redact_headers(d: dict) -> dict: + if not isinstance(d, dict): + return d + sensitive = { + "authorization", + "proxy-authorization", + "x-api-key", + "x-amzn-authorization", + "x-aws-ec2-metadata-token", + } + redacted = {} + for k, v in d.items(): + redacted[k] = "***" if isinstance(k, str) and k.lower() in sensitive else v + return redacted + + result = { "browser_type": self.browser_type, "headless": self.headless, "browser_mode": self.browser_mode, "use_managed_browser": self.use_managed_browser, "cdp_url": self.cdp_url, - "cdp_headers": self.cdp_headers, + "cdp_headers": _redact_headers(self.cdp_headers), "use_persistent_context": self.use_persistent_context, "user_data_dir": self.user_data_dir, "chrome_channel": self.chrome_channel, "channel": self.channel, "proxy": self.proxy, "proxy_config": self.proxy_config, "viewport_width": self.viewport_width, "viewport_height": self.viewport_height, "accept_downloads": self.accept_downloads, "downloads_path": self.downloads_path, "storage_state": self.storage_state, "ignore_https_errors": self.ignore_https_errors, "java_script_enabled": self.java_script_enabled, "cookies": self.cookies, "headers": self.headers, "user_agent": self.user_agent, "user_agent_mode": self.user_agent_mode, "user_agent_generator_config": self.user_agent_generator_config, "text_mode": self.text_mode, "light_mode": self.light_mode, "extra_args": self.extra_args, "sleep_on_close": self.sleep_on_close, "verbose": self.verbose, "debugging_port": self.debugging_port, "host": self.host, "enable_stealth": self.enable_stealth, }
🧹 Nitpick comments (15)
pyproject.toml (2)
67-77: Add bedrock-agentcore to the “all” extra for consistencyYou added the aws extra, but “all” doesn’t include bedrock-agentcore. Users installing crawl4ai[all] will miss this dependency.
Apply this diff to include bedrock-agentcore in the all extra:
[project.optional-dependencies] pdf = ["PyPDF2"] torch = ["torch", "nltk", "scikit-learn"] transformer = ["transformers", "tokenizers", "sentence-transformers"] cosine = ["torch", "transformers", "nltk", "sentence-transformers"] sync = ["selenium"] aws = ["bedrock-agentcore"] all = [ "PyPDF2", "torch", "nltk", "scikit-learn", "transformers", "tokenizers", "sentence-transformers", - "selenium" + "selenium", + "bedrock-agentcore" ]
67-67: Document the extra name in examplesSince the new example relies on this extra, please ensure docs mention installation via:
- pip install 'crawl4ai[aws]'
I can add a short prerequisites note to the example section if you’d like.
docs/md_v2/core/examples.md (2)
25-25: Fix product casing: “AgentCore”, not “Agentcore”Use “Amazon Bedrock AgentCore Browser” consistently.
-| Remote Browser (with Amazon Bedrock AgentCore Browser) | Shows how to integrate with Amazon Bedrock Agentcore Browser for remote browser sessions. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_bedrock_agentcore_browser_integration.py) | +| Remote Browser (with Amazon Bedrock AgentCore Browser) | Shows how to integrate with Amazon Bedrock AgentCore Browser for remote browser sessions. | [View Code](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/amazon_bedrock_agentcore_browser_integration.py) |
105-121: Add note about installing extras for the Bedrock example (optional)Consider adding a brief mention that some examples (e.g., Bedrock AgentCore) require extras like:
- pip install 'crawl4ai[aws]'
I can raise a small doc PR if preferred.
crawl4ai/async_configs.py (1)
405-406: Type narrow cdp_headers to Mapping[str, str] (optional)Using Optional[Mapping[str, str]] instead of bare dict communicates intent and avoids unexpected mutability.
- cdp_headers: dict = None, + cdp_headers: Optional[Mapping[str, str]] = None, ... - self.cdp_headers = cdp_headers if cdp_headers is not None else {} + self.cdp_headers = dict(cdp_headers) if cdp_headers is not None else {}crawl4ai/browser_manager.py (1)
663-671: Improve error handling for CDP connect (optional)Wrap connect_over_cdp in try/except to surface header-related failures clearly (401/403), and avoid silent fall-throughs.
- self.browser = await self.playwright.chromium.connect_over_cdp(cdp_url, headers=self.config.cdp_headers) + try: + self.browser = await self.playwright.chromium.connect_over_cdp( + cdp_url, headers=(self.config.cdp_headers or None) + ) + except Exception as e: + if self.logger: + self.logger.error( + message="CDP connect failed: {error}", + tag="BROWSER", + params={"error": str(e)}, + ) + raisedocs/examples/amazon_bedrock_agentcore_browser_integration.py (4)
1-3: Fix product casing in example docstringUse “AgentCore” consistently.
-""" -This example demonstrates how to integrate Amazon Bedrock Agentcore Browser, a remote browser session using Playwright. -""" +""" +This example demonstrates how to integrate Amazon Bedrock AgentCore Browser for a remote Playwright session. +"""
12-16: Graceful lifecycle management for BrowserClient (verify and adopt context manager if available)Starting the client without closing it may leak resources. Prefer an async context manager or explicit close/shutdown if provided.
Proposed pattern (adjust to the actual API):
- client = BrowserClient(region="us-east-1") - client.start() - - ws_url, headers = client.generate_ws_headers() + async with BrowserClient(region="us-east-1") as client: + await client.start() # if required; some context managers auto-start + ws_url, headers = client.generate_ws_headers()If no async CM exists, ensure a corresponding client.stop()/close() is called in a finally block.
23-27: Add runnable entrypoint for convenience (optional)Including an entrypoint helps users run the example directly.
async def use_bedrock_agentcore(): @@ print(result.markdown) + +if __name__ == "__main__": + import asyncio + asyncio.run(use_bedrock_agentcore())
5-5: Call out installation prerequisites in the example header (optional)Add a short comment indicating the required extra and AWS credentials.
Example:
- Requires: pip install 'crawl4ai[aws]'
- AWS credentials with permissions to use Bedrock AgentCore Browser (e.g., via environment or config)
tests/browser/test_cdp_headers.py (5)
19-22: Remove unused import flagged by RuffCrawlerRunConfig isn’t used here. Clean it up to satisfy Ruff F401.
Apply this diff:
-from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig +from crawl4ai.async_configs import BrowserConfig
37-41: Be explicit: set browser_mode="cdp"Depending on BrowserManager logic, relying only on cdp_url may work, but being explicit avoids ambiguity and future regressions.
You already set this in the refactor above. If you prefer the implicit behavior, please confirm that crawl4ai/browser_manager.py branches to connect_over_cdp solely based on
cdp_urlbeing set, independent ofbrowser_mode.
23-25: Reduce CI noise from test loggingGlobal verbose logger can spam CI output. Make verbosity opt-in via env.
Apply this diff:
-logger = AsyncLogger(verbose=True, log_file=None) +logger = AsyncLogger(verbose=os.getenv("CRAWL4AI_TEST_VERBOSE") == "1", log_file=None)
57-76: Remove custom runner from tests directory or move it to examplesA bespoke test runner in tests/ won’t be used by pytest and adds confusion. Either delete it or move the script-style runner to docs/examples alongside the Bedrock example.
Apply this diff to remove it:
-async def run_tests(): - """Run all tests sequentially.""" - results = [] - - results.append(await test_cdp_headers_with_aws()) - - # Print summary - total = len(results) - passed = sum(results) - logger.info(f"Tests complete: {passed}/{total} passed", tag="SUMMARY") - - if passed == total: - logger.success("All tests passed!", tag="SUMMARY") - else: - logger.error(f"{total - passed} tests failed", tag="SUMMARY") - - -if __name__ == "__main__": - asyncio.run(run_tests())
1-5: Name/intent mismatch: this is an integration test, not a unit testThe docstring labels this as “examples” and the implementation depends on external infrastructure. Marking and gating it (as suggested) keeps CI stable while preserving the example.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
crawl4ai/async_configs.py(4 hunks)crawl4ai/browser_manager.py(1 hunks)docs/examples/amazon_bedrock_agentcore_browser_integration.py(1 hunks)docs/md_v2/core/examples.md(1 hunks)pyproject.toml(1 hunks)tests/browser/test_cdp_headers.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
docs/examples/amazon_bedrock_agentcore_browser_integration.py (2)
crawl4ai/async_webcrawler.py (1)
AsyncWebCrawler(53-852)crawl4ai/async_configs.py (1)
BrowserConfig(329-625)
tests/browser/test_cdp_headers.py (3)
crawl4ai/async_webcrawler.py (1)
AsyncWebCrawler(53-852)crawl4ai/async_configs.py (2)
BrowserConfig(329-625)CrawlerRunConfig(837-1672)crawl4ai/async_logger.py (1)
AsyncLogger(79-322)
🪛 Ruff (0.12.2)
tests/browser/test_cdp_headers.py
20-20: crawl4ai.async_configs.CrawlerRunConfig imported but unused
Remove unused import: crawl4ai.async_configs.CrawlerRunConfig
(F401)
🔇 Additional comments (7)
pyproject.toml (1)
67-67: Verify PyPI package existence and pin a minimal version
It wasn’t possible to confirm “bedrock-agentcore” on PyPI using the prior script (the JSON response wasn’t iterable). Please manually verify the correct package name and list of available versions, then pin at least the minimum known‐good release to prevent future breakage.• To check existence and HTTP status:
curl -o /dev/null -s -w "%{http_code}" https://pypi.org/pypi/bedrock-agentcore/json• Or visit: https://pypi.org/project/bedrock-agentcore/
• If the package is named differently (e.g.bedrock-agent-core), update the dependency.
• Once confirmed, change the entry in pyproject.toml, for example:aws = ["bedrock-agentcore>=1.2.3"]crawl4ai/async_configs.py (3)
405-406: Good addition: cdp_headers in BrowserConfig ctorAccepting CDP headers at construction time is aligned with the new CDP integration.
444-445: Sane default for cdp_headersDefaulting to {} avoids downstream None checks.
531-532: from_kwargs wiring looks correctcdp_headers is correctly passed through during deserialization.
crawl4ai/browser_manager.py (2)
663-667: Pass-through of CDP headers to connect_over_cdp is correctThis is the right place to forward headers to the CDP endpoint.
663-667: Playwrightheaderssupport confirmed
Theheadersparameter forBrowserType.connect_over_cdpwas introduced in Playwright Python v1.11¹, and our minimum pinned version (>= 1.49.0) is well beyond that². No runtime risk of a missingheaderskwarg—no changes needed.¹ https://playwright.dev/python/docs/api/class-browsertype?utm_source=chatgpt.com
² https://playwright.dev/python/docs/release-notes?utm_source=chatgpt.comdocs/examples/amazon_bedrock_agentcore_browser_integration.py (1)
17-21: Example is correct and conciseCreating BrowserConfig with cdp_url and cdp_headers demonstrates the new feature well.
| from bedrock_agentcore.tools.browser_client import BrowserClient | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make the Bedrock import optional and avoid import-time failures
Unconditionally importing an optional dependency will break test collection in environments that don’t install the extra (CI by default). Gate it behind pytest.importorskip and move it inside the test. Also add contextlib/pytest imports now used by the test changes below.
Apply this diff:
@@
import asyncio
import os
import sys
+import contextlib
+import pytest
-from bedrock_agentcore.tools.browser_client import BrowserClient
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from bedrock_agentcore.tools.browser_client import BrowserClient | |
| import asyncio | |
| import os | |
| import sys | |
| import contextlib | |
| import pytest |
🤖 Prompt for AI Agents
In tests/browser/test_cdp_headers.py around lines 11 to 12, the test
unconditionally imports BrowserClient from
bedrock_agentcore.tools.browser_client which causes import-time failures in
environments without that optional extra; change this to use pytest.importorskip
to gate the bedrock dependency and move the import inside the test function so
test collection doesn't fail, and add the necessary imports for pytest and
contextlib at the top of the file to support the new pattern.
| async def test_cdp_headers_with_aws(): | ||
| """Test launching a browser and connecting via CDP.""" | ||
| logger.info("Testing launch and connect via CDP", tag="TEST") | ||
|
|
||
| try: | ||
| client = BrowserClient(region="us-east-1") | ||
| client.start() | ||
|
|
||
| ws_url, headers = client.generate_ws_headers() | ||
|
|
||
| browser_config = BrowserConfig( | ||
| browser_type="chromium", | ||
| cdp_url=ws_url, | ||
| cdp_headers=headers, | ||
| ) | ||
|
|
||
| async with AsyncWebCrawler(config=browser_config) as crawler: | ||
| result = await crawler.arun( | ||
| url="https://www.nbcnews.com/business", | ||
| ) | ||
| print(result.markdown) | ||
|
|
||
| logger.success("CDP headers test passed", tag="TEST") | ||
| return True | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"CDP headers test failed: {str(e)}", tag="TEST") | ||
| return False | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Test currently always passes (exceptions swallowed); convert to a real pytest-asyncio test with assertions, skip gating, and proper cleanup
Catching all exceptions and returning True/False means pytest will treat the test as passed regardless of failures. Also, the import should be optional, and the external E2E dependency should be opt-in. Add:
- pytest-asyncio decorator
- opt-in gating via env var
- importorskip for the optional extra
- deterministic assertion(s)
- finally cleanup for the BrowserClient
- explicitly set browser_mode="cdp"
- avoid printing large page content and use a stable URL
Apply this diff:
@@
-async def test_cdp_headers_with_aws():
- """Test launching a browser and connecting via CDP."""
- logger.info("Testing launch and connect via CDP", tag="TEST")
-
- try:
- client = BrowserClient(region="us-east-1")
- client.start()
-
- ws_url, headers = client.generate_ws_headers()
-
- browser_config = BrowserConfig(
- browser_type="chromium",
- cdp_url=ws_url,
- cdp_headers=headers,
- )
-
- async with AsyncWebCrawler(config=browser_config) as crawler:
- result = await crawler.arun(
- url="https://www.nbcnews.com/business",
- )
- print(result.markdown)
-
- logger.success("CDP headers test passed", tag="TEST")
- return True
-
- except Exception as e:
- logger.error(f"CDP headers test failed: {str(e)}", tag="TEST")
- return False
+@pytest.mark.asyncio
+async def test_cdp_headers_with_aws():
+ """Test launching a browser and connecting via CDP."""
+ logger.info("Testing launch and connect via CDP", tag="TEST")
+
+ # Opt-in: run only when explicitly enabled
+ if os.getenv("CRAWL4AI_E2E_BEDROCK") != "1":
+ pytest.skip("E2E Bedrock test disabled. Set CRAWL4AI_E2E_BEDROCK=1 to enable.")
+
+ # Optional dependency
+ browser_mod = pytest.importorskip(
+ "bedrock_agentcore.tools.browser_client",
+ reason="Install crawl4ai[bedrock-agentcore] to run this test.",
+ )
+ BrowserClient = getattr(browser_mod, "BrowserClient")
+
+ region = os.getenv("BEDROCK_REGION", "us-east-1")
+ client = BrowserClient(region=region)
+ try:
+ client.start()
+
+ ws_url, headers = client.generate_ws_headers()
+
+ browser_config = BrowserConfig(
+ browser_type="chromium",
+ browser_mode="cdp",
+ cdp_url=ws_url,
+ cdp_headers=headers,
+ )
+
+ async with AsyncWebCrawler(config=browser_config, logger=logger) as crawler:
+ result = await crawler.arun(
+ url="https://example.com",
+ )
+
+ # Minimal, deterministic assertions
+ assert result.success is True
+ assert result.markdown is not None and len(result.markdown) > 0
+ logger.success("CDP headers test passed", tag="TEST")
+ finally:
+ with contextlib.suppress(Exception):
+ client.stop()As a follow-up, consider adding a small unit test (with monkeypatch) to assert that BrowserManager passes cdp_headers to Playwright’s connect_over_cdp. I can draft it if you want.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_cdp_headers_with_aws(): | |
| """Test launching a browser and connecting via CDP.""" | |
| logger.info("Testing launch and connect via CDP", tag="TEST") | |
| try: | |
| client = BrowserClient(region="us-east-1") | |
| client.start() | |
| ws_url, headers = client.generate_ws_headers() | |
| browser_config = BrowserConfig( | |
| browser_type="chromium", | |
| cdp_url=ws_url, | |
| cdp_headers=headers, | |
| ) | |
| async with AsyncWebCrawler(config=browser_config) as crawler: | |
| result = await crawler.arun( | |
| url="https://www.nbcnews.com/business", | |
| ) | |
| print(result.markdown) | |
| logger.success("CDP headers test passed", tag="TEST") | |
| return True | |
| except Exception as e: | |
| logger.error(f"CDP headers test failed: {str(e)}", tag="TEST") | |
| return False | |
| @pytest.mark.asyncio | |
| async def test_cdp_headers_with_aws(): | |
| """Test launching a browser and connecting via CDP.""" | |
| logger.info("Testing launch and connect via CDP", tag="TEST") | |
| # Opt-in: run only when explicitly enabled | |
| if os.getenv("CRAWL4AI_E2E_BEDROCK") != "1": | |
| pytest.skip("E2E Bedrock test disabled. Set CRAWL4AI_E2E_BEDROCK=1 to enable.") | |
| # Optional dependency | |
| browser_mod = pytest.importorskip( | |
| "bedrock_agentcore.tools.browser_client", | |
| reason="Install crawl4ai[bedrock-agentcore] to run this test.", | |
| ) | |
| BrowserClient = getattr(browser_mod, "BrowserClient") | |
| region = os.getenv("BEDROCK_REGION", "us-east-1") | |
| client = BrowserClient(region=region) | |
| try: | |
| client.start() | |
| ws_url, headers = client.generate_ws_headers() | |
| browser_config = BrowserConfig( | |
| browser_type="chromium", | |
| browser_mode="cdp", | |
| cdp_url=ws_url, | |
| cdp_headers=headers, | |
| ) | |
| async with AsyncWebCrawler(config=browser_config, logger=logger) as crawler: | |
| result = await crawler.arun( | |
| url="https://example.com", | |
| ) | |
| # Minimal, deterministic assertions | |
| assert result.success is True | |
| assert result.markdown is not None and len(result.markdown) > 0 | |
| logger.success("CDP headers test passed", tag="TEST") | |
| finally: | |
| with contextlib.suppress(Exception): | |
| client.stop() |
🤖 Prompt for AI Agents
In tests/browser/test_cdp_headers.py lines 27-55, the current test swallows
exceptions and always returns True/False; convert it into a proper
pytest-asyncio test: add the pytest.mark.asyncio decorator, gate running via an
environment variable (skip the test unless set), use pytest.importorskip for
optional E2E extras, set BrowserConfig/browser client to explicit
browser_mode="cdp", avoid printing page content and use a stable URL, replace
the top-level try/except with real assertions (e.g., assert crawler.run result
meets expected conditions) and ensure deterministic checks, and add a finally
block to always stop/cleanup the BrowserClient to avoid leaking resources.
| logger = AsyncLogger(verbose=True, log_file=None) | ||
|
|
||
|
|
||
| async def test_cdp_headers_with_aws(): |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Verify pytest-asyncio configuration is present
Since we’re using @pytest.mark.asyncio, ensure pytest-asyncio is installed and asyncio mode is configured. Run this repo-scoped script to confirm:
🏁 Script executed:
#!/bin/bash
# Check for pytest-asyncio in project files and config of asyncio mode
set -euo pipefail
echo "Searching for pytest-asyncio dependency mentions..."
rg -n -C2 -i 'pytest-asyncio' || true
echo
echo "Looking for asyncio_mode in pytest config files..."
fd -a 'pytest.ini|pyproject.toml|setup.cfg|tox.ini' | while read -r f; do
echo "---- $f ----"
rg -n -C2 -i 'asyncio_mode|pytest_plugins|markers' "$f" || true
done
echo
echo "Examples of existing async tests (if any):"
rg -nP --type=py -g 'tests/**' -C3 '@pytest\.mark\.asyncio|async def test_' || trueLength of output: 161871
Add pytest-asyncio and configure asyncio mode
It looks like there’s no explicit pytest-asyncio dependency or asyncio_mode/marker registration in your pytest config, and the new async test in tests/browser/test_cdp_headers.py isn’t decorated—under the default “strict” mode your async def tests will be skipped or error.
• tests/browser/test_cdp_headers.py:27
– Add the missing decorator above the async test.
• Project config (pyproject.toml, pytest.ini or setup.cfg)
– Ensure pytest-asyncio is listed in your dev-dependencies (or requirements).
– Register the asyncio marker or set asyncio_mode to “auto” so bare async def tests run.
Example diff for the test itself:
--- a/tests/browser/test_cdp_headers.py
+++ b/tests/browser/test_cdp_headers.py
@@
-async def test_cdp_headers_with_aws():
+import pytest
+
+@pytest.mark.asyncio
+async def test_cdp_headers_with_aws():And in your pytest config (e.g. pyproject.toml):
[tool.pytest.ini_options]
asyncio_mode = "auto"
markers = [
"asyncio: mark async tests using pytest-asyncio"
]or equivalently in pytest.ini / setup.cfg under [pytest] section.
🤖 Prompt for AI Agents
In tests/browser/test_cdp_headers.py around line 27, the new async test function
is missing a pytest-asyncio marker/decorator so it will be skipped or error
under strict asyncio mode; add @pytest.mark.asyncio above the async def
test_cdp_headers_with_aws() (and import pytest if not present). Also add
pytest-asyncio to the dev dependencies and configure pytest to allow bare async
tests by setting asyncio_mode = "auto" or registering the asyncio marker in your
test config (pyproject.toml, pytest.ini, or setup.cfg).
Summary
This update adds support for CDP headers in the browser configuration, enabling integration with remote browser services that require authentication headers. An example of a usecase for this enhancemen enables seamless integration with Amazon Bedrock AgentCore Browser, allowing Crawl4AI to connect to managed browser instances in AWS environments.
The change introduces a new cdp_headers parameter to the BrowserConfig class that allows passing authentication headers when connecting to remote browsers via CDP, expanding Crawl4AI's compatibility with cloud-based browser services.
List of files changed and why
Core Configuration Changes:
crawl4ai/async_configs.py- Addedcdp_headersparameter toBrowserConfigclass constructor, initialization,from_kwargsmethod, andto_dictmethod to support authentication headers for CDP connectionscrawl4ai/browser_manager.py- Modified the CDP connection logic to pass thecdp_headersto Playwright'sconnect_over_cdpmethodDocumentation and Examples:
docs/examples/amazon_bedrock_agentcore_browser_integration.py- New example demonstrating integration with Amazon Bedrock AgentCore Browser using the new CDP headers functionalitydocs/md_v2/core/examples.md- Updated examples documentation to include the new Amazon Bedrock integration exampleTesting:
tests/browser/test_cdp_headers.py- Comprehensive test suite for the CDP headers functionality, including integration tests with Amazon Bedrock AgentCore BrowserDependencies:
pyproject.toml- Added new optional dependency groupaws = ["bedrock-agentcore"]to support Amazon Bedrock integrationHow Has This Been Tested?
I've tested this integration with Amazon Bedrock AgentCore Browser which utilises CDP Headers in order to have an authorised connection from Crawl4AI.
Test is available in file:
tests/browser/test_cdp_headers.py.Checklist:
Summary by CodeRabbit
New Features
Documentation
Tests
Chores