Skip to content

feat: Add web research node with Tavily search integration#56

Open
1wos wants to merge 3 commits intoWithModulabs:v2-mainfrom
1wos:v2-main
Open

feat: Add web research node with Tavily search integration#56
1wos wants to merge 3 commits intoWithModulabs:v2-mainfrom
1wos:v2-main

Conversation

@1wos
Copy link
Contributor

@1wos 1wos commented Feb 17, 2026

#53

  • 선택된 키워드 기반 Tavily 웹 리서치 노드 추가
  • LangChain v1 마이그레이션에 맞춰 langchain-tavily 패키지 사용 (TavilySearch)

Summary by CodeRabbit

  • New Features

    • Added an API health-check endpoint.
    • Integrated web research into the blog-writing workflow, including keyword-based searches and inclusion of search results in generated posts.
  • New Data / Schema

    • Blog post payloads now include title, content, tags, and response timestamps.
  • Style

    • Formatting and readability improvements across the codebase.
  • Chores

    • Added support for an external web-search provider and related configuration.

@vercel
Copy link

vercel bot commented Feb 17, 2026

@1wos is attempting to deploy a commit to the robertchoi's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 17, 2026

📝 Walkthrough

Walkthrough

Adds a health-check GET /api/chk, extends Post schemas with title/content/tags and response metadata, and integrates Tavily-based web research into the blog-writer flow (new WebResearch node, search_with_tavily/web_search functions, SearchProvider enum, and prompt updates).

Changes

Cohort / File(s) Summary
API Health Check
api/chk.py, api/index.py
New GET /api/chk route returning JSON status; minor formatting change in debug_root response.update call.
API Routes & Tools
api/routes/blog.py, api/routes/chat.py, api/routes/tools.py
Cosmetic formatting and docstring spacing; Query(...) params split/multiline and trailing commas added.
API Schemas & Services
api/schemas/api_keys.py, api/schemas/post.py, api/services/post_service.py
api_keys: added Tavily key field, accessor, and header param; Google key resolution expanded. post.py: PostBase gains title, content, tags; PostUpdate and PostResponse extended (id, timestamps, model_config). post_service.py whitespace-only edits.
Blog Writer Graph & State
casts/blog_writer/CLAUDE.md, casts/blog_writer/graph.py, casts/blog_writer/modules/state.py
Inserted web_research node between keyword selection and writing; added SearchProvider enum (TAVILY) and search_results field in BlogState; graph edges updated and node registered.
Blog Writer Nodes, Tools & Prompts
casts/blog_writer/modules/nodes.py, casts/blog_writer/modules/tools.py, casts/blog_writer/modules/prompts.py, casts/blog_writer/modules/models.py
Added WebResearch AsyncBaseNode and wiring; new async functions search_with_tavily(...) and web_search(...); WriteBlog updated to consume search_results and include a search_results section in prompts; prompt requirements expanded to cite web sources.
Dependencies, Tests & Minor Edits
pyproject.toml, casts/chat/modules/models.py, tests/cast_tests/blog_writer_test.py
Added dependency langchain-tavily>=0.2.0; test updated to expect web_research node; small formatting/whitespace tweaks across files.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant HumanSelectKeywords as HumanSelectKeywords
    participant WebResearch as WebResearch
    participant TavilyAPI as Tavily API
    participant WriteBlog as WriteBlog
    participant State as BlogState

    User->>HumanSelectKeywords: request blog with keywords
    HumanSelectKeywords->>State: store selected_keywords
    State-->>WebResearch: trigger with selected_keywords
    WebResearch->>TavilyAPI: search keywords (top N)
    TavilyAPI-->>WebResearch: return search results
    WebResearch->>State: store search_results
    State-->>WriteBlog: provide analyzed_content + selected_keywords + search_results
    WriteBlog->>WriteBlog: format search_results_section and render blog
    WriteBlog-->>User: return blog content with sources
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I hopped through lines with nimble feet,
Added pings and searches, tidy and neat,
Tavily peeks and keywords play,
The blog now cites the web today —
A rabbit cheers this tidy feat!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: Add web research node with Tavily search integration' directly describes the primary change: introducing a new web research node that integrates Tavily search, which aligns with the substantial file changes across the blog writer modules and schema updates.
Docstring Coverage ✅ Passed Docstring coverage is 91.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@1wos 1wos changed the title ✨ feat: Add web research node with Tavily search integration feat: Add web research node with Tavily search integration Feb 17, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/cast_tests/blog_writer_test.py (1)

29-41: ⚠️ Potential issue | 🟡 Minor

"web_research" is missing from the expected nodes list.

The graph now contains 9 nodes (including the new web_research), but this test still only checks for 8. While the test won't fail (it only verifies listed nodes exist, not exclusivity), it silently skips coverage for the newly added node.

Proposed fix
         expected_nodes = [
             "fetch_content",
             "analyze_content",
             "suggest_keywords",
             "human_select_keywords",
+            "web_research",
             "write_blog",
             "optimize_seo",
             "generate_images",
             "convert_to_html",
         ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/cast_tests/blog_writer_test.py` around lines 29 - 41, The
expected_nodes list in the test (variable expected_nodes used in the loop with
node_name) is missing the newly added "web_research" node; update the
expected_nodes array to include "web_research" so the test checks for that node
as well (i.e., add the string "web_research" alongside "fetch_content",
"analyze_content", etc.) to ensure the graph's new node is covered by the
assertions.
🧹 Nitpick comments (7)
api/schemas/post.py (3)

25-25: tags is missing Field(...) for consistency.

title and content both use Field(None, ...) with descriptions/constraints, but tags uses a bare default. This inconsistency makes the schema harder to read and leaves out API documentation for this field.

♻️ Suggested fix
-    tags: Optional[List[str]] = None
+    tags: Optional[List[str]] = Field(None, description="태그 목록")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/schemas/post.py` at line 25, The tags field in the Pydantic model is
using a bare default (Optional[List[str]] = None) and should be made consistent
with title/content by using Field to provide metadata; update the tags
declaration (the tags attribute in the model) to use Field(None,
description="List of tag strings for the post") (or similar wording consistent
with your project's descriptions) so the schema includes API docs and any
constraints.

6-11: Consider adding per-item max_length on tag strings.

tags: List[str] places no upper bound on the length of individual tag strings. Without a max_length constraint on each item, an API client can submit arbitrarily long tag values, which may hit database column limits silently or be exploitable for oversized payloads.

🛡️ Suggested constraint
-    tags: List[str] = Field(default_factory=list, description="태그 목록")
+    tags: List[str] = Field(default_factory=list, description="태그 목록",
+                            max_length=50)  # adjust per-item limit as appropriate

Pydantic v2 applies max_length to each item in a List[str] field.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/schemas/post.py` around lines 6 - 11, The tags field on PostBase
currently allows arbitrarily long strings per item; update the tags definition
(PostBase.tags) to include a per-item max_length via Field (e.g.,
Field(default_factory=list, max_length=50, description="태그 목록")) so each tag
string is bounded (choose an appropriate max_length such as 50 or 100 for your
DB constraints) and keep the default_factory and description intact.

1-3: Consider replacing typing generics with built-in equivalents.

The project targets Python 3.11, which natively supports list[str] and str | None syntax without requiring from __future__ import annotations. Migrating from legacy List and Optional aliases improves code readability and aligns with modern Python best practices.

♻️ Suggested migration
-from pydantic import BaseModel, Field, ConfigDict
-from typing import Optional, List
-from datetime import datetime
+from pydantic import BaseModel, Field, ConfigDict
+from datetime import datetime

Then in the schema bodies:

-    tags: List[str] = Field(...)
+    tags: list[str] = Field(...)

-    title: Optional[str] = Field(...)
+    title: str | None = Field(...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/schemas/post.py` around lines 1 - 3, Replace legacy typing aliases with
built-in generics in the post schema: remove imports of Optional and List from
typing and update any type annotations using Optional[...] to the PEP 604 union
form (e.g., str | None) and List[...] to the built-in bracket form (e.g.,
list[str]); keep imports of BaseModel, Field, ConfigDict and datetime intact and
adjust any annotated fields in the BaseModel subclasses to use the new syntax so
the file uses native Python 3.11 types.
api/chk.py (1)

6-8: Consider stabilizing the message string.

"Standalone routing is working!" reads like a debug phrase rather than a stable health-check response. If this endpoint is permanent, a neutral string such as "ok" or "healthy" is more appropriate for consumers.

✏️ Proposed change
-    return {"status": "ok", "message": "Standalone routing is working!"}
+    return {"status": "ok", "message": "ok"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/chk.py` around lines 6 - 8, The health-check handler check (decorated
with `@app.get`("/api/chk")) returns a debug-like message; change its response to
a stable, neutral value (e.g., {"status": "ok", "message": "healthy"} or simply
{"status": "ok"}) so consumers get a consistent, production-safe health string;
update the return value in the check function accordingly.
casts/blog_writer/modules/state.py (1)

53-58: Consider adding search_provider to BlogWriterConfig.

The other provider enums (llm_provider, image_provider, scraper_type) are all configurable via BlogWriterConfig, but SearchProvider is not. Currently there's only one search provider, so it's not urgent, but adding it now keeps the config surface consistent and prepares for future providers.

Suggested addition
 class BlogWriterConfig(BaseModel):
     """Configuration for Blog Writer cast."""

     llm_provider: LLMProvider = LLMProvider.OPENAI
     image_provider: ImageProvider = ImageProvider.DALLE
     scraper_type: ScraperType = ScraperType.BEAUTIFULSOUP
+    search_provider: SearchProvider = SearchProvider.TAVILY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@casts/blog_writer/modules/state.py` around lines 53 - 58, Add a new field to
BlogWriterConfig named search_provider of type SearchProvider with a sensible
default (e.g., SearchProvider.DEFAULT) so the config surface matches the other
provider enums; update the imports to include SearchProvider if not present and
add search_provider: SearchProvider = SearchProvider.DEFAULT to the
BlogWriterConfig class definition so code referencing BlogWriterConfig (and any
Pydantic parsing) will include the search provider option.
casts/blog_writer/modules/nodes.py (1)

214-250: WebResearch node: solid error handling, but the config-based API key path is likely dead code.

The api_keys.get_tavily_key() call at lines 227-230 requires api_keys to be an object with a get_tavily_key method. However, BlogState.config is typed as dict, so api_keys obtained via state["config"].get("api_keys") will typically be a dict (or None). The hasattr guard prevents a crash, but it means api_key will always be None here, falling through to the TAVILY_API_KEY env var in search_with_tavily.

This isn't broken (env var fallback works), but if you intend to support runtime API key injection via config, consider aligning with a dict-based approach:

Suggested fix
         api_key = None
         if state.get("config"):
             api_keys = state["config"].get("api_keys")
             if api_keys:
-                api_key = (
-                    api_keys.get_tavily_key()
-                    if hasattr(api_keys, "get_tavily_key")
-                    else None
-                )
+                api_key = (
+                    api_keys.get("tavily_key")
+                    if isinstance(api_keys, dict)
+                    else getattr(api_keys, "get_tavily_key", lambda: None)()
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@casts/blog_writer/modules/nodes.py` around lines 214 - 250, The config path
in WebResearch.execute assumes api_keys is an object with get_tavily_key, but
BlogState.config is a dict so that branch is effectively dead; update the logic
that extracts api_key from state["config"].get("api_keys") to support dict-based
injection (e.g., check if api_keys is a dict and read a well-known key like
"tavily" or "tavily_api_key"), while still preserving the existing
hasattr(api_keys, "get_tavily_key") branch for backward compatibility; ensure
the value you extract is passed through to web_search (SearchProvider.TAVILY) so
runtime config API keys override the env fallback.
casts/blog_writer/modules/tools.py (1)

236-247: Error entries are included as search results and will appear in the LLM prompt.

When a keyword search fails, the error message (line 245: f"검색 실패: {e}") is stored in the content field of a result dict. This entry flows through WriteBlog's formatting (nodes.py line 276) and into the LLM prompt. While not a security issue, the LLM seeing error messages like stack traces or connection errors in the <web_research> block could degrade output quality.

Consider filtering out error entries before passing to the prompt, or using a distinct marker so WriteBlog can skip them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@casts/blog_writer/modules/tools.py` around lines 236 - 247, The current
except block in tools.py appends an error-result dict (with content f"검색 실패:
{e}") into all_results so it flows into WriteBlog (nodes.py WriteBlog
formatting) and into the LLM prompt; change the handling so error entries are
not treated as regular search results: either (A) do not append any result on
exception (remove the all_results.append in the except), or (B) append a clearly
typed marker such as {"keyword": keyword, "url":"", "title":"", "content":"",
"is_error": True} and then update WriteBlog (nodes.py formatting function) to
skip any result where is_error is True before building the <web_research> block.
Ensure you update all call sites that expect the result shape to tolerate the
new marker if you choose option B.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@casts/blog_writer/modules/tools.py`:
- Around line 153-203: The search_with_tavily function currently accepts and
passes an api_key to TavilySearch (which ignores it), doesn't parse the JSON
string returned by tool.ainvoke, and thus returns malformed results; update
search_with_tavily to remove the api_key parameter (or mark it unused),
keep/validate tavily_key via os.getenv("TAVILY_API_KEY") and raise if missing,
instantiate TavilySearch without an api_key argument, and after await
tool.ainvoke({"query": query}) call parse the returned JSON string with
json.loads() before normalizing the "results" list into the expected List[dict]
with keys "url", "title", "content".

In `@pyproject.toml`:
- Line 15: Add a Tavily API key field to the API key schema and wire header
extraction into the per-request key resolution: add a new field (e.g.,
TAVILY_API_KEY) to the APIKeys dataclass/schema in api/schemas/api_keys.py, then
update the get_api_keys dependency/function (where other X-... headers are read)
to accept and map the incoming "X-Tavily-API-Key" header into
APIKeys.TAVILY_API_KEY so multi-tenant callers can supply their own key per
request; ensure any Tavily client initialization consumes APIKeys.TAVILY_API_KEY
if present before falling back to env config.

---

Outside diff comments:
In `@tests/cast_tests/blog_writer_test.py`:
- Around line 29-41: The expected_nodes list in the test (variable
expected_nodes used in the loop with node_name) is missing the newly added
"web_research" node; update the expected_nodes array to include "web_research"
so the test checks for that node as well (i.e., add the string "web_research"
alongside "fetch_content", "analyze_content", etc.) to ensure the graph's new
node is covered by the assertions.

---

Nitpick comments:
In `@api/chk.py`:
- Around line 6-8: The health-check handler check (decorated with
`@app.get`("/api/chk")) returns a debug-like message; change its response to a
stable, neutral value (e.g., {"status": "ok", "message": "healthy"} or simply
{"status": "ok"}) so consumers get a consistent, production-safe health string;
update the return value in the check function accordingly.

In `@api/schemas/post.py`:
- Line 25: The tags field in the Pydantic model is using a bare default
(Optional[List[str]] = None) and should be made consistent with title/content by
using Field to provide metadata; update the tags declaration (the tags attribute
in the model) to use Field(None, description="List of tag strings for the post")
(or similar wording consistent with your project's descriptions) so the schema
includes API docs and any constraints.
- Around line 6-11: The tags field on PostBase currently allows arbitrarily long
strings per item; update the tags definition (PostBase.tags) to include a
per-item max_length via Field (e.g., Field(default_factory=list, max_length=50,
description="태그 목록")) so each tag string is bounded (choose an appropriate
max_length such as 50 or 100 for your DB constraints) and keep the
default_factory and description intact.
- Around line 1-3: Replace legacy typing aliases with built-in generics in the
post schema: remove imports of Optional and List from typing and update any type
annotations using Optional[...] to the PEP 604 union form (e.g., str | None) and
List[...] to the built-in bracket form (e.g., list[str]); keep imports of
BaseModel, Field, ConfigDict and datetime intact and adjust any annotated fields
in the BaseModel subclasses to use the new syntax so the file uses native Python
3.11 types.

In `@casts/blog_writer/modules/nodes.py`:
- Around line 214-250: The config path in WebResearch.execute assumes api_keys
is an object with get_tavily_key, but BlogState.config is a dict so that branch
is effectively dead; update the logic that extracts api_key from
state["config"].get("api_keys") to support dict-based injection (e.g., check if
api_keys is a dict and read a well-known key like "tavily" or "tavily_api_key"),
while still preserving the existing hasattr(api_keys, "get_tavily_key") branch
for backward compatibility; ensure the value you extract is passed through to
web_search (SearchProvider.TAVILY) so runtime config API keys override the env
fallback.

In `@casts/blog_writer/modules/state.py`:
- Around line 53-58: Add a new field to BlogWriterConfig named search_provider
of type SearchProvider with a sensible default (e.g., SearchProvider.DEFAULT) so
the config surface matches the other provider enums; update the imports to
include SearchProvider if not present and add search_provider: SearchProvider =
SearchProvider.DEFAULT to the BlogWriterConfig class definition so code
referencing BlogWriterConfig (and any Pydantic parsing) will include the search
provider option.

In `@casts/blog_writer/modules/tools.py`:
- Around line 236-247: The current except block in tools.py appends an
error-result dict (with content f"검색 실패: {e}") into all_results so it flows into
WriteBlog (nodes.py WriteBlog formatting) and into the LLM prompt; change the
handling so error entries are not treated as regular search results: either (A)
do not append any result on exception (remove the all_results.append in the
except), or (B) append a clearly typed marker such as {"keyword": keyword,
"url":"", "title":"", "content":"", "is_error": True} and then update WriteBlog
(nodes.py formatting function) to skip any result where is_error is True before
building the <web_research> block. Ensure you update all call sites that expect
the result shape to tolerate the new marker if you choose option B.

Comment on lines 153 to 203
async def search_with_tavily(
query: str, max_results: int = 3, api_key: str | None = None
) -> list[dict]:
"""Search the web using Tavily API.

Uses langchain-tavily TavilySearch tool (LangChain v1).

Args:
query: Search query string
max_results: Maximum number of results to return
api_key: Optional Tavily API key (falls back to env var)

Returns:
List of search results with url, title, content
"""
from langchain_tavily import TavilySearch

tavily_key = api_key or os.getenv("TAVILY_API_KEY")
if not tavily_key:
raise ValueError("TAVILY_API_KEY가 설정되어 있지 않습니다.")

tool = TavilySearch(
max_results=max_results,
api_key=tavily_key,
)

results = await tool.ainvoke({"query": query})

# Normalize results to a consistent format
if isinstance(results, dict) and "results" in results:
# TavilySearch returns {"query": ..., "results": [...]}
return [
{
"url": r.get("url", ""),
"title": r.get("title", ""),
"content": r.get("content", ""),
}
for r in results["results"]
]
if isinstance(results, str):
return [{"content": results, "url": "", "title": ""}]
if isinstance(results, list):
return [
{
"url": r.get("url", ""),
"title": r.get("title", ""),
"content": r.get("content", ""),
}
for r in results
]
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

langchain-tavily TavilySearch constructor parameters api_key max_results ainvoke

💡 Result:

langchain_tavily.TavilySearch constructor params

TavilySearch(...) is configured via keyword args like:

  • max_results: int = 5
  • topic: str = "general" (e.g. "general", "news", "finance")
  • include_answer: bool
  • include_raw_content: bool
  • include_images: bool
  • include_image_descriptions: bool
  • include_favicon: bool
  • include_usage: bool
  • search_depth: str (e.g. "basic", "advanced")
  • time_range: str | None (e.g. "day", "week", "month", "year")
  • start_date: str | None / end_date: str | None (format YYYY-MM-DD)
  • include_domains: list[str] | None / exclude_domains: list[str] | None (plus country is shown in the GitHub README example)
    [1][2]

api_key parameter?

The langchain-tavily docs show credentials are read from the TAVILY_API_KEY environment variable (not an api_key= constructor argument in the examples). [1][2]

ainvoke

Like other LangChain tools/runnables, you can call it asynchronously:

result = await tool.ainvoke({"query": "..."})

ainvoke is the async counterpart to invoke (default implementation runs invoke in a thread if native async isn’t provided). [3]

Sources:
[1] Tavily LangChain integration docs (langchain-tavily)
[2] langchain-tavily README / PyPI description (parameters + defaults)
[3] LangChain API docs (Runnable/BaseTool ainvoke)


🌐 Web query:

TavilySearch ainvoke return format langchain-tavily results structure

💡 Result:

In LangChain’s Tavily integration, the async call await tool.ainvoke(...) returns the same shape as tool.invoke(...) (just executed asynchronously).

langchain-tavily (from langchain_tavily import TavilySearch)

1) Invoking with plain args

from langchain_tavily import TavilySearch
import json

tool = TavilySearch(max_results=5)

raw = await tool.ainvoke({"query": "euro 2024 host nation"})
data = json.loads(raw)          # raw is a JSON string

raw is a JSON string representing an object like: (docs.langchain.com)

{
  "query": "...",
  "follow_up_questions": null,
  "answer": null,
  "images": [],
  "results": [
    {
      "title": "...",
      "url": "...",
      "content": "...",
      "score": 0.91,
      "raw_content": null
    }
  ],
  "response_time": 1.67
}

2) Invoking with a ToolCall (model-style)

tool_call = {"args": {"query": "euro 2024 host nation"}, "id": "1", "name": "tavily", "type": "tool_call"}
tool_msg = await tool.ainvoke(tool_call)
raw = tool_msg.content          # JSON string (same structure as above)
data = json.loads(raw)

When invoked as a ToolCall, you get a ToolMessage, whose .content is that same JSON string. (docs.langchain.com)

Legacy langchain-community (from langchain_community.tools import TavilySearchResults)

This older tool can be configured with response_format="content_and_artifact" so the ToolMessage carries a parsed artifact dict (in addition to JSON string content). (api.python.langchain.com)

Citations:


Fix critical API integration bugs in search_with_tavily function (lines 153-203).

Three critical issues prevent this function from working:

  1. api_key parameter is not accepted by TavilySearch: The constructor does not recognize an api_key argument. The langchain-tavily integration reads the API key exclusively from the TAVILY_API_KEY environment variable. The passed api_key parameter (line 167) is silently ignored, making the function signature misleading.

  2. ainvoke returns a JSON string, not a dict: The call to await tool.ainvoke({"query": query}) returns a JSON string in the format {"query": "...", "results": [...], ...}, not a Python dict. The code does not parse this string. When isinstance(results, str) is True (line 181), it returns the entire JSON string as a single result's content field, producing malformed output instead of the expected list[dict] structure.

  3. The error-handling check is ineffective: The tavily_key variable computed at line 164 is never actually used since the api_key parameter to TavilySearch is invalid. The ValueError will not prevent initialization if the environment variable is set.

Required fix:

  • Remove the api_key parameter from the TavilySearch constructor and ensure TAVILY_API_KEY is set in the environment.
  • Parse the JSON string returned by ainvoke using json.loads() before attempting to access the "results" key.
  • Revise the function signature and documentation to clarify that the api_key parameter is unused or remove it entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@casts/blog_writer/modules/tools.py` around lines 153 - 203, The
search_with_tavily function currently accepts and passes an api_key to
TavilySearch (which ignores it), doesn't parse the JSON string returned by
tool.ainvoke, and thus returns malformed results; update search_with_tavily to
remove the api_key parameter (or mark it unused), keep/validate tavily_key via
os.getenv("TAVILY_API_KEY") and raise if missing, instantiate TavilySearch
without an api_key argument, and after await tool.ainvoke({"query": query}) call
parse the returned JSON string with json.loads() before normalizing the
"results" list into the expected List[dict] with keys "url", "title", "content".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@casts/blog_writer/modules/tools.py`:
- Around line 168-169: Replace the private import from
langchain_tavily._utilities by using the public API: remove the "from
langchain_tavily._utilities import TavilySearchAPIWrapper" import and instead
import the public wrapper path "from langchain_community.utilities.tavily_search
import TavilySearchAPIWrapper" or, better, refactor to use TavilySearch directly
(the TavilySearch class from langchain_tavily) so you don't instantiate a
private wrapper; update all uses of TavilySearchAPIWrapper in this module (e.g.,
where you construct or call the wrapper) to either use the public
TavilySearchAPIWrapper symbol or call TavilySearch's official methods per the
migration guidance.

---

Duplicate comments:
In `@casts/blog_writer/modules/tools.py`:
- Around line 181-205: The code treats the value returned by tool.ainvoke as
possibly a dict but TavilySearch.ainvoke returns a JSON string, so parse the
JSON string into Python objects before the isinstance checks: call json.loads on
results (handle json.JSONDecodeError and fall back to the original string), then
run the existing branches that expect dict/list/str; update the logic around the
results variable used after tool.ainvoke (and any functions that consume it) so
the dict branch (the {"results": [...]}) is reachable and individual result
objects are extracted rather than wrapping the raw JSON blob into a single
content entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments