LEADS-240: Token usage should be 0 for a re-run with successful cache#176
LEADS-240: Token usage should be 0 for a re-run with successful cache#176xmican10 wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughThis PR modifies token count handling for cached API responses. The API client now zeros out token counts when retrieving cached responses, and the custom LLM module updates token tracking logic to check for cache hits and only count tokens when responses are not cached, preventing duplicate token accounting. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/api/client.py (1)
284-289: Avoid mutating cached response objects in place.This works functionally, but mutating cached model instances can create hidden side effects. Prefer returning a copy with zeroed token fields.
♻️ Proposed refactor
- # Zero out token counts for cached responses since no API call was made - if cached_response is not None: - cached_response.input_tokens = 0 - cached_response.output_tokens = 0 - - return cached_response + # Return zero token usage for cache hits without mutating cached object state + if cached_response is None: + return None + + return cached_response.model_copy( + update={"input_tokens": 0, "output_tokens": 0} + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/api/client.py` around lines 284 - 289, The current code mutates the cached_response object in place by setting cached_response.input_tokens/output_tokens to 0; instead create and return a copy to avoid side effects—make a shallow copy (e.g., via copy.copy, dataclass.replace, or the model's own clone/copy method) of cached_response, set input_tokens and output_tokens to 0 on the copy, and return that copy while leaving the original cached_response untouched.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lightspeed_evaluation/core/llm/custom.py`:
- Line 111: Remove the inline pylint suppression on the newly modified functions
(notably the method named call) and refactor the implementations to satisfy lint
rules instead: for the call method, reduce the number of local variables by
extracting helper functions, grouping related values into small
dataclasses/tuples or reusing existing attributes, and simplifying complex
expressions; apply the same approach to the other location where a "# pylint:
disable" was added (refactor to smaller helper functions or combine variables)
so no "# pylint: disable" pragmas are required. Ensure all logic remains covered
by tests and that helper functions are private and colocated with the original
function to preserve readability.
---
Nitpick comments:
In `@src/lightspeed_evaluation/core/api/client.py`:
- Around line 284-289: The current code mutates the cached_response object in
place by setting cached_response.input_tokens/output_tokens to 0; instead create
and return a copy to avoid side effects—make a shallow copy (e.g., via
copy.copy, dataclass.replace, or the model's own clone/copy method) of
cached_response, set input_tokens and output_tokens to 0 on the copy, and return
that copy while leaving the original cached_response untouched.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/lightspeed_evaluation/core/api/client.pysrc/lightspeed_evaluation/core/llm/custom.pytests/unit/core/api/test_client.pytests/unit/core/llm/test_custom.py
| litellm.ssl_verify = False | ||
|
|
||
| def call( | ||
| def call( # pylint: disable=too-many-locals |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove newly introduced pylint suppression pragmas.
Line 111 and Line 190 add # pylint: disable comments. Please refactor to satisfy lint without inline suppression.
As per coding guidelines, **/*.py: "Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead". Based on learnings, the exception for too-many-locals is only acceptable for lazy-import-heavy functions, which does not apply here.
Also applies to: 190-190
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lightspeed_evaluation/core/llm/custom.py` at line 111, Remove the inline
pylint suppression on the newly modified functions (notably the method named
call) and refactor the implementations to satisfy lint rules instead: for the
call method, reduce the number of local variables by extracting helper
functions, grouping related values into small dataclasses/tuples or reusing
existing attributes, and simplifying complex expressions; apply the same
approach to the other location where a "# pylint: disable" was added (refactor
to smaller helper functions or combine variables) so no "# pylint: disable"
pragmas are required. Ensure all logic remains covered by tests and that helper
functions are private and colocated with the original function to preserve
readability.
…such scenarios
Description
llmcache is enabled, JudgeLLM token counts are not added + unit test for this scenarioapicache is enabled, API Calls token counts are zeroed when response is loaded from cache + unit test for such scenarioType of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit