LEADS-240: Token usage should be 0 for a re-run with successful cache by xmican10 · Pull Request #176 · lightspeed-core/lightspeed-evaluation

xmican10 · 2026-02-27T14:32:15Z

…such scenarios

Description

When llm cache is enabled, JudgeLLM token counts are not added + unit test for this scenario
When api cache is enabled, API Calls token counts are zeroed when response is loaded from cache + unit test for such scenario
For Deepeval JudgeLLM token counts are not counted, so this issue is ignored. This tracks another ticket LEADS-241

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude(e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Bug Fixes
- Improved accuracy of token usage tracking. Token counts from cached API responses are now properly zeroed to prevent double-counting. Token consumption is now accurately recorded only when responses are freshly fetched from the service, not when retrieved from cache.

…such scenarios

coderabbitai · 2026-02-27T14:32:33Z

Walkthrough

This PR modifies token count handling for cached API responses. The API client now zeros out token counts when retrieving cached responses, and the custom LLM module updates token tracking logic to check for cache hits and only count tokens when responses are not cached, preventing duplicate token accounting.

Changes

Cohort / File(s)	Summary
API Client Cache Token Handling `src/lightspeed_evaluation/core/api/client.py`	Modified `_get_cached_response` to zero out `input_tokens` and `output_tokens` on cached APIResponse objects before returning, ensuring cached responses contribute no token counts.
LLM Token Tracking Logic `src/lightspeed_evaluation/core/llm/custom.py`	Restructured token tracking to occur in a finally block and added cache-hit detection via `response._hidden_params["cache_hit"]`; tokens are only recorded when response exists, tracker is active, and response is not a cache hit.
API Client Tests `tests/unit/core/api/test_client.py`	Added `test_get_cached_response_zeros_token_counts` to verify token counts are zeroed while other response fields remain intact.
LLM Tests `tests/unit/core/llm/test_custom.py`	Added `test_call_does_not_add_tokens_on_cache_hit` to verify tokens are not tracked on cache hits; updated existing test with `_hidden_params = {}` to exclude cache-hit path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

lightspeed-evaluation#75: Implements the caching mechanism that this PR modifies to handle token counts correctly on cached responses.

Suggested reviewers

asamal4
tisnik

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main change: zeroing token usage when cache is successfully hit, which aligns with the primary objectives of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/api/client.py (1)

284-289: Avoid mutating cached response objects in place.

This works functionally, but mutating cached model instances can create hidden side effects. Prefer returning a copy with zeroed token fields.

♻️ Proposed refactor

-        # Zero out token counts for cached responses since no API call was made
-        if cached_response is not None:
-            cached_response.input_tokens = 0
-            cached_response.output_tokens = 0
-
-        return cached_response
+        # Return zero token usage for cache hits without mutating cached object state
+        if cached_response is None:
+            return None
+
+        return cached_response.model_copy(
+            update={"input_tokens": 0, "output_tokens": 0}
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_evaluation/core/api/client.py` around lines 284 - 289, The
current code mutates the cached_response object in place by setting
cached_response.input_tokens/output_tokens to 0; instead create and return a
copy to avoid side effects—make a shallow copy (e.g., via copy.copy,
dataclass.replace, or the model's own clone/copy method) of cached_response, set
input_tokens and output_tokens to 0 on the copy, and return that copy while
leaving the original cached_response untouched.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lightspeed_evaluation/core/llm/custom.py`:
- Line 111: Remove the inline pylint suppression on the newly modified functions
(notably the method named call) and refactor the implementations to satisfy lint
rules instead: for the call method, reduce the number of local variables by
extracting helper functions, grouping related values into small
dataclasses/tuples or reusing existing attributes, and simplifying complex
expressions; apply the same approach to the other location where a "# pylint:
disable" was added (refactor to smaller helper functions or combine variables)
so no "# pylint: disable" pragmas are required. Ensure all logic remains covered
by tests and that helper functions are private and colocated with the original
function to preserve readability.

---

Nitpick comments:
In `@src/lightspeed_evaluation/core/api/client.py`:
- Around line 284-289: The current code mutates the cached_response object in
place by setting cached_response.input_tokens/output_tokens to 0; instead create
and return a copy to avoid side effects—make a shallow copy (e.g., via
copy.copy, dataclass.replace, or the model's own clone/copy method) of
cached_response, set input_tokens and output_tokens to 0 on the copy, and return
that copy while leaving the original cached_response untouched.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e395459 and 691fb68.

📒 Files selected for processing (4)

src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/llm/custom.py
tests/unit/core/api/test_client.py
tests/unit/core/llm/test_custom.py

coderabbitai · 2026-02-27T14:36:50Z

src/lightspeed_evaluation/core/llm/custom.py

            litellm.ssl_verify = False

-    def call(
+    def call(  # pylint: disable=too-many-locals


🛠️ Refactor suggestion | 🟠 Major

Remove newly introduced pylint suppression pragmas.

Line 111 and Line 190 add # pylint: disable comments. Please refactor to satisfy lint without inline suppression.

As per coding guidelines, **/*.py: "Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead". Based on learnings, the exception for too-many-locals is only acceptable for lazy-import-heavy functions, which does not apply here.

Also applies to: 190-190

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/llm/custom.py` at line 111, Remove the inline pylint suppression on the newly modified functions (notably the method named call) and refactor the implementations to satisfy lint rules instead: for the call method, reduce the number of local variables by extracting helper functions, grouping related values into small dataclasses/tuples or reusing existing attributes, and simplifying complex expressions; apply the same approach to the other location where a "# pylint: disable" was added (refactor to smaller helper functions or combine variables) so no "# pylint: disable" pragmas are required. Ensure all logic remains covered by tests and that helper functions are private and colocated with the original function to preserve readability.

0 JudgeLLM/API tokens are added when cache hit, added unit tests for …

691fb68

…such scenarios

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LEADS-240: Token usage should be 0 for a re-run with successful cache#176

LEADS-240: Token usage should be 0 for a re-run with successful cache#176
xmican10 wants to merge 1 commit intolightspeed-core:mainfrom
xmican10:LEADS-240-0-token-usage-when-using-cache

xmican10 commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xmican10 commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xmican10 commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading