Properly increment metrics for /v1/infer by samdoran · Pull Request #1236 · lightspeed-core/lightspeed-stack

samdoran · 2026-02-27T21:31:25Z

Description

In order to properly record metrics for the /v1/infer endpoint, call extract_token_usage().
Add provider and model labels to metrics.llm_calls_failures_total, matching the behavior of other Counters.

Type of change

Tools used to create PR

Assisted-by Claude

Related Tickets & Documents

RSPEED-2463

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

pytest tests/integration/endpoints/test_rlsapi_v1_integration.py tests/unit/app/endpoints/test_rlsapi_v1.py

Summary by CodeRabbit

Improvements
- Added token usage tracking for LLM responses to surface consumption patterns.
- Enhanced failure metrics to include provider and model identifiers for more granular monitoring.
- Improved error context and logging for clearer visibility into API call failures and issues.

Need to call extract_token_usage() in order to increment the metrics counter

coderabbitai · 2026-02-27T21:31:41Z

Walkthrough

Extracts provider and model from default model IDs, captures token-usage from LLM responses, and propagates provider/model into failure telemetry by converting the failure metric to a labeled Counter and passing provider/model through inference failure paths.

Changes

Cohort / File(s)	Summary
Metrics `src/metrics/__init__.py`	Changed `llm_calls_failures_total` from an unlabeled Counter to a labeled Counter with `["provider","model"]`.
Inference endpoint & telemetry `src/app/endpoints/rlsapi_v1.py`	Added `extract_provider_and_model_from_model_id()` usage to derive provider/model, added token usage extraction via `extract_token_usage(...)`, and updated all `_record_inference_failure(...)` calls & signature to accept `provider` and `model`, ensuring labeled failure increments and telemetry include context.
Tests `tests/unit/app/endpoints/test_metrics.py`	Removed an assertion expecting a specific Prometheus metric line (gauge) from the metrics endpoint response.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as rlsapi_v1
    participant LLM as LLM Provider
    participant Metrics as Prometheus/Telemetry

    Client->>API: infer request
    API->>API: extract default_model_id
    API->>API: extract_provider_and_model_from_model_id(model_id)
    API->>LLM: call provider with resolved model
    LLM-->>API: response (including usage)
    API->>API: extract_token_usage(response.usage, model_id)
    API->>Metrics: increment llm_calls_failures_total(provider, model) on error / emit telemetry
    API-->>Client: inference response / error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

LCORE-220: Add metrics to lightspeed-stack #256: Related change that introduced or modified provider/model labeling for llm_calls_failures_total.
LCORE-411: add token usage metrics #489: Related work on token-usage telemetry and propagation of provider/model through endpoints.

Suggested reviewers

umago
tisnik

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Properly increment metrics for /v1/infer' directly aligns with the main objectives: ensuring extract_token_usage() is called and properly labeling the llm_calls_failures_total counter with provider and model.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/rlsapi_v1.py (1)

253-262: ⚠️ Potential issue | 🟡 Minor

Update _record_inference_failure docstring to include new args.

model and provider were added to the function signature but are missing from Args:.

📝 Proposed fix

     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: Model identifier used for inference.
+        provider: Provider identifier used for inference.

As per coding guidelines, "All functions must have complete docstrings with brief descriptions" and "Follow Google Python docstring conventions for modules, classes, and functions with Parameters, Returns, Raises, and Attributes sections".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 262, The
_record_inference_failure function's docstring is out of date: the Args section
doesn't list the newly added model and provider parameters; update the docstring
for _record_inference_failure to include brief entries for model (e.g., the
model name or identifier) and provider (e.g., the inference provider name),
following the existing Google-style "Args:" formatting used for
background_tasks, infer_request, request, request_id, error, and start_time so
the docstring is complete and consistent with project conventions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple returned by extract_provider_and_model_from_model_id is
being unpacked in reverse; change the assignment at the call site so the first
value is assigned to provider and the second to model (i.e., provider, model =
extract_provider_and_model_from_model_id(model_id)) so downstream labels like
llm_calls_failures_total use the correct provider and model values.

---

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-262: The _record_inference_failure function's docstring is out
of date: the Args section doesn't list the newly added model and provider
parameters; update the docstring for _record_inference_failure to include brief
entries for model (e.g., the model name or identifier) and provider (e.g., the
inference provider name), following the existing Google-style "Args:" formatting
used for background_tasks, infer_request, request, request_id, error, and
start_time so the docstring is complete and consistent with project conventions.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6024a and 338b116.

📒 Files selected for processing (2)

src/app/endpoints/rlsapi_v1.py
src/metrics/__init__.py

coderabbitai · 2026-02-27T21:35:07Z

src/app/endpoints/rlsapi_v1.py

    input_source = infer_request.get_input_source()
    instructions = _build_instructions(infer_request.context.systeminfo)
    model_id = _get_default_model_id()
+    model, provider = extract_provider_and_model_from_model_id(model_id)


⚠️ Potential issue | 🟠 Major

Provider/model destructuring is reversed, so failure labels are flipped.

At Line 317, the helper returns (provider, model), but the assignment stores (model, provider). That inverts llm_calls_failures_total labels downstream.

🔧 Proposed fix

- model, provider = extract_provider_and_model_from_model_id(model_id) + provider, model = extract_provider_and_model_from_model_id(model_id)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model, provider = extract_provider_and_model_from_model_id(model_id)

provider, model = extract_provider_and_model_from_model_id(model_id)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple returned by extract_provider_and_model_from_model_id is being unpacked in reverse; change the assignment at the call site so the first value is assigned to provider and the second to model (i.e., provider, model = extract_provider_and_model_from_model_id(model_id)) so downstream labels like llm_calls_failures_total use the correct provider and model values.

Get the provider and model in order to pass that to _record_inference_failure. Add model and provider labels to the Counter.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/rlsapi_v1.py (1)

282-282: ⚠️ Potential issue | 🟡 Minor

Pipeline failure: too-many-locals (16/15).

The addition of model and provider variables pushed the local variable count over pylint's limit. Consider extracting some logic to reduce locals.

♻️ One approach: inline model_id extraction or extract a helper

Option 1 - Extract provider/model alongside model_id retrieval:

-def _get_default_model_id() -> str:
+def _get_default_model_config() -> tuple[str, str, str]:
-    """Get the default model ID from configuration.
+    """Get the default model ID, provider, and model from configuration.
 
-    Returns the model identifier in Llama Stack format (provider/model).
+    Returns:
+        Tuple of (model_id, provider, model).

Then in infer_endpoint:

-    model_id = _get_default_model_id()
-    provider, model = extract_provider_and_model_from_model_id(model_id)
+    model_id, provider, model = _get_default_model_config()

Option 2 - Suppress the pylint warning with a pragma if the complexity is acceptable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 282, The function infer_endpoint has
exceeded pylint's too-many-locals limit after adding model and provider;
refactor to reduce local variables by extracting logic that derives
model_id/model/provider into a helper (e.g., create a helper function
get_model_and_provider_from_request or inline model_id extraction into the call
site) and update infer_endpoint to call that helper, returning the needed values
so infer_endpoint uses fewer locals; alternatively, if the added locals are
acceptable, add a pylint: disable=too-many-locals pragma on the infer_endpoint
definition (prefer extracting the helper to keep lint rules).

♻️ Duplicate comments (1)

src/app/endpoints/rlsapi_v1.py (1)
317-317: ⚠️ Potential issue | 🔴 Critical

Provider/model destructuring is reversed.

The helper extract_provider_and_model_from_model_id returns (provider, model) per the relevant code snippet, but the assignment stores (model, provider). This will cause llm_calls_failures_total labels to be flipped.
🐛 Proposed fix
-    model, provider = extract_provider_and_model_from_model_id(model_id)
+    provider, model = extract_provider_and_model_from_model_id(model_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple from
extract_provider_and_model_from_model_id is being unpacked in the wrong order;
change the destructuring at the call site so it reads provider, model =
extract_provider_and_model_from_model_id(model_id) (not model, provider = ...),
and ensure subsequent uses (e.g., the labels passed to llm_calls_failures_total)
use these corrected variables so provider and model labels are not flipped.

🧹 Nitpick comments (1)

src/app/endpoints/rlsapi_v1.py (1)

253-265: Docstring missing model and provider parameter descriptions.

The function signature now includes model: str and provider: str, but the Args section doesn't document them. As per coding guidelines, functions should have complete docstrings with Parameters sections.

📝 Proposed fix

     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: The model identifier for metrics labeling.
+        provider: The provider identifier for metrics labeling.

     Returns:
         The total inference time in seconds.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 265, The docstring for the
inference-failure handler is missing descriptions for the new parameters model
and provider; update the Args section to include both model: str — the model
identifier used for the inference request, and provider: str — the
provider/backend name handling the inference, each with a short purpose and
type, so the docstring documents all parameters (background_tasks,
infer_request, request, request_id, error, start_time, model, provider)
consistently with the existing style.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 282: The function infer_endpoint has exceeded pylint's too-many-locals
limit after adding model and provider; refactor to reduce local variables by
extracting logic that derives model_id/model/provider into a helper (e.g.,
create a helper function get_model_and_provider_from_request or inline model_id
extraction into the call site) and update infer_endpoint to call that helper,
returning the needed values so infer_endpoint uses fewer locals; alternatively,
if the added locals are acceptable, add a pylint: disable=too-many-locals pragma
on the infer_endpoint definition (prefer extracting the helper to keep lint
rules).

---

Duplicate comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple from extract_provider_and_model_from_model_id is being
unpacked in the wrong order; change the destructuring at the call site so it
reads provider, model = extract_provider_and_model_from_model_id(model_id) (not
model, provider = ...), and ensure subsequent uses (e.g., the labels passed to
llm_calls_failures_total) use these corrected variables so provider and model
labels are not flipped.

---

Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-265: The docstring for the inference-failure handler is
missing descriptions for the new parameters model and provider; update the Args
section to include both model: str — the model identifier used for the inference
request, and provider: str — the provider/backend name handling the inference,
each with a short purpose and type, so the docstring documents all parameters
(background_tasks, infer_request, request, request_id, error, start_time, model,
provider) consistently with the existing style.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 338b116 and befedd4.

📒 Files selected for processing (3)

src/app/endpoints/rlsapi_v1.py
src/metrics/__init__.py
tests/unit/app/endpoints/test_metrics.py

💤 Files with no reviewable changes (1)

tests/unit/app/endpoints/test_metrics.py

major · 2026-02-27T21:40:31Z

src/app/endpoints/rlsapi_v1.py

    input_source = infer_request.get_input_source()
    instructions = _build_instructions(infer_request.context.systeminfo)
    model_id = _get_default_model_id()
+    model, provider = extract_provider_and_model_from_model_id(model_id)


I think this might be flipped and it should be provider, model.

AH, I AM SLOWER THAN @CodeRabbit

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find the implementation of extract_provider_and_model_from_model_id function ast-grep --pattern $'def extract_provider_and_model_from_model_id($$$) { $$$ }'

Length of output: 0

🏁 Script executed:

#!/bin/bash # Also search with rg to find the function definition rg -n -A 10 'def extract_provider_and_model_from_model_id'

Length of output: 570

🏁 Script executed:

#!/bin/bash # Get more lines to see the complete return statement rg -n -A 15 'def extract_provider_and_model_from_model_id'

Length of output: 780

@major 😄 Great minds think alike! Thanks for double-checking - it's always good to have multiple eyes on these details. The function indeed returns (provider, model) based on how it splits the model_id string.

Properly increment metrics for /v1/infer

8212b0a

Need to call extract_token_usage() in order to increment the metrics counter

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

Add labels to the /v1/infer failure metrics

befedd4

Get the provider and model in order to pass that to _record_inference_failure. Add model and provider labels to the Counter.

samdoran force-pushed the rlsapi-metrics branch from 338b116 to befedd4 Compare February 27, 2026 21:35

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

major reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly increment metrics for /v1/infer#1236

Properly increment metrics for /v1/infer#1236
samdoran wants to merge 2 commits intolightspeed-core:mainfrom
samdoran:rlsapi-metrics

samdoran commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

major Feb 27, 2026

Uh oh!

major Feb 27, 2026

Uh oh!

coderabbitai bot Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	model, provider = extract_provider_and_model_from_model_id(model_id)
	provider, model = extract_provider_and_model_from_model_id(model_id)

Conversation

samdoran commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

major Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

major Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samdoran commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading