Skip to content

Properly increment metrics for /v1/infer#1236

Open
samdoran wants to merge 2 commits intolightspeed-core:mainfrom
samdoran:rlsapi-metrics
Open

Properly increment metrics for /v1/infer#1236
samdoran wants to merge 2 commits intolightspeed-core:mainfrom
samdoran:rlsapi-metrics

Conversation

@samdoran
Copy link
Contributor

@samdoran samdoran commented Feb 27, 2026

Description

In order to properly record metrics for the /v1/infer endpoint, call extract_token_usage().
Add provider and model labels to metrics.llm_calls_failures_total, matching the behavior of other Counters.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by Claude

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

pytest tests/integration/endpoints/test_rlsapi_v1_integration.py tests/unit/app/endpoints/test_rlsapi_v1.py

Summary by CodeRabbit

  • Improvements
    • Added token usage tracking for LLM responses to surface consumption patterns.
    • Enhanced failure metrics to include provider and model identifiers for more granular monitoring.
    • Improved error context and logging for clearer visibility into API call failures and issues.

Need to call extract_token_usage() in order to increment the metrics counter
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Walkthrough

Extracts provider and model from default model IDs, captures token-usage from LLM responses, and propagates provider/model into failure telemetry by converting the failure metric to a labeled Counter and passing provider/model through inference failure paths.

Changes

Cohort / File(s) Summary
Metrics
src/metrics/__init__.py
Changed llm_calls_failures_total from an unlabeled Counter to a labeled Counter with ["provider","model"].
Inference endpoint & telemetry
src/app/endpoints/rlsapi_v1.py
Added extract_provider_and_model_from_model_id() usage to derive provider/model, added token usage extraction via extract_token_usage(...), and updated all _record_inference_failure(...) calls & signature to accept provider and model, ensuring labeled failure increments and telemetry include context.
Tests
tests/unit/app/endpoints/test_metrics.py
Removed an assertion expecting a specific Prometheus metric line (gauge) from the metrics endpoint response.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as rlsapi_v1
    participant LLM as LLM Provider
    participant Metrics as Prometheus/Telemetry

    Client->>API: infer request
    API->>API: extract default_model_id
    API->>API: extract_provider_and_model_from_model_id(model_id)
    API->>LLM: call provider with resolved model
    LLM-->>API: response (including usage)
    API->>API: extract_token_usage(response.usage, model_id)
    API->>Metrics: increment llm_calls_failures_total(provider, model) on error / emit telemetry
    API-->>Client: inference response / error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • umago
  • tisnik
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Properly increment metrics for /v1/infer' directly aligns with the main objectives: ensuring extract_token_usage() is called and properly labeling the llm_calls_failures_total counter with provider and model.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)

253-262: ⚠️ Potential issue | 🟡 Minor

Update _record_inference_failure docstring to include new args.

model and provider were added to the function signature but are missing from Args:.

📝 Proposed fix
     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: Model identifier used for inference.
+        provider: Provider identifier used for inference.

As per coding guidelines, "All functions must have complete docstrings with brief descriptions" and "Follow Google Python docstring conventions for modules, classes, and functions with Parameters, Returns, Raises, and Attributes sections".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 262, The
_record_inference_failure function's docstring is out of date: the Args section
doesn't list the newly added model and provider parameters; update the docstring
for _record_inference_failure to include brief entries for model (e.g., the
model name or identifier) and provider (e.g., the inference provider name),
following the existing Google-style "Args:" formatting used for
background_tasks, infer_request, request, request_id, error, and start_time so
the docstring is complete and consistent with project conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple returned by extract_provider_and_model_from_model_id is
being unpacked in reverse; change the assignment at the call site so the first
value is assigned to provider and the second to model (i.e., provider, model =
extract_provider_and_model_from_model_id(model_id)) so downstream labels like
llm_calls_failures_total use the correct provider and model values.

---

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-262: The _record_inference_failure function's docstring is out
of date: the Args section doesn't list the newly added model and provider
parameters; update the docstring for _record_inference_failure to include brief
entries for model (e.g., the model name or identifier) and provider (e.g., the
inference provider name), following the existing Google-style "Args:" formatting
used for background_tasks, infer_request, request, request_id, error, and
start_time so the docstring is complete and consistent with project conventions.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a6024a and 338b116.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py

input_source = infer_request.get_input_source()
instructions = _build_instructions(infer_request.context.systeminfo)
model_id = _get_default_model_id()
model, provider = extract_provider_and_model_from_model_id(model_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Provider/model destructuring is reversed, so failure labels are flipped.

At Line 317, the helper returns (provider, model), but the assignment stores (model, provider). That inverts llm_calls_failures_total labels downstream.

🔧 Proposed fix
-    model, provider = extract_provider_and_model_from_model_id(model_id)
+    provider, model = extract_provider_and_model_from_model_id(model_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model, provider = extract_provider_and_model_from_model_id(model_id)
provider, model = extract_provider_and_model_from_model_id(model_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple returned by
extract_provider_and_model_from_model_id is being unpacked in reverse; change
the assignment at the call site so the first value is assigned to provider and
the second to model (i.e., provider, model =
extract_provider_and_model_from_model_id(model_id)) so downstream labels like
llm_calls_failures_total use the correct provider and model values.

Get the provider and model in order to pass that to _record_inference_failure.
Add model and provider labels to the Counter.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)

282-282: ⚠️ Potential issue | 🟡 Minor

Pipeline failure: too-many-locals (16/15).

The addition of model and provider variables pushed the local variable count over pylint's limit. Consider extracting some logic to reduce locals.

♻️ One approach: inline model_id extraction or extract a helper

Option 1 - Extract provider/model alongside model_id retrieval:

-def _get_default_model_id() -> str:
+def _get_default_model_config() -> tuple[str, str, str]:
-    """Get the default model ID from configuration.
+    """Get the default model ID, provider, and model from configuration.
 
-    Returns the model identifier in Llama Stack format (provider/model).
+    Returns:
+        Tuple of (model_id, provider, model).

Then in infer_endpoint:

-    model_id = _get_default_model_id()
-    provider, model = extract_provider_and_model_from_model_id(model_id)
+    model_id, provider, model = _get_default_model_config()

Option 2 - Suppress the pylint warning with a pragma if the complexity is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 282, The function infer_endpoint has
exceeded pylint's too-many-locals limit after adding model and provider;
refactor to reduce local variables by extracting logic that derives
model_id/model/provider into a helper (e.g., create a helper function
get_model_and_provider_from_request or inline model_id extraction into the call
site) and update infer_endpoint to call that helper, returning the needed values
so infer_endpoint uses fewer locals; alternatively, if the added locals are
acceptable, add a pylint: disable=too-many-locals pragma on the infer_endpoint
definition (prefer extracting the helper to keep lint rules).
♻️ Duplicate comments (1)
src/app/endpoints/rlsapi_v1.py (1)

317-317: ⚠️ Potential issue | 🔴 Critical

Provider/model destructuring is reversed.

The helper extract_provider_and_model_from_model_id returns (provider, model) per the relevant code snippet, but the assignment stores (model, provider). This will cause llm_calls_failures_total labels to be flipped.

🐛 Proposed fix
-    model, provider = extract_provider_and_model_from_model_id(model_id)
+    provider, model = extract_provider_and_model_from_model_id(model_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple from
extract_provider_and_model_from_model_id is being unpacked in the wrong order;
change the destructuring at the call site so it reads provider, model =
extract_provider_and_model_from_model_id(model_id) (not model, provider = ...),
and ensure subsequent uses (e.g., the labels passed to llm_calls_failures_total)
use these corrected variables so provider and model labels are not flipped.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)

253-265: Docstring missing model and provider parameter descriptions.

The function signature now includes model: str and provider: str, but the Args section doesn't document them. As per coding guidelines, functions should have complete docstrings with Parameters sections.

📝 Proposed fix
     Args:
         background_tasks: FastAPI background tasks for async event sending.
         infer_request: The original inference request.
         request: The FastAPI request object.
         request_id: Unique identifier for the request.
         error: The exception that caused the failure.
         start_time: Monotonic clock time when inference started.
+        model: The model identifier for metrics labeling.
+        provider: The provider identifier for metrics labeling.

     Returns:
         The total inference time in seconds.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 265, The docstring for the
inference-failure handler is missing descriptions for the new parameters model
and provider; update the Args section to include both model: str — the model
identifier used for the inference request, and provider: str — the
provider/backend name handling the inference, each with a short purpose and
type, so the docstring documents all parameters (background_tasks,
infer_request, request, request_id, error, start_time, model, provider)
consistently with the existing style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 282: The function infer_endpoint has exceeded pylint's too-many-locals
limit after adding model and provider; refactor to reduce local variables by
extracting logic that derives model_id/model/provider into a helper (e.g.,
create a helper function get_model_and_provider_from_request or inline model_id
extraction into the call site) and update infer_endpoint to call that helper,
returning the needed values so infer_endpoint uses fewer locals; alternatively,
if the added locals are acceptable, add a pylint: disable=too-many-locals pragma
on the infer_endpoint definition (prefer extracting the helper to keep lint
rules).

---

Duplicate comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple from extract_provider_and_model_from_model_id is being
unpacked in the wrong order; change the destructuring at the call site so it
reads provider, model = extract_provider_and_model_from_model_id(model_id) (not
model, provider = ...), and ensure subsequent uses (e.g., the labels passed to
llm_calls_failures_total) use these corrected variables so provider and model
labels are not flipped.

---

Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-265: The docstring for the inference-failure handler is
missing descriptions for the new parameters model and provider; update the Args
section to include both model: str — the model identifier used for the inference
request, and provider: str — the provider/backend name handling the inference,
each with a short purpose and type, so the docstring documents all parameters
(background_tasks, infer_request, request, request_id, error, start_time, model,
provider) consistently with the existing style.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 338b116 and befedd4.

📒 Files selected for processing (3)
  • src/app/endpoints/rlsapi_v1.py
  • src/metrics/__init__.py
  • tests/unit/app/endpoints/test_metrics.py
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_metrics.py

input_source = infer_request.get_input_source()
instructions = _build_instructions(infer_request.context.systeminfo)
model_id = _get_default_model_id()
model, provider = extract_provider_and_model_from_model_id(model_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be flipped and it should be provider, model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH, I AM SLOWER THAN @CodeRabbit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the implementation of extract_provider_and_model_from_model_id function
ast-grep --pattern $'def extract_provider_and_model_from_model_id($$$) {
  $$$
}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Also search with rg to find the function definition
rg -n -A 10 'def extract_provider_and_model_from_model_id'

Length of output: 570


🏁 Script executed:

#!/bin/bash
# Get more lines to see the complete return statement
rg -n -A 15 'def extract_provider_and_model_from_model_id'

Length of output: 780


@major 😄 Great minds think alike! Thanks for double-checking - it's always good to have multiple eyes on these details. The function indeed returns (provider, model) based on how it splits the model_id string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants