Properly increment metrics for /v1/infer#1236
Properly increment metrics for /v1/infer#1236samdoran wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
Need to call extract_token_usage() in order to increment the metrics counter
WalkthroughExtracts provider and model from default model IDs, captures token-usage from LLM responses, and propagates provider/model into failure telemetry by converting the failure metric to a labeled Counter and passing provider/model through inference failure paths. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as rlsapi_v1
participant LLM as LLM Provider
participant Metrics as Prometheus/Telemetry
Client->>API: infer request
API->>API: extract default_model_id
API->>API: extract_provider_and_model_from_model_id(model_id)
API->>LLM: call provider with resolved model
LLM-->>API: response (including usage)
API->>API: extract_token_usage(response.usage, model_id)
API->>Metrics: increment llm_calls_failures_total(provider, model) on error / emit telemetry
API-->>Client: inference response / error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)
253-262:⚠️ Potential issue | 🟡 MinorUpdate
_record_inference_failuredocstring to include new args.
modelandproviderwere added to the function signature but are missing fromArgs:.📝 Proposed fix
Args: background_tasks: FastAPI background tasks for async event sending. infer_request: The original inference request. request: The FastAPI request object. request_id: Unique identifier for the request. error: The exception that caused the failure. start_time: Monotonic clock time when inference started. + model: Model identifier used for inference. + provider: Provider identifier used for inference.As per coding guidelines, "All functions must have complete docstrings with brief descriptions" and "Follow Google Python docstring conventions for modules, classes, and functions with Parameters, Returns, Raises, and Attributes sections".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 262, The _record_inference_failure function's docstring is out of date: the Args section doesn't list the newly added model and provider parameters; update the docstring for _record_inference_failure to include brief entries for model (e.g., the model name or identifier) and provider (e.g., the inference provider name), following the existing Google-style "Args:" formatting used for background_tasks, infer_request, request, request_id, error, and start_time so the docstring is complete and consistent with project conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple returned by extract_provider_and_model_from_model_id is
being unpacked in reverse; change the assignment at the call site so the first
value is assigned to provider and the second to model (i.e., provider, model =
extract_provider_and_model_from_model_id(model_id)) so downstream labels like
llm_calls_failures_total use the correct provider and model values.
---
Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-262: The _record_inference_failure function's docstring is out
of date: the Args section doesn't list the newly added model and provider
parameters; update the docstring for _record_inference_failure to include brief
entries for model (e.g., the model name or identifier) and provider (e.g., the
inference provider name), following the existing Google-style "Args:" formatting
used for background_tasks, infer_request, request, request_id, error, and
start_time so the docstring is complete and consistent with project conventions.
| input_source = infer_request.get_input_source() | ||
| instructions = _build_instructions(infer_request.context.systeminfo) | ||
| model_id = _get_default_model_id() | ||
| model, provider = extract_provider_and_model_from_model_id(model_id) |
There was a problem hiding this comment.
Provider/model destructuring is reversed, so failure labels are flipped.
At Line 317, the helper returns (provider, model), but the assignment stores (model, provider). That inverts llm_calls_failures_total labels downstream.
🔧 Proposed fix
- model, provider = extract_provider_and_model_from_model_id(model_id)
+ provider, model = extract_provider_and_model_from_model_id(model_id)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model, provider = extract_provider_and_model_from_model_id(model_id) | |
| provider, model = extract_provider_and_model_from_model_id(model_id) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple returned by
extract_provider_and_model_from_model_id is being unpacked in reverse; change
the assignment at the call site so the first value is assigned to provider and
the second to model (i.e., provider, model =
extract_provider_and_model_from_model_id(model_id)) so downstream labels like
llm_calls_failures_total use the correct provider and model values.
Get the provider and model in order to pass that to _record_inference_failure. Add model and provider labels to the Counter.
338b116 to
befedd4
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/rlsapi_v1.py (1)
282-282:⚠️ Potential issue | 🟡 MinorPipeline failure: too-many-locals (16/15).
The addition of
modelandprovidervariables pushed the local variable count over pylint's limit. Consider extracting some logic to reduce locals.♻️ One approach: inline model_id extraction or extract a helper
Option 1 - Extract provider/model alongside model_id retrieval:
-def _get_default_model_id() -> str: +def _get_default_model_config() -> tuple[str, str, str]: - """Get the default model ID from configuration. + """Get the default model ID, provider, and model from configuration. - Returns the model identifier in Llama Stack format (provider/model). + Returns: + Tuple of (model_id, provider, model).Then in
infer_endpoint:- model_id = _get_default_model_id() - provider, model = extract_provider_and_model_from_model_id(model_id) + model_id, provider, model = _get_default_model_config()Option 2 - Suppress the pylint warning with a pragma if the complexity is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/rlsapi_v1.py` at line 282, The function infer_endpoint has exceeded pylint's too-many-locals limit after adding model and provider; refactor to reduce local variables by extracting logic that derives model_id/model/provider into a helper (e.g., create a helper function get_model_and_provider_from_request or inline model_id extraction into the call site) and update infer_endpoint to call that helper, returning the needed values so infer_endpoint uses fewer locals; alternatively, if the added locals are acceptable, add a pylint: disable=too-many-locals pragma on the infer_endpoint definition (prefer extracting the helper to keep lint rules).
♻️ Duplicate comments (1)
src/app/endpoints/rlsapi_v1.py (1)
317-317:⚠️ Potential issue | 🔴 CriticalProvider/model destructuring is reversed.
The helper
extract_provider_and_model_from_model_idreturns(provider, model)per the relevant code snippet, but the assignment stores(model, provider). This will causellm_calls_failures_totallabels to be flipped.🐛 Proposed fix
- model, provider = extract_provider_and_model_from_model_id(model_id) + provider, model = extract_provider_and_model_from_model_id(model_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/rlsapi_v1.py` at line 317, The tuple from extract_provider_and_model_from_model_id is being unpacked in the wrong order; change the destructuring at the call site so it reads provider, model = extract_provider_and_model_from_model_id(model_id) (not model, provider = ...), and ensure subsequent uses (e.g., the labels passed to llm_calls_failures_total) use these corrected variables so provider and model labels are not flipped.
🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)
253-265: Docstring missingmodelandproviderparameter descriptions.The function signature now includes
model: strandprovider: str, but the Args section doesn't document them. As per coding guidelines, functions should have complete docstrings with Parameters sections.📝 Proposed fix
Args: background_tasks: FastAPI background tasks for async event sending. infer_request: The original inference request. request: The FastAPI request object. request_id: Unique identifier for the request. error: The exception that caused the failure. start_time: Monotonic clock time when inference started. + model: The model identifier for metrics labeling. + provider: The provider identifier for metrics labeling. Returns: The total inference time in seconds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/rlsapi_v1.py` around lines 253 - 265, The docstring for the inference-failure handler is missing descriptions for the new parameters model and provider; update the Args section to include both model: str — the model identifier used for the inference request, and provider: str — the provider/backend name handling the inference, each with a short purpose and type, so the docstring documents all parameters (background_tasks, infer_request, request, request_id, error, start_time, model, provider) consistently with the existing style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 282: The function infer_endpoint has exceeded pylint's too-many-locals
limit after adding model and provider; refactor to reduce local variables by
extracting logic that derives model_id/model/provider into a helper (e.g.,
create a helper function get_model_and_provider_from_request or inline model_id
extraction into the call site) and update infer_endpoint to call that helper,
returning the needed values so infer_endpoint uses fewer locals; alternatively,
if the added locals are acceptable, add a pylint: disable=too-many-locals pragma
on the infer_endpoint definition (prefer extracting the helper to keep lint
rules).
---
Duplicate comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Line 317: The tuple from extract_provider_and_model_from_model_id is being
unpacked in the wrong order; change the destructuring at the call site so it
reads provider, model = extract_provider_and_model_from_model_id(model_id) (not
model, provider = ...), and ensure subsequent uses (e.g., the labels passed to
llm_calls_failures_total) use these corrected variables so provider and model
labels are not flipped.
---
Nitpick comments:
In `@src/app/endpoints/rlsapi_v1.py`:
- Around line 253-265: The docstring for the inference-failure handler is
missing descriptions for the new parameters model and provider; update the Args
section to include both model: str — the model identifier used for the inference
request, and provider: str — the provider/backend name handling the inference,
each with a short purpose and type, so the docstring documents all parameters
(background_tasks, infer_request, request, request_id, error, start_time, model,
provider) consistently with the existing style.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/app/endpoints/rlsapi_v1.pysrc/metrics/__init__.pytests/unit/app/endpoints/test_metrics.py
💤 Files with no reviewable changes (1)
- tests/unit/app/endpoints/test_metrics.py
| input_source = infer_request.get_input_source() | ||
| instructions = _build_instructions(infer_request.context.systeminfo) | ||
| model_id = _get_default_model_id() | ||
| model, provider = extract_provider_and_model_from_model_id(model_id) |
There was a problem hiding this comment.
I think this might be flipped and it should be provider, model.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the implementation of extract_provider_and_model_from_model_id function
ast-grep --pattern $'def extract_provider_and_model_from_model_id($$$) {
$$$
}'
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Also search with rg to find the function definition
rg -n -A 10 'def extract_provider_and_model_from_model_id'
Length of output: 570
🏁 Script executed:
#!/bin/bash
# Get more lines to see the complete return statement
rg -n -A 15 'def extract_provider_and_model_from_model_id'
Length of output: 780
@major 😄 Great minds think alike! Thanks for double-checking - it's always good to have multiple eyes on these details. The function indeed returns (provider, model) based on how it splits the model_id string.
Description
In order to properly record metrics for the
/v1/inferendpoint, callextract_token_usage().Add provider and model labels to
metrics.llm_calls_failures_total, matching the behavior of other Counters.Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit