Skip to content

RAG and NO_RAG feature#174

Open
x86girl wants to merge 6 commits intolightspeed-core:mainfrom
x86girl:prgutier/rag-feature
Open

RAG and NO_RAG feature#174
x86girl wants to merge 6 commits intolightspeed-core:mainfrom
x86girl:prgutier/rag-feature

Conversation

@x86girl
Copy link

@x86girl x86girl commented Feb 26, 2026

This PR introduces three features for enhanced RAG experiment control:

  • a no_rag configuration option that allows disabling retrieval/RAG in API calls to run
    baseline experiments comparing pure LLM vs RAG-enhanced performance
  • context threshold flagging that automatically detects and warns when retrieved context counts fall
    below a configurable threshold, and (3) a context_warning column in CSV output to track these issues across evaluation runs.

Summary by CodeRabbit

Release Notes

New Features

  • Configuration option to disable retrieval/RAG in API calls for baseline experiments
  • Configuration setting for minimum context requirements with automatic flagging
  • New CSV output columns: context warnings (alerts when contexts below threshold) and metrics metadata

Priscila Gutierres and others added 5 commits February 12, 2026 11:05
  This commit introduces a new optional boolean field to the APIConfig model that allows
  disabling retrieval/RAG functionality in API calls. This is useful for
  running baseline experiments without retrieval augmentation.
This commit implements context threshold flagging to detect when
retrieved contexts are below a configurable threshold during RAG
experiments.

Changes:
- Add context_threshold field to APIConfig (default: 1)
- Add context_warning field to EvaluationResult
- Implement flagging logic in MetricsEvaluator.evaluate_metric()
  - Only flags when RAG is enabled (no_rag != True)
  - Creates warning when context_count < context_threshold
- Update system.yaml with context_threshold configuration

The flagging logic helps identify potential gaps in RAG knowledge base
by highlighting queries that retrieve insufficient context.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add three unit tests to verify context warning behavior:
1. test_context_warning_when_zero_contexts_rag_enabled
   - Verifies warning is created when RAG is enabled and contexts are zero
2. test_no_context_warning_when_rag_disabled
   - Verifies no warning when RAG is disabled (no_rag mode)
3. test_no_context_warning_above_threshold
   - Verifies no warning when contexts are above threshold

All tests pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds context_warning to the system configuration
and the list of supported CSV columns to allow tracking
of context-related issues during evaluation.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Walkthrough

This PR adds RAG disablement and context threshold capabilities to the evaluation framework. It introduces configuration options to disable retrieval-augmented generation and detect when contexts fall below a minimum threshold, includes corresponding model updates, implements context warning generation in the evaluation pipeline, and adds comprehensive test coverage.

Changes

Cohort / File(s) Summary
Configuration and Model Schema
config/system.yaml, src/lightspeed_evaluation/core/constants.py, src/lightspeed_evaluation/core/models/system.py, src/lightspeed_evaluation/core/models/api.py, src/lightspeed_evaluation/core/models/data.py
Added no_rag (boolean, default True) and context_threshold (integer, default 1) configuration fields; extended EvaluationResult with metrics_metadata and context_warning optional string fields; updated SUPPORTED_CSV_COLUMNS to include "context_warning".
API Client Implementation
src/lightspeed_evaluation/core/api/client.py
Propagated no_rag configuration into APIRequest payload during request preparation; refactored HTTP error message formatting (note: potential f-string syntax issue with multiline continuation).
Evaluation Logic
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
Implemented context threshold validation to generate context_warning when RAG is enabled and context count falls below threshold; added metrics_metadata extraction and propagation into EvaluationResult response.
Test Coverage
tests/unit/core/system/test_loader.py, tests/unit/pipeline/evaluation/test_evaluator.py
Added tests verifying config loading with no_rag option; added comprehensive tests for context warning generation under various RAG and threshold scenarios (zero contexts, threshold exceeded, RAG disabled).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • asamal4
  • VladimirKadlec
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'RAG and NO_RAG feature' is vague and generic, using broad terms that do not clearly convey what the changeset accomplishes. Provide a more descriptive title that highlights the main feature, such as 'Add no_rag configuration and context threshold warning' or 'Implement RAG disable toggle and low-context warnings'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@x86girl
Copy link
Author

x86girl commented Feb 26, 2026

Old Behavior:

{
  "query": "What is Kubernetes?",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "no_tools": null
  // no_rag parameter does NOT exist
}

New behavior

  • New no_rag configuration option available
  • Default is true (RAG disabled by default!)
  • Users can control RAG behavior per experiment
{
  "query": "What is Kubernetes?",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "no_tools": null,
  **"no_rag": true**  
}

@x86girl
Copy link
Author

x86girl commented Feb 26, 2026

Just explaining the feature, the user (optionally) adds no_rag: true (default) or false (using rag).
In case that the user uses rag, this person needs to add the number of contexts inside the context_threshold parameter.
Inside ./src/lightspeed_evaluation/core/api/client.py, we refine the query to return a json like
{ "query": "What is Kubernetes?", "provider": "openai", "model": "gpt-4o-mini", "no_tools": null, "no_rag": true }

We evaluate the number of the contexts inside ./src/lightspeed_evaluation/pipeline/evaluation/evaluator.py in case if rag is enabled.

Then, a context warning is returned inside the csv in case rag is turned on with a defined number of contexts.

Some tests were created to test the code and make sure it is working as expected.

@x86girl
Copy link
Author

x86girl commented Feb 26, 2026

@asamal4 can you please take a look here?

- Fix Ruff E714: Change `not (x is True)` to `x is not True`
- Fix Pylint C0325: Remove unnecessary parentheses after `not` keyword
- Fix Pyright reportOptionalMemberAccess: Add assertions for non-None system_config
- Update test assertions to use `.find()` instead of `in` operator for pylint compatibility

This commit addresses all linting and type checking errors introduced in
commits 005d062 and cde2017 without changing functionality.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@x86girl x86girl changed the title [draft] RAG and NO_RAG feature RAG and NO_RAG feature Feb 26, 2026
@x86girl x86girl marked this pull request as ready for review February 26, 2026 20:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

169-169: ⚠️ Potential issue | 🟡 Minor

Duplicate call to _extract_metadata_for_csv.

_extract_metadata_for_csv(request) is called twice in the same return statement - once for metric_metadata (line 169) and again for metrics_metadata (line 197). This results in the same computation being performed twice and the same data being stored in two fields.

♻️ Suggested fix

If both fields are intended to have the same value, extract once and reuse:

+            extracted_metadata = self._extract_metadata_for_csv(request)
+
             return EvaluationResult(
                 **metric_result.model_dump(),
                 conversation_group_id=request.conv_data.conversation_group_id,
                 tag=request.conv_data.tag,
                 turn_id=request.turn_id,
                 metric_identifier=request.metric_identifier,
-                metric_metadata=self._extract_metadata_for_csv(request),
+                metric_metadata=extracted_metadata,
                 # ... other fields ...
-                metrics_metadata=self._extract_metadata_for_csv(request),
+                metrics_metadata=extracted_metadata,
                 context_warning=context_warning,
             )

However, if both fields serve the same purpose, consider removing one of them entirely (see related comment in data.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_evaluation/pipeline/evaluation/evaluator.py` at line 169, The
return currently calls self._extract_metadata_for_csv(request) twice for
metric_metadata and metrics_metadata; fix by calling
_extract_metadata_for_csv(request) once (e.g., metadata =
self._extract_metadata_for_csv(request)) and reuse that variable for both
metric_metadata and metrics_metadata, or if metrics_metadata is redundant remove
one of the fields (adjust any consumers accordingly) so the computation is not
duplicated; ensure references to metric_metadata and metrics_metadata in the
return and related code use the single variable or the removed field is cleaned
up.
🧹 Nitpick comments (1)
tests/unit/core/system/test_loader.py (1)

170-174: Align this fixture with the new context_warning CSV feature.

Given this PR adds context_warning output tracking, using contexts plus a historical inline comment weakens feature coverage and is misleading.

Proposed test fixture/assertion update
   output:
     output_dir: ./no_rag_output
     csv_columns:
       - conversation_group_id
       - turn_id
-      - contexts  # Changed from 'context_warning' to 'contexts' (valid column)
+      - context_warning
@@
             assert config.api.enabled is True
             assert config.api.no_rag is True
+            assert "context_warning" in config.output.csv_columns
             assert loader.logger is not None

Also applies to: 189-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/core/system/test_loader.py` around lines 170 - 174, Update the
test fixture to reflect the new CSV column name and assertions for the
context_warning feature: replace the 'contexts' entry in the csv_columns fixture
with 'context_warning', and update any related assertions in the test
function(s) that reference csv_columns (or check exported CSV headers/rows) to
expect and validate the new 'context_warning' column and its values; look for
usages of the csv_columns variable and assertions around exported CSV
headers/rows in test_loader.py (and the tests around the same block) and change
them to assert presence and correct values of 'context_warning' instead of
'contexts'.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lightspeed_evaluation/core/api/client.py`:
- Around line 209-210: The multi-line f-string in the error message (inside the
Agent API error handling) is invalid in Python 3.11; change the f-string used
for message (the expression that currently spans lines including
response.status_code and error_msg) to a single-line f-string or use string
concatenation (e.g., "Agent API error: " + str(response.status_code) + " - " +
error_msg) so the message construction in the client error handling (the code
that builds message=f"...{response.status_code} - {error_msg}") is compatible
with Python 3.11.

In `@src/lightspeed_evaluation/core/models/data.py`:
- Around line 506-509: The model contains two similarly named fields,
metric_metadata and metrics_metadata, both Optional[str] with JSON descriptions
and both populated from the same helper
(self._extract_metadata_for_csv(request)), causing redundancy; decide whether to
remove one or make them distinct: either delete the unused field (remove
metrics_metadata or metric_metadata) and update any references to the remaining
field, or rename and clarify their Field descriptions and change the evaluator
code so each is assigned from the appropriate extractor (e.g., keep
metric_metadata = self._extract_metadata_for_csv(request) and implement/point
metrics_metadata to a different helper or data source). Update the Field
descriptions for metric_metadata/metrics_metadata to clearly state their
different purposes and adjust all places that set them (references to
self._extract_metadata_for_csv in the evaluator) so they no longer duplicate the
same JSON content.

In `@tests/unit/core/system/test_loader.py`:
- Line 188: Remove the debug print statement print("something") from the test
body (delete that line in the failing test in
tests/unit/core/system/test_loader.py) so tests don't output noisy debug text;
keep the test's assertions and logic intact and run the unit tests to verify no
behavior change.

In `@tests/unit/pipeline/evaluation/test_evaluator.py`:
- Around line 916-1088: The three test methods
test_context_warning_when_zero_contexts_rag_enabled,
test_no_context_warning_when_rag_disabled, and
test_no_context_warning_above_threshold are defined inside TestTokenTracker but
exercise MetricsEvaluator.evaluate_metric() using fixtures for MetricsEvaluator;
move these three methods out of the TestTokenTracker class and place them into
the TestMetricsEvaluator class so they run with the correct fixture/context and
clearly reference MetricsEvaluator (keep the method bodies unchanged, only
relocate the methods).

---

Outside diff comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/evaluator.py`:
- Line 169: The return currently calls self._extract_metadata_for_csv(request)
twice for metric_metadata and metrics_metadata; fix by calling
_extract_metadata_for_csv(request) once (e.g., metadata =
self._extract_metadata_for_csv(request)) and reuse that variable for both
metric_metadata and metrics_metadata, or if metrics_metadata is redundant remove
one of the fields (adjust any consumers accordingly) so the computation is not
duplicated; ensure references to metric_metadata and metrics_metadata in the
return and related code use the single variable or the removed field is cleaned
up.

---

Nitpick comments:
In `@tests/unit/core/system/test_loader.py`:
- Around line 170-174: Update the test fixture to reflect the new CSV column
name and assertions for the context_warning feature: replace the 'contexts'
entry in the csv_columns fixture with 'context_warning', and update any related
assertions in the test function(s) that reference csv_columns (or check exported
CSV headers/rows) to expect and validate the new 'context_warning' column and
its values; look for usages of the csv_columns variable and assertions around
exported CSV headers/rows in test_loader.py (and the tests around the same
block) and change them to assert presence and correct values of
'context_warning' instead of 'contexts'.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e395459 and 01e48c8.

📒 Files selected for processing (9)
  • config/system.yaml
  • src/lightspeed_evaluation/core/api/client.py
  • src/lightspeed_evaluation/core/constants.py
  • src/lightspeed_evaluation/core/models/api.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • tests/unit/core/system/test_loader.py
  • tests/unit/pipeline/evaluation/test_evaluator.py

Comment on lines +209 to +210
message=f"Agent API error: {
response.status_code} - {error_msg}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Python version requirements in project configuration files
echo "=== Checking pyproject.toml ==="
find . -name 'pyproject.toml' -type f | head -1 | xargs cat 2>/dev/null | head -100

echo -e "\n=== Checking setup.py ==="
find . -name 'setup.py' -type f | head -1 | xargs cat 2>/dev/null | head -50

echo -e "\n=== Checking setup.cfg ==="
find . -name 'setup.cfg' -type f | head -1 | xargs cat 2>/dev/null | head -50

echo -e "\n=== Checking for Python version in tox.ini ==="
find . -name 'tox.ini' -type f | head -1 | xargs cat 2>/dev/null

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 3043


Fix multi-line f-string syntax for Python 3.11 compatibility.

The project requires Python 3.11+ (requires-python = ">=3.11,<3.14" in pyproject.toml), but this f-string syntax with expressions spanning multiple lines is only supported in Python 3.12+ (PEP 701). Reformat the f-string to fit on a single line or use string concatenation to maintain Python 3.11 compatibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_evaluation/core/api/client.py` around lines 209 - 210, The
multi-line f-string in the error message (inside the Agent API error handling)
is invalid in Python 3.11; change the f-string used for message (the expression
that currently spans lines including response.status_code and error_msg) to a
single-line f-string or use string concatenation (e.g., "Agent API error: " +
str(response.status_code) + " - " + error_msg) so the message construction in
the client error handling (the code that builds
message=f"...{response.status_code} - {error_msg}") is compatible with Python
3.11.

Comment on lines +506 to +509
metrics_metadata: Optional[str] = Field(
default=None,
description="Additional metric metadata (JSON-encoded key-value pairs)",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify distinction between metrics_metadata and metric_metadata.

There's potential confusion between the new metrics_metadata field (line 506) and the existing metric_metadata field (line 473). Both are Optional[str] with JSON-encoded content descriptions.

Looking at the evaluator code, both are populated with self._extract_metadata_for_csv(request) at lines 169 and 197, which appears to be duplicating the same data.

🔍 Suggested investigation

If both fields serve the same purpose, consider removing one to avoid redundancy. If they serve different purposes, please clarify the distinction in the field descriptions.

 metrics_metadata: Optional[str] = Field(
     default=None,
-    description="Additional metric metadata (JSON-encoded key-value pairs)",
+    description="<clarify how this differs from metric_metadata>",
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_evaluation/core/models/data.py` around lines 506 - 509, The
model contains two similarly named fields, metric_metadata and metrics_metadata,
both Optional[str] with JSON descriptions and both populated from the same
helper (self._extract_metadata_for_csv(request)), causing redundancy; decide
whether to remove one or make them distinct: either delete the unused field
(remove metrics_metadata or metric_metadata) and update any references to the
remaining field, or rename and clarify their Field descriptions and change the
evaluator code so each is assigned from the appropriate extractor (e.g., keep
metric_metadata = self._extract_metadata_for_csv(request) and implement/point
metrics_metadata to a different helper or data source). Update the Field
descriptions for metric_metadata/metrics_metadata to clearly state their
different purposes and adjust all places that set them (references to
self._extract_metadata_for_csv in the evaluator) so they no longer duplicate the
same JSON content.

try:
loader = ConfigLoader()
config = loader.load_system_config(temp_path)
print("something")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug print from test body.

print("something") at Line 188 is a debug artifact and adds noise to test runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/core/system/test_loader.py` at line 188, Remove the debug print
statement print("something") from the test body (delete that line in the failing
test in tests/unit/core/system/test_loader.py) so tests don't output noisy debug
text; keep the test's assertions and logic intact and run the unit tests to
verify no behavior change.

Comment on lines +916 to +1088
def test_context_warning_when_zero_contexts_rag_enabled(
self,
config_loader: ConfigLoader,
mock_metric_manager: MetricManager,
mock_script_manager: ScriptExecutionManager,
mocker: MockerFixture,
) -> None:
"""Test context warning is created when RAG is enabled and contexts are zero."""
# Mock the handlers
mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager"
)

mock_ragas = mocker.Mock()
mock_ragas.evaluate.return_value = (0.85, "Good faithfulness")
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics",
return_value=mock_ragas,
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics"
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics"
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics"
)

# Configure RAG enabled (no_rag = False) and context_threshold = 1
assert config_loader.system_config is not None
config_loader.system_config.api.no_rag = False
config_loader.system_config.api.context_threshold = 1

evaluator = MetricsEvaluator(
config_loader, mock_metric_manager, mock_script_manager
)

# Create turn data with zero contexts (None = no contexts)
turn_data = TurnData(
turn_id="1",
query="What is Python?",
response="Python is a programming language.",
contexts=None, # Zero contexts
)
conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])

request = EvaluationRequest.for_turn(
conv_data, "ragas:faithfulness", 0, turn_data
)

result = evaluator.evaluate_metric(request)

assert result is not None
assert result.context_warning is not None
assert isinstance(result.context_warning, str)
assert result.context_warning.find("Low context count (0/1)") != -1
assert result.context_warning.find("Potential for new content needs") != -1

def test_no_context_warning_when_rag_disabled(
self,
config_loader: ConfigLoader,
mock_metric_manager: MetricManager,
mock_script_manager: ScriptExecutionManager,
mocker: MockerFixture,
) -> None:
"""Test no context warning when RAG is disabled (no_rag mode)."""
# Mock the handlers
mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager"
)

mock_ragas = mocker.Mock()
mock_ragas.evaluate.return_value = (0.85, "Good faithfulness")
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics",
return_value=mock_ragas,
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics"
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics"
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics"
)

# Configure RAG disabled (no_rag = True)
assert config_loader.system_config is not None
config_loader.system_config.api.no_rag = True
config_loader.system_config.api.context_threshold = 1

evaluator = MetricsEvaluator(
config_loader, mock_metric_manager, mock_script_manager
)

# Create turn data with zero contexts (expected in no_rag mode)
turn_data = TurnData(
turn_id="1",
query="What is Python?",
response="Python is a programming language.",
contexts=None, # Zero contexts (expected)
)
conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])

request = EvaluationRequest.for_turn(
conv_data, "ragas:faithfulness", 0, turn_data
)

result = evaluator.evaluate_metric(request)

assert result is not None
assert result.context_warning is None # No warning when RAG is disabled

def test_no_context_warning_above_threshold(
self,
config_loader: ConfigLoader,
mock_metric_manager: MetricManager,
mock_script_manager: ScriptExecutionManager,
mocker: MockerFixture,
) -> None:
"""Test no context warning when contexts are above threshold."""
# Mock the handlers
mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager"
)

mock_ragas = mocker.Mock()
mock_ragas.evaluate.return_value = (0.85, "Good faithfulness")
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics",
return_value=mock_ragas,
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics"
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics"
)
mocker.patch(
"lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics"
)

# Configure RAG enabled and context_threshold = 2
assert config_loader.system_config is not None
config_loader.system_config.api.no_rag = False
config_loader.system_config.api.context_threshold = 2

evaluator = MetricsEvaluator(
config_loader, mock_metric_manager, mock_script_manager
)

# Create turn data with 3 contexts (above threshold of 2)
turn_data = TurnData(
turn_id="1",
query="What is Python?",
response="Python is a programming language.",
contexts=["Context 1", "Context 2", "Context 3"], # 3 contexts
)
conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])

request = EvaluationRequest.for_turn(
conv_data, "ragas:faithfulness", 0, turn_data
)

result = evaluator.evaluate_metric(request)

assert result is not None
assert result.context_warning is None # No warning (3 >= 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tests are misplaced in the wrong test class.

These three new test methods (test_context_warning_when_zero_contexts_rag_enabled, test_no_context_warning_when_rag_disabled, test_no_context_warning_above_threshold) are placed inside the TestTokenTracker class, but they test MetricsEvaluator.evaluate_metric() functionality, not TokenTracker.

The tests use fixtures (config_loader, mock_metric_manager, mock_script_manager) that are designed for MetricsEvaluator tests and create a MetricsEvaluator instance to test context warning behavior.

🐛 Proposed fix - move tests to TestMetricsEvaluator

Move these three test methods from TestTokenTracker to TestMetricsEvaluator:

 class TestMetricsEvaluator:
     """Unit tests for MetricsEvaluator."""
 
     # ... existing tests ...
 
+    def test_context_warning_when_zero_contexts_rag_enabled(
+        self,
+        config_loader: ConfigLoader,
+        mock_metric_manager: MetricManager,
+        mock_script_manager: ScriptExecutionManager,
+        mocker: MockerFixture,
+    ) -> None:
+        """Test context warning is created when RAG is enabled and contexts are zero."""
+        # ... test implementation ...
+
+    def test_no_context_warning_when_rag_disabled(
+        # ... rest of the tests ...
+
+    def test_no_context_warning_above_threshold(
+        # ... rest of the tests ...
+

 class TestTokenTracker:
     """Unit tests for TokenTracker class."""
 
     # ... existing TokenTracker tests only ...
-
-    def test_context_warning_when_zero_contexts_rag_enabled(
-        # Remove from here
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/pipeline/evaluation/test_evaluator.py` around lines 916 - 1088,
The three test methods test_context_warning_when_zero_contexts_rag_enabled,
test_no_context_warning_when_rag_disabled, and
test_no_context_warning_above_threshold are defined inside TestTokenTracker but
exercise MetricsEvaluator.evaluate_metric() using fixtures for MetricsEvaluator;
move these three methods out of the TestTokenTracker class and place them into
the TestMetricsEvaluator class so they run with the correct fixture/context and
clearly reference MetricsEvaluator (keep the method bodies unchanged, only
relocate the methods).

@asamal4
Copy link
Collaborator

asamal4 commented Feb 27, 2026

@x86girl wondering which API are you using ? lightspeed-stack doesn't have these attributes as part of API request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants