RAG and NO_RAG feature by x86girl · Pull Request #174 · lightspeed-core/lightspeed-evaluation

x86girl · 2026-02-26T13:05:34Z

This PR introduces three features for enhanced RAG experiment control:

a no_rag configuration option that allows disabling retrieval/RAG in API calls to run
baseline experiments comparing pure LLM vs RAG-enhanced performance
context threshold flagging that automatically detects and warns when retrieved context counts fall
below a configurable threshold, and (3) a context_warning column in CSV output to track these issues across evaluation runs.

Summary by CodeRabbit

Release Notes

New Features

Configuration option to disable retrieval/RAG in API calls for baseline experiments
Configuration setting for minimum context requirements with automatic flagging
New CSV output columns: context warnings (alerts when contexts below threshold) and metrics metadata

This commit introduces a new optional boolean field to the APIConfig model that allows disabling retrieval/RAG functionality in API calls. This is useful for running baseline experiments without retrieval augmentation.

This commit implements context threshold flagging to detect when retrieved contexts are below a configurable threshold during RAG experiments. Changes: - Add context_threshold field to APIConfig (default: 1) - Add context_warning field to EvaluationResult - Implement flagging logic in MetricsEvaluator.evaluate_metric() - Only flags when RAG is enabled (no_rag != True) - Creates warning when context_count < context_threshold - Update system.yaml with context_threshold configuration The flagging logic helps identify potential gaps in RAG knowledge base by highlighting queries that retrieve insufficient context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add three unit tests to verify context warning behavior: 1. test_context_warning_when_zero_contexts_rag_enabled - Verifies warning is created when RAG is enabled and contexts are zero 2. test_no_context_warning_when_rag_disabled - Verifies no warning when RAG is disabled (no_rag mode) 3. test_no_context_warning_above_threshold - Verifies no warning when contexts are above threshold All tests pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds context_warning to the system configuration and the list of supported CSV columns to allow tracking of context-related issues during evaluation.

coderabbitai · 2026-02-26T13:06:26Z

Walkthrough

This PR adds RAG disablement and context threshold capabilities to the evaluation framework. It introduces configuration options to disable retrieval-augmented generation and detect when contexts fall below a minimum threshold, includes corresponding model updates, implements context warning generation in the evaluation pipeline, and adds comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Configuration and Model Schema `config/system.yaml`, `src/lightspeed_evaluation/core/constants.py`, `src/lightspeed_evaluation/core/models/system.py`, `src/lightspeed_evaluation/core/models/api.py`, `src/lightspeed_evaluation/core/models/data.py`	Added `no_rag` (boolean, default True) and `context_threshold` (integer, default 1) configuration fields; extended EvaluationResult with `metrics_metadata` and `context_warning` optional string fields; updated SUPPORTED_CSV_COLUMNS to include "context_warning".
API Client Implementation `src/lightspeed_evaluation/core/api/client.py`	Propagated `no_rag` configuration into APIRequest payload during request preparation; refactored HTTP error message formatting (note: potential f-string syntax issue with multiline continuation).
Evaluation Logic `src/lightspeed_evaluation/pipeline/evaluation/evaluator.py`	Implemented context threshold validation to generate `context_warning` when RAG is enabled and context count falls below threshold; added `metrics_metadata` extraction and propagation into EvaluationResult response.
Test Coverage `tests/unit/core/system/test_loader.py`, `tests/unit/pipeline/evaluation/test_evaluator.py`	Added tests verifying config loading with `no_rag` option; added comprehensive tests for context warning generation under various RAG and threshold scenarios (zero contexts, threshold exceeded, RAG disabled).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

[LEADS-182] - Add eval config data to the report #142: Introduces metrics_metadata field to EvaluationResult and populates it in MetricsEvaluator—directly overlaps with this PR's metadata handling logic.
add additional fields to output for non-error scenarios #114: Modifies the same EvaluationResult model and SUPPORTED_CSV_COLUMNS constants, adding different output fields—potential for merge conflicts or inconsistent field handling.

Suggested reviewers

asamal4
VladimirKadlec

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'RAG and NO_RAG feature' is vague and generic, using broad terms that do not clearly convey what the changeset accomplishes.	Provide a more descriptive title that highlights the main feature, such as 'Add no_rag configuration and context threshold warning' or 'Implement RAG disable toggle and low-context warnings'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

x86girl · 2026-02-26T15:51:12Z

Old Behavior:

{
  "query": "What is Kubernetes?",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "no_tools": null
  // no_rag parameter does NOT exist
}

New behavior

New no_rag configuration option available
Default is true (RAG disabled by default!)
Users can control RAG behavior per experiment

{
  "query": "What is Kubernetes?",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "no_tools": null,
  **"no_rag": true**  
}

x86girl · 2026-02-26T19:32:54Z

Just explaining the feature, the user (optionally) adds no_rag: true (default) or false (using rag).
In case that the user uses rag, this person needs to add the number of contexts inside the context_threshold parameter.
Inside ./src/lightspeed_evaluation/core/api/client.py, we refine the query to return a json like
{ "query": "What is Kubernetes?", "provider": "openai", "model": "gpt-4o-mini", "no_tools": null, "no_rag": true }

We evaluate the number of the contexts inside ./src/lightspeed_evaluation/pipeline/evaluation/evaluator.py in case if rag is enabled.

Then, a context warning is returned inside the csv in case rag is turned on with a defined number of contexts.

Some tests were created to test the code and make sure it is working as expected.

x86girl · 2026-02-26T19:33:51Z

@asamal4 can you please take a look here?

- Fix Ruff E714: Change `not (x is True)` to `x is not True` - Fix Pylint C0325: Remove unnecessary parentheses after `not` keyword - Fix Pyright reportOptionalMemberAccess: Add assertions for non-None system_config - Update test assertions to use `.find()` instead of `in` operator for pylint compatibility This commit addresses all linting and type checking errors introduced in commits 005d062 and cde2017 without changing functionality. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

169-169: ⚠️ Potential issue | 🟡 Minor

Duplicate call to _extract_metadata_for_csv.

_extract_metadata_for_csv(request) is called twice in the same return statement - once for metric_metadata (line 169) and again for metrics_metadata (line 197). This results in the same computation being performed twice and the same data being stored in two fields.

♻️ Suggested fix

If both fields are intended to have the same value, extract once and reuse:

+            extracted_metadata = self._extract_metadata_for_csv(request)
+
             return EvaluationResult(
                 **metric_result.model_dump(),
                 conversation_group_id=request.conv_data.conversation_group_id,
                 tag=request.conv_data.tag,
                 turn_id=request.turn_id,
                 metric_identifier=request.metric_identifier,
-                metric_metadata=self._extract_metadata_for_csv(request),
+                metric_metadata=extracted_metadata,
                 # ... other fields ...
-                metrics_metadata=self._extract_metadata_for_csv(request),
+                metrics_metadata=extracted_metadata,
                 context_warning=context_warning,
             )

However, if both fields serve the same purpose, consider removing one of them entirely (see related comment in data.py).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lightspeed_evaluation/pipeline/evaluation/evaluator.py` at line 169, The
return currently calls self._extract_metadata_for_csv(request) twice for
metric_metadata and metrics_metadata; fix by calling
_extract_metadata_for_csv(request) once (e.g., metadata =
self._extract_metadata_for_csv(request)) and reuse that variable for both
metric_metadata and metrics_metadata, or if metrics_metadata is redundant remove
one of the fields (adjust any consumers accordingly) so the computation is not
duplicated; ensure references to metric_metadata and metrics_metadata in the
return and related code use the single variable or the removed field is cleaned
up.

🧹 Nitpick comments (1)

tests/unit/core/system/test_loader.py (1)

170-174: Align this fixture with the new context_warning CSV feature.

Given this PR adds context_warning output tracking, using contexts plus a historical inline comment weakens feature coverage and is misleading.

Proposed test fixture/assertion update

   output:
     output_dir: ./no_rag_output
     csv_columns:
       - conversation_group_id
       - turn_id
-      - contexts  # Changed from 'context_warning' to 'contexts' (valid column)
+      - context_warning
@@
             assert config.api.enabled is True
             assert config.api.no_rag is True
+            assert "context_warning" in config.output.csv_columns
             assert loader.logger is not None

Also applies to: 189-191

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/core/system/test_loader.py` around lines 170 - 174, Update the
test fixture to reflect the new CSV column name and assertions for the
context_warning feature: replace the 'contexts' entry in the csv_columns fixture
with 'context_warning', and update any related assertions in the test
function(s) that reference csv_columns (or check exported CSV headers/rows) to
expect and validate the new 'context_warning' column and its values; look for
usages of the csv_columns variable and assertions around exported CSV
headers/rows in test_loader.py (and the tests around the same block) and change
them to assert presence and correct values of 'context_warning' instead of
'contexts'.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lightspeed_evaluation/core/api/client.py`:
- Around line 209-210: The multi-line f-string in the error message (inside the
Agent API error handling) is invalid in Python 3.11; change the f-string used
for message (the expression that currently spans lines including
response.status_code and error_msg) to a single-line f-string or use string
concatenation (e.g., "Agent API error: " + str(response.status_code) + " - " +
error_msg) so the message construction in the client error handling (the code
that builds message=f"...{response.status_code} - {error_msg}") is compatible
with Python 3.11.

In `@src/lightspeed_evaluation/core/models/data.py`:
- Around line 506-509: The model contains two similarly named fields,
metric_metadata and metrics_metadata, both Optional[str] with JSON descriptions
and both populated from the same helper
(self._extract_metadata_for_csv(request)), causing redundancy; decide whether to
remove one or make them distinct: either delete the unused field (remove
metrics_metadata or metric_metadata) and update any references to the remaining
field, or rename and clarify their Field descriptions and change the evaluator
code so each is assigned from the appropriate extractor (e.g., keep
metric_metadata = self._extract_metadata_for_csv(request) and implement/point
metrics_metadata to a different helper or data source). Update the Field
descriptions for metric_metadata/metrics_metadata to clearly state their
different purposes and adjust all places that set them (references to
self._extract_metadata_for_csv in the evaluator) so they no longer duplicate the
same JSON content.

In `@tests/unit/core/system/test_loader.py`:
- Line 188: Remove the debug print statement print("something") from the test
body (delete that line in the failing test in
tests/unit/core/system/test_loader.py) so tests don't output noisy debug text;
keep the test's assertions and logic intact and run the unit tests to verify no
behavior change.

In `@tests/unit/pipeline/evaluation/test_evaluator.py`:
- Around line 916-1088: The three test methods
test_context_warning_when_zero_contexts_rag_enabled,
test_no_context_warning_when_rag_disabled, and
test_no_context_warning_above_threshold are defined inside TestTokenTracker but
exercise MetricsEvaluator.evaluate_metric() using fixtures for MetricsEvaluator;
move these three methods out of the TestTokenTracker class and place them into
the TestMetricsEvaluator class so they run with the correct fixture/context and
clearly reference MetricsEvaluator (keep the method bodies unchanged, only
relocate the methods).

---

Outside diff comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/evaluator.py`:
- Line 169: The return currently calls self._extract_metadata_for_csv(request)
twice for metric_metadata and metrics_metadata; fix by calling
_extract_metadata_for_csv(request) once (e.g., metadata =
self._extract_metadata_for_csv(request)) and reuse that variable for both
metric_metadata and metrics_metadata, or if metrics_metadata is redundant remove
one of the fields (adjust any consumers accordingly) so the computation is not
duplicated; ensure references to metric_metadata and metrics_metadata in the
return and related code use the single variable or the removed field is cleaned
up.

---

Nitpick comments:
In `@tests/unit/core/system/test_loader.py`:
- Around line 170-174: Update the test fixture to reflect the new CSV column
name and assertions for the context_warning feature: replace the 'contexts'
entry in the csv_columns fixture with 'context_warning', and update any related
assertions in the test function(s) that reference csv_columns (or check exported
CSV headers/rows) to expect and validate the new 'context_warning' column and
its values; look for usages of the csv_columns variable and assertions around
exported CSV headers/rows in test_loader.py (and the tests around the same
block) and change them to assert presence and correct values of
'context_warning' instead of 'contexts'.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e395459 and 01e48c8.

📒 Files selected for processing (9)

config/system.yaml
src/lightspeed_evaluation/core/api/client.py
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/models/api.py
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
tests/unit/core/system/test_loader.py
tests/unit/pipeline/evaluation/test_evaluator.py

coderabbitai · 2026-02-26T20:33:12Z

src/lightspeed_evaluation/core/api/client.py

+                message=f"Agent API error: {
+                    response.status_code} - {error_msg}",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check Python version requirements in project configuration files echo "=== Checking pyproject.toml ===" find . -name 'pyproject.toml' -type f | head -1 | xargs cat 2>/dev/null | head -100 echo -e "\n=== Checking setup.py ===" find . -name 'setup.py' -type f | head -1 | xargs cat 2>/dev/null | head -50 echo -e "\n=== Checking setup.cfg ===" find . -name 'setup.cfg' -type f | head -1 | xargs cat 2>/dev/null | head -50 echo -e "\n=== Checking for Python version in tox.ini ===" find . -name 'tox.ini' -type f | head -1 | xargs cat 2>/dev/null

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 3043

Fix multi-line f-string syntax for Python 3.11 compatibility.

The project requires Python 3.11+ (requires-python = ">=3.11,<3.14" in pyproject.toml), but this f-string syntax with expressions spanning multiple lines is only supported in Python 3.12+ (PEP 701). Reformat the f-string to fit on a single line or use string concatenation to maintain Python 3.11 compatibility.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/api/client.py` around lines 209 - 210, The multi-line f-string in the error message (inside the Agent API error handling) is invalid in Python 3.11; change the f-string used for message (the expression that currently spans lines including response.status_code and error_msg) to a single-line f-string or use string concatenation (e.g., "Agent API error: " + str(response.status_code) + " - " + error_msg) so the message construction in the client error handling (the code that builds message=f"...{response.status_code} - {error_msg}") is compatible with Python 3.11.

coderabbitai · 2026-02-26T20:33:12Z

src/lightspeed_evaluation/core/models/data.py

+    metrics_metadata: Optional[str] = Field(
+        default=None,
+        description="Additional metric metadata (JSON-encoded key-value pairs)",
+    )


⚠️ Potential issue | 🟡 Minor

Clarify distinction between metrics_metadata and metric_metadata.

There's potential confusion between the new metrics_metadata field (line 506) and the existing metric_metadata field (line 473). Both are Optional[str] with JSON-encoded content descriptions.

Looking at the evaluator code, both are populated with self._extract_metadata_for_csv(request) at lines 169 and 197, which appears to be duplicating the same data.

🔍 Suggested investigation

If both fields serve the same purpose, consider removing one to avoid redundancy. If they serve different purposes, please clarify the distinction in the field descriptions.

metrics_metadata: Optional[str] = Field( default=None, - description="Additional metric metadata (JSON-encoded key-value pairs)", + description="<clarify how this differs from metric_metadata>", )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/core/models/data.py` around lines 506 - 509, The model contains two similarly named fields, metric_metadata and metrics_metadata, both Optional[str] with JSON descriptions and both populated from the same helper (self._extract_metadata_for_csv(request)), causing redundancy; decide whether to remove one or make them distinct: either delete the unused field (remove metrics_metadata or metric_metadata) and update any references to the remaining field, or rename and clarify their Field descriptions and change the evaluator code so each is assigned from the appropriate extractor (e.g., keep metric_metadata = self._extract_metadata_for_csv(request) and implement/point metrics_metadata to a different helper or data source). Update the Field descriptions for metric_metadata/metrics_metadata to clearly state their different purposes and adjust all places that set them (references to self._extract_metadata_for_csv in the evaluator) so they no longer duplicate the same JSON content.

coderabbitai · 2026-02-26T20:33:12Z

tests/unit/core/system/test_loader.py

+        try:
+            loader = ConfigLoader()
+            config = loader.load_system_config(temp_path)
+            print("something")


⚠️ Potential issue | 🟡 Minor

Remove debug print from test body.

print("something") at Line 188 is a debug artifact and adds noise to test runs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/core/system/test_loader.py` at line 188, Remove the debug print statement print("something") from the test body (delete that line in the failing test in tests/unit/core/system/test_loader.py) so tests don't output noisy debug text; keep the test's assertions and logic intact and run the unit tests to verify no behavior change.

coderabbitai · 2026-02-26T20:33:12Z

tests/unit/pipeline/evaluation/test_evaluator.py

+    def test_context_warning_when_zero_contexts_rag_enabled(
+        self,
+        config_loader: ConfigLoader,
+        mock_metric_manager: MetricManager,
+        mock_script_manager: ScriptExecutionManager,
+        mocker: MockerFixture,
+    ) -> None:
+        """Test context warning is created when RAG is enabled and contexts are zero."""
+        # Mock the handlers
+        mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager"
+        )
+
+        mock_ragas = mocker.Mock()
+        mock_ragas.evaluate.return_value = (0.85, "Good faithfulness")
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics",
+            return_value=mock_ragas,
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics"
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics"
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics"
+        )
+
+        # Configure RAG enabled (no_rag = False) and context_threshold = 1
+        assert config_loader.system_config is not None
+        config_loader.system_config.api.no_rag = False
+        config_loader.system_config.api.context_threshold = 1
+
+        evaluator = MetricsEvaluator(
+            config_loader, mock_metric_manager, mock_script_manager
+        )
+
+        # Create turn data with zero contexts (None = no contexts)
+        turn_data = TurnData(
+            turn_id="1",
+            query="What is Python?",
+            response="Python is a programming language.",
+            contexts=None,  # Zero contexts
+        )
+        conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])
+
+        request = EvaluationRequest.for_turn(
+            conv_data, "ragas:faithfulness", 0, turn_data
+        )
+
+        result = evaluator.evaluate_metric(request)
+
+        assert result is not None
+        assert result.context_warning is not None
+        assert isinstance(result.context_warning, str)
+        assert result.context_warning.find("Low context count (0/1)") != -1
+        assert result.context_warning.find("Potential for new content needs") != -1
+
+    def test_no_context_warning_when_rag_disabled(
+        self,
+        config_loader: ConfigLoader,
+        mock_metric_manager: MetricManager,
+        mock_script_manager: ScriptExecutionManager,
+        mocker: MockerFixture,
+    ) -> None:
+        """Test no context warning when RAG is disabled (no_rag mode)."""
+        # Mock the handlers
+        mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager"
+        )
+
+        mock_ragas = mocker.Mock()
+        mock_ragas.evaluate.return_value = (0.85, "Good faithfulness")
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics",
+            return_value=mock_ragas,
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics"
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics"
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics"
+        )
+
+        # Configure RAG disabled (no_rag = True)
+        assert config_loader.system_config is not None
+        config_loader.system_config.api.no_rag = True
+        config_loader.system_config.api.context_threshold = 1
+
+        evaluator = MetricsEvaluator(
+            config_loader, mock_metric_manager, mock_script_manager
+        )
+
+        # Create turn data with zero contexts (expected in no_rag mode)
+        turn_data = TurnData(
+            turn_id="1",
+            query="What is Python?",
+            response="Python is a programming language.",
+            contexts=None,  # Zero contexts (expected)
+        )
+        conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])
+
+        request = EvaluationRequest.for_turn(
+            conv_data, "ragas:faithfulness", 0, turn_data
+        )
+
+        result = evaluator.evaluate_metric(request)
+
+        assert result is not None
+        assert result.context_warning is None  # No warning when RAG is disabled
+
+    def test_no_context_warning_above_threshold(
+        self,
+        config_loader: ConfigLoader,
+        mock_metric_manager: MetricManager,
+        mock_script_manager: ScriptExecutionManager,
+        mocker: MockerFixture,
+    ) -> None:
+        """Test no context warning when contexts are above threshold."""
+        # Mock the handlers
+        mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager"
+        )
+
+        mock_ragas = mocker.Mock()
+        mock_ragas.evaluate.return_value = (0.85, "Good faithfulness")
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics",
+            return_value=mock_ragas,
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics"
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics"
+        )
+        mocker.patch(
+            "lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics"
+        )
+
+        # Configure RAG enabled and context_threshold = 2
+        assert config_loader.system_config is not None
+        config_loader.system_config.api.no_rag = False
+        config_loader.system_config.api.context_threshold = 2
+
+        evaluator = MetricsEvaluator(
+            config_loader, mock_metric_manager, mock_script_manager
+        )
+
+        # Create turn data with 3 contexts (above threshold of 2)
+        turn_data = TurnData(
+            turn_id="1",
+            query="What is Python?",
+            response="Python is a programming language.",
+            contexts=["Context 1", "Context 2", "Context 3"],  # 3 contexts
+        )
+        conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data])
+
+        request = EvaluationRequest.for_turn(
+            conv_data, "ragas:faithfulness", 0, turn_data
+        )
+
+        result = evaluator.evaluate_metric(request)
+
+        assert result is not None
+        assert result.context_warning is None  # No warning (3 >= 2)


⚠️ Potential issue | 🟠 Major

Tests are misplaced in the wrong test class.

These three new test methods (test_context_warning_when_zero_contexts_rag_enabled, test_no_context_warning_when_rag_disabled, test_no_context_warning_above_threshold) are placed inside the TestTokenTracker class, but they test MetricsEvaluator.evaluate_metric() functionality, not TokenTracker.

The tests use fixtures (config_loader, mock_metric_manager, mock_script_manager) that are designed for MetricsEvaluator tests and create a MetricsEvaluator instance to test context warning behavior.

🐛 Proposed fix - move tests to TestMetricsEvaluator

Move these three test methods from TestTokenTracker to TestMetricsEvaluator:

class TestMetricsEvaluator: """Unit tests for MetricsEvaluator.""" # ... existing tests ... + def test_context_warning_when_zero_contexts_rag_enabled( + self, + config_loader: ConfigLoader, + mock_metric_manager: MetricManager, + mock_script_manager: ScriptExecutionManager, + mocker: MockerFixture, + ) -> None: + """Test context warning is created when RAG is enabled and contexts are zero.""" + # ... test implementation ... + + def test_no_context_warning_when_rag_disabled( + # ... rest of the tests ... + + def test_no_context_warning_above_threshold( + # ... rest of the tests ... + class TestTokenTracker: """Unit tests for TokenTracker class.""" # ... existing TokenTracker tests only ... - - def test_context_warning_when_zero_contexts_rag_enabled( - # Remove from here

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/pipeline/evaluation/test_evaluator.py` around lines 916 - 1088, The three test methods test_context_warning_when_zero_contexts_rag_enabled, test_no_context_warning_when_rag_disabled, and test_no_context_warning_above_threshold are defined inside TestTokenTracker but exercise MetricsEvaluator.evaluate_metric() using fixtures for MetricsEvaluator; move these three methods out of the TestTokenTracker class and place them into the TestMetricsEvaluator class so they run with the correct fixture/context and clearly reference MetricsEvaluator (keep the method bodies unchanged, only relocate the methods).

asamal4 · 2026-02-27T07:51:40Z

@x86girl wondering which API are you using ? lightspeed-stack doesn't have these attributes as part of API request

Priscila Gutierres and others added 5 commits February 12, 2026 11:05

feat: add no_rag configuration option to APIConfig

7b37766

This commit introduces a new optional boolean field to the APIConfig model that allows disabling retrieval/RAG functionality in API calls. This is useful for running baseline experiments without retrieval augmentation.

feat: add context_warning column to CSV output

7ed55ae

Adds context_warning to the system configuration and the list of supported CSV columns to allow tracking of context-related issues during evaluation.

Merge branch 'main' into prgutier/rag-feature

385a988

x86girl changed the title ~~[draft] RAG and NO_RAG feature~~ RAG and NO_RAG feature Feb 26, 2026

x86girl marked this pull request as ready for review February 26, 2026 20:29

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG and NO_RAG feature#174

RAG and NO_RAG feature#174
x86girl wants to merge 6 commits intolightspeed-core:mainfrom
x86girl:prgutier/rag-feature

x86girl commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

❌ Failed checks (1 inconclusive)

Uh oh!

x86girl commented Feb 26, 2026

Uh oh!

x86girl commented Feb 26, 2026

Uh oh!

x86girl commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

asamal4 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		message=f"Agent API error: {
		response.status_code} - {error_msg}",

Conversation

x86girl commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

x86girl commented Feb 26, 2026

Uh oh!

x86girl commented Feb 26, 2026

Uh oh!

x86girl commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

asamal4 commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

x86girl commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading