Conversation
This commit introduces a new optional boolean field to the APIConfig model that allows disabling retrieval/RAG functionality in API calls. This is useful for running baseline experiments without retrieval augmentation.
This commit implements context threshold flagging to detect when retrieved contexts are below a configurable threshold during RAG experiments. Changes: - Add context_threshold field to APIConfig (default: 1) - Add context_warning field to EvaluationResult - Implement flagging logic in MetricsEvaluator.evaluate_metric() - Only flags when RAG is enabled (no_rag != True) - Creates warning when context_count < context_threshold - Update system.yaml with context_threshold configuration The flagging logic helps identify potential gaps in RAG knowledge base by highlighting queries that retrieve insufficient context. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add three unit tests to verify context warning behavior: 1. test_context_warning_when_zero_contexts_rag_enabled - Verifies warning is created when RAG is enabled and contexts are zero 2. test_no_context_warning_when_rag_disabled - Verifies no warning when RAG is disabled (no_rag mode) 3. test_no_context_warning_above_threshold - Verifies no warning when contexts are above threshold All tests pass successfully. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds context_warning to the system configuration and the list of supported CSV columns to allow tracking of context-related issues during evaluation.
WalkthroughThis PR adds RAG disablement and context threshold capabilities to the evaluation framework. It introduces configuration options to disable retrieval-augmented generation and detect when contexts fall below a minimum threshold, includes corresponding model updates, implements context warning generation in the evaluation pipeline, and adds comprehensive test coverage. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Old Behavior: {
"query": "What is Kubernetes?",
"provider": "openai",
"model": "gpt-4o-mini",
"no_tools": null
// no_rag parameter does NOT exist
}New behavior
{
"query": "What is Kubernetes?",
"provider": "openai",
"model": "gpt-4o-mini",
"no_tools": null,
**"no_rag": true**
}
|
|
Just explaining the feature, the user (optionally) adds no_rag: true (default) or false (using rag). We evaluate the number of the contexts inside ./src/lightspeed_evaluation/pipeline/evaluation/evaluator.py in case if rag is enabled. Then, a context warning is returned inside the csv in case rag is turned on with a defined number of contexts. Some tests were created to test the code and make sure it is working as expected. |
|
@asamal4 can you please take a look here? |
- Fix Ruff E714: Change `not (x is True)` to `x is not True` - Fix Pylint C0325: Remove unnecessary parentheses after `not` keyword - Fix Pyright reportOptionalMemberAccess: Add assertions for non-None system_config - Update test assertions to use `.find()` instead of `in` operator for pylint compatibility This commit addresses all linting and type checking errors introduced in commits 005d062 and cde2017 without changing functionality. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)
169-169:⚠️ Potential issue | 🟡 MinorDuplicate call to
_extract_metadata_for_csv.
_extract_metadata_for_csv(request)is called twice in the same return statement - once formetric_metadata(line 169) and again formetrics_metadata(line 197). This results in the same computation being performed twice and the same data being stored in two fields.♻️ Suggested fix
If both fields are intended to have the same value, extract once and reuse:
+ extracted_metadata = self._extract_metadata_for_csv(request) + return EvaluationResult( **metric_result.model_dump(), conversation_group_id=request.conv_data.conversation_group_id, tag=request.conv_data.tag, turn_id=request.turn_id, metric_identifier=request.metric_identifier, - metric_metadata=self._extract_metadata_for_csv(request), + metric_metadata=extracted_metadata, # ... other fields ... - metrics_metadata=self._extract_metadata_for_csv(request), + metrics_metadata=extracted_metadata, context_warning=context_warning, )However, if both fields serve the same purpose, consider removing one of them entirely (see related comment in
data.py).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lightspeed_evaluation/pipeline/evaluation/evaluator.py` at line 169, The return currently calls self._extract_metadata_for_csv(request) twice for metric_metadata and metrics_metadata; fix by calling _extract_metadata_for_csv(request) once (e.g., metadata = self._extract_metadata_for_csv(request)) and reuse that variable for both metric_metadata and metrics_metadata, or if metrics_metadata is redundant remove one of the fields (adjust any consumers accordingly) so the computation is not duplicated; ensure references to metric_metadata and metrics_metadata in the return and related code use the single variable or the removed field is cleaned up.
🧹 Nitpick comments (1)
tests/unit/core/system/test_loader.py (1)
170-174: Align this fixture with the newcontext_warningCSV feature.Given this PR adds
context_warningoutput tracking, usingcontextsplus a historical inline comment weakens feature coverage and is misleading.Proposed test fixture/assertion update
output: output_dir: ./no_rag_output csv_columns: - conversation_group_id - turn_id - - contexts # Changed from 'context_warning' to 'contexts' (valid column) + - context_warning @@ assert config.api.enabled is True assert config.api.no_rag is True + assert "context_warning" in config.output.csv_columns assert loader.logger is not NoneAlso applies to: 189-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/core/system/test_loader.py` around lines 170 - 174, Update the test fixture to reflect the new CSV column name and assertions for the context_warning feature: replace the 'contexts' entry in the csv_columns fixture with 'context_warning', and update any related assertions in the test function(s) that reference csv_columns (or check exported CSV headers/rows) to expect and validate the new 'context_warning' column and its values; look for usages of the csv_columns variable and assertions around exported CSV headers/rows in test_loader.py (and the tests around the same block) and change them to assert presence and correct values of 'context_warning' instead of 'contexts'.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lightspeed_evaluation/core/api/client.py`:
- Around line 209-210: The multi-line f-string in the error message (inside the
Agent API error handling) is invalid in Python 3.11; change the f-string used
for message (the expression that currently spans lines including
response.status_code and error_msg) to a single-line f-string or use string
concatenation (e.g., "Agent API error: " + str(response.status_code) + " - " +
error_msg) so the message construction in the client error handling (the code
that builds message=f"...{response.status_code} - {error_msg}") is compatible
with Python 3.11.
In `@src/lightspeed_evaluation/core/models/data.py`:
- Around line 506-509: The model contains two similarly named fields,
metric_metadata and metrics_metadata, both Optional[str] with JSON descriptions
and both populated from the same helper
(self._extract_metadata_for_csv(request)), causing redundancy; decide whether to
remove one or make them distinct: either delete the unused field (remove
metrics_metadata or metric_metadata) and update any references to the remaining
field, or rename and clarify their Field descriptions and change the evaluator
code so each is assigned from the appropriate extractor (e.g., keep
metric_metadata = self._extract_metadata_for_csv(request) and implement/point
metrics_metadata to a different helper or data source). Update the Field
descriptions for metric_metadata/metrics_metadata to clearly state their
different purposes and adjust all places that set them (references to
self._extract_metadata_for_csv in the evaluator) so they no longer duplicate the
same JSON content.
In `@tests/unit/core/system/test_loader.py`:
- Line 188: Remove the debug print statement print("something") from the test
body (delete that line in the failing test in
tests/unit/core/system/test_loader.py) so tests don't output noisy debug text;
keep the test's assertions and logic intact and run the unit tests to verify no
behavior change.
In `@tests/unit/pipeline/evaluation/test_evaluator.py`:
- Around line 916-1088: The three test methods
test_context_warning_when_zero_contexts_rag_enabled,
test_no_context_warning_when_rag_disabled, and
test_no_context_warning_above_threshold are defined inside TestTokenTracker but
exercise MetricsEvaluator.evaluate_metric() using fixtures for MetricsEvaluator;
move these three methods out of the TestTokenTracker class and place them into
the TestMetricsEvaluator class so they run with the correct fixture/context and
clearly reference MetricsEvaluator (keep the method bodies unchanged, only
relocate the methods).
---
Outside diff comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/evaluator.py`:
- Line 169: The return currently calls self._extract_metadata_for_csv(request)
twice for metric_metadata and metrics_metadata; fix by calling
_extract_metadata_for_csv(request) once (e.g., metadata =
self._extract_metadata_for_csv(request)) and reuse that variable for both
metric_metadata and metrics_metadata, or if metrics_metadata is redundant remove
one of the fields (adjust any consumers accordingly) so the computation is not
duplicated; ensure references to metric_metadata and metrics_metadata in the
return and related code use the single variable or the removed field is cleaned
up.
---
Nitpick comments:
In `@tests/unit/core/system/test_loader.py`:
- Around line 170-174: Update the test fixture to reflect the new CSV column
name and assertions for the context_warning feature: replace the 'contexts'
entry in the csv_columns fixture with 'context_warning', and update any related
assertions in the test function(s) that reference csv_columns (or check exported
CSV headers/rows) to expect and validate the new 'context_warning' column and
its values; look for usages of the csv_columns variable and assertions around
exported CSV headers/rows in test_loader.py (and the tests around the same
block) and change them to assert presence and correct values of
'context_warning' instead of 'contexts'.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
config/system.yamlsrc/lightspeed_evaluation/core/api/client.pysrc/lightspeed_evaluation/core/constants.pysrc/lightspeed_evaluation/core/models/api.pysrc/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/core/models/system.pysrc/lightspeed_evaluation/pipeline/evaluation/evaluator.pytests/unit/core/system/test_loader.pytests/unit/pipeline/evaluation/test_evaluator.py
| message=f"Agent API error: { | ||
| response.status_code} - {error_msg}", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check Python version requirements in project configuration files
echo "=== Checking pyproject.toml ==="
find . -name 'pyproject.toml' -type f | head -1 | xargs cat 2>/dev/null | head -100
echo -e "\n=== Checking setup.py ==="
find . -name 'setup.py' -type f | head -1 | xargs cat 2>/dev/null | head -50
echo -e "\n=== Checking setup.cfg ==="
find . -name 'setup.cfg' -type f | head -1 | xargs cat 2>/dev/null | head -50
echo -e "\n=== Checking for Python version in tox.ini ==="
find . -name 'tox.ini' -type f | head -1 | xargs cat 2>/dev/nullRepository: lightspeed-core/lightspeed-evaluation
Length of output: 3043
Fix multi-line f-string syntax for Python 3.11 compatibility.
The project requires Python 3.11+ (requires-python = ">=3.11,<3.14" in pyproject.toml), but this f-string syntax with expressions spanning multiple lines is only supported in Python 3.12+ (PEP 701). Reformat the f-string to fit on a single line or use string concatenation to maintain Python 3.11 compatibility.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lightspeed_evaluation/core/api/client.py` around lines 209 - 210, The
multi-line f-string in the error message (inside the Agent API error handling)
is invalid in Python 3.11; change the f-string used for message (the expression
that currently spans lines including response.status_code and error_msg) to a
single-line f-string or use string concatenation (e.g., "Agent API error: " +
str(response.status_code) + " - " + error_msg) so the message construction in
the client error handling (the code that builds
message=f"...{response.status_code} - {error_msg}") is compatible with Python
3.11.
| metrics_metadata: Optional[str] = Field( | ||
| default=None, | ||
| description="Additional metric metadata (JSON-encoded key-value pairs)", | ||
| ) |
There was a problem hiding this comment.
Clarify distinction between metrics_metadata and metric_metadata.
There's potential confusion between the new metrics_metadata field (line 506) and the existing metric_metadata field (line 473). Both are Optional[str] with JSON-encoded content descriptions.
Looking at the evaluator code, both are populated with self._extract_metadata_for_csv(request) at lines 169 and 197, which appears to be duplicating the same data.
🔍 Suggested investigation
If both fields serve the same purpose, consider removing one to avoid redundancy. If they serve different purposes, please clarify the distinction in the field descriptions.
metrics_metadata: Optional[str] = Field(
default=None,
- description="Additional metric metadata (JSON-encoded key-value pairs)",
+ description="<clarify how this differs from metric_metadata>",
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lightspeed_evaluation/core/models/data.py` around lines 506 - 509, The
model contains two similarly named fields, metric_metadata and metrics_metadata,
both Optional[str] with JSON descriptions and both populated from the same
helper (self._extract_metadata_for_csv(request)), causing redundancy; decide
whether to remove one or make them distinct: either delete the unused field
(remove metrics_metadata or metric_metadata) and update any references to the
remaining field, or rename and clarify their Field descriptions and change the
evaluator code so each is assigned from the appropriate extractor (e.g., keep
metric_metadata = self._extract_metadata_for_csv(request) and implement/point
metrics_metadata to a different helper or data source). Update the Field
descriptions for metric_metadata/metrics_metadata to clearly state their
different purposes and adjust all places that set them (references to
self._extract_metadata_for_csv in the evaluator) so they no longer duplicate the
same JSON content.
| try: | ||
| loader = ConfigLoader() | ||
| config = loader.load_system_config(temp_path) | ||
| print("something") |
There was a problem hiding this comment.
Remove debug print from test body.
print("something") at Line 188 is a debug artifact and adds noise to test runs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/core/system/test_loader.py` at line 188, Remove the debug print
statement print("something") from the test body (delete that line in the failing
test in tests/unit/core/system/test_loader.py) so tests don't output noisy debug
text; keep the test's assertions and logic intact and run the unit tests to
verify no behavior change.
| def test_context_warning_when_zero_contexts_rag_enabled( | ||
| self, | ||
| config_loader: ConfigLoader, | ||
| mock_metric_manager: MetricManager, | ||
| mock_script_manager: ScriptExecutionManager, | ||
| mocker: MockerFixture, | ||
| ) -> None: | ||
| """Test context warning is created when RAG is enabled and contexts are zero.""" | ||
| # Mock the handlers | ||
| mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager") | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager" | ||
| ) | ||
|
|
||
| mock_ragas = mocker.Mock() | ||
| mock_ragas.evaluate.return_value = (0.85, "Good faithfulness") | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics", | ||
| return_value=mock_ragas, | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics" | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics" | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics" | ||
| ) | ||
|
|
||
| # Configure RAG enabled (no_rag = False) and context_threshold = 1 | ||
| assert config_loader.system_config is not None | ||
| config_loader.system_config.api.no_rag = False | ||
| config_loader.system_config.api.context_threshold = 1 | ||
|
|
||
| evaluator = MetricsEvaluator( | ||
| config_loader, mock_metric_manager, mock_script_manager | ||
| ) | ||
|
|
||
| # Create turn data with zero contexts (None = no contexts) | ||
| turn_data = TurnData( | ||
| turn_id="1", | ||
| query="What is Python?", | ||
| response="Python is a programming language.", | ||
| contexts=None, # Zero contexts | ||
| ) | ||
| conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data]) | ||
|
|
||
| request = EvaluationRequest.for_turn( | ||
| conv_data, "ragas:faithfulness", 0, turn_data | ||
| ) | ||
|
|
||
| result = evaluator.evaluate_metric(request) | ||
|
|
||
| assert result is not None | ||
| assert result.context_warning is not None | ||
| assert isinstance(result.context_warning, str) | ||
| assert result.context_warning.find("Low context count (0/1)") != -1 | ||
| assert result.context_warning.find("Potential for new content needs") != -1 | ||
|
|
||
| def test_no_context_warning_when_rag_disabled( | ||
| self, | ||
| config_loader: ConfigLoader, | ||
| mock_metric_manager: MetricManager, | ||
| mock_script_manager: ScriptExecutionManager, | ||
| mocker: MockerFixture, | ||
| ) -> None: | ||
| """Test no context warning when RAG is disabled (no_rag mode).""" | ||
| # Mock the handlers | ||
| mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager") | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager" | ||
| ) | ||
|
|
||
| mock_ragas = mocker.Mock() | ||
| mock_ragas.evaluate.return_value = (0.85, "Good faithfulness") | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics", | ||
| return_value=mock_ragas, | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics" | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics" | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics" | ||
| ) | ||
|
|
||
| # Configure RAG disabled (no_rag = True) | ||
| assert config_loader.system_config is not None | ||
| config_loader.system_config.api.no_rag = True | ||
| config_loader.system_config.api.context_threshold = 1 | ||
|
|
||
| evaluator = MetricsEvaluator( | ||
| config_loader, mock_metric_manager, mock_script_manager | ||
| ) | ||
|
|
||
| # Create turn data with zero contexts (expected in no_rag mode) | ||
| turn_data = TurnData( | ||
| turn_id="1", | ||
| query="What is Python?", | ||
| response="Python is a programming language.", | ||
| contexts=None, # Zero contexts (expected) | ||
| ) | ||
| conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data]) | ||
|
|
||
| request = EvaluationRequest.for_turn( | ||
| conv_data, "ragas:faithfulness", 0, turn_data | ||
| ) | ||
|
|
||
| result = evaluator.evaluate_metric(request) | ||
|
|
||
| assert result is not None | ||
| assert result.context_warning is None # No warning when RAG is disabled | ||
|
|
||
| def test_no_context_warning_above_threshold( | ||
| self, | ||
| config_loader: ConfigLoader, | ||
| mock_metric_manager: MetricManager, | ||
| mock_script_manager: ScriptExecutionManager, | ||
| mocker: MockerFixture, | ||
| ) -> None: | ||
| """Test no context warning when contexts are above threshold.""" | ||
| # Mock the handlers | ||
| mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager") | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager" | ||
| ) | ||
|
|
||
| mock_ragas = mocker.Mock() | ||
| mock_ragas.evaluate.return_value = (0.85, "Good faithfulness") | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics", | ||
| return_value=mock_ragas, | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics" | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics" | ||
| ) | ||
| mocker.patch( | ||
| "lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics" | ||
| ) | ||
|
|
||
| # Configure RAG enabled and context_threshold = 2 | ||
| assert config_loader.system_config is not None | ||
| config_loader.system_config.api.no_rag = False | ||
| config_loader.system_config.api.context_threshold = 2 | ||
|
|
||
| evaluator = MetricsEvaluator( | ||
| config_loader, mock_metric_manager, mock_script_manager | ||
| ) | ||
|
|
||
| # Create turn data with 3 contexts (above threshold of 2) | ||
| turn_data = TurnData( | ||
| turn_id="1", | ||
| query="What is Python?", | ||
| response="Python is a programming language.", | ||
| contexts=["Context 1", "Context 2", "Context 3"], # 3 contexts | ||
| ) | ||
| conv_data = EvaluationData(conversation_group_id="test_conv", turns=[turn_data]) | ||
|
|
||
| request = EvaluationRequest.for_turn( | ||
| conv_data, "ragas:faithfulness", 0, turn_data | ||
| ) | ||
|
|
||
| result = evaluator.evaluate_metric(request) | ||
|
|
||
| assert result is not None | ||
| assert result.context_warning is None # No warning (3 >= 2) |
There was a problem hiding this comment.
Tests are misplaced in the wrong test class.
These three new test methods (test_context_warning_when_zero_contexts_rag_enabled, test_no_context_warning_when_rag_disabled, test_no_context_warning_above_threshold) are placed inside the TestTokenTracker class, but they test MetricsEvaluator.evaluate_metric() functionality, not TokenTracker.
The tests use fixtures (config_loader, mock_metric_manager, mock_script_manager) that are designed for MetricsEvaluator tests and create a MetricsEvaluator instance to test context warning behavior.
🐛 Proposed fix - move tests to TestMetricsEvaluator
Move these three test methods from TestTokenTracker to TestMetricsEvaluator:
class TestMetricsEvaluator:
"""Unit tests for MetricsEvaluator."""
# ... existing tests ...
+ def test_context_warning_when_zero_contexts_rag_enabled(
+ self,
+ config_loader: ConfigLoader,
+ mock_metric_manager: MetricManager,
+ mock_script_manager: ScriptExecutionManager,
+ mocker: MockerFixture,
+ ) -> None:
+ """Test context warning is created when RAG is enabled and contexts are zero."""
+ # ... test implementation ...
+
+ def test_no_context_warning_when_rag_disabled(
+ # ... rest of the tests ...
+
+ def test_no_context_warning_above_threshold(
+ # ... rest of the tests ...
+
class TestTokenTracker:
"""Unit tests for TokenTracker class."""
# ... existing TokenTracker tests only ...
-
- def test_context_warning_when_zero_contexts_rag_enabled(
- # Remove from here🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/pipeline/evaluation/test_evaluator.py` around lines 916 - 1088,
The three test methods test_context_warning_when_zero_contexts_rag_enabled,
test_no_context_warning_when_rag_disabled, and
test_no_context_warning_above_threshold are defined inside TestTokenTracker but
exercise MetricsEvaluator.evaluate_metric() using fixtures for MetricsEvaluator;
move these three methods out of the TestTokenTracker class and place them into
the TestMetricsEvaluator class so they run with the correct fixture/context and
clearly reference MetricsEvaluator (keep the method bodies unchanged, only
relocate the methods).
|
@x86girl wondering which API are you using ? lightspeed-stack doesn't have these attributes as part of API request |
This PR introduces three features for enhanced RAG experiment control:
no_ragconfiguration option that allows disabling retrieval/RAG in API calls to runbaseline experiments comparing pure LLM vs RAG-enhanced performance
below a configurable threshold, and (3) a
context_warningcolumn in CSV output to track these issues across evaluation runs.Summary by CodeRabbit
Release Notes
New Features