[Feature] PPL Highlight Support by RyanL1997 · Pull Request #5234 · opensearch-project/sql

RyanL1997 · 2026-03-14T17:01:23Z

Description

This PR replaces the request-level ThreadLocal approach from #5141. Instead of passing highlight config through thread-local state, we thread a HighlightConfig record through the AST and Calcite plan — following the same pattern as fetch_size (#5109). The highlight config flows from the API request into the AST via StatementBuilderContext, becomes a LogicalHighlight Calcite node, and gets pushed down to the OpenSearch scan via an optimizer rule. This avoids the thread-safety concerns and implicit coupling of ThreadLocal and keeps the data flow explicit and traceable through the plan.

Add PPL search result highlighting via the highlight API parameter (not as a PPL command)
Support both the simple array format (["*"]) and the rich OpenSearch Dashboards object format with pre_tags, post_tags, fields, and fragment_size
Introduce HighlightConfig record to carry highlight options through the full pipeline: PPLQueryRequest → AstStatementBuilder → Highlight AST node → CalciteRelNodeVisitor → LogicalHighlight → HighlightIndexScanRule pushdown → OpenSearch HighlightBuilder
Response includes a top-level highlights array parallel to datarows, containing per-field highlighted fragments
Add endpoint documentation with examples and expected output

Related Issues

Sample Request / Response

Details

curl -s -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{"query":"search source=accounts \"Holmes\"","highlight":{"fields":{"*":{}},"pre_tags":["@opensearch-dashboards-highlighted-field@"],"post_tags":["@/opensearch-dashboards-highlighted-field@"],"fragment_size":2147483647}}' | jq
{
  "schema": [
    {
      "name": "account_number",
      "type": "bigint"
    },
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "balance",
      "type": "bigint"
    },
    {
      "name": "gender",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    },
    {
      "name": "employer",
      "type": "string"
    },
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "age",
      "type": "bigint"
    },
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    }
  ],
  "datarows": [
    [
      578,
      "Holmes",
      "969 Metropolitan Avenue",
      34259,
      "M",
      "Aguila",
      "Cubicide",
      "PA",
      37,
      "holmesmcknight@cubicide.com",
      "Mcknight"
    ],
    [
      828,
      "Blanche",
      "605 Stryker Court",
      44890,
      "F",
      "Loomis",
      "Motovate",
      "KS",
      33,
      "blancheholmes@motovate.com",
      "Holmes"
    ],
    [
      1,
      "Amber",
      "880 Holmes Lane",
      39225,
      "M",
      "Brogan",
      "Pyrami",
      "IL",
      32,
      "amberduke@pyrami.com",
      "Duke"
    ]
  ],
  "highlights": [
    {
      "firstname": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ],
      "firstname.keyword": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ]
    },
    {
      "lastname.keyword": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ],
      "lastname": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ]
    },
    {
      "address": [
        "880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"
      ]
    }
  ],
  "total": 3,
  "size": 3
}

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-14T17:02:22Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5891541.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	516	low	User-supplied highlight terms are interpolated directly into a Lucene query_string expression with only double-quote wrapping and no further sanitization. A crafted term containing backslash escapes or unbalanced quotes could malform or expand the generated query. Impact is limited to highlighting behavior (not document access control), so this is likely an oversight rather than malicious intent.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:03:27Z

PR Reviewer Guide 🔍

(Review updated until commit `c87b5b8`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: HighlightConfig AST and PPL request parsing Relevant files: core/src/main/java/org/opensearch/sql/ast/tree/Highlight.java core/src/main/java/org/opensearch/sql/ast/tree/HighlightConfig.java core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java core/src/main/java/org/opensearch/sql/analysis/Analyzer.java ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java ppl/src/test/java/org/opensearch/sql/ppl/domain/PPLQueryRequestTest.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java Sub-PR theme: Calcite LogicalHighlight plan node and pushdown rule Relevant files: core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalHighlight.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/HighlightIndexScanRule.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownType.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/CalciteIndexScanCostTest.java Sub-PR theme: Highlight response formatting, integration tests, and documentation Relevant files: protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteHighlightIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java docs/user/ppl/interfaces/endpoint.md
⚡ Recommended focus areas for review Join Safety In `visitRelation`, the highlight config is cleared after the first scan (`context.setHighlightConfig(null)`). For queries with multiple relations (e.g., JOINs), only the first scanned relation will receive the highlight config. This may silently drop highlighting for subsequent relations without any warning or error. if (context.getHighlightConfig() != null) { RelNode input = context.relBuilder.build(); LogicalHighlight highlight = LogicalHighlight.create(input, context.getHighlightConfig()); context.relBuilder.push(highlight); context.setHighlightConfig(null); // Clear for join safety } Highlight Injection Order In `visitHighlight`, the highlight config is set on the context before visiting children. However, `visitRelation` injects `LogicalHighlight` directly around the scan. This means the `Highlight` AST node wrapping (from `AstStatementBuilder`) and the direct injection in `visitRelation` could both fire, potentially double-applying highlight. The interaction between the `visitHighlight` path and the `visitRelation` path should be validated to ensure highlight is not applied twice. @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); visitChildren(node, context); return context.relBuilder.peek(); } Default Fragment Size When `fragmentSize` is null (simple array format), the code defaults to `Integer.MAX_VALUE`. This silently overrides OpenSearch's own default fragment size (100 characters), which may produce unexpectedly large responses. Consider documenting this behavior or using OpenSearch's default instead. hb.fragmentSize( highlightConfig.fragmentSize() != null ? highlightConfig.fragmentSize() : Integer.MAX_VALUE); Unsafe Cast The unchecked cast `(Map<String, Object>) hlMap` on line 123 is unnecessary since `hlMap` is already declared as `LinkedHashMap<String, Object>`. This cast should be removed to avoid compiler warnings and potential confusion. return (Map<String, Object>) hlMap; Magic String The string literal `"_highlight"` is used directly instead of referencing `HighlightExpression.HIGHLIGHT_FIELD`. This breaks the single-source-of-truth principle and could cause silent bugs if the constant is ever renamed. if ("_highlight".equals(rawPath)) { ExprValue hl = ExprValueUtils.getTupleValue(value).get("_highlight"); return (hl != null && !hl.isMissing()) ? hl : null; }

github-actions · 2026-03-14T17:04:15Z

PR Code Suggestions ✨

Latest suggestions up to c87b5b8

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Guard against duplicate highlight column in schema The `pushDownHighlight` method unconditionally adds `_highlight` to the schema even if it already exists (e.g., when called twice or after `LogicalHighlight.create` already added it). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, which could cause duplicate column errors if called multiple times or after `LogicalHighlight.create` already added it. The fix mirrors the guard already present in `LogicalHighlight.create`.	Medium
Possible issue	Prevent highlight config leaking across queries The `visitHighlight` method sets the highlight config on the context and then visits children, but if an exception or early return occurs in `visitChildren`, the config remains set on the context and could leak into subsequent queries. The config is only cleared inside `visitRelation` when a scan is found, but if the plan tree doesn't contain a scan (e.g., VALUES), the config is never cleared. Add a finally block or clear the config after `visitChildren`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); - visitChildren(node, context); + try { + visitChildren(node, context); + } finally { + context.setHighlightConfig(null); + } return context.relBuilder.peek(); } Suggestion importance[1-10]: 5 __ Why: The `highlightConfig` is only cleared inside `visitRelation` when a scan is found, so if the plan tree has no scan node, the config could leak. However, in practice PPL queries always have a scan, making this a low-probability edge case.	Low
General	Guard against non-collection highlight field values The `highlights()` method calls `entry.getValue().collectionValue()` without checking if the value is actually a collection type. If a highlight field contains a non-collection `ExprValue`, this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } return (Map<String, Object>) hlMap; Suggestion importance[1-10]: 4 __ Why: Calling `collectionValue()` without checking if the value is a collection type could throw a runtime exception for unexpected highlight field formats. The guard improves robustness but is a defensive coding improvement rather than a critical bug fix.	Low
General	Document and verify highlight/fetchSize node ordering The `Highlight` node is injected unconditionally around the entire plan, including when the plan already contains a `Highlight` node (e.g., if PPL ever gains a native `highlight` command). More critically, the `Highlight` node is placed outside `Head` (fetch_size), meaning highlight config is set on context before the scan is visited, but the `Head` node wraps the scan — the ordering `Highlight(Head(scan))` means `visitHighlight` sets the config, then `visitHead` is visited, then `visitRelation` consumes it. This ordering is intentional per the test, but if `fetchSize > 0`, the `Head` is attached first and then `Highlight` wraps it, which means `visitHighlight` → `visitHead` → `visitRelation`. Verify this ordering is correct and document it explicitly to avoid future regressions. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [40-44] +if (context.getFetchSize() > 0) { + rawPlan = new Head(context.getFetchSize(), 0).attach(rawPlan); +} if (context.getHighlightConfig() != null && context.getHighlightConfig().fields() != null && !context.getHighlightConfig().fields().isEmpty()) { rawPlan = new Highlight(context.getHighlightConfig()).attach(rawPlan); } +// Note: Highlight wraps Head so that visitHighlight sets config before visitRelation is reached Suggestion importance[1-10]: 2 __ Why: This suggestion asks to verify and document existing behavior rather than fix a bug. The `improved_code` is functionally equivalent to the existing code (same ordering), and the suggestion is primarily about adding a comment, which should score low.	Low

Previous suggestions

Suggestions up to commit 124c42a

Category	Suggestion	Impact
Possible issue	Guard against duplicate highlight column in schema The `pushDownHighlight` method unconditionally adds `_highlight` to the schema even if it already exists (e.g., when called twice or after `LogicalHighlight.create` already added it). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, which could cause duplicate column errors if called multiple times or after `LogicalHighlight.create` already added it. The fix mirrors the guard already present in `LogicalHighlight.create`.	Medium
Possible issue	Prevent highlight config leaking across queries The `visitHighlight` method sets the highlight config on the context and then visits children, but never clears the config if `visitRelation` is not reached (e.g., in error paths or non-scan plans). More critically, if `visitRelation` is not called (e.g., for `Values` or other non-scan sources), the highlight config leaks into subsequent queries on the same context. The config should be cleared after `visitChildren` regardless of whether it was consumed. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); visitChildren(node, context); + context.setHighlightConfig(null); // Ensure config is cleared even if not consumed by visitRelation return context.relBuilder.peek(); } Suggestion importance[1-10]: 5 __ Why: The `visitHighlight` method sets `highlightConfig` on the context but only clears it inside `visitRelation`. If the child is not a scan (e.g., `Values`), the config leaks. However, `visitRelation` already clears it with `context.setHighlightConfig(null)`, so this is a minor edge-case concern for non-scan sources.	Low
General	Guard against non-collection highlight field values The `highlights()` method calls `entry.getValue().collectionValue()` without checking if the value is actually a collection type. If a highlight field value is not a collection (e.g., a plain string), this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue val = entry.getValue(); + if (val == null \|\| val.isMissing() \|\| val.isNull()) continue; hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + val.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: Calling `collectionValue()` without checking if the value is a collection type could throw a runtime exception for unexpected highlight field value types. Adding a null/type guard improves robustness, though in practice highlight values from OpenSearch are always arrays.	Low
General	Return null for object format with empty fields When the object format has no `fields` key (or an empty `fields` object), `fields` will be an empty list. This causes `getHighlightConfig()` to return a `HighlightConfig` with empty fields, which will be treated as valid by `AstStatementBuilder` (since it only checks `fields() != null && !fields().isEmpty()`), but `applyHighlightPushDown` will silently skip it. Consider returning `null` when `fields` is empty to be consistent with the null-check in `AstStatementBuilder`. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [147-154] // Parse fields from "fields" object keys List<String> fields = new ArrayList<>(); JSONObject fieldsObj = obj.optJSONObject("fields"); if (fieldsObj != null) { for (String key : fieldsObj.keySet()) { fields.add(key); } } +if (fields.isEmpty()) { + return null; +} Suggestion importance[1-10]: 4 __ Why: When the object format has no `fields` key or an empty `fields` object, returning `null` instead of an empty-fields `HighlightConfig` makes the behavior consistent with the `AstStatementBuilder` null-check and avoids silently creating a no-op highlight node.	Low

Suggestions up to commit 82443d8

Category	Suggestion	Impact
Possible issue	Guard against duplicate highlight column in schema The `pushDownHighlight` method unconditionally adds `_highlight` to the schema even if it already exists (e.g., when the rule fires twice or the field was already added by `LogicalHighlight.create`). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, while `LogicalHighlight.create` already guards against duplicates. If the rule fires multiple times or the field already exists, this could cause duplicate column errors. The fix mirrors the existing guard in `LogicalHighlight.create`.	Medium
Possible issue	Guard against non-collection highlight field values The code calls `entry.getValue().collectionValue()` without checking whether the value is actually a collection type. If a highlight field value is not a collection (e.g., a plain string), this will throw a `ClassCastException` or `UnsupportedOperationException` at runtime. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue val = entry.getValue(); + if (val == null \|\| val.isMissing() \|\| val.isNull()) continue; hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + val.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: Calling `collectionValue()` without checking whether the `ExprValue` is actually a collection type could throw a runtime exception if highlight data is returned in an unexpected format. Adding a null/missing/null check before calling `collectionValue()` improves robustness.	Low
General	Clear highlight config after visiting children The `visitHighlight` method sets the highlight config on the context and then visits children, but never clears the config if `visitRelation` is not reached (e.g., if the child is not a `Relation` node or if an exception occurs). This can cause stale highlight config to leak into subsequent queries. The config should be cleared after `visitChildren` completes, regardless of whether it was consumed. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); visitChildren(node, context); + context.setHighlightConfig(null); // Ensure config is cleared even if not consumed in visitRelation return context.relBuilder.peek(); } Suggestion importance[1-10]: 5 __ Why: The `visitHighlight` method sets `highlightConfig` on the context but only clears it inside `visitRelation` (with a comment "Clear for join safety"). If the child is not a `Relation` node or an exception occurs before `visitRelation`, the config could leak. Adding a cleanup after `visitChildren` is a reasonable defensive measure, though the current code already handles the common case.	Low
General	Return null for empty highlight fields configuration If the OSD object format is provided without a `fields` key (or with an empty `fields` object), `fields` will be an empty list. The `AstStatementBuilder` checks `!context.getHighlightConfig().fields().isEmpty()` before injecting the `Highlight` node, so an empty fields list silently skips highlighting without any error or warning. Consider returning `null` from `getHighlightConfig()` when fields are empty to make the behavior consistent and explicit. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [148-155] // Parse fields from "fields" object keys List<String> fields = new ArrayList<>(); JSONObject fieldsObj = obj.optJSONObject("fields"); if (fieldsObj != null) { for (String key : fieldsObj.keySet()) { fields.add(key); } } +if (fields.isEmpty()) { + return null; +} Suggestion importance[1-10]: 4 __ Why: When the OSD object format is provided without a `fields` key or with an empty `fields` object, the current code returns a `HighlightConfig` with an empty fields list, which is silently ignored by `AstStatementBuilder`. Returning `null` in this case makes the behavior consistent with the null check in `AstStatementBuilder` and avoids silent no-ops.	Low

Suggestions up to commit d039af6

Category	Suggestion	Impact
Possible issue	Guard against duplicate highlight column in schema The `pushDownHighlight` method unconditionally adds `_highlight` to the schema even if it already exists (e.g., when called multiple times or after `LogicalHighlight.create` already added it). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, which could cause duplicate column errors if called multiple times or if the field already exists. `LogicalHighlight.create` already guards against this, so consistency and safety require the same guard here.	Medium
General	Clear highlight config after visiting children The `visitHighlight` method sets the highlight config on the context and then visits children, but never clears the config if `visitRelation` is not reached (e.g., if the child is not a `Relation` node but a subquery or join). This can cause stale highlight config to leak into subsequent queries. The config should be cleared after `visitChildren` completes regardless of whether it was consumed. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); visitChildren(node, context); + context.setHighlightConfig(null); // Ensure config is cleared even if not consumed return context.relBuilder.peek(); } Suggestion importance[1-10]: 5 __ Why: The `visitHighlight` method sets `highlightConfig` on the context but only clears it inside `visitRelation`. If the child plan doesn't reach `visitRelation` (e.g., subquery or join), the config leaks. Adding a clear after `visitChildren` is a reasonable defensive measure, though the current code already clears it in `visitRelation` for the common case.	Low
	Guard against non-collection highlight field values The `highlights()` method calls `entry.getValue().collectionValue()` without checking if the value is actually a collection type. If a highlight field value is not a collection (e.g., a plain string), this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue val = entry.getValue(); + if (val == null \|\| val.isMissing() \|\| val.isNull()) continue; hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + val.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: Calling `collectionValue()` without checking if the value is a collection type could throw a runtime exception for unexpected highlight field formats. Adding a null/type guard improves robustness, though in practice OpenSearch highlight responses are always arrays.	Low
	Return null for empty highlight fields in object format When the OSD object format is used but the `fields` key is absent or empty, `getHighlightConfig()` returns a `HighlightConfig` with an empty fields list. The `AstStatementBuilder` checks for non-empty fields before injecting the `Highlight` node, so this case is silently ignored. However, `applyHighlightPushDown` also guards against empty fields, creating inconsistent behavior. Consider returning `null` when fields are empty to make the no-op case explicit and consistent. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [148-155] // Parse fields from "fields" object keys List<String> fields = new ArrayList<>(); JSONObject fieldsObj = obj.optJSONObject("fields"); if (fieldsObj != null) { for (String key : fieldsObj.keySet()) { fields.add(key); } } +if (fields.isEmpty()) { + return null; +} Suggestion importance[1-10]: 4 __ Why: When the OSD object format has no `fields` key, an empty `HighlightConfig` is returned and silently ignored downstream. Returning `null` explicitly makes the no-op case consistent with the null checks in `AstStatementBuilder` and `applyHighlightPushDown`, improving clarity.	Low

Suggestions up to commit 9a272c4

Category	Suggestion	Impact
Possible issue	Prevent duplicate highlight column in schema The `pushDownHighlight` method unconditionally adds the `_highlight` field to the schema without checking if it already exists. If `pushDownHighlight` is called more than once (e.g., in a join scenario), the `_highlight` column will be duplicated in the row type, causing downstream errors. Add a guard similar to the one in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-104] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); - ... + newScan + .getPushDownContext() + .add( + PushDownType.HIGHLIGHT, + highlightConfig.fields(), + (OSRequestBuilderAction) + requestBuilder -> applyHighlightPushDown(requestBuilder, highlightConfig)); + return newScan; } Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema without checking for duplicates, unlike `LogicalHighlight.create` which has this guard. This could cause issues in join scenarios or if called multiple times, making this a valid correctness concern.	Medium
	Prevent stale highlight config state leakage The `visitHighlight` method sets the highlight config on the context before visiting children, but `visitRelation` consumes and clears it. If the highlight node wraps a non-relation child (e.g., a filter or join), the config will be set but never consumed, leaving stale state. Additionally, the config should be cleared in a `finally` block to ensure cleanup even if `visitChildren` throws. Consider saving and restoring the previous config to handle nested or complex plans safely. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { + HighlightConfig previous = context.getHighlightConfig(); context.setHighlightConfig(node.getHighlightConfig()); - visitChildren(node, context); + try { + visitChildren(node, context); + } finally { + // Restore previous config to avoid leaking state if not consumed + if (context.getHighlightConfig() == node.getHighlightConfig()) { + context.setHighlightConfig(previous); + } + } return context.relBuilder.peek(); } Suggestion importance[1-10]: 6 __ Why: The suggestion raises a valid concern about stale `highlightConfig` state if `visitChildren` doesn't consume it (e.g., non-relation child). The `finally` block approach is a reasonable defensive pattern, though the current design clears it in `visitRelation` which is the expected child. The risk is real but limited to edge cases.	Low
	Guard against non-collection highlight field values The `highlights()` method calls `entry.getValue().collectionValue()` without checking whether the value is actually a collection type. If a highlight field value is not a collection (e.g., it's a string or null), this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue val = entry.getValue(); + if (val == null \|\| val.isMissing() \|\| val.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + val.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 6 __ Why: Calling `collectionValue()` without checking if the value is actually a collection type could throw a runtime exception for unexpected highlight field value types. The improved code adds null/missing/null checks and also removes the unnecessary cast, making it more robust.	Low
General	Return null for object format with no fields specified If the OSD object format is provided without a `fields` key (e.g., `{"pre_tags": [""],` `"post_tags": ["``"]}`), `fields` will be an empty list. This empty `HighlightConfig` will later be filtered out in `AstStatementBuilder` (due to the `!isEmpty()` check), silently ignoring the highlight request. Consider returning `null` or throwing a validation error when `fields` is empty in the object format to give the user clear feedback. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [148-154] // Parse fields from "fields" object keys List<String> fields = new ArrayList<>(); JSONObject fieldsObj = obj.optJSONObject("fields"); if (fieldsObj != null) { for (String key : fieldsObj.keySet()) { fields.add(key); } } +if (fields.isEmpty()) { + return null; // No fields specified; treat as no highlight +} Suggestion importance[1-10]: 4 __ Why: If the OSD object format lacks a `fields` key, the highlight request is silently ignored. Returning `null` explicitly when `fields` is empty makes the behavior more predictable, though the current silent-ignore behavior may be acceptable since the `AstStatementBuilder` already handles empty fields.	Low

Suggestions up to commit 3cfbe9a

Category	Suggestion	Impact
Possible issue	Guard against duplicate highlight column in schema The `pushDownHighlight` method unconditionally adds `_highlight` to the schema even if it already exists (e.g., when called multiple times or after `LogicalHighlight.create` already added it). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96] public RelNode pushDownHighlight(HighlightConfig highlightConfig) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, which could cause duplicate column issues if called multiple times or after `LogicalHighlight.create` already added it. The fix mirrors the guard already present in `LogicalHighlight.create`.	Medium
Possible issue	Guard against non-collection highlight field values The code calls `entry.getValue().collectionValue()` without checking whether the value is actually a collection type. If a highlight field contains a non-collection `ExprValue`, this will throw an exception at runtime. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: Calling `collectionValue()` on a non-collection `ExprValue` could throw a runtime exception. Adding a null/type check is a reasonable defensive measure, though in practice highlight values from OpenSearch are always arrays.	Low
General	Prevent highlight config leaking across unrelated queries The `visitHighlight` method sets the highlight config on the context and then visits children, but never clears the config if `visitRelation` is not reached (e.g., in error paths or when the child is not a relation). More critically, if `visitRelation` is not called (e.g., the child is already a built plan), the `LogicalHighlight` node is never injected, silently dropping the highlight. Consider wrapping the child result with `LogicalHighlight` directly here instead of relying on side-effects in `visitRelation`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207] @Override public RelNode visitHighlight(Highlight node, CalcitePlanContext context) { context.setHighlightConfig(node.getHighlightConfig()); visitChildren(node, context); + // If visitRelation did not consume the config (e.g. non-relation child), clear it to avoid leaking + if (context.getHighlightConfig() != null) { + context.setHighlightConfig(null); + } return context.relBuilder.peek(); } Suggestion importance[1-10]: 4 __ Why: The suggestion addresses a potential config leak if `visitRelation` is not reached, but the `improved_code` doesn't actually fix the core issue of silently dropping the highlight — it just clears the config. The scenario described is an edge case and the existing code already clears the config in `visitRelation`.	Low
General	Return null for highlight config with no fields If the OSD object format is provided without a `fields` key (or with an empty `fields` object), `fields` will be an empty list. This causes `getHighlightConfig()` to return a `HighlightConfig` with empty fields, which `AstStatementBuilder` will skip (due to the `!isEmpty()` check), but `applyHighlightPushDown` also guards against it. However, returning `null` instead of an empty-fields config would be more consistent and avoid creating a no-op `Highlight` AST node. Consider returning `null` when `fields` is empty. ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [147-154] // Parse fields from "fields" object keys List<String> fields = new ArrayList<>(); JSONObject fieldsObj = obj.optJSONObject("fields"); if (fieldsObj != null) { for (String key : fieldsObj.keySet()) { fields.add(key); } } +if (fields.isEmpty()) { + return null; +} Suggestion importance[1-10]: 4 __ Why: Returning `null` when `fields` is empty would be more consistent and avoids creating a no-op `HighlightConfig`. However, the existing guards in `AstStatementBuilder` and `applyHighlightPushDown` already handle the empty-fields case safely, making this a minor improvement.	Low

github-actions · 2026-03-14T17:29:29Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0dacc0.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	519	low	User-supplied highlight terms are wrapped in double quotes but not escaped before being passed to queryStringQuery(). A term containing a literal '"' character could partially break out of the phrase context and alter the Lucene query structure. Because PPL users are already authenticated and can issue arbitrary search queries, the practical exploitability is limited, but proper escaping of the term string (e.g., via QueryParser.escape) would be safer.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:30:33Z

Persistent review updated to latest commit f0dacc0

github-actions · 2026-03-15T19:40:17Z

Persistent review updated to latest commit ab37b5b

github-actions · 2026-03-15T20:51:54Z

Persistent review updated to latest commit a7c44d8

github-actions · 2026-03-15T21:16:05Z

Persistent review updated to latest commit 51356cd

github-actions · 2026-03-15T21:21:58Z

Persistent review updated to latest commit b2d6afe

github-actions · 2026-03-16T16:41:14Z

Persistent review updated to latest commit 7bb526a

github-actions · 2026-03-16T20:38:31Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e3b9c75.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	514	medium	User-supplied field names from the API request's 'highlight' parameter are directly interpolated into a Lucene query string with only double-quote wrapping ('"' + term + '"'). While this helps with phrase matching, it does not sanitize Lucene special characters or escape sequences, creating a potential query string injection vector. A crafted field value like '* OR sensitive_field:*' could alter query semantics. This appears to be an unintentional coding oversight rather than malicious intent, as the pattern is consistent with the rest of the feature and the terms are at least wrapped in quotes.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-16T20:39:50Z

Persistent review updated to latest commit e3b9c75

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

…command Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-16T21:10:49Z

Persistent review updated to latest commit 3cfbe9a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-16T22:34:49Z

Persistent review updated to latest commit 82443d8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-16T23:40:41Z

Persistent review updated to latest commit 124c42a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-17T00:35:35Z

Persistent review updated to latest commit c87b5b8

RyanL1997 added PPL Piped processing language v3.6.0 Issues targeting release v3.6.0 feature labels Mar 14, 2026

RyanL1997 added 12 commits March 16, 2026 14:07

initial commit for highlight cmd

0686753

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix the response format

f8f7208

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

pushdown fix (similar to v2)

4c649b5

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix - filter out highlight col in result schema

a596cba

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add docs for highlight

5c64e04

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

dead code cleanup

49f5598

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

enable no pushdown

fec921d

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix pushdown and add IT and explain IT

a5a2788

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add more tests for coverage

f46b241

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add ccr tests

90839b6

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix ccr test

e38284d

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

[GRAMMAR REMOVAL] Internally handle highlight instead of expose as a …

3cfbe9a

…command Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 force-pushed the ppl-highlight-cmd branch from e3b9c75 to 3cfbe9a Compare March 16, 2026 21:08

Add more tests

9a272c4

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 mentioned this pull request Mar 16, 2026

[Feature] Calcite PPL search result highlighting #5141

Closed

8 tasks

fix doc test

82443d8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix IT

124c42a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix tests

c87b5b8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

Conversation

RyanL1997 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Sample Request / Response

Check List

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit c87b5b8)

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RyanL1997 commented Mar 14, 2026 •

edited

Loading

github-actions bot commented Mar 14, 2026 •

edited

Loading

(Review updated until commit `c87b5b8`)

github-actions bot commented Mar 14, 2026 •

edited

Loading