Skip to content

[Feature] PPL Highlight Support#5234

Open
RyanL1997 wants to merge 17 commits intoopensearch-project:mainfrom
RyanL1997:ppl-highlight-cmd
Open

[Feature] PPL Highlight Support#5234
RyanL1997 wants to merge 17 commits intoopensearch-project:mainfrom
RyanL1997:ppl-highlight-cmd

Conversation

@RyanL1997
Copy link
Collaborator

@RyanL1997 RyanL1997 commented Mar 14, 2026

Description

This PR replaces the request-level ThreadLocal approach from #5141. Instead of passing highlight config through thread-local state, we thread a HighlightConfig record through the AST and Calcite plan — following the same pattern as fetch_size (#5109). The highlight config flows from the API request into the AST via StatementBuilderContext, becomes a LogicalHighlight Calcite node, and gets pushed down to the OpenSearch scan via an optimizer rule. This avoids the thread-safety concerns and implicit coupling of ThreadLocal and keeps the data flow explicit and traceable through the plan.

  • Add PPL search result highlighting via the highlight API parameter (not as a PPL command)
  • Support both the simple array format (["*"]) and the rich OpenSearch Dashboards object format with pre_tags, post_tags, fields, and fragment_size
  • Introduce HighlightConfig record to carry highlight options through the full pipeline: PPLQueryRequestAstStatementBuilderHighlight AST node → CalciteRelNodeVisitorLogicalHighlightHighlightIndexScanRule pushdown → OpenSearch HighlightBuilder
  • Response includes a top-level highlights array parallel to datarows, containing per-field highlighted fragments
  • Add endpoint documentation with examples and expected output

Related Issues

Sample Request / Response

Details
curl -s -X POST "localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{"query":"search source=accounts \"Holmes\"","highlight":{"fields":{"*":{}},"pre_tags":["@opensearch-dashboards-highlighted-field@"],"post_tags":["@/opensearch-dashboards-highlighted-field@"],"fragment_size":2147483647}}' | jq
{
  "schema": [
    {
      "name": "account_number",
      "type": "bigint"
    },
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "string"
    },
    {
      "name": "balance",
      "type": "bigint"
    },
    {
      "name": "gender",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    },
    {
      "name": "employer",
      "type": "string"
    },
    {
      "name": "state",
      "type": "string"
    },
    {
      "name": "age",
      "type": "bigint"
    },
    {
      "name": "email",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    }
  ],
  "datarows": [
    [
      578,
      "Holmes",
      "969 Metropolitan Avenue",
      34259,
      "M",
      "Aguila",
      "Cubicide",
      "PA",
      37,
      "holmesmcknight@cubicide.com",
      "Mcknight"
    ],
    [
      828,
      "Blanche",
      "605 Stryker Court",
      44890,
      "F",
      "Loomis",
      "Motovate",
      "KS",
      33,
      "blancheholmes@motovate.com",
      "Holmes"
    ],
    [
      1,
      "Amber",
      "880 Holmes Lane",
      39225,
      "M",
      "Brogan",
      "Pyrami",
      "IL",
      32,
      "amberduke@pyrami.com",
      "Duke"
    ]
  ],
  "highlights": [
    {
      "firstname": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ],
      "firstname.keyword": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ]
    },
    {
      "lastname.keyword": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ],
      "lastname": [
        "@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"
      ]
    },
    {
      "address": [
        "880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"
      ]
    }
  ],
  "total": 3,
  "size": 3
}

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5891541.

PathLineSeverityDescription
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java516lowUser-supplied highlight terms are interpolated directly into a Lucene query_string expression with only double-quote wrapping and no further sanitization. A crafted term containing backslash escapes or unbalanced quotes could malform or expand the generated query. Impact is limited to highlighting behavior (not document access control), so this is likely an oversight rather than malicious intent.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 14, 2026

PR Reviewer Guide 🔍

(Review updated until commit c87b5b8)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: HighlightConfig AST and PPL request parsing

Relevant files:

  • core/src/main/java/org/opensearch/sql/ast/tree/Highlight.java
  • core/src/main/java/org/opensearch/sql/ast/tree/HighlightConfig.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java
  • core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
  • ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java
  • ppl/src/main/java/org/opensearch/sql/ppl/PPLService.java
  • ppl/src/test/java/org/opensearch/sql/ppl/domain/PPLQueryRequestTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/parser/AstStatementBuilderTest.java

Sub-PR theme: Calcite LogicalHighlight plan node and pushdown rule

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalHighlight.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/HighlightIndexScanRule.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownType.java
  • opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java
  • opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/CalciteIndexScanCostTest.java

Sub-PR theme: Highlight response formatting, integration tests, and documentation

Relevant files:

  • protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java
  • protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java
  • protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java
  • protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteHighlightIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • docs/user/ppl/interfaces/endpoint.md

⚡ Recommended focus areas for review

Join Safety

In visitRelation, the highlight config is cleared after the first scan (context.setHighlightConfig(null)). For queries with multiple relations (e.g., JOINs), only the first scanned relation will receive the highlight config. This may silently drop highlighting for subsequent relations without any warning or error.

if (context.getHighlightConfig() != null) {
  RelNode input = context.relBuilder.build();
  LogicalHighlight highlight = LogicalHighlight.create(input, context.getHighlightConfig());
  context.relBuilder.push(highlight);
  context.setHighlightConfig(null); // Clear for join safety
}
Highlight Injection Order

In visitHighlight, the highlight config is set on the context before visiting children. However, visitRelation injects LogicalHighlight directly around the scan. This means the Highlight AST node wrapping (from AstStatementBuilder) and the direct injection in visitRelation could both fire, potentially double-applying highlight. The interaction between the visitHighlight path and the visitRelation path should be validated to ensure highlight is not applied twice.

@Override
public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
  context.setHighlightConfig(node.getHighlightConfig());
  visitChildren(node, context);
  return context.relBuilder.peek();
}
Default Fragment Size

When fragmentSize is null (simple array format), the code defaults to Integer.MAX_VALUE. This silently overrides OpenSearch's own default fragment size (100 characters), which may produce unexpectedly large responses. Consider documenting this behavior or using OpenSearch's default instead.

hb.fragmentSize(
    highlightConfig.fragmentSize() != null
        ? highlightConfig.fragmentSize()
        : Integer.MAX_VALUE);
Unsafe Cast

The unchecked cast (Map<String, Object>) hlMap on line 123 is unnecessary since hlMap is already declared as LinkedHashMap<String, Object>. This cast should be removed to avoid compiler warnings and potential confusion.

return (Map<String, Object>) hlMap;
Magic String

The string literal "_highlight" is used directly instead of referencing HighlightExpression.HIGHLIGHT_FIELD. This breaks the single-source-of-truth principle and could cause silent bugs if the constant is ever renamed.

if ("_highlight".equals(rawPath)) {
  ExprValue hl = ExprValueUtils.getTupleValue(value).get("_highlight");
  return (hl != null && !hl.isMissing()) ? hl : null;
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 14, 2026

PR Code Suggestions ✨

Latest suggestions up to c87b5b8

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against duplicate highlight column in schema

The pushDownHighlight method unconditionally adds _highlight to the schema even if
it already exists (e.g., when called twice or after LogicalHighlight.create already
added it). This can cause duplicate column errors. Add a guard similar to the one in
LogicalHighlight.create.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96]

 public RelNode pushDownHighlight(HighlightConfig highlightConfig) {
     RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder();
     schemaBuilder.addAll(getRowType().getFieldList());
-    schemaBuilder.add(
-        HighlightExpression.HIGHLIGHT_FIELD,
-        getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+    if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) {
+        schemaBuilder.add(
+            HighlightExpression.HIGHLIGHT_FIELD,
+            getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+    }
     CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build());
Suggestion importance[1-10]: 7

__

Why: The pushDownHighlight method unconditionally adds _highlight to the schema, which could cause duplicate column errors if called multiple times or after LogicalHighlight.create already added it. The fix mirrors the guard already present in LogicalHighlight.create.

Medium
Prevent highlight config leaking across queries

The visitHighlight method sets the highlight config on the context and then visits
children, but if an exception or early return occurs in visitChildren, the config
remains set on the context and could leak into subsequent queries. The config is
only cleared inside visitRelation when a scan is found, but if the plan tree doesn't
contain a scan (e.g., VALUES), the config is never cleared. Add a finally block or
clear the config after visitChildren.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207]

 @Override
 public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
     context.setHighlightConfig(node.getHighlightConfig());
-    visitChildren(node, context);
+    try {
+        visitChildren(node, context);
+    } finally {
+        context.setHighlightConfig(null);
+    }
     return context.relBuilder.peek();
 }
Suggestion importance[1-10]: 5

__

Why: The highlightConfig is only cleared inside visitRelation when a scan is found, so if the plan tree has no scan node, the config could leak. However, in practice PPL queries always have a scan, making this a low-probability edge case.

Low
General
Guard against non-collection highlight field values

The highlights() method calls entry.getValue().collectionValue() without checking if
the value is actually a collection type. If a highlight field contains a
non-collection ExprValue, this will throw a runtime exception. Add a null/type check
before calling collectionValue().

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123]

 Map<String, Object> hlMap = new LinkedHashMap<>();
 for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
+    ExprValue fieldVal = entry.getValue();
+    if (fieldVal == null || fieldVal.isMissing() || fieldVal.isNull()) {
+        continue;
+    }
     hlMap.put(
         entry.getKey(),
-        entry.getValue().collectionValue().stream()
+        fieldVal.collectionValue().stream()
             .map(ExprValue::stringValue)
             .collect(Collectors.toList()));
 }
 return (Map<String, Object>) hlMap;
Suggestion importance[1-10]: 4

__

Why: Calling collectionValue() without checking if the value is a collection type could throw a runtime exception for unexpected highlight field formats. The guard improves robustness but is a defensive coding improvement rather than a critical bug fix.

Low
Document and verify highlight/fetchSize node ordering

The Highlight node is injected unconditionally around the entire plan, including
when the plan already contains a Highlight node (e.g., if PPL ever gains a native
highlight command). More critically, the Highlight node is placed outside Head
(fetch_size), meaning highlight config is set on context before the scan is visited,
but the Head node wraps the scan — the ordering Highlight(Head(scan)) means
visitHighlight sets the config, then visitHead is visited, then visitRelation
consumes it. This ordering is intentional per the test, but if fetchSize > 0, the
Head is attached first and then Highlight wraps it, which means visitHighlight
visitHeadvisitRelation. Verify this ordering is correct and document it
explicitly to avoid future regressions.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstStatementBuilder.java [40-44]

+if (context.getFetchSize() > 0) {
+  rawPlan = new Head(context.getFetchSize(), 0).attach(rawPlan);
+}
 if (context.getHighlightConfig() != null
     && context.getHighlightConfig().fields() != null
     && !context.getHighlightConfig().fields().isEmpty()) {
   rawPlan = new Highlight(context.getHighlightConfig()).attach(rawPlan);
 }
+// Note: Highlight wraps Head so that visitHighlight sets config before visitRelation is reached
Suggestion importance[1-10]: 2

__

Why: This suggestion asks to verify and document existing behavior rather than fix a bug. The improved_code is functionally equivalent to the existing code (same ordering), and the suggestion is primarily about adding a comment, which should score low.

Low

Previous suggestions

Suggestions up to commit 124c42a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against duplicate highlight column in schema

The pushDownHighlight method unconditionally adds _highlight to the schema even if
it already exists (e.g., when called twice or after LogicalHighlight.create already
added it). This can cause duplicate column errors. Add a guard similar to the one in
LogicalHighlight.create.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96]

 public RelNode pushDownHighlight(HighlightConfig highlightConfig) {
     RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder();
     schemaBuilder.addAll(getRowType().getFieldList());
-    schemaBuilder.add(
-        HighlightExpression.HIGHLIGHT_FIELD,
-        getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+    if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) {
+        schemaBuilder.add(
+            HighlightExpression.HIGHLIGHT_FIELD,
+            getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+    }
     CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build());
Suggestion importance[1-10]: 7

__

Why: The pushDownHighlight method unconditionally adds _highlight to the schema, which could cause duplicate column errors if called multiple times or after LogicalHighlight.create already added it. The fix mirrors the guard already present in LogicalHighlight.create.

Medium
Prevent highlight config leaking across queries

The visitHighlight method sets the highlight config on the context and then visits
children, but never clears the config if visitRelation is not reached (e.g., in
error paths or non-scan plans). More critically, if visitRelation is not called
(e.g., for Values or other non-scan sources), the highlight config leaks into
subsequent queries on the same context. The config should be cleared after
visitChildren regardless of whether it was consumed.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207]

 @Override
 public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
   context.setHighlightConfig(node.getHighlightConfig());
   visitChildren(node, context);
+  context.setHighlightConfig(null); // Ensure config is cleared even if not consumed by visitRelation
   return context.relBuilder.peek();
 }
Suggestion importance[1-10]: 5

__

Why: The visitHighlight method sets highlightConfig on the context but only clears it inside visitRelation. If the child is not a scan (e.g., Values), the config leaks. However, visitRelation already clears it with context.setHighlightConfig(null), so this is a minor edge-case concern for non-scan sources.

Low
General
Guard against non-collection highlight field values

The highlights() method calls entry.getValue().collectionValue() without checking if
the value is actually a collection type. If a highlight field value is not a
collection (e.g., a plain string), this will throw a runtime exception. Add a
null/type check before calling collectionValue().

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123]

 Map<String, Object> hlMap = new LinkedHashMap<>();
 for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
+  ExprValue val = entry.getValue();
+  if (val == null || val.isMissing() || val.isNull()) continue;
   hlMap.put(
       entry.getKey(),
-      entry.getValue().collectionValue().stream()
+      val.collectionValue().stream()
           .map(ExprValue::stringValue)
           .collect(Collectors.toList()));
 }
-return (Map<String, Object>) hlMap;
+return hlMap;
Suggestion importance[1-10]: 5

__

Why: Calling collectionValue() without checking if the value is a collection type could throw a runtime exception for unexpected highlight field value types. Adding a null/type guard improves robustness, though in practice highlight values from OpenSearch are always arrays.

Low
Return null for object format with empty fields

When the object format has no fields key (or an empty fields object), fields will be
an empty list. This causes getHighlightConfig() to return a HighlightConfig with
empty fields, which will be treated as valid by AstStatementBuilder (since it only
checks fields() != null && !fields().isEmpty()), but applyHighlightPushDown will
silently skip it. Consider returning null when fields is empty to be consistent with
the null-check in AstStatementBuilder.

ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [147-154]

 // Parse fields from "fields" object keys
 List<String> fields = new ArrayList<>();
 JSONObject fieldsObj = obj.optJSONObject("fields");
 if (fieldsObj != null) {
   for (String key : fieldsObj.keySet()) {
     fields.add(key);
   }
 }
+if (fields.isEmpty()) {
+  return null;
+}
Suggestion importance[1-10]: 4

__

Why: When the object format has no fields key or an empty fields object, returning null instead of an empty-fields HighlightConfig makes the behavior consistent with the AstStatementBuilder null-check and avoids silently creating a no-op highlight node.

Low
Suggestions up to commit 82443d8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against duplicate highlight column in schema

The pushDownHighlight method unconditionally adds _highlight to the schema even if
it already exists (e.g., when the rule fires twice or the field was already added by
LogicalHighlight.create). This can cause duplicate column errors. Add a guard
similar to the one in LogicalHighlight.create.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96]

 public RelNode pushDownHighlight(HighlightConfig highlightConfig) {
   RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder();
   schemaBuilder.addAll(getRowType().getFieldList());
-  schemaBuilder.add(
-      HighlightExpression.HIGHLIGHT_FIELD,
-      getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+  if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) {
+    schemaBuilder.add(
+        HighlightExpression.HIGHLIGHT_FIELD,
+        getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+  }
   CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build());
Suggestion importance[1-10]: 7

__

Why: The pushDownHighlight method unconditionally adds _highlight to the schema, while LogicalHighlight.create already guards against duplicates. If the rule fires multiple times or the field already exists, this could cause duplicate column errors. The fix mirrors the existing guard in LogicalHighlight.create.

Medium
Guard against non-collection highlight field values

The code calls entry.getValue().collectionValue() without checking whether the value
is actually a collection type. If a highlight field value is not a collection (e.g.,
a plain string), this will throw a ClassCastException or
UnsupportedOperationException at runtime. Add a null/type check before calling
collectionValue().

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123]

 Map<String, Object> hlMap = new LinkedHashMap<>();
 for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
+  ExprValue val = entry.getValue();
+  if (val == null || val.isMissing() || val.isNull()) continue;
   hlMap.put(
       entry.getKey(),
-      entry.getValue().collectionValue().stream()
+      val.collectionValue().stream()
           .map(ExprValue::stringValue)
           .collect(Collectors.toList()));
 }
-return (Map<String, Object>) hlMap;
+return hlMap;
Suggestion importance[1-10]: 5

__

Why: Calling collectionValue() without checking whether the ExprValue is actually a collection type could throw a runtime exception if highlight data is returned in an unexpected format. Adding a null/missing/null check before calling collectionValue() improves robustness.

Low
General
Clear highlight config after visiting children

The visitHighlight method sets the highlight config on the context and then visits
children, but never clears the config if visitRelation is not reached (e.g., if the
child is not a Relation node or if an exception occurs). This can cause stale
highlight config to leak into subsequent queries. The config should be cleared after
visitChildren completes, regardless of whether it was consumed.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207]

 @Override
 public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
   context.setHighlightConfig(node.getHighlightConfig());
   visitChildren(node, context);
+  context.setHighlightConfig(null); // Ensure config is cleared even if not consumed in visitRelation
   return context.relBuilder.peek();
 }
Suggestion importance[1-10]: 5

__

Why: The visitHighlight method sets highlightConfig on the context but only clears it inside visitRelation (with a comment "Clear for join safety"). If the child is not a Relation node or an exception occurs before visitRelation, the config could leak. Adding a cleanup after visitChildren is a reasonable defensive measure, though the current code already handles the common case.

Low
Return null for empty highlight fields configuration

If the OSD object format is provided without a fields key (or with an empty fields
object), fields will be an empty list. The AstStatementBuilder checks
!context.getHighlightConfig().fields().isEmpty() before injecting the Highlight
node, so an empty fields list silently skips highlighting without any error or
warning. Consider returning null from getHighlightConfig() when fields are empty to
make the behavior consistent and explicit.

ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [148-155]

 // Parse fields from "fields" object keys
 List<String> fields = new ArrayList<>();
 JSONObject fieldsObj = obj.optJSONObject("fields");
 if (fieldsObj != null) {
   for (String key : fieldsObj.keySet()) {
     fields.add(key);
   }
 }
+if (fields.isEmpty()) {
+  return null;
+}
Suggestion importance[1-10]: 4

__

Why: When the OSD object format is provided without a fields key or with an empty fields object, the current code returns a HighlightConfig with an empty fields list, which is silently ignored by AstStatementBuilder. Returning null in this case makes the behavior consistent with the null check in AstStatementBuilder and avoids silent no-ops.

Low
Suggestions up to commit d039af6
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against duplicate highlight column in schema

The pushDownHighlight method unconditionally adds _highlight to the schema even if
it already exists (e.g., when called multiple times or after LogicalHighlight.create
already added it). This can cause duplicate column errors. Add a guard similar to
the one in LogicalHighlight.create.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96]

 public RelNode pushDownHighlight(HighlightConfig highlightConfig) {
   RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder();
   schemaBuilder.addAll(getRowType().getFieldList());
-  schemaBuilder.add(
-      HighlightExpression.HIGHLIGHT_FIELD,
-      getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+  if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) {
+    schemaBuilder.add(
+        HighlightExpression.HIGHLIGHT_FIELD,
+        getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+  }
   CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build());
Suggestion importance[1-10]: 7

__

Why: The pushDownHighlight method unconditionally adds _highlight to the schema, which could cause duplicate column errors if called multiple times or if the field already exists. LogicalHighlight.create already guards against this, so consistency and safety require the same guard here.

Medium
General
Clear highlight config after visiting children

The visitHighlight method sets the highlight config on the context and then visits
children, but never clears the config if visitRelation is not reached (e.g., if the
child is not a Relation node but a subquery or join). This can cause stale highlight
config to leak into subsequent queries. The config should be cleared after
visitChildren completes regardless of whether it was consumed.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207]

 @Override
 public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
   context.setHighlightConfig(node.getHighlightConfig());
   visitChildren(node, context);
+  context.setHighlightConfig(null); // Ensure config is cleared even if not consumed
   return context.relBuilder.peek();
 }
Suggestion importance[1-10]: 5

__

Why: The visitHighlight method sets highlightConfig on the context but only clears it inside visitRelation. If the child plan doesn't reach visitRelation (e.g., subquery or join), the config leaks. Adding a clear after visitChildren is a reasonable defensive measure, though the current code already clears it in visitRelation for the common case.

Low
Guard against non-collection highlight field values

The highlights() method calls entry.getValue().collectionValue() without checking if
the value is actually a collection type. If a highlight field value is not a
collection (e.g., a plain string), this will throw a runtime exception. Add a
null/type check before calling collectionValue().

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123]

 Map<String, Object> hlMap = new LinkedHashMap<>();
 for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
+  ExprValue val = entry.getValue();
+  if (val == null || val.isMissing() || val.isNull()) continue;
   hlMap.put(
       entry.getKey(),
-      entry.getValue().collectionValue().stream()
+      val.collectionValue().stream()
           .map(ExprValue::stringValue)
           .collect(Collectors.toList()));
 }
-return (Map<String, Object>) hlMap;
+return hlMap;
Suggestion importance[1-10]: 5

__

Why: Calling collectionValue() without checking if the value is a collection type could throw a runtime exception for unexpected highlight field formats. Adding a null/type guard improves robustness, though in practice OpenSearch highlight responses are always arrays.

Low
Return null for empty highlight fields in object format

When the OSD object format is used but the fields key is absent or empty,
getHighlightConfig() returns a HighlightConfig with an empty fields list. The
AstStatementBuilder checks for non-empty fields before injecting the Highlight node,
so this case is silently ignored. However, applyHighlightPushDown also guards
against empty fields, creating inconsistent behavior. Consider returning null when
fields are empty to make the no-op case explicit and consistent.

ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [148-155]

 // Parse fields from "fields" object keys
 List<String> fields = new ArrayList<>();
 JSONObject fieldsObj = obj.optJSONObject("fields");
 if (fieldsObj != null) {
   for (String key : fieldsObj.keySet()) {
     fields.add(key);
   }
 }
+if (fields.isEmpty()) {
+  return null;
+}
Suggestion importance[1-10]: 4

__

Why: When the OSD object format has no fields key, an empty HighlightConfig is returned and silently ignored downstream. Returning null explicitly makes the no-op case consistent with the null checks in AstStatementBuilder and applyHighlightPushDown, improving clarity.

Low
Suggestions up to commit 9a272c4
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent duplicate highlight column in schema

The pushDownHighlight method unconditionally adds the _highlight field to the schema
without checking if it already exists. If pushDownHighlight is called more than once
(e.g., in a join scenario), the _highlight column will be duplicated in the row
type, causing downstream errors. Add a guard similar to the one in
LogicalHighlight.create.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-104]

 public RelNode pushDownHighlight(HighlightConfig highlightConfig) {
   RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder();
   schemaBuilder.addAll(getRowType().getFieldList());
-  schemaBuilder.add(
-      HighlightExpression.HIGHLIGHT_FIELD,
-      getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+  if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) {
+    schemaBuilder.add(
+        HighlightExpression.HIGHLIGHT_FIELD,
+        getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+  }
   CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build());
-  ...
+  newScan
+      .getPushDownContext()
+      .add(
+          PushDownType.HIGHLIGHT,
+          highlightConfig.fields(),
+          (OSRequestBuilderAction)
+              requestBuilder -> applyHighlightPushDown(requestBuilder, highlightConfig));
+  return newScan;
 }
Suggestion importance[1-10]: 7

__

Why: The pushDownHighlight method unconditionally adds _highlight to the schema without checking for duplicates, unlike LogicalHighlight.create which has this guard. This could cause issues in join scenarios or if called multiple times, making this a valid correctness concern.

Medium
Prevent stale highlight config state leakage

The visitHighlight method sets the highlight config on the context before visiting
children, but visitRelation consumes and clears it. If the highlight node wraps a
non-relation child (e.g., a filter or join), the config will be set but never
consumed, leaving stale state. Additionally, the config should be cleared in a
finally block to ensure cleanup even if visitChildren throws. Consider saving and
restoring the previous config to handle nested or complex plans safely.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207]

 @Override
 public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
+  HighlightConfig previous = context.getHighlightConfig();
   context.setHighlightConfig(node.getHighlightConfig());
-  visitChildren(node, context);
+  try {
+    visitChildren(node, context);
+  } finally {
+    // Restore previous config to avoid leaking state if not consumed
+    if (context.getHighlightConfig() == node.getHighlightConfig()) {
+      context.setHighlightConfig(previous);
+    }
+  }
   return context.relBuilder.peek();
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion raises a valid concern about stale highlightConfig state if visitChildren doesn't consume it (e.g., non-relation child). The finally block approach is a reasonable defensive pattern, though the current design clears it in visitRelation which is the expected child. The risk is real but limited to edge cases.

Low
Guard against non-collection highlight field values

The highlights() method calls entry.getValue().collectionValue() without checking
whether the value is actually a collection type. If a highlight field value is not a
collection (e.g., it's a string or null), this will throw a runtime exception. Add a
null/type check before calling collectionValue().

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123]

 Map<String, Object> hlMap = new LinkedHashMap<>();
 for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
+  ExprValue val = entry.getValue();
+  if (val == null || val.isMissing() || val.isNull()) {
+    continue;
+  }
   hlMap.put(
       entry.getKey(),
-      entry.getValue().collectionValue().stream()
+      val.collectionValue().stream()
           .map(ExprValue::stringValue)
           .collect(Collectors.toList()));
 }
-return (Map<String, Object>) hlMap;
+return hlMap;
Suggestion importance[1-10]: 6

__

Why: Calling collectionValue() without checking if the value is actually a collection type could throw a runtime exception for unexpected highlight field value types. The improved code adds null/missing/null checks and also removes the unnecessary cast, making it more robust.

Low
General
Return null for object format with no fields specified

If the OSD object format is provided without a fields key (e.g., {"pre_tags": [""],
"post_tags": ["
"]}), fields will be an empty list. This empty HighlightConfig will
later be filtered out in AstStatementBuilder (due to the !isEmpty() check), silently
ignoring the highlight request. Consider returning null or throwing a validation
error when fields is empty in the object format to give the user clear feedback.

ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [148-154]

 // Parse fields from "fields" object keys
 List<String> fields = new ArrayList<>();
 JSONObject fieldsObj = obj.optJSONObject("fields");
 if (fieldsObj != null) {
   for (String key : fieldsObj.keySet()) {
     fields.add(key);
   }
 }
+if (fields.isEmpty()) {
+  return null; // No fields specified; treat as no highlight
+}
Suggestion importance[1-10]: 4

__

Why: If the OSD object format lacks a fields key, the highlight request is silently ignored. Returning null explicitly when fields is empty makes the behavior more predictable, though the current silent-ignore behavior may be acceptable since the AstStatementBuilder already handles empty fields.

Low
Suggestions up to commit 3cfbe9a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against duplicate highlight column in schema

The pushDownHighlight method unconditionally adds _highlight to the schema even if
it already exists (e.g., when called multiple times or after LogicalHighlight.create
already added it). This can cause duplicate column errors. Add a guard similar to
the one in LogicalHighlight.create.

opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [89-96]

 public RelNode pushDownHighlight(HighlightConfig highlightConfig) {
     RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder();
     schemaBuilder.addAll(getRowType().getFieldList());
-    schemaBuilder.add(
-        HighlightExpression.HIGHLIGHT_FIELD,
-        getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+    if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) {
+        schemaBuilder.add(
+            HighlightExpression.HIGHLIGHT_FIELD,
+            getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY));
+    }
     CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build());
Suggestion importance[1-10]: 7

__

Why: The pushDownHighlight method unconditionally adds _highlight to the schema, which could cause duplicate column issues if called multiple times or after LogicalHighlight.create already added it. The fix mirrors the guard already present in LogicalHighlight.create.

Medium
Guard against non-collection highlight field values

The code calls entry.getValue().collectionValue() without checking whether the value
is actually a collection type. If a highlight field contains a non-collection
ExprValue, this will throw an exception at runtime. Add a null/type check before
calling collectionValue().

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123]

 Map<String, Object> hlMap = new LinkedHashMap<>();
 for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
+  ExprValue fieldVal = entry.getValue();
+  if (fieldVal == null || fieldVal.isMissing() || fieldVal.isNull()) {
+    continue;
+  }
   hlMap.put(
       entry.getKey(),
-      entry.getValue().collectionValue().stream()
+      fieldVal.collectionValue().stream()
           .map(ExprValue::stringValue)
           .collect(Collectors.toList()));
 }
-return (Map<String, Object>) hlMap;
+return hlMap;
Suggestion importance[1-10]: 5

__

Why: Calling collectionValue() on a non-collection ExprValue could throw a runtime exception. Adding a null/type check is a reasonable defensive measure, though in practice highlight values from OpenSearch are always arrays.

Low
General
Prevent highlight config leaking across unrelated queries

The visitHighlight method sets the highlight config on the context and then visits
children, but never clears the config if visitRelation is not reached (e.g., in
error paths or when the child is not a relation). More critically, if visitRelation
is not called (e.g., the child is already a built plan), the LogicalHighlight node
is never injected, silently dropping the highlight. Consider wrapping the child
result with LogicalHighlight directly here instead of relying on side-effects in
visitRelation.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [3202-3207]

 @Override
 public RelNode visitHighlight(Highlight node, CalcitePlanContext context) {
   context.setHighlightConfig(node.getHighlightConfig());
   visitChildren(node, context);
+  // If visitRelation did not consume the config (e.g. non-relation child), clear it to avoid leaking
+  if (context.getHighlightConfig() != null) {
+    context.setHighlightConfig(null);
+  }
   return context.relBuilder.peek();
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion addresses a potential config leak if visitRelation is not reached, but the improved_code doesn't actually fix the core issue of silently dropping the highlight — it just clears the config. The scenario described is an edge case and the existing code already clears the config in visitRelation.

Low
Return null for highlight config with no fields

If the OSD object format is provided without a fields key (or with an empty fields
object), fields will be an empty list. This causes getHighlightConfig() to return a
HighlightConfig with empty fields, which AstStatementBuilder will skip (due to the
!isEmpty() check), but applyHighlightPushDown also guards against it. However,
returning null instead of an empty-fields config would be more consistent and avoid
creating a no-op Highlight AST node. Consider returning null when fields is empty.

ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java [147-154]

 // Parse fields from "fields" object keys
 List<String> fields = new ArrayList<>();
 JSONObject fieldsObj = obj.optJSONObject("fields");
 if (fieldsObj != null) {
   for (String key : fieldsObj.keySet()) {
     fields.add(key);
   }
 }
+if (fields.isEmpty()) {
+  return null;
+}
Suggestion importance[1-10]: 4

__

Why: Returning null when fields is empty would be more consistent and avoids creating a no-op HighlightConfig. However, the existing guards in AstStatementBuilder and applyHighlightPushDown already handle the empty-fields case safely, making this a minor improvement.

Low

@RyanL1997 RyanL1997 added PPL Piped processing language v3.6.0 Issues targeting release v3.6.0 feature labels Mar 14, 2026
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0dacc0.

PathLineSeverityDescription
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java519lowUser-supplied highlight terms are wrapped in double quotes but not escaped before being passed to queryStringQuery(). A term containing a literal '"' character could partially break out of the phrase context and alter the Lucene query structure. Because PPL users are already authenticated and can issue arbitrary search queries, the practical exploitability is limited, but proper escaping of the term string (e.g., via QueryParser.escape) would be safer.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit f0dacc0

@github-actions
Copy link
Contributor

Persistent review updated to latest commit ab37b5b

@github-actions
Copy link
Contributor

Persistent review updated to latest commit a7c44d8

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 51356cd

@github-actions
Copy link
Contributor

Persistent review updated to latest commit b2d6afe

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 7bb526a

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e3b9c75.

PathLineSeverityDescription
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java514mediumUser-supplied field names from the API request's 'highlight' parameter are directly interpolated into a Lucene query string with only double-quote wrapping ('"' + term + '"'). While this helps with phrase matching, it does not sanitize Lucene special characters or escape sequences, creating a potential query string injection vector. A crafted field value like '* OR sensitive_field:*' could alter query semantics. This appears to be an unintentional coding oversight rather than malicious intent, as the pattern is consistent with the rest of the feature and the terms are at least wrapped in quotes.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit e3b9c75

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
…command

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 3cfbe9a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 82443d8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 124c42a

Signed-off-by: Jialiang Liang <jiallian@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit c87b5b8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PPL Piped processing language v3.6.0 Issues targeting release v3.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE RFC] PPL Search Result Highlighting [FEATURE] Supporting Query Highlight Feature into PPL API

1 participant