Skip to content

TRT-2471: Consider mass failures during CR query#3285

Open
xueqzhan wants to merge 2 commits intoopenshift:mainfrom
xueqzhan:mass-failure
Open

TRT-2471: Consider mass failures during CR query#3285
xueqzhan wants to merge 2 commits intoopenshift:mainfrom
xueqzhan:mass-failure

Conversation

@xueqzhan
Copy link
Contributor

@xueqzhan xueqzhan commented Feb 19, 2026

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

	"install should succeed: overall",
	"[sig-cluster-lifecycle] Cluster completes upgrade",
	"[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added --exclude-mass-failures command-line flag to filter out tests that commonly fail across multiple components, improving accuracy of component readiness reports
    • Enhanced component readiness query engine to support exclusive test filtering for more granular control over reliability metrics

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 19, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from dgoodwin and stbenjam February 19, 2026 18:27
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

The PR implements exclusive test name filtering for component readiness reports. It introduces a new --exclude-mass-failures CLI flag, threads exclusiveTestNames through server initialization and query parameter parsing, and extends BigQuery query generation with priority-based filtering logic for exclusive tests.

Changes

Cohort / File(s) Summary
CLI and Server Initialization
cmd/sippy/component_readiness.go, cmd/sippy/serve.go, pkg/sippyserver/server.go
Added exclusiveTestNames parameter to NewServer and propagated it from GetMassFailureTestNames() into server construction and component report parsing logic.
Command-Line Flags
pkg/flags/component_readiness.go
Added ExcludeMassFailures boolean field and GetMassFailureTestNames() method that returns hard-coded test names when the flag is enabled.
Query Parameter Handling
pkg/api/componentreadiness/utils/queryparamparser.go, pkg/apis/api/componentreport/reqopts/types.go
Added ExclusiveTestNames field to Advanced struct and updated ParseComponentReportRequest to accept and assign exclusive test names to request options.
Query Generation Logic
pkg/api/componentreadiness/query/querygenerators.go, pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go
Extended query generation to accept exclusiveTestNames parameter, introduced priority-based CTEs (exclusive_test_priorities, jobs_with_highest_priority_test), updated BuildComponentReportQuery and FetchTestStatusResults signatures, and threaded exclusive test names through fallback release flow.
Tests
pkg/api/componentreadiness/query/querygenerators_test.go, pkg/api/componentreadiness/queryparamparser_test.go
Added comprehensive test coverage for exclusive test filtering behavior, CTE presence, query parameters, and comparative tests between queries with and without exclusive tests; updated existing test calls.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Single Responsibility And Clear Naming ⚠️ Warning PR adds fields to already oversized structs (Server with 24 fields, Advanced with 10 fields) instead of refactoring into focused sub-types, violating single responsibility principles. Refactor Advanced struct into sub-structs (AnalysisThresholds, FilterOptions, QueryOptions) and extract Server fields into focused component structs to reduce field counts below recommended ~7-field guideline.
Sql Injection Prevention ❓ Inconclusive Unable to review the specified file as direct access to code repositories is not available. Web search cannot retrieve private or internal code files. Access the file directly in your code repository or IDE to inspect the SQL query builders and verify that parameterized queries are used instead of string concatenation for user-provided values.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: introducing consideration of mass failures during component readiness queries.
Go Error Handling ✅ Passed Go error handling patterns properly implemented with errors checked and wrapped with context. No errors ignored. Pre-existing panic calls not part of new changes. Parameters handled safely without nil pointer dereferences.
Excessive Css In React Should Use Styles ✅ Passed This custom check is not applicable to the provided pull request. The check validates CSS styling practices in React components using the useStyles pattern, but this PR contains only Go backend code with no React components, JSX, inline CSS styling, or related frontend patterns.
✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xueqzhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Feb 19, 2026

@xueqzhan: This pull request references TRT-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

We use a list of key tests to detect mass failure situation. When one of those key tests appear in the failed test list in a job, we don't consider the other tests for the purpose of regression calculation. This will clear the board and only show regression in the key test component.

Current list of key tests:

  "install should succeed: overall",
  "[sig-cluster-lifecycle] Cluster completes upgrade",
  "[Jira:\"Test Framework\"] there should not be mass test failures",

The list is ordered. Only the lowest indexed one is counted if multiple key tests appear in a job's failed tests.

Summary by CodeRabbit

Release Notes

  • New Features
  • Added --exclude-mass-failures command-line flag to filter out tests that commonly fail across multiple components, improving accuracy of component readiness reports
  • Enhanced component readiness query engine to support exclusive test filtering for more granular control over reliability metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go (1)

541-567: ⚠️ Potential issue | 🟡 Minor

Remove unused exclusiveTestNames parameter from FetchTestStatusResults.

The exclusiveTestNames parameter is declared in the function signature but never used in the function body. Query-level filtering for exclusive test names is already handled by BuildComponentReportQuery (which incorporates the filtering into the SQL query itself), making this parameter redundant. Either remove the parameter or document why it's retained.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`
around lines 541 - 567, The FetchTestStatusResults function's signature includes
an unused exclusiveTestNames parameter; remove that parameter from the function
declaration (query.FetchTestStatusResults) and then update all call sites (e.g.,
this file's fallbackTestQueryGenerator.getTestFallbackRelease where it's called
as query.FetchTestStatusResults(ctx, baseQuery,
f.ReqOptions.AdvancedOption.ExclusiveTestNames)) to call the new two-argument
form query.FetchTestStatusResults(ctx, baseQuery). Ensure you update any other
references, associated unit tests, and imports accordingly so compilation
succeeds.
🧹 Nitpick comments (2)
pkg/api/componentreadiness/queryparamparser_test.go (1)

405-405: Consider adding test coverage for non-nil exclusiveTestNames.

The test correctly passes nil for backward compatibility, but there are no test cases verifying behavior when exclusiveTestNames is provided. Consider adding a test case to verify opts.AdvancedOption.ExclusiveTestNames is correctly populated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/queryparamparser_test.go` at line 405, Add a unit
test in queryparamparser_test.go that calls utils.ParseComponentReportRequest
with a non-nil exclusiveTestNames slice (instead of nil) and asserts that the
returned options.AdvancedOption.ExclusiveTestNames is populated with the same
values; specifically exercise ParseComponentReportRequest (the call in the
existing test that currently passes nil) and add assertions on the returned
options.AdvancedOption.ExclusiveTestNames to confirm it contains the expected
test names and preserves order/contents.
pkg/api/componentreadiness/query/querygenerators.go (1)

670-707: Remove unused exclusiveTestNames parameter from FetchTestStatusResults function signature.

The exclusiveTestNames parameter is accepted but never referenced in the function body. Exclusive test filtering is handled at query construction time in BuildComponentReportQuery, so this parameter serves no purpose in FetchTestStatusResults. Removing it will simplify the API.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 670 - 707,
The FetchTestStatusResults function accepts an unused exclusiveTestNames
parameter; remove exclusiveTestNames from the function signature of
FetchTestStatusResults and update all call sites to stop passing that argument
(the query already handles exclusivity in BuildComponentReportQuery), keeping
the function body unchanged other than the signature; ensure any imports or
interfaces that referenced the old signature are updated accordingly so
compilation succeeds.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 50-51: Remove the duplicated comment line "// partition." that
appears immediately before the dedupedJunitTable declaration; locate the comment
just above the dedupedJunitTable variable in querygenerators.go and delete the
extra duplicate so only a single "// partition." comment remains.

---

Outside diff comments:
In `@pkg/api/componentreadiness/middleware/releasefallback/releasefallback.go`:
- Around line 541-567: The FetchTestStatusResults function's signature includes
an unused exclusiveTestNames parameter; remove that parameter from the function
declaration (query.FetchTestStatusResults) and then update all call sites (e.g.,
this file's fallbackTestQueryGenerator.getTestFallbackRelease where it's called
as query.FetchTestStatusResults(ctx, baseQuery,
f.ReqOptions.AdvancedOption.ExclusiveTestNames)) to call the new two-argument
form query.FetchTestStatusResults(ctx, baseQuery). Ensure you update any other
references, associated unit tests, and imports accordingly so compilation
succeeds.

---

Nitpick comments:
In `@pkg/api/componentreadiness/query/querygenerators.go`:
- Around line 670-707: The FetchTestStatusResults function accepts an unused
exclusiveTestNames parameter; remove exclusiveTestNames from the function
signature of FetchTestStatusResults and update all call sites to stop passing
that argument (the query already handles exclusivity in
BuildComponentReportQuery), keeping the function body unchanged other than the
signature; ensure any imports or interfaces that referenced the old signature
are updated accordingly so compilation succeeds.

In `@pkg/api/componentreadiness/queryparamparser_test.go`:
- Line 405: Add a unit test in queryparamparser_test.go that calls
utils.ParseComponentReportRequest with a non-nil exclusiveTestNames slice
(instead of nil) and asserts that the returned
options.AdvancedOption.ExclusiveTestNames is populated with the same values;
specifically exercise ParseComponentReportRequest (the call in the existing test
that currently passes nil) and add assertions on the returned
options.AdvancedOption.ExclusiveTestNames to confirm it contains the expected
test names and preserves order/contents.

Comment on lines +50 to 51
// partition.
dedupedJunitTable = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate comment line.

Line 50 appears to be a duplicate of the comment ending on line 49 (// partition.). This looks like an editing artifact.

Proposed fix
 	// So, this sorts the data, partitioning by the 3-tuple of file_path/test_name/testsuite -
 	// preferring flakes, then successes, then failures, and we get the first row of each
 	// partition.
-	// partition.
 	dedupedJunitTable = `
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// partition.
dedupedJunitTable = `
// So, this sorts the data, partitioning by the 3-tuple of file_path/test_name/testsuite -
// preferring flakes, then successes, then failures, and we get the first row of each
// partition.
dedupedJunitTable = `
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/api/componentreadiness/query/querygenerators.go` around lines 50 - 51,
Remove the duplicated comment line "// partition." that appears immediately
before the dedupedJunitTable declaration; locate the comment just above the
dedupedJunitTable variable in querygenerators.go and delete the extra duplicate
so only a single "// partition." comment remains.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 19, 2026

@xueqzhan: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments