Add test sharding, proactive clean, and retry logic for self-hosted CI by sbryngelson · Pull Request #1171 · MFlowCode/MFC

sbryngelson · 2026-02-19T20:01:54Z

User description

Summary

Shard Frontier GPU tests into 2 parts for faster parallel execution
Add proactive ./mfc.sh clean in Phoenix test scripts to prevent cross-compiler contamination from stale build artifacts
Add --requeue to Phoenix SLURM jobs for preemption recovery
Add lint-gate job that must pass before self-hosted tests run
Add retry logic for GitHub runner tests (retry ≤5 failures)
Restructure self-hosted matrix with explicit cluster names

Depends on: #1170 (for monitor_slurm_job.sh and build script changes)

Test plan

Frontier GPU tests run in 2 shards and complete within 2h
Phoenix tests pass with proactive clean (no stale artifact errors)
Lint-gate blocks self-hosted tests on lint failure
GitHub runner retry logic fires on ≤5 test failures

🤖 Generated with Claude Code

CodeAnt-AI Description

Speed up and harden CI runs for Frontier/Frontier (AMD) and Phoenix clusters

What Changed

GPU test jobs for Frontier and Frontier (AMD) are split into two shards so GPU tests run as separate parallel shards and the shard ID is shown in the job name
SLURM submit scripts use the CFD154 account, shorter walltime, and the batch partition with a hackathon QoS; Phoenix jobs now auto-requeue on preemption
Phoenix test workflow now runs a proactive clean of stale build artifacts before building and testing to avoid cross-compiler contamination
GitHub runner test step will retry only a small number of failed tests: if tests/failed_uuids.txt lists ≤5 failures, those failures are re-run automatically
Non-Phoenix build jobs are retried up to 3 times on the runner; each retry runs a clean first to reduce contamination

Impact

✅ Faster Frontier GPU test completion (parallel shards)
✅ Fewer Phoenix build/test failures due to stale artifacts
✅ Fewer whole-suite reruns for sporadic test failures (retries only failed tests)

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

New Features
- Shard-based test distribution and a command-line shard option to run subsets of tests
- Automatic retry of failed tests with targeted retries using saved failure lists
Improvements
- Test matrix updated to support shard-aware runs and controlled retry attempts
- Build step now includes a pre-clean to avoid cross-build contamination
- Job submission defaults adjusted for shorter runs, partitioning, and automatic requeue on preemption

- Shard Frontier GPU tests into 2 parts for faster parallel execution - Add proactive ./mfc.sh clean in Phoenix test scripts to prevent cross-compiler contamination from stale build artifacts - Add --requeue to Phoenix SLURM jobs for preemption recovery - Add lint-gate job that must pass before self-hosted tests run - Add retry logic for GitHub runner tests (retry <=5 failures) - Add Frontier AMD test support with dedicated submit/test scripts - Restructure self-hosted matrix with explicit cluster names Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-19T20:01:58Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

coderabbitai · 2026-02-19T20:02:19Z

📝 Walkthrough

Walkthrough

Adds test sharding and retry orchestration to CI, updates SLURM job directives (accounts, time, partition, QOS, requeue), introduces shard propagation through submission/test scripts, adds build cleanup, and extends test CLI with a shard option and shard-aware filtering and failure reporting.

Changes

Cohort / File(s)	Summary
SLURM submit scripts `.github/workflows/frontier/submit.sh`, `.github/workflows/frontier_amd/submit.sh`	Changed SLURM SBATCH directives: account ENG160 → CFD154, walltime reduced 05:59:00 → 01:59:00, partition extended → batch, added `--qos=hackathon` (frontier files). Added capture of `job_shard` from 4th script argument.
Test runner scripts (shard propagation) `.github/workflows/frontier/test.sh`, `.github/workflows/frontier_amd/test.sh`	Added `shard_opts` construction and conditional `--shard` forwarding; appended `$shard_opts` into GPU `mfc.sh` invocation so shard is passed when provided.
Phoenix CI changes `.github/workflows/phoenix/submit.sh`, `.github/workflows/phoenix/test.sh`	`submit.sh`: added `--requeue` SBATCH flag. `test.sh`: added `./mfc.sh clean` pre-clean step before build.
Workflow orchestration & matrix `.github/workflows/test.yml`	Added retry-on-failed-tests flow (writes/reads `tests/failed_uuids.txt` and retries up to threshold), made test matrix shard-aware (added shard entries and shard into job names), swapped Build step to a retry-enabled action for non-Phoenix clusters, passed shard arg into Test invocation, removed legacy Actions Node env vars.
Test CLI & filtering `toolchain/mfc/cli/commands.py`	Added `--shard` option (string, default None) to TEST_COMMAND.
Test execution logic `toolchain/mfc/test/test.py`	Implemented shard-based filtering parsing `i/n` and selecting subset of cases; records failed test UUIDs to `tests/failed_uuids.txt` for CI retry, removes file if no failures.

Sequence Diagram(s)

sequenceDiagram
  participant GH as "GitHub Actions\n(Workflow)"
  participant Runner as "Actions Runner\n(job matrix)"
  participant Submit as "submit.sh\n(cluster-specific)"
  participant SLURM as "SLURM (sbatch)"
  participant Node as "Compute Node\n(mfc.sh)"
  participant Test as "mfc test\n(toolchain test.py)"
  participant GHArtifacts as "GH workspace\n(tests/failed_uuids.txt)"

  GH->>Runner: start job (includes shard)
  Runner->>Submit: run submit.sh (pass shard)
  Submit->>SLURM: sbatch (includes SBATCH directives, --requeue where set)
  SLURM->>Node: allocate node & run job
  Node->>Test: invoke mfc.sh / mfc test --shard (if provided)
  Test->>GHArtifacts: write tests/failed_uuids.txt if failures
  GH->>GHArtifacts: check tests/failed_uuids.txt
  alt failures ≤ threshold
    GH->>Runner: rerun only failed UUIDs (retry flow)
  else too many failures
    GH->>GH: mark job failed
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Review effort 4/5, size:XL

Poem

🐰 Hopping through shards with a cheerful hop,

Jobs requeue and retries never stop,
Clean builds nibble stray artifact hay,
Tests split and scurry in orderly play,
A rabbit's small dance for CI's new day.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: test sharding for parallelization, proactive clean step for artifact management, and retry logic for CI resilience.
Description check	✅ Passed	The PR description covers key changes, motivation, and test plan. However, it lacks explicit completion of template checklist items (testing, documentation updates, GPU changes section).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codeant-ai · 2026-02-19T20:04:22Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Heredoc variable interpolation / injection risk The sbatch job script is constructed by expanding variables (e.g. `$sbatch_device_opts`, `$sbatch_script_contents`) into an unquoted heredoc passed to `sbatch`. If `sbatch_script_contents` contains characters that should be preserved (dollars, backticks, multiline constructs) they will be expanded at submit time rather than inside the job script, which can produce unexpected behavior or cause accidental expansion/injection. Ensure values are properly escaped or use a quoted heredoc so the intended content is delivered to the job. Unvalidated shard `job_shard` is taken directly from `$4` and passed into downstream scripts and commands (via `--shard` in test scripts) without sanitization. A malformed or malicious value could cause unexpected behavior. At minimum validate it's numeric and in the expected shard range (e.g. 1..2). Queue / account assumptions The SBATCH header now hardcodes account, partition and QoS (`CFD154`, `batch`, `hackathon`) and reduces walltime. These changes may be restricted or cause job rejections depending on cluster policy; they should be configurable or approved by cluster admins. Build verification The retry loop uses a dry-run invocation ("--dry-run") to decide whether the build succeeded. A dry-run may not exercise the real build/link steps (and can return success while the real build fails), so succeeding the dry-run does not guarantee the following actual test run will succeed. Consider running a real build or verifying artifacts instead of only a dry-run. GPU detection fragility GPU detection uses `nvidia-smi -L \| wc -l` and then feeds that into `seq` and `expr`. If `nvidia-smi` is not present or returns an error, `gpu_count` can be empty or non-numeric which will cause `seq`/`expr` to fail and break the script. Add robust fallbacks and validation.

.github/workflows/phoenix/test.sh

codeant-ai · 2026-02-19T20:04:38Z

CodeAnt AI finished reviewing your PR.

Copilot

Pull request overview

This PR enhances the self-hosted CI infrastructure with test sharding, proactive cleanup, and retry mechanisms to improve reliability and reduce execution time. It addresses cross-compiler contamination issues on persistent runners and enables faster parallel test execution on batch partition systems.

Changes:

Add retry logic for GitHub runner tests (≤5 failures trigger automatic retest)
Shard Frontier GPU tests into 2 parallel jobs for faster execution
Add proactive ./mfc.sh clean to Phoenix test scripts
Add --requeue flag to Phoenix SLURM jobs for preemption recovery
Wrap Frontier build steps in retry action with automatic cleanup
Update Frontier SLURM configuration (account, partition, timeout, QOS)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
.github/workflows/test.yml	Add retry logic for ≤5 test failures, add shard parameter to matrix, wrap builds in retry action, remove deprecated environment variables
.github/workflows/phoenix/test.sh	Add proactive `./mfc.sh clean` to prevent cross-compiler contamination
.github/workflows/phoenix/submit.sh	Add `--requeue` flag for automatic preemption recovery
.github/workflows/frontier/test.sh	Add shard parameter handling for test splitting
.github/workflows/frontier/submit.sh	Update SLURM config (account, partition, timeout, QOS) and add shard parameter
.github/workflows/frontier_amd/test.sh	Add shard parameter handling for test splitting
.github/workflows/frontier_amd/submit.sh	Update SLURM config (account, partition, timeout, QOS) and add shard parameter

.github/workflows/test.yml

.github/workflows/frontier/test.sh

.github/workflows/frontier_amd/test.sh

.github/workflows/frontier/test.sh

.github/workflows/frontier_amd/test.sh

.github/workflows/test.yml

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.github/workflows/test.yml (1)
265-274: ⚠️ Potential issue | 🟡 Minor

Log file references don't account for shard — will break if job_slug is fixed.

test-${{ matrix.device }}-${{ matrix.interface }}.out on line 267 assumes the output filename doesn't include a shard suffix. This is currently consistent with the submit scripts, but if the job_slug collision (flagged on frontier_amd/submit.sh) is fixed by incorporating the shard, these references must be updated in tandem.

Also, the artifact name on line 273 doesn't include shard, which could cause upload conflicts for sharded matrix entries with the same device/interface (e.g., two gpu-acc frontier shards). strategy.job-index makes it unique, but adding shard would improve clarity.
Proposed fix (apply after fixing job_slug in submit scripts)
      - name: Print Logs
        if:   always()
-        run:  cat test-${{ matrix.device }}-${{ matrix.interface }}.out
+        run:  cat test-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' }}.out

      - name: Archive Logs
        uses: actions/upload-artifact@v4
        if:   matrix.cluster != 'phoenix'
        with:
-          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}
+          name: logs-${{ strategy.job-index }}-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' }}
-          path: test-${{ matrix.device }}-${{ matrix.interface }}.out
+          path: test-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' }}.out
Note: The shard value contains / (e.g., 1/2) which is invalid in filenames. The submit script slug sanitization would need to handle this (e.g., replace / with -of-), and the workflow expressions here would need to match.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test.yml around lines 265 - 274, Update the Print Logs and
Archive Logs steps so the logfile and artifact name include the shard-aware slug
used by the submit scripts (instead of assuming test-${{ matrix.device }}-${{
matrix.interface }}.out). Locate the "Print Logs" and "Archive Logs" steps and
change the referenced filename and artifact name to incorporate the sanitized
job slug/shard token produced by the submit scripts (the same slug that replaces
"/" with a safe separator such as "-of-"); ensure the workflow expression that
builds the filename and the artifact "name" use that sanitized slug so filenames
and artifact names remain unique and valid across sharded jobs.
.github/workflows/frontier_amd/submit.sh (1)
31-32: ⚠️ Potential issue | 🔴 Critical

Job slug does not include shard — SLURM output files collide when sharded tests run concurrently.

When multiple shards for the same device/interface pair run on the same HPC cluster, they produce identical job_slug values (e.g., test-gpu-acc for both shard 1/2 and 2/2), resulting in identical output_file names. Since both SLURM jobs execute from the same SLURM_SUBMIT_DIR, one job's output will silently overwrite the other's. This affects both .github/workflows/frontier/submit.sh and .github/workflows/frontier_amd/submit.sh at line 31.

Incorporate the shard into the slug:
Proposed fix
-job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3"
+shard_suffix=""
+if [ -n "$4" ]; then
+    shard_suffix="-$(echo "$4" | sed 's|/|-of-|')"
+fi
+job_slug="`basename "$1" | sed 's/\.sh$//' | sed 's/[^a-zA-Z0-9]/-/g'`-$2-$3${shard_suffix}"
Additionally, update .github/workflows/test.yml line 267 and 273 to account for the shard suffix:

Line 267: cat test-${{ matrix.device }}-${{ matrix.interface }}.out → cat test-${{ matrix.device }}-${{ matrix.interface }}${{ matrix.shard != '' && format('-{0}', matrix.shard) || '' | replace('/', '-of-') }}.out

Line 273 artifact name: include shard suffix to match

The usage messages in both scripts (line 9) should also be updated to document the interface and shard parameters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/frontier_amd/submit.sh around lines 31 - 32, The job_slug
currently built by job_slug and used for output_file omits the shard, causing
name collisions; update the job_slug generation (the job_slug variable and any
references to output_file) to append the shard identifier (formatting the shard
like "-{shard}" and replacing "/" with "-of-" for values like "1/2") so each
shard produces a unique slug; also update the script usage message (the usage
text near the top that lists parameters) to document the interface and shard
parameters, and update the workflow steps that read and upload artifacts (the
cat command that reads test-${matrix.device}-${matrix.interface}.out and the
artifact name) to include the same shard suffix formatting so artifact names and
printed output match the new job_slug convention.

🧹 Nitpick comments (1)

.github/workflows/frontier_amd/submit.sh (1)

8-9: Usage message is outdated — does not document the interface or shard arguments.

The script accepts up to 4 positional arguments ($1=script, $2=device, $3=interface, $4=shard), but the usage string only mentions the first two.

Proposed fix

 usage() {
-    echo "Usage: $0 [script.sh] [cpu|gpu]"
+    echo "Usage: $0 [script.sh] [cpu|gpu] [none|acc|omp] [shard]"
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/frontier_amd/submit.sh around lines 8 - 9, The usage()
function's message is outdated and only mentions two arguments; update the echo
in usage() to document all supported positional params ($1 script, $2 device
(cpu|gpu), $3 interface, $4 shard) and any defaults or optional markers (e.g.,
"[interface]" "[shard]") so callers see the full signature; edit the echo inside
usage() to a single clear line listing script, device, interface, and shard and
optional/default semantics.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.github/workflows/frontier_amd/submit.sh:
- Around line 31-32: The job_slug currently built by job_slug and used for
output_file omits the shard, causing name collisions; update the job_slug
generation (the job_slug variable and any references to output_file) to append
the shard identifier (formatting the shard like "-{shard}" and replacing "/"
with "-of-" for values like "1/2") so each shard produces a unique slug; also
update the script usage message (the usage text near the top that lists
parameters) to document the interface and shard parameters, and update the
workflow steps that read and upload artifacts (the cat command that reads
test-${matrix.device}-${matrix.interface}.out and the artifact name) to include
the same shard suffix formatting so artifact names and printed output match the
new job_slug convention.

In @.github/workflows/test.yml:
- Around line 265-274: Update the Print Logs and Archive Logs steps so the
logfile and artifact name include the shard-aware slug used by the submit
scripts (instead of assuming test-${{ matrix.device }}-${{ matrix.interface
}}.out). Locate the "Print Logs" and "Archive Logs" steps and change the
referenced filename and artifact name to incorporate the sanitized job
slug/shard token produced by the submit scripts (the same slug that replaces "/"
with a safe separator such as "-of-"); ensure the workflow expression that
builds the filename and the artifact "name" use that sanitized slug so filenames
and artifact names remain unique and valid across sharded jobs.

---

Duplicate comments:
In @.github/workflows/frontier/submit.sh:
- Around line 31-32: job_slug and output_file are colliding for parallel shards
because they only use basename("$1") with $2 and $3; update the job_slug
generation (and derived output_file) to include an additional unique shard
identifier (for example a shard index/ID passed as another script argument or a
runtime value like the process/array task id) so each shard produces a distinct
job_slug and output_file; change the construction that sets job_slug and the
assignment of output_file to append that unique identifier.

---

Nitpick comments:
In @.github/workflows/frontier_amd/submit.sh:
- Around line 8-9: The usage() function's message is outdated and only mentions
two arguments; update the echo in usage() to document all supported positional
params ($1 script, $2 device (cpu|gpu), $3 interface, $4 shard) and any defaults
or optional markers (e.g., "[interface]" "[shard]") so callers see the full
signature; edit the echo inside usage() to a single clear line listing script,
device, interface, and shard and optional/default semantics.

cubic-dev-ai

1 issue found across 7 files

Confidence score: 4/5

Moderate risk only: the cleanup step in .github/workflows/phoenix/test.sh doesn’t check the ./mfc.sh clean exit status, so failures could allow stale artifacts to affect builds/tests.
This is a CI reliability concern rather than a direct product bug, so it’s likely safe to merge with minimal risk.
Pay close attention to .github/workflows/phoenix/test.sh - ensure cleanup failures don’t silently proceed.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/phoenix/test.sh">

<violation number="1" location=".github/workflows/phoenix/test.sh:5">
P2: The `./mfc.sh clean` exit status is not checked. If the clean fails, the script continues and may build/test against stale or corrupted artifacts, defeating the purpose of this proactive cleanup and causing hard-to-diagnose failures.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/workflows/phoenix/test.sh

The CI test scripts use --shard for splitting Frontier GPU tests across multiple jobs, and failed_uuids.txt for retry logic. These toolchain changes were missing from the cherry-pick. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

toolchain/mfc/test/test.py (1)

102-108: Shard filtering is correct — minor readability nit on line 104.

The validation logic handles all edge cases correctly (short-circuit or ensures int() is never called on non-digit strings), and i % shard_count == shard_idx - 1 correctly partitions cases without overlap. The placement after all other filters and before --percent is the right ordering.

Optional: the compound condition on line 104 can be split into guard clauses to improve readability:

♻️ Optional readability refactor

-        if len(parts) != 2 or not all(p.isdigit() for p in parts) or int(parts[1]) < 1 or not 1 <= int(parts[0]) <= int(parts[1]):
-            raise MFCException(f"Invalid --shard '{ARG('shard')}': expected 'i/n' with 1 <= i <= n (e.g., '1/2').")
+        def _bad_shard():
+            if len(parts) != 2 or not all(p.isdigit() for p in parts):
+                return True
+            n, i = int(parts[1]), int(parts[0])
+            return n < 1 or not (1 <= i <= n)
+        if _bad_shard():
+            raise MFCException(f"Invalid --shard '{ARG('shard')}': expected 'i/n' with 1 <= i <= n (e.g., '1/2').")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@toolchain/mfc/test/test.py` around lines 102 - 108, The compound validation
in the ARG("shard") block is correct but hard to read; refactor the conditional
inside the if ARG("shard") is not None: block by splitting the long compound
condition into explicit guard checks: first split = ARG("shard").split("/") and
verify length == 2, then check that both parts are digits (using
parts[0].isdigit() and parts[1].isdigit()), then parse shard_idx = int(parts[0])
and shard_count = int(parts[1]) and validate shard_count >= 1 and 1 <= shard_idx
<= shard_count; on any failure raise MFCException with the same message, then
compute skipped_cases and cases using shard_idx and shard_count as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@toolchain/mfc/test/test.py`:
- Around line 217-224: When abort_tests.is_set() causes test() to raise
MFCException the existing cleanup that writes/removes failed_uuids.txt (the
failed_uuids_path handling around failed_tests, open(...), os.remove(...)) is
skipped; modify the exception/exit path to always attempt to remove stale
failed_uuids_path and wrap file I/O (open and os.remove) in try/except catching
OSError (or Exception) so I/O errors are logged but do not replace the real exit
code—i.e., in the MFCException handler and/or finally block ensure you try to
delete failed_uuids_path if it exists and handle/log any OSError from
open()/os.remove() instead of letting it propagate.

---

Nitpick comments:
In `@toolchain/mfc/test/test.py`:
- Around line 102-108: The compound validation in the ARG("shard") block is
correct but hard to read; refactor the conditional inside the if ARG("shard") is
not None: block by splitting the long compound condition into explicit guard
checks: first split = ARG("shard").split("/") and verify length == 2, then check
that both parts are digits (using parts[0].isdigit() and parts[1].isdigit()),
then parse shard_idx = int(parts[0]) and shard_count = int(parts[1]) and
validate shard_count >= 1 and 1 <= shard_idx <= shard_count; on any failure
raise MFCException with the same message, then compute skipped_cases and cases
using shard_idx and shard_count as before.

coderabbitai · 2026-02-19T22:07:10Z

toolchain/mfc/test/test.py

+    # Write failed UUIDs to file for CI retry logic
+    failed_uuids_path = os.path.join(common.MFC_TEST_DIR, "failed_uuids.txt")
+    if failed_tests:
+        with open(failed_uuids_path, "w") as f:
+            for test_info in failed_tests:
+                f.write(test_info['uuid'] + "\n")
+    elif os.path.exists(failed_uuids_path):
+        os.remove(failed_uuids_path)


⚠️ Potential issue | 🟡 Minor

Stale failed_uuids.txt when the early-abort path fires.

When abort_tests.is_set() causes test() to raise MFCException at lines 192–203, execution never reaches lines 217–224. A failed_uuids.txt left by a previous run is not cleaned up. Depending on how the CI workflow gates the retry step, it could retry stale UUIDs from the prior run rather than (or in addition to) the current failures.

Additionally, unhandled I/O errors (permissions, disk full) in open()/os.remove() would propagate past exit(nFAIL), masking the real failure count in the process exit code.

🛡️ Suggested fix: clean stale file on abort + guard I/O

+ # Clean up any stale file from a previous run when aborting early + if abort_tests.is_set(): + ... # (existing abort exception block) + try: + if os.path.exists(failed_uuids_path := os.path.join(common.MFC_TEST_DIR, "failed_uuids.txt")): + os.remove(failed_uuids_path) + except OSError: + pass + raise MFCException(...) # Write failed UUIDs to file for CI retry logic failed_uuids_path = os.path.join(common.MFC_TEST_DIR, "failed_uuids.txt") - if failed_tests: - with open(failed_uuids_path, "w") as f: - for test_info in failed_tests: - f.write(test_info['uuid'] + "\n") - elif os.path.exists(failed_uuids_path): - os.remove(failed_uuids_path) + try: + if failed_tests: + with open(failed_uuids_path, "w") as f: + for test_info in failed_tests: + f.write(test_info['uuid'] + "\n") + elif os.path.exists(failed_uuids_path): + os.remove(failed_uuids_path) + except OSError: + pass # Non-fatal; CI retry logic may not fire but test results are unaffected

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@toolchain/mfc/test/test.py` around lines 217 - 224, When abort_tests.is_set() causes test() to raise MFCException the existing cleanup that writes/removes failed_uuids.txt (the failed_uuids_path handling around failed_tests, open(...), os.remove(...)) is skipped; modify the exception/exit path to always attempt to remove stale failed_uuids_path and wrap file I/O (open and os.remove) in try/except catching OSError (or Exception) so I/O errors are logged but do not replace the real exit code—i.e., in the MFCException handler and/or finally block ensure you try to delete failed_uuids_path if it exists and handle/log any OSError from open()/os.remove() instead of letting it propagate.

codecov · 2026-02-20T01:59:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.05%. Comparing base (d048c4b) to head (a7219b3).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
- Coverage   44.07%   44.05%   -0.02%     
==========================================
  Files          70       70              
  Lines       20431    20498      +67     
  Branches     1974     1990      +16     
==========================================
+ Hits         9004     9030      +26     
- Misses      10291    10329      +38     
- Partials     1136     1139       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings February 19, 2026 20:01

Copilot started reviewing on behalf of sbryngelson February 19, 2026 20:02 View session

codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Feb 19, 2026

codeant-ai bot reviewed Feb 19, 2026

View reviewed changes

.github/workflows/phoenix/test.sh Show resolved Hide resolved

Copilot AI reviewed Feb 19, 2026

View reviewed changes

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

cubic-dev-ai bot reviewed Feb 19, 2026

View reviewed changes

.github/workflows/phoenix/test.sh Show resolved Hide resolved

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Conversation

sbryngelson commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Test plan

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Summary by CodeRabbit

Uh oh!

codeant-ai bot commented Feb 19, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

codeant-ai bot commented Feb 19, 2026

Nitpicks 🔍

Uh oh!

Uh oh!

codeant-ai bot commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Comments

sbryngelson commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading