Skip to content

Conversation

@goabhinav
Copy link
Contributor

No description provided.

@goabhinav goabhinav requested a review from a team as a code owner February 10, 2026 09:24
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @goabhinav, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical new test to validate the resilience of containerized services on network devices. By simulating a control processor failover, the test ensures that deployed containers and their associated volumes maintain their state and functionality, thereby enhancing the overall stability and reliability of the system's container supervisor capabilities.

Highlights

  • New Failover Test: Introduced a new test case (CNTR-3) to verify the persistence of containers and volumes across a control processor switchover (failover) event.
  • Test Procedure Documentation: Added a comprehensive README file outlining the summary, detailed procedure for building the test container, and the steps for the failover test itself.
  • Robust Recovery Verification: The test includes steps to deploy a container and volume, trigger a control processor switchover, and then verify that the container is still running and the volume persists after the device recovers.
Changelog
  • feature/container/failover/README.md
    • Added a new README file detailing the container supervisor failover test (CNTR-3).
    • Included instructions for building the test container and the step-by-step procedure for the failover test.
    • Documented the OpenConfig RPCs covered by the test.
  • feature/container/failover/failover_test.go
    • Added a new Go test file implementing the container supervisor failover test.
    • Implemented functions to deploy containers and volumes, trigger control processor switchovers, and verify post-failover state.
    • Included helper functions for finding standby RPs, verifying container states, checking volume existence, and waiting for DUT reboot.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@OpenConfigBot
Copy link

OpenConfigBot commented Feb 10, 2026

Pull Request Functional Test Report for #5071 / fa79887

No tests identified for validation.

Help

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new test for container supervisor failover (CNTR-3). The changes include a README.md file detailing the test procedure and a Go test file failover_test.go that implements the test. The test correctly follows the structure of setting up a container and volume, triggering a control processor switchover, and then verifying that the container and volume persist after recovery.

My review focuses on adherence to the repository's style guide. I've identified one instance where time.Sleep is used redundantly, which is discouraged. I've provided a suggestion to remove it to improve test efficiency. Otherwise, the code is well-written and the test logic is sound.


1. **Setup**: Using `gnoi.Containerz`, deploy a container image, create a volume, and start a container mounting that volume. Verify the container is running and the volume exists.
2. **Trigger Failover**: Identify the standby control processor using gNMI. Trigger a switchover using `gnoi.System.SwitchControlProcessor`.
3. **Verify Recovery**: Wait for the switchover to complete. Verify that the container started in step 1 is in `RUNNING` state and the volume still exists using `gnoi.Containerz`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check that RPCs to the container work as well.

@@ -0,0 +1,71 @@
# CNTR-3: Container Supervisor Failover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start but we should test several other scenarios like what happens when the backup is not available and containers are started. Do the containers get started when the backup returns?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how can we map this to gNOI calls on the vendor devices; can we manually kill and restart a containerz instance on the individual supervisors?

Additional scenario to consider:
What happens when the primary returns after a failover? Will the original containers still be available? Are modifications to containers/volumes that only reached the backup properly replicated back to the primary?

Could be simulated by calling SwitchControlProcessor twice. Although the SwitchControlProcessor implementation may just switch the handling supervisor without actually restarting the primary.


const (
imageName = "cntrsrv_image"
tag = "latest"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag is unused; can be dropped

Comment on lines +91 to +93
if err := verifyVolumeExists(ctx, cli, volName); err != nil {
t.Fatalf("Volume not found after creation: %v", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would consider moving this up after create volume (slightly easier to read)

if vol.Error != nil {
return fmt.Errorf("error listing volumes: %w", vol.Error)
}
if vol.Name == name {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no potential leading slash here?


## CNTR-3.1: Container Supervisor Failover

1. **Setup**: Using `gnoi.Containerz`, deploy a container image, create a volume, and start a container mounting that volume. Verify the container is running and the volume exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start a container mounting that volume; Is the container in failover_test.go actually mounting the volume?

@@ -0,0 +1,71 @@
# CNTR-3: Container Supervisor Failover

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how can we map this to gNOI calls on the vendor devices; can we manually kill and restart a containerz instance on the individual supervisors?

Additional scenario to consider:
What happens when the primary returns after a failover? Will the original containers still be available? Are modifications to containers/volumes that only reached the backup properly replicated back to the primary?

Could be simulated by calling SwitchControlProcessor twice. Although the SwitchControlProcessor implementation may just switch the handling supervisor without actually restarting the primary.

time.Sleep(switchoverWait)

// Refresh clients after reconnection.
cli = containerztest.Client(t, dut)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate that standby supervisor is not the same as previously obtained before attempting reconnection. We could end up reconnecting to the same supervisor here.

// Raw system client for switchover.
sysClient := dut.RawAPIs().GNOI(t).System()

t.Run("Setup", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably would like to see either a cleanup function at the end which removes everything this test created or perhaps a cleanup for the volume in case this test is ran >1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants