Skip to content

Comments

4.22 mc 5s#1469

Draft
jmencak wants to merge 7 commits intoopenshift:mainfrom
jmencak:4.22-mc-5s
Draft

4.22 mc 5s#1469
jmencak wants to merge 7 commits intoopenshift:mainfrom
jmencak:4.22-mc-5s

Conversation

@jmencak
Copy link
Contributor

@jmencak jmencak commented Feb 9, 2026

No description provided.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 12, 2026
@jmencak jmencak force-pushed the 4.22-mc-5s branch 8 times, most recently from 2684aff to 0b70a87 Compare February 18, 2026 15:10
@jmencak jmencak force-pushed the 4.22-mc-5s branch 2 times, most recently from d1499a6 to e164eb4 Compare February 19, 2026 19:10
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2026
mrniranjan and others added 7 commits February 20, 2026 09:21
Adds tests related to Odd integer cpu alignment
with full-pcpus-only: false

Assisted-by: Cursor v2.1.39
AI Attribution: AIA PAI Hin v1.0

Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Problem
-------
When a PerformanceProfile is created or updated, two separate MachineConfigs
are written by two independent controllers:

  - 50-nto-*         written by the NTO operator controller
  - 50-performance-* written by the PerformanceProfile controller

MCO batches MachineConfig renders with a 5-second renderDelay window.
If the two writes do not land within that window, MCO renders them
separately, causing two node reboots instead of just one.

Solution
--------
Replace racy in-memory bootcmdline cache with a Tuned CR generation-aware,
synchronization protocol between the operator controller and the
PerformanceProfile controller.

1. New node annotation `tuned.openshift.io/bootcmdline-deps`
   The operator controller stores a dependency fingerprint on each Profile
   CR annotation in the format:
     <RELEASE_VERSION>,<tuned-name1>:<gen1>,...,<tuned-nameN>:<genN>
   The tuned-daemon controller (pkg/tuned/controller.go) then propagates
   this fingerprint to the Node annotation alongside the existing
   tuned.openshift.io/bootcmdline annotation.

2. `allNodesHaveCurrentBootcmdlineDeps()`: eliminate the previous unreliable
   in-memory bootcmdline cache by checking every node in the pool has computed
   its bootcmdline from the same generation of Tuned CRs.

3. Added cross-controller synchronization primitive `pkg/sync/bootcmdline.go`:

   - The PerformanceProfile controller calls
     IsReady(pool, "performance-tuned-cr-name:gen") before writing
     50-performance-*.  If not ready it returns BootcmdlineNotReadyError
     which requeues after 30s -- fallback mechanism.
   - The operator controller calls SignalReady(pool, bootcmdlineDeps) via
     a deferred call immediately after a successful 50-nto-* MachineConfig
     create/update.  The signal carries the full bootcmdlineDeps string so
     the PerformanceProfile controller can verify it is based on the current
     performance Tuned CR generation.  SignalReady() also sends the pool name
     to a channel, to immediately wake up PerformanceProfile controller
     reconciliation.

4. The PerformanceProfile controller now always creates/updates the
   Tuned CR first and returns early to wait for the bootcmdline-ready
   signal before writing the MachineConfig.

5. GetMutatedTuned now returns (*Tuned, bool, error).  The Tuned object is
   always returned (even when unchanged) so callers can read its
   .Generation field to construct the expected bootcmdline dependency
   string without an extra API call.

The same protocol is applied to the HyperShift handler, using the NodePool
name as the pool key instead of the MCP name.

Unit tests are updated to reflect the new two-phase reconcile sequence
(first reconcile creates/updates Tuned; second reconcile creates
MachineConfig after signalBootcmdlineReady()).
@jmencak
Copy link
Contributor Author

jmencak commented Feb 20, 2026

/test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 20, 2026

@jmencak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-pao-workloadhints 7bd2b4c link true /test e2e-gcp-pao-workloadhints

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants