Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jmencak The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2684aff to
0b70a87
Compare
d1499a6 to
e164eb4
Compare
Adds tests related to Odd integer cpu alignment with full-pcpus-only: false Assisted-by: Cursor v2.1.39 AI Attribution: AIA PAI Hin v1.0 Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Signed-off-by: Niranjan M.R <mniranja@redhat.com>
Problem
-------
When a PerformanceProfile is created or updated, two separate MachineConfigs
are written by two independent controllers:
- 50-nto-* written by the NTO operator controller
- 50-performance-* written by the PerformanceProfile controller
MCO batches MachineConfig renders with a 5-second renderDelay window.
If the two writes do not land within that window, MCO renders them
separately, causing two node reboots instead of just one.
Solution
--------
Replace racy in-memory bootcmdline cache with a Tuned CR generation-aware,
synchronization protocol between the operator controller and the
PerformanceProfile controller.
1. New node annotation `tuned.openshift.io/bootcmdline-deps`
The operator controller stores a dependency fingerprint on each Profile
CR annotation in the format:
<RELEASE_VERSION>,<tuned-name1>:<gen1>,...,<tuned-nameN>:<genN>
The tuned-daemon controller (pkg/tuned/controller.go) then propagates
this fingerprint to the Node annotation alongside the existing
tuned.openshift.io/bootcmdline annotation.
2. `allNodesHaveCurrentBootcmdlineDeps()`: eliminate the previous unreliable
in-memory bootcmdline cache by checking every node in the pool has computed
its bootcmdline from the same generation of Tuned CRs.
3. Added cross-controller synchronization primitive `pkg/sync/bootcmdline.go`:
- The PerformanceProfile controller calls
IsReady(pool, "performance-tuned-cr-name:gen") before writing
50-performance-*. If not ready it returns BootcmdlineNotReadyError
which requeues after 30s -- fallback mechanism.
- The operator controller calls SignalReady(pool, bootcmdlineDeps) via
a deferred call immediately after a successful 50-nto-* MachineConfig
create/update. The signal carries the full bootcmdlineDeps string so
the PerformanceProfile controller can verify it is based on the current
performance Tuned CR generation. SignalReady() also sends the pool name
to a channel, to immediately wake up PerformanceProfile controller
reconciliation.
4. The PerformanceProfile controller now always creates/updates the
Tuned CR first and returns early to wait for the bootcmdline-ready
signal before writing the MachineConfig.
5. GetMutatedTuned now returns (*Tuned, bool, error). The Tuned object is
always returned (even when unchanged) so callers can read its
.Generation field to construct the expected bootcmdline dependency
string without an extra API call.
The same protocol is applied to the HyperShift handler, using the NodePool
name as the pool key instead of the MCP name.
Unit tests are updated to reflect the new two-phase reconcile sequence
(first reconcile creates/updates Tuned; second reconcile creates
MachineConfig after signalBootcmdlineReady()).
|
/test all |
|
@jmencak: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.