CSI node plugin fix for unstaging volumes#3178
Conversation
Signed-off-by: Beorn Facchini <beornf@gmail.com>
|
Similarly, this early return in swarmkit/agent/csi/plugin/plugin.go Lines 366 to 373 in ea1a7ce |
| if v, ok := np.volumeMap[req.ID]; ok && v.isPublished { | ||
| return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID) | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm not super-familiar with this code, but should this actually proceed if the volume was not found in np.volumeMap ?
i.e., the volume should be present if it was successfully staged. If a failure happened during staging, it wouldn't be added. Looking at NodeStageVolume further up;
swarmkit/agent/csi/plugin/plugin.go
Lines 231 to 239 in e8ecf83
Perhaps this should be something like;
if v, ok := np.volumeMap[req.ID]; !ok || v.isPublished {
// volume not found or is still published
return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}Or if it would be useful to have distinct errors for each situation
v, ok := np.volumeMap[req.ID
if !ok {
return status.Errorf(codes.FailedPrecondition, "Volume %s not found", req.ID)
}
if v.isPublished {
return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
}There was a problem hiding this comment.
Indeed the assumptions around state for unstaging would always hold true if np.volumeMap was made persistent:
swarmkit/agent/csi/plugin/plugin.go
Lines 60 to 63 in ea1a7ce
I recall during testing it is possible that a volume had been staged before the node daemon restarted and np.volumeMap was empty. In the method NodeUnpublishVolume it always unpublishes the volume irrespective of np.volumeMap.
- What I did
While running a CSI volume plugin that supports staging, I created a new swarm service that initiated an attempt to unpublish a cluster volume. The node agent calls
NodeUnpublishVolumewhich returns no errors thenNodeUnstageVolumewhich also returns no errors. However, there was no call to the underlying plugin driver forNodeUnstageVolume.- How I did it
Fixed the unpublish check to only return in the failing condition that the volume was not unpublished.
- How to test it
Create a cluster volume using CSI volume plugin then trigger a publish and unpublish by creating a swarm service and restarting the service.
- Description for the changelog