CSI node plugin fix for unstaging volumes by beornf · Pull Request #3178 · moby/swarmkit

beornf · 2024-06-12T18:42:56Z

- What I did
While running a CSI volume plugin that supports staging, I created a new swarm service that initiated an attempt to unpublish a cluster volume. The node agent calls NodeUnpublishVolume which returns no errors then NodeUnstageVolume which also returns no errors. However, there was no call to the underlying plugin driver for NodeUnstageVolume.

- How I did it
Fixed the unpublish check to only return in the failing condition that the volume was not unpublished.

- How to test it
Create a cluster volume using CSI volume plugin then trigger a publish and unpublish by creating a swarm service and restarting the service.

- Description for the changelog

Signed-off-by: Beorn Facchini <beornf@gmail.com>

beornf · 2024-06-12T19:01:34Z

Similarly, this early return in NodeUnpublishVolume skips the log which would be helpful in debugging:

swarmkit/agent/csi/plugin/plugin.go

Lines 366 to 373 in ea1a7ce

    
           if v, ok := np.volumeMap[req.ID]; ok { 
        
           	v.publishedPath = "" 
        
           	v.isPublished = false 
        
           	return nil 
        
           } 
        
           log.G(ctx).Info("volume unpublished") 
        
           return nil

thaJeztah

@dperny you're more familiar with this code; PTAL 🤗

thaJeztah · 2024-10-18T10:05:36Z

agent/csi/plugin/plugin.go

+	if v, ok := np.volumeMap[req.ID]; ok && v.isPublished {
+		return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID)
 	}



I'm not super-familiar with this code, but should this actually proceed if the volume was not found in np.volumeMap ?

i.e., the volume should be present if it was successfully staged. If a failure happened during staging, it wouldn't be added. Looking at NodeStageVolume further up;

swarmkit/agent/csi/plugin/plugin.go

Lines 231 to 239 in e8ecf83

if err != nil {

return err

}

v := &volumePublishStatus{

stagingPath: stagingTarget,

}

np.volumeMap[req.ID] = v

Perhaps this should be something like;

if v, ok := np.volumeMap[req.ID]; !ok || v.isPublished { // volume not found or is still published return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID) }

Or if it would be useful to have distinct errors for each situation

v, ok := np.volumeMap[req.ID if !ok { return status.Errorf(codes.FailedPrecondition, "Volume %s not found", req.ID) } if v.isPublished { return status.Errorf(codes.FailedPrecondition, "Volume %s is not unpublished", req.ID) }

Indeed the assumptions around state for unstaging would always hold true if np.volumeMap was made persistent:

swarmkit/agent/csi/plugin/plugin.go

Lines 60 to 63 in ea1a7ce

// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,

// remove it from the map for unstage.

// TODO: Make this map persistent if the swarm node goes down

volumeMap map[string]*volumePublishStatus

I recall during testing it is possible that a volume had been staged before the node daemon restarted and np.volumeMap was empty. In the method NodeUnpublishVolume it always unpublishes the volume irrespective of np.volumeMap.

olljanat · 2024-11-23T06:34:36Z

@beornf New test case for this would nice as CSI logic is still quite new which bugs like this exist. You can find examples from my PRs #3116 and #3123

CSI node plugin fix for unstaging volumes

13f03ac

Signed-off-by: Beorn Facchini <beornf@gmail.com>

beornf mentioned this pull request Jun 13, 2024

CSI volume bugs in Docker Swarm moby/moby#47974

Open

thaJeztah reviewed Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSI node plugin fix for unstaging volumes#3178

CSI node plugin fix for unstaging volumes#3178
beornf wants to merge 1 commit intomoby:masterfrom
beornf:node-plugin-unstage-volume

beornf commented Jun 12, 2024

Uh oh!

beornf commented Jun 12, 2024 •

edited

Loading

Uh oh!

thaJeztah left a comment

Uh oh!

thaJeztah Oct 18, 2024

Uh oh!

beornf Oct 18, 2024 •

edited

Loading

Uh oh!

olljanat commented Nov 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if err != nil {
	return err
	}

	v := &volumePublishStatus{
	stagingPath: stagingTarget,
	}

	np.volumeMap[req.ID] = v

	// volumeMap is the map from volume ID to Volume. Will place a volume once it is staged,
	// remove it from the map for unstage.
	// TODO: Make this map persistent if the swarm node goes down
	volumeMap map[string]*volumePublishStatus

Conversation

beornf commented Jun 12, 2024

Uh oh!

beornf commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah Oct 18, 2024

Choose a reason for hiding this comment

Uh oh!

beornf Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

olljanat commented Nov 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

beornf commented Jun 12, 2024 •

edited

Loading

beornf Oct 18, 2024 •

edited

Loading