Skip to content

manager/state/raft/storage: TestReadRepairWAL: fix for etcd v3.5.5#3095

Merged
thaJeztah merged 1 commit intomoby:masterfrom
thaJeztah:fix_TestReadRepairWAL
Nov 23, 2022
Merged

manager/state/raft/storage: TestReadRepairWAL: fix for etcd v3.5.5#3095
thaJeztah merged 1 commit intomoby:masterfrom
thaJeztah:fix_TestReadRepairWAL

Conversation

@thaJeztah
Copy link
Member

etcd server v3.5.5 includes a fix to prevent the reader from trying to read more data than available in the file, introduced in etcd-io/etcd@621cd7b

restrict the max size of each WAL entry to the remaining size of the file

Currently the max size of each WAL entry is hard coded as 10MB. If users
set a value > 10MB for the flag --max-request-bytes, then etcd may run
into a situation that it successfully processes a big request, but fails
to decode it when replaying the WAL file on startup.

On the other hand, we can't just remove the limitation, because if a
WAL entry is somehow corrupted, and its recByte is a huge value, then
etcd may run out of memory. So the solution is to restrict the max size
of each WAL entry as a dynamic value, which is the remaining size of
the WAL file.

This patch updates the test to have sufficient data available in the corrupted file to perform recovery.

etcd server v3.5.5 includes a fix to prevent the reader from trying to read
more data than available in the file, introduced in
etcd-io/etcd@621cd7b

> restrict the max size of each WAL entry to the remaining size of the file
>
> Currently the max size of each WAL entry is hard coded as 10MB. If users
> set a value > 10MB for the flag --max-request-bytes, then etcd may run
> into a situation that it successfully processes a big request, but fails
> to decode it when replaying the WAL file on startup.
>
> On the other hand, we can't just remove the limitation, because if a
> WAL entry is somehow corrupted, and its recByte is a huge value, then
> etcd may run out of memory. So the solution is to restrict the max size
> of each WAL entry as a dynamic value, which is the remaining size of
> the WAL file.

This patch updates the test to have sufficient data available in the corrupted
file to perform recovery.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Comment on lines -234 to -239
fName := filepath.Join(tempdir, files[0].Name())
fileContents, err := os.ReadFile(fName)
require.NoError(t, err)
info, err := files[0].Info()
require.NoError(t, err)
require.NoError(t, os.WriteFile(fName, fileContents[:200], info.Mode()))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced this with os.Truncate() below, which was shorter, and more descriptive what we're doing 😅

//
// - https://github.com/etcd-io/etcd/commit/621cd7b9e5aa2ccf634b555e4ebe0037b8975066 / https://github.com/etcd-io/etcd/pull/14127 (backport of https://github.com/etcd-io/etcd/pull/14122)
// - https://github.com/etcd-io/etcd/issues/14114
require.NoError(t, os.Truncate(filepath.Join(tempdir, files[0].Name()), 300))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively the fix is s/200/300/

@thaJeztah
Copy link
Member Author

Flaky test? (passing locally)

--- FAIL: TestSuccessfulRootRotation (72.33s)
    integration_test.go:698: 
        	Error Trace:	integration_test.go:698
        	Error:      	Received unexpected error:
        	            	remove node: rpc error: code = DeadlineExceeded desc = context deadline exceeded
        	Test:       	TestSuccessfulRootRotation

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@d05b6da). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master    #3095   +/-   ##
=========================================
  Coverage          ?   62.10%           
=========================================
  Files             ?      153           
  Lines             ?    24177           
  Branches          ?        0           
=========================================
  Hits              ?    15014           
  Misses            ?     7613           
  Partials          ?     1550           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@thaJeztah
Copy link
Member Author

@rumpl @corhere @vvoland PTAL

@thaJeztah
Copy link
Member Author

Bringing this one in as well, but happy to do follow-ups if there's issues with the change in the test, or if we need to improve this functionality (/cc @dperny)

@thaJeztah thaJeztah merged commit c137385 into moby:master Nov 23, 2022
@thaJeztah thaJeztah deleted the fix_TestReadRepairWAL branch November 23, 2022 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants