Handle the sequenced entries with a dynamic size in GCP's assignEntries()#862
Open
roger2hk wants to merge 1 commit intotransparency-dev:mainfrom
Open
Handle the sequenced entries with a dynamic size in GCP's assignEntries()#862roger2hk wants to merge 1 commit intotransparency-dev:mainfrom
assignEntries()#862roger2hk wants to merge 1 commit intotransparency-dev:mainfrom
Conversation
AlCutter
reviewed
Feb 23, 2026
| if err := e.Encode(sequencedEntries); err != nil { | ||
| return fmt.Errorf("failed to serialise batch: %v", err) | ||
| // If adding this entry would make the batch too big, we need to flush the original batch. | ||
| if len(sequencedEntries) > 0 && currentBatchByteSize+len(sequencedEntry.BundleData) > s.seqTableMaxBatchByteSize { |
Collaborator
There was a problem hiding this comment.
Unlikely (impossible?), but, what would happen if someone managed to get a single 10.1MB entry here...?
| var entries []*tessera.Entry | ||
| for i := range numEntries { | ||
| entries = append(entries, tessera.NewEntry(data)) | ||
| _ = i |
Comment on lines
+399
to
+410
| iter := db.Single().Read(ctx, "Seq", spanner.AllKeys(), []string{"seq"}) | ||
| defer iter.Stop() | ||
| for { | ||
| _, err := iter.Next() | ||
| if err == iterator.Done { | ||
| break | ||
| } | ||
| if err != nil { | ||
| t.Fatalf("Failed to count rows in Seq table: %v", err) | ||
| } | ||
| rowCount++ | ||
| } |
Collaborator
There was a problem hiding this comment.
Suggested change
| iter := db.Single().Read(ctx, "Seq", spanner.AllKeys(), []string{"seq"}) | |
| defer iter.Stop() | |
| for { | |
| _, err := iter.Next() | |
| if err == iterator.Done { | |
| break | |
| } | |
| if err != nil { | |
| t.Fatalf("Failed to count rows in Seq table: %v", err) | |
| } | |
| rowCount++ | |
| } | |
| if err := db.Single().Read(ctx, "Seq", spanner.AllKeys(), []string{"seq"}).Do(func(_ *spanner.Row) error { | |
| rowCount++ | |
| return nil | |
| }); err != nil { | |
| ... | |
| } } |
|
|
||
| var entries []*tessera.Entry | ||
| for i := range numEntries { | ||
| entries = append(entries, tessera.NewEntry(data)) |
Collaborator
There was a problem hiding this comment.
Is there an easy way to use data which is unique per leaf such that we can tell when consuming below that we're seeing each leaf coming through exactly once?
E.g. if there were a bug in the code which splits chunks of entries which replaced the first entry in each batch with the last entry from the previous, I think this test would currently pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The maximum size of data per cell in Cloud Spanner is 10 MiB. Splitting entries into 9 MiB into multiple rows in "Seq" table avoids errors when there are many huge entries in one batch.
https://docs.cloud.google.com/spanner/quotas#tables