Skip to content

Comments

Handle the sequenced entries with a dynamic size in GCP's assignEntries()#862

Open
roger2hk wants to merge 1 commit intotransparency-dev:mainfrom
roger2hk:gcp-assignentries
Open

Handle the sequenced entries with a dynamic size in GCP's assignEntries()#862
roger2hk wants to merge 1 commit intotransparency-dev:mainfrom
roger2hk:gcp-assignentries

Conversation

@roger2hk
Copy link
Contributor

@roger2hk roger2hk commented Feb 18, 2026

The maximum size of data per cell in Cloud Spanner is 10 MiB. Splitting entries into 9 MiB into multiple rows in "Seq" table avoids errors when there are many huge entries in one batch.

https://docs.cloud.google.com/spanner/quotas#tables

@roger2hk roger2hk requested a review from AlCutter February 18, 2026 11:48
@roger2hk roger2hk requested a review from a team as a code owner February 18, 2026 11:48
@roger2hk roger2hk added the bug Something isn't working label Feb 18, 2026
if err := e.Encode(sequencedEntries); err != nil {
return fmt.Errorf("failed to serialise batch: %v", err)
// If adding this entry would make the batch too big, we need to flush the original batch.
if len(sequencedEntries) > 0 && currentBatchByteSize+len(sequencedEntry.BundleData) > s.seqTableMaxBatchByteSize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely (impossible?), but, what would happen if someone managed to get a single 10.1MB entry here...?

var entries []*tessera.Entry
for i := range numEntries {
entries = append(entries, tessera.NewEntry(data))
_ = i
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for range numEntries {}

Comment on lines +399 to +410
iter := db.Single().Read(ctx, "Seq", spanner.AllKeys(), []string{"seq"})
defer iter.Stop()
for {
_, err := iter.Next()
if err == iterator.Done {
break
}
if err != nil {
t.Fatalf("Failed to count rows in Seq table: %v", err)
}
rowCount++
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
iter := db.Single().Read(ctx, "Seq", spanner.AllKeys(), []string{"seq"})
defer iter.Stop()
for {
_, err := iter.Next()
if err == iterator.Done {
break
}
if err != nil {
t.Fatalf("Failed to count rows in Seq table: %v", err)
}
rowCount++
}
if err := db.Single().Read(ctx, "Seq", spanner.AllKeys(), []string{"seq"}).Do(func(_ *spanner.Row) error {
rowCount++
return nil
}); err != nil {
...
} }


var entries []*tessera.Entry
for i := range numEntries {
entries = append(entries, tessera.NewEntry(data))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way to use data which is unique per leaf such that we can tell when consuming below that we're seeing each leaf coming through exactly once?

E.g. if there were a bug in the code which splits chunks of entries which replaced the first entry in each batch with the last entry from the previous, I think this test would currently pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants