Skip to content

Optimize transferring array, dict, and composite (e.g., 2x faster for byte arrays)#4448

Merged
fxamacker merged 14 commits intomasterfrom
fxamacker/optimize-cadence-value-transfer
Mar 11, 2026
Merged

Optimize transferring array, dict, and composite (e.g., 2x faster for byte arrays)#4448
fxamacker merged 14 commits intomasterfrom
fxamacker/optimize-cadence-value-transfer

Conversation

@fxamacker
Copy link
Member

@fxamacker fxamacker commented Mar 4, 2026

Updates onflow/flow-go#8447 onflow/flow-go#8401
Requires onflow/atree#633

This optimization benefits EVM, FVM, etc. For example, an open yield vault transaction indirectly uses Transfer() more than 2900 times. More than 1800 of those 2900 Transfer() are optimized by this.

This PR optimizes copying arrays or maps that don't require deep copying or slab (aka payload) operations by using Atree's new Array.Copy() and OrderedMap.Copy(). For example, copying byte array is 2x faster and uses half the memory allocations.

EVMAddress and other frequently used data like hashes currently use byte arrays, so the 2x speedup and reduced memory use for copying them can be worthwhile. Other data such as Cadence enum is about 1.5x faster to copy.

Currently, ArrayValue, DictionaryValue, and CompositeValue use these Atree functions to copy arrays/maps in Transfer():

  • atree.NewArrayFromBatchData()
  • atree.NewMapFromBatchData()

With this PR, Cadence Transfer() uses Atree's Copy() if:

  • The array or map is small (a single slab container).
  • All elements are not containers or references to other slabs.
  • All elements don't wrap containers or references to other slabs.
  • All elements return true from Storable.CanCopy().

Otherwise, Transfer() continues to use NewArrayFromBatchData() or NewMapFromBatchData() for arrays or maps that require deep copy or slab operations.

NOTE: To be conservative, some Storable types use the old way of copying but it might be OK to use Atree's Copy() for some of them too if the Cadence team determines it would be safe. See "Caveats" section below for more details.

Benchmarks

Benchmarks to transfer Cadence [UInt8]:

                     │  before.txt  │              after.txt              │
                     │    sec/op    │   sec/op     vs base                │
ByteArrayTransfer-12   1528.5n ± 2%   755.6n ± 2%  -50.57% (p=0.000 n=20)

                     │ before.txt  │              after.txt              │
                     │    B/op     │    B/op      vs base                │
ByteArrayTransfer-12   1551.5 ± 0%   1014.0 ± 0%  -34.64% (p=0.000 n=20)

                     │ before.txt  │             after.txt              │
                     │  allocs/op  │ allocs/op   vs base                │
ByteArrayTransfer-12   14.000 ± 0%   7.000 ± 0%  -50.00% (p=0.000 n=20)

Benchmarks to transfer Cadence enum:

                      │ before.txt  │              after.txt              │
                      │   sec/op    │   sec/op     vs base                │
EMVAddressTransfer-12   2.692µ ± 1%   2.114µ ± 1%  -21.49% (p=0.000 n=20)

                      │  before.txt  │              after.txt               │
                      │     B/op     │     B/op      vs base                │
EMVAddressTransfer-12   3.163Ki ± 0%   2.407Ki ± 0%  -23.90% (p=0.000 n=20)

                      │ before.txt │             after.txt              │
                      │ allocs/op  │ allocs/op   vs base                │
EMVAddressTransfer-12   36.00 ± 0%   29.00 ± 0%  -19.44% (p=0.000 n=20)

Benchmarks to transfer EVMAddress (composite value with [UInt8: 20] field):

                │  before.txt  │              after.txt              │
                │    sec/op    │   sec/op     vs base                │
EnumTransfer-12   1228.0n ± 1%   815.6n ± 3%  -33.59% (p=0.000 n=20)

                │ before.txt  │             after.txt              │
                │    B/op     │    B/op     vs base                │
EnumTransfer-12   1719.0 ± 0%   843.5 ± 1%  -50.93% (p=0.000 n=20)

                │ before.txt │             after.txt              │
                │ allocs/op  │ allocs/op   vs base                │
EnumTransfer-12   19.00 ± 0%   13.00 ± 0%  -31.58% (p=0.000 n=20)

Caveats

To be conservative, CanCopy() returns false for these Storable types, and they continue to use the old way of copying:

  • NonStorable
  • AccountCapabilityControllerValue
  • PublishedValue
  • StorageCapabilityControllerValue

If needed, the Cadence team can look into these 4 types to determine if some of them are safe to copy their values using the new Atree Copy() function.

TODO

  • Cadence team (not me) needs to add new metering kinds. Done by @turbolent by adding them to this PR. 👍
  • Outside this PR, @janezpodhostnik needs to assign weights to the new metering kinds. 🙏

  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

This commit optimizes copying arrays or maps that don't
require deep copying or slab (aka register) ops by using Atree's
new Array.Copy() and OrderedMap.Copy().

Currently, ArrayValue, DictionaryValue, and CompositeValue
use these Atree functions to copy arrays/maps in Transfer():
- atree.NewArrayFromBatchData()
- atree.NewMapFromBatchData()

With this commit, Cadence Transfer() uses Atree's Copy() if:
- The array or map is small (a single slab container).
- All elements are not containers or references to other slabs.
- All elements don't wrap containers or references to other slabs.
- All elements return true from Storable.CanCopy().

Otherwise, Transfer() continues to use NewArrayFromBatchData() or
NewMapFromBatchData() for arrays or maps that require deep copy or
slab operations.

Benchmarks to transfer Cadence [UInt8]:

                     │  before.txt  │              after.txt              │
                     │    sec/op    │   sec/op     vs base                │
ByteArrayTransfer-12   1528.5n ± 2%   755.6n ± 2%  -50.57% (p=0.000 n=20)

                     │ before.txt  │              after.txt              │
                     │    B/op     │    B/op      vs base                │
ByteArrayTransfer-12   1551.5 ± 0%   1014.0 ± 0%  -34.64% (p=0.000 n=20)

                     │ before.txt  │             after.txt              │
                     │  allocs/op  │ allocs/op   vs base                │
ByteArrayTransfer-12   14.000 ± 0%   7.000 ± 0%  -50.00% (p=0.000 n=20)

Benchmarks to transfer Cadence enum:

                      │ before.txt  │              after.txt              │
                      │   sec/op    │   sec/op     vs base                │
EMVAddressTransfer-12   2.692µ ± 1%   2.114µ ± 1%  -21.49% (p=0.000 n=20)

                      │  before.txt  │              after.txt               │
                      │     B/op     │     B/op      vs base                │
EMVAddressTransfer-12   3.163Ki ± 0%   2.407Ki ± 0%  -23.90% (p=0.000 n=20)

                      │ before.txt │             after.txt              │
                      │ allocs/op  │ allocs/op   vs base                │
EMVAddressTransfer-12   36.00 ± 0%   29.00 ± 0%  -19.44% (p=0.000 n=20)

Benchmarks to transfer EVMAddress (composite value with [UInt8: 20] field):

                │  before.txt  │              after.txt              │
                │    sec/op    │   sec/op     vs base                │
EnumTransfer-12   1228.0n ± 1%   815.6n ± 3%  -33.59% (p=0.000 n=20)

                │ before.txt  │             after.txt              │
                │    B/op     │    B/op     vs base                │
EnumTransfer-12   1719.0 ± 0%   843.5 ± 1%  -50.93% (p=0.000 n=20)

                │ before.txt │             after.txt              │
                │ allocs/op  │ allocs/op   vs base                │
EnumTransfer-12   19.00 ± 0%   13.00 ± 0%  -31.58% (p=0.000 n=20)
@fxamacker fxamacker force-pushed the fxamacker/optimize-cadence-value-transfer branch from 3820385 to a6b78c4 Compare March 4, 2026 15:24
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Great idea to optimize this common case with a "fast path" 👏

The changes in the Transfer methods look good. Just a few minor TODOs / concerns:

  • Improve method naming (see atree PR)
  • Resolve question re: large values with separate slab
  • Add back computation metering
  • Add back tracing

fxamacker and others added 3 commits March 5, 2026 15:02
This refactor allows fewer types to qualify for skipping calling
`CanCopyNonRefSimple` for all elements in a container.
…into fxamacker/optimize-cadence-value-transfer
@fxamacker
Copy link
Member Author

@turbolent BTW, I spotted this while adding more comments.

It seems the raw value for enums is an integer subtype, which can be Int or UInt. Large Int and UInt can be stored in a separate slab, so we need to check data to determine if it is copyable.

So I removed the fast path for enum values.

@turbolent
Copy link
Member

@fxamacker Good catch! Would it be possible to add a test case for an enum with a very large integer that will trigger it to be stored in a separate slab? Such a regression test would help us accidentally re-add the fast path / optimization

Base automatically changed from fxamacker/optimize-array-and-go-byte-slice-conversion to master March 9, 2026 16:59
An enum value can be of any integer subtype, and very large integer
values are not copyable since they are stored in their own slabs.

This commit adds a unit test to test computation metering when
transferring a very large enum value with a raw value stored in
a separate slab.
@turbolent
Copy link
Member

Triggering stuck CI by closing and reopening

@turbolent turbolent closed this Mar 9, 2026
@turbolent turbolent reopened this Mar 9, 2026
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

OpenSSF Scorecard

PackageVersionScoreDetails
gomod/github.com/onflow/atree 0.13.1-0.20260305200207-dad78366916d 🟢 8.8
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1030 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Pinned-Dependencies🟢 10all dependencies are pinned
License🟢 10license file detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST🟢 10SAST tool is run on all commits
gomod/github.com/onflow/atree 0.13.1-0.20260305200207-dad78366916d 🟢 8.8
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1030 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 10
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Pinned-Dependencies🟢 10all dependencies are pinned
License🟢 10license file detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: some github tokens can't read classic branch protection rules: https://github.com/ossf/scorecard-action/blob/main/docs/authentication/fine-grained-auth-token.md
SAST🟢 10SAST tool is run on all commits

Scanned Files

  • go.mod
  • tools/compatibility-check/go.mod

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Benchstat comparison

  • Base branch: onflow:master
  • Base commit: 3f3470e
Results

old.txtnew.txt
time/opdelta
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ByteArrayValueToByteSlice-481.1ns ± 0%78.8ns ± 0%~(p=1.000 n=1+1)
ByteSliceToByteArrayValue-4933ns ± 0%940ns ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
ContractFunctionInvocation-4412µs ± 0%405µs ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
Emit-44.51ms ± 0%4.61ms ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
ExportType/composite_type-4266ns ± 0%282ns ± 0%~(p=1.000 n=1+1)
ExportType/simple_type-477.7ns ± 0%77.7ns ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ImperativeFib-424.2µs ± 0%22.5µs ± 0%~(p=1.000 n=1+1)
InterpretRecursionFib-42.23ms ± 0%2.20ms ± 0%~(p=1.000 n=1+1)
NewInterpreter/new_interpreter-4911ns ± 0%857ns ± 0%~(p=1.000 n=1+1)
NewInterpreter/new_sub-interpreter-4343ns ± 0%324ns ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
RuntimeFungibleTokenTransferInterpreter-4632µs ± 0%635µs ± 0%~(p=1.000 n=1+1)
RuntimeFungibleTokenTransferVM-4703µs ± 0%693µs ± 0%~(p=1.000 n=1+1)
RuntimeResourceDictionaryValues-42.71ms ± 0%2.66ms ± 0%~(p=1.000 n=1+1)
RuntimeResourceTracking-411.3ms ± 0%9.5ms ± 0%~(p=1.000 n=1+1)
RuntimeScriptNoop-415.3µs ± 0%15.0µs ± 0%~(p=1.000 n=1+1)
RuntimeVMInvokeContractImperativeFib-440.2µs ± 0%41.0µs ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ValueIsSubtypeOfSemaType-463.3ns ± 0%58.7ns ± 0%~(p=1.000 n=1+1)
 
alloc/opdelta
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ByteArrayValueToByteSlice-432.0B ± 0%32.0B ± 0%~(all equal)
ByteSliceToByteArrayValue-4839B ± 0%857B ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
ContractFunctionInvocation-4153kB ± 0%146kB ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
Emit-41.49MB ± 0%1.49MB ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
ExportType/composite_type-4120B ± 0%120B ± 0%~(all equal)
ExportType/simple_type-40.00B 0.00B ~(all equal)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ImperativeFib-48.30kB ± 0%8.30kB ± 0%~(all equal)
InterpretRecursionFib-41.19MB ± 0%1.19MB ± 0%~(p=1.000 n=1+1)
NewInterpreter/new_interpreter-4976B ± 0%976B ± 0%~(all equal)
NewInterpreter/new_sub-interpreter-4232B ± 0%232B ± 0%~(all equal)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
RuntimeFungibleTokenTransferInterpreter-4170kB ± 0%171kB ± 0%~(p=1.000 n=1+1)
RuntimeFungibleTokenTransferVM-4192kB ± 0%192kB ± 0%~(p=1.000 n=1+1)
RuntimeResourceDictionaryValues-41.76MB ± 0%1.76MB ± 0%~(p=1.000 n=1+1)
RuntimeResourceTracking-48.58MB ± 0%6.97MB ± 0%~(p=1.000 n=1+1)
RuntimeScriptNoop-48.07kB ± 0%8.05kB ± 0%~(p=1.000 n=1+1)
RuntimeVMInvokeContractImperativeFib-413.3kB ± 0%13.3kB ± 0%~(all equal)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ValueIsSubtypeOfSemaType-432.0B ± 0%32.0B ± 0%~(all equal)
 
allocs/opdelta
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ByteArrayValueToByteSlice-41.00 ± 0%1.00 ± 0%~(all equal)
ByteSliceToByteArrayValue-45.00 ± 0%5.00 ± 0%~(all equal)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
ContractFunctionInvocation-42.47k ± 0%2.42k ± 0%~(p=1.000 n=1+1)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
Emit-440.0k ± 0%40.0k ± 0%~(all equal)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
ExportType/composite_type-43.00 ± 0%3.00 ± 0%~(all equal)
ExportType/simple_type-40.00 0.00 ~(all equal)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ImperativeFib-4176 ± 0%176 ± 0%~(all equal)
InterpretRecursionFib-417.7k ± 0%17.7k ± 0%~(all equal)
NewInterpreter/new_interpreter-415.0 ± 0%15.0 ± 0%~(all equal)
NewInterpreter/new_sub-interpreter-44.00 ± 0%4.00 ± 0%~(all equal)
pkg:github.com/onflow/cadence/runtime goos:linux goarch:amd64
RuntimeFungibleTokenTransferInterpreter-43.27k ± 0%3.27k ± 0%~(all equal)
RuntimeFungibleTokenTransferVM-43.73k ± 0%3.73k ± 0%~(all equal)
RuntimeResourceDictionaryValues-436.7k ± 0%36.7k ± 0%~(all equal)
RuntimeResourceTracking-4144k ± 0%129k ± 0%~(p=1.000 n=1+1)
RuntimeScriptNoop-4114 ± 0%114 ± 0%~(all equal)
RuntimeVMInvokeContractImperativeFib-4424 ± 0%424 ± 0%~(all equal)
pkg:github.com/onflow/cadence/interpreter goos:linux goarch:amd64
ValueIsSubtypeOfSemaType-41.00 ± 0%1.00 ± 0%~(all equal)
 

@fxamacker
Copy link
Member Author

@fxamacker Good catch! Would it be possible to add a test case for an enum with a very large integer that will trigger it to be stored in a separate slab? Such a regression test would help us accidentally re-add the fast path / optimization

@turbolent Good idea! I pushed commit f685701 to add tests for enum value transfers for both small and large (over 500 bytes) enum values.

@turbolent
Copy link
Member

Looks good! I added one more assertion in 64ed851. Should be ready to be merged now.

@SupunS Can you also PTAL?

@fxamacker
Copy link
Member Author

Thanks @turbolent for adding new metering kinds.

Outside this PR, @janezpodhostnik will need to assign weights to the new metering kinds. 🙏

Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 👏

Copy link
Member

@SupunS SupunS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

This commit adds a new internal error StorableCopyError, which
is returned from Storable.CopyNonRefSimple() if the given
storable type can't be copied.
CopyNonRefSimple() is an optimized path for Transfer(), and both
approaches should return the same value.  For example, if Transfer()
returns a shallow copy, CopyNonRefSimple() should also return
a shallow copy.

This commit updates CopyNonRefSimple() to return the same value
as Transfer().  This commit also adds comments in both functions
to mention that returned values from these two functions should
be kept in sync.
@fxamacker fxamacker merged commit b5b2fb9 into master Mar 11, 2026
13 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants