perf: use aligned pointer reads for SparkUnsafeRow field accessors#3670
Conversation
SparkUnsafeRow field offsets are always 8-byte aligned (the JVM guarantees 8-byte alignment on the base address, bitset_width is a multiple of 8, and each field slot is 8 bytes). This means we can safely use ptr::read() instead of the from_le_bytes(slice) pattern for all typed accesses, avoiding slice creation and try_into overhead. Move primitive accessor implementations out of the SparkUnsafeObject trait defaults and into each concrete impl via a macro parameterized on the read method: - SparkUnsafeRow uses ptr::read() (aligned) - SparkUnsafeArray uses ptr::read_unaligned() (may be unaligned when nested in a row's variable-length region) Also switch is_null_at/set_not_null_at in SparkUnsafeRow from read_unaligned/write_unaligned to aligned read/write, since the null bitset is always at 8-byte aligned offsets within the row.
The test_append_null_struct_field_to_struct_builder test used a plain [u8; 16] stack buffer with no alignment guarantee. Since is_null_at performs aligned i64 reads, Miri flags this as undefined behavior when the buffer lands at a non-8-byte-aligned address. Wrap the buffer in a #[repr(align(8))] struct to match the alignment that real Spark UnsafeRow data always has from JVM memory.
729f044 to
8bc5761
Compare
mbutrovich
left a comment
There was a problem hiding this comment.
Thanks @andygrove! Like I mentioned before, this seems to compile down the same so not expecting performance differences, but it's more readable/maintainable.
|
Thanks for the review @mbutrovich |
|
I was just looking into this PR. |
AFAIK arrow-rs' position on big endian support has not changed (i.e., it does not support big endian). apache/arrow-rs#6917 (comment) It's possible we've acquired some code that supports both targets, but we likely only need to consider little endian. |
Which issue does this PR close?
Performance optimization for native shuffle row-to-columnar conversion.
Rationale for this change
The
SparkUnsafeObjecttrait previously used a one-size-fits-all approach for reading primitive values: creating a byte slice viafrom_raw_parts, then callingfrom_le_bytes(slice.try_into().unwrap()). This incurs unnecessary overhead from slice creation,try_into, andunwrapon every field access.The two implementors of
SparkUnsafeObjecthave fundamentally different alignment guarantees:bitset_widthis a multiple of 8, and each field slot is 8 bytes. This means alignedptr::read()is safe and optimal.ptr::read_unaligned()is required for correctness.What changes are included in this PR?
get_int,get_long,get_float,get_double, etc.) out of the trait defaults and into each concreteimplblock via aimpl_primitive_accessors!macro parameterized on the read method (readvsread_unaligned).SparkUnsafeRowusesptr::read()(aligned) — avoids thefrom_le_bytes+ slice overhead.SparkUnsafeArrayusesptr::read_unaligned()— correct for potentially unaligned data.is_null_atandset_not_null_atinSparkUnsafeRowfromread_unaligned/write_unalignedto alignedread/write, since the null bitset words are always at 8-byte aligned offsets within the row.How are these changes tested?
Existing tests cover these code paths. The change is purely an optimization of pointer read methods — no behavioral change.
cargo clippyandcargo checkpass cleanly.