Skip to content

Conversation

@gbrgr
Copy link
Contributor

@gbrgr gbrgr commented Feb 4, 2026

  • Adds a transaction API supporting a FastAppend action
  • Adds write support by exposing a data file writer
  • Introduces test_config.jl to avoid repetitive test config values in tests.

Transaction API

  • Transaction(table) - Synchronous, creates transaction handle
  • FastAppendAction - Accumulates data files from multiple writers to avoid commit conflicts
  • add_data_files(action, data_files) - Consumes DataFiles (moves contents via std::mem::take, frees container, sets ptr = C_NULL)
  • with_fast_append(tx) do action ... end - Helper: creates action, runs code, applies, cleans up
  • with_transaction(table, catalog) do tx ... end - Helper: creates tx, runs code, commits, cleans up

Writer API

  • DataFileWriter - Tracks: Rust ptr, table ref, column metadata, associated DataFiles
  • close_writer(writer) - Returns DataFiles, stores in writer for cleanup
  • free_writer!(writer) - Frees writer AND any unconsumed DataFiles
  • with_data_file_writer(table) do ... end - Detaches returned DataFiles from writer; caller responsible

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 84.09091% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.03%. Comparing base (f6b0bcb) to head (11a7294).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/transaction.jl 72.97% 10 Missing ⚠️
src/data_file.jl 75.00% 2 Missing ⚠️
src/writer.jl 95.34% 2 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
+ Coverage   83.84%   87.03%   +3.19%     
==========================================
  Files           5        9       +4     
  Lines         359      594     +235     
==========================================
+ Hits          301      517     +216     
- Misses         58       77      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gbrgr gbrgr marked this pull request as ready for review February 9, 2026 11:41
@gbrgr gbrgr requested a review from vustef February 9, 2026 11:41
Copy link
Collaborator

@vustef vustef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gerald

}

/// Replace the inner transaction
pub fn replace(&mut self, tx: Transaction) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replace transaction instead of creating new IcebergTransaction? Just asking about the trade-offs so that we may pick the better out of the two approaches, I'm not sure at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like transactions are consumed when applying actions and a new transaction is returned by iceberg-rust. Hence, we always keep track of the latest returned such object via this here.

error_msg = unsafe_string(error_message_ptr[])
@ccall rust_lib.iceberg_destroy_cstring(error_message_ptr[]::Ptr{Cchar})::Cint
end
throw(IcebergException(error_msg))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be safe to log this exception in raicode, given that it may contain arbitrary message that we just propagate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have decided the other day to log all exceptions from our fork here also in raicode. I took a short glimpse at all the error messages, and nothing unsafe appeared to me. We might give error messaging in general a second thought, but until this is still experimental, we can go with that

# Free the now-empty DataFiles container and mark as consumed
# The Rust side took the Vec<DataFile> contents via std::mem::take,
# but we still need to free the IcebergDataFiles struct itself
free_data_files!(data_files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a bit out of the ordinary that we do this here. Should we have a separate call for this? And maybe have a top level function that cleanups? Orthogonal to those, the cleanup should probably be in finally block.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering could we free the IcebergDataFiles struct at the time we call std::mem::take? Probably not, but just in case...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a finally block. I added this here, because manually freeing data files seemed annoying. One cannot really have a global function that cleans up everything, because the objects themselves are not really stored within one single object afaict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding freeing data files at mem::take: we can't free it there, because that only takes the inner vector of the struct, while the struct itself is still allocated, and needs to be freed via an explicit call.

# Now use updated_table for subsequent operations
```
"""
function commit(tx::Transaction, catalog::Catalog)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd pull (this) Transaction's function nearer to its struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


# Test 5: Create transaction and append data files
println("\nTest 5: Committing data files via transaction...")
updated_table = RustyIceberg.transaction(table, catalog) do tx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we call these functions that do cleanup like with_transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

value = [10.1, 20.2, 30.3]
)
# Include field ID metadata matching the Iceberg schema (id=1, name=2, value=3)
colmeta = Dict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way to get schema from existing table, possibly convert it, and use it here for arrow.table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I added such a helper function

@gbrgr gbrgr merged commit 9b43d00 into main Feb 11, 2026
3 checks passed
@gbrgr gbrgr deleted the gb/transaction-api branch February 11, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants