-
Notifications
You must be signed in to change notification settings - Fork 0
Add basic write support #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
==========================================
+ Coverage 83.84% 87.03% +3.19%
==========================================
Files 5 9 +4
Lines 359 594 +235
==========================================
+ Hits 301 517 +216
- Misses 58 77 +19 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
vustef
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Gerald
| } | ||
|
|
||
| /// Replace the inner transaction | ||
| pub fn replace(&mut self, tx: Transaction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replace transaction instead of creating new IcebergTransaction? Just asking about the trade-offs so that we may pick the better out of the two approaches, I'm not sure at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like transactions are consumed when applying actions and a new transaction is returned by iceberg-rust. Hence, we always keep track of the latest returned such object via this here.
src/transaction.jl
Outdated
| error_msg = unsafe_string(error_message_ptr[]) | ||
| @ccall rust_lib.iceberg_destroy_cstring(error_message_ptr[]::Ptr{Cchar})::Cint | ||
| end | ||
| throw(IcebergException(error_msg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be safe to log this exception in raicode, given that it may contain arbitrary message that we just propagate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have decided the other day to log all exceptions from our fork here also in raicode. I took a short glimpse at all the error messages, and nothing unsafe appeared to me. We might give error messaging in general a second thought, but until this is still experimental, we can go with that
src/transaction.jl
Outdated
| # Free the now-empty DataFiles container and mark as consumed | ||
| # The Rust side took the Vec<DataFile> contents via std::mem::take, | ||
| # but we still need to free the IcebergDataFiles struct itself | ||
| free_data_files!(data_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a bit out of the ordinary that we do this here. Should we have a separate call for this? And maybe have a top level function that cleanups? Orthogonal to those, the cleanup should probably be in finally block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering could we free the IcebergDataFiles struct at the time we call std::mem::take? Probably not, but just in case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a finally block. I added this here, because manually freeing data files seemed annoying. One cannot really have a global function that cleans up everything, because the objects themselves are not really stored within one single object afaict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding freeing data files at mem::take: we can't free it there, because that only takes the inner vector of the struct, while the struct itself is still allocated, and needs to be freed via an explicit call.
src/transaction.jl
Outdated
| # Now use updated_table for subsequent operations | ||
| ``` | ||
| """ | ||
| function commit(tx::Transaction, catalog::Catalog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd pull (this) Transaction's function nearer to its struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/writer_tests.jl
Outdated
|
|
||
| # Test 5: Create transaction and append data files | ||
| println("\nTest 5: Committing data files via transaction...") | ||
| updated_table = RustyIceberg.transaction(table, catalog) do tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we call these functions that do cleanup like with_transaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/writer_tests.jl
Outdated
| value = [10.1, 20.2, 30.3] | ||
| ) | ||
| # Include field ID metadata matching the Iceberg schema (id=1, name=2, value=3) | ||
| colmeta = Dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any way to get schema from existing table, possibly convert it, and use it here for arrow.table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I added such a helper function
FastAppendactiontest_config.jlto avoid repetitive test config values in tests.Transaction API
Transaction(table)- Synchronous, creates transaction handleFastAppendAction- Accumulates data files from multiple writers to avoid commit conflictsadd_data_files(action, data_files)- ConsumesDataFiles(moves contents viastd::mem::take, frees container, setsptr = C_NULL)with_fast_append(tx) do action ... end- Helper: creates action, runs code, applies, cleans upwith_transaction(table, catalog) do tx ... end- Helper: creates tx, runs code, commits, cleans upWriter API
DataFileWriter- Tracks: Rust ptr, table ref, column metadata, associatedDataFilesclose_writer(writer)- ReturnsDataFiles, stores in writer for cleanupfree_writer!(writer)- Frees writer AND any unconsumedDataFileswith_data_file_writer(table) do ... end- Detaches returnedDataFilesfrom writer; caller responsible