Skip to content

chore(deps): upgrade to DataFusion 52#1997

Merged
kevinjqliu merged 8 commits intoapache:mainfrom
ethan-tyler:chore/datafusion-52-validation
Feb 26, 2026
Merged

chore(deps): upgrade to DataFusion 52#1997
kevinjqliu merged 8 commits intoapache:mainfrom
ethan-tyler:chore/datafusion-52-validation

Conversation

@ethan-tyler
Copy link
Contributor

@ethan-tyler ethan-tyler commented Jan 6, 2026

Which issue does this PR close?

Validates and adopts DataFusion 52

What changes are included in this PR?

  • Upgrade DataFusion integration from 51.x to 52.x
  • Keep the DataFusion Python dependency dynamic within major version 52
  • Update the Python FFI table provider bridge for DataFusion 52 API/ABI expectations:
    • session-aware __datafusion_table_provider__(session) integration
    • DF52 compatible FFI table provider construction with task context and logical codec handling
  • Update sqllogictest physical plan expectations for DataFusion 52 planner output changes
  • Refresh lockfiles impacted by the upgrade

DataFusion FFI API change

DataFusion 52 expanded table provider FFI construction to include task context and optional logical codec parameters.

Updated the Rust/Python bridge accordingly allowing filter/logical expression serialization remains compatible across the FFI boundary.

Are these changes tested?

Yes

@ethan-tyler
Copy link
Contributor Author

The audit failure (RUSTSEC-2026-0001 for rkyv) is unrelated to this PR - it's being addressed in #1994. Will rebase once that lands.

@ethan-tyler
Copy link
Contributor Author

Fix for Python Bindings CI Failure

The initial PR failed the Bindings Python CI workflow due to a breaking API change in DataFusion 51+'s FFI module.

Root cause: FFI_TableProvider::new signature changed from 3 to 5 arguments.

Fix (commit 33d5608):

  • Added datafusion and datafusion-execution dependencies to bindings/python/Cargo.toml
  • Updated datafusion_table_provider.rs to create a TaskContextProvider from SessionContext and pass it to FFI_TableProvider::new

The core iceberg-rust crates were already compatible with DataFusion 52 - only the Python bindings needed this update.

@timsaucer
Copy link
Member

Please let me know if you run into difficulties with this PR also regarding the FFI change. I think that my approach in apache/datafusion-python#1337 will help resolve the missing elements here.

@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch 2 times, most recently from ae7b70e to 723e3a6 Compare January 23, 2026 23:59
@Smith-Cruise
Copy link

Is there any progress now?

@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 723e3a6 to a19062d Compare February 21, 2026 18:03
@ethan-tyler
Copy link
Contributor Author

Is there any progress now?

I rebased and got CI cleaned up. DF 52 Python wheels should be landing in the next day or so - https://lists.apache.org/thread/76v9pmqh7cflgjwx4wnqsmdzw00v62bl. To limit an additional follow up, I am waiting to include that and will open the PR.

@timsaucer
Copy link
Member

Let me know if you need any help with this PR.

@ethan-tyler
Copy link
Contributor Author

Let me know if you need any help with this PR.

Thanks Tim - it's been a hot minute since I worked on Iceberg and got myself into a CI pickle. I think we should be good after this most recent push. I'll let you know if I run into any trouble.

@ethan-tyler ethan-tyler changed the title [WIP] chore(deps): validate DataFusion 52 compatibility chore(deps): upgrade to DataFusion 52 Feb 23, 2026
@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 19690bf to 3309d6c Compare February 23, 2026 19:59
@ethan-tyler ethan-tyler marked this pull request as ready for review February 23, 2026 20:15
@mbutrovich mbutrovich self-requested a review February 23, 2026 20:20
assert (
datafusion.__version__ >= "45"
) # iceberg table provider only works for datafusion >= 45
if Version(datafusion.__version__) < Version("52.0.0"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this pr, but is it possible to extract the version from workspace Cargo.toml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - I can do this on a follow up PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 should take from datafusion-ffi

@ethan-tyler ethan-tyler force-pushed the chore/datafusion-52-validation branch from 3309d6c to 196c528 Compare February 24, 2026 04:22
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly lgtm, a couple nits

arrow-schema = "57.0"
arrow-select = "57.0"
arrow-string = "57.0"
arrow-arith = "57.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
datafusion 51.1.0 uses arrow-* 57.1
https://crates.io/crates/datafusion/52.1.0/dependencies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we usually bump these during release.

Previous upgrade PR didnt include this #1899


# monkey patch the __datafusion_table_provider__ method to the iceberg table
def __datafusion_table_provider__(self):
def __datafusion_table_provider__(self, session):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


use crate::runtime::runtime;

fn validate_pycapsule(capsule: &Bound<PyCapsule>, name: &str) -> PyResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert (
datafusion.__version__ >= "45"
) # iceberg table provider only works for datafusion >= 45
if Version(datafusion.__version__) < Version("52.0.0"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 should take from datafusion-ffi

datafusion = "51.0"
datafusion-cli = "51.0"
datafusion-sqllogictest = "51.0"
datafusion = "52.1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if it needs to block but 52.2 is close to release apache/datafusion#20287. It should be a trivial bump from 51.1 so we can always do it later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! lets do that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blackmwk pushed a commit that referenced this pull request Feb 26, 2026
## Which issue does this PR close?

- Closes #.

## What changes are included in this PR?

While reviewing #1997, i noticed a couple of improvements we can make

1. `make install` should install the local editable `pyiceberg-core` so
that i can do `make install && make test`
2. CI should fail on warnings 
3. test should not read local env files (we made a similar fix in
pyiceberg apache/iceberg-python#3006).
Otherwise, tests were reading `~/.pyiceberg.yaml` and polluting the runs

## Are these changes tested?

Yes. 

For (2), see that [it fails in
CI](https://github.com/apache/iceberg-rust/actions/runs/22410058261/job/64880767855?pr=2178)
before removing the use of deprecated `register_table_provider`
ethan-tyler and others added 2 commits February 25, 2026 21:27
Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thanks a bunch

@kevinjqliu kevinjqliu merged commit b24ab63 into apache:main Feb 26, 2026
20 checks passed
@kevinjqliu
Copy link
Contributor

Thanks for the PR @ethan-tyler lets move this forward and follow up with 52.2 if necessary.

Thanks everyone for the review

@ethan-tyler ethan-tyler deleted the chore/datafusion-52-validation branch February 26, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants