Skip to content

feat: check schema capability when adding parquet files#2158

Open
charlesdong1991 wants to merge 1 commit intoapache:mainfrom
charlesdong1991:feat/schema-compat-check
Open

feat: check schema capability when adding parquet files#2158
charlesdong1991 wants to merge 1 commit intoapache:mainfrom
charlesdong1991:feat/schema-compat-check

Conversation

@charlesdong1991
Copy link
Contributor

Which issue does this PR close?

What changes are included in this PR?

Adds schema compatibility validation when converting existing parquet files to data files
via parquet_files_to_data_files.

Are these changes tested?

Yes, tests are passing locally

@kevinjqliu kevinjqliu force-pushed the feat/schema-compat-check branch from 0141cdc to 354d9b5 Compare February 24, 2026 02:21
Copy link
Contributor

@jdockerty jdockerty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite reasonable to me 👍

Comment on lines +437 to +438
/// in the file schema (looked up by field ID). Follows the same semantics as
/// iceberg-python's `_check_schema_compatible`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how likely it is that this method name would change, but it is perhaps worth linking to the specific commit that you've referenced this from?

That way the details are clear from looking at the reference implementation.

/// Returns `true` if `self` can be promoted to `target` per the Iceberg spec's
/// valid type promotion rules.
///
/// See <https://iceberg.apache.org/spec/#schema-evolution>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great reference 💯

Comment on lines +328 to +329
.map(|s| serde_json::from_str::<NameMapping>(s))
.transpose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this transpose usage required?

If I'm understanding this rightly, it is being used to flip the map output on Err (after from_str) for the map_err to bubble up afterwards.

Although it seems like some indirection when a match would be more explicit/easy to follow too.

This may just be a preference thing though, feel free to disregard 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check parquet file schema compatibility when adding parquet file in fast append action.

2 participants