-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
Currently, our Parquet validation process does not support Avro schemas (AVSC files). To enhance our data validation capabilities, we need to implement support for validating Parquet files using Avro schemas.
AC:
- Implement a Parquet validation process that accepts Avro schema files (AVSC) as input. There is a todo in src/data_toolset/utils/parquet.py
- Ensure that the validation process can validate Parquet files against the provided Avro schema.
- Make sure it supports complex and nested data structures
- If a Parquet file doesn't conform to the Avro schema, the validation process should identify and report schema violations or data inconsistencies.
- Integrate the Avro schema validation process into the tool
- It should support existing data sample files from the tests/data directory
- Write unit and integration tests to validate the correctness of the function.
- Integrate the function into the interface
- Ensure the function is easy to use with a clear and well-documented API.
Reactions are currently unavailable