Skip to content

Make parquet validation work with avsc schemas #7

@luminousmen

Description

@luminousmen

Currently, our Parquet validation process does not support Avro schemas (AVSC files). To enhance our data validation capabilities, we need to implement support for validating Parquet files using Avro schemas.

AC:

  • Implement a Parquet validation process that accepts Avro schema files (AVSC) as input. There is a todo in src/data_toolset/utils/parquet.py
  • Ensure that the validation process can validate Parquet files against the provided Avro schema.
  • Make sure it supports complex and nested data structures
  • If a Parquet file doesn't conform to the Avro schema, the validation process should identify and report schema violations or data inconsistencies.
  • Integrate the Avro schema validation process into the tool
  • It should support existing data sample files from the tests/data directory
  • Write unit and integration tests to validate the correctness of the function.
  • Integrate the function into the interface
  • Ensure the function is easy to use with a clear and well-documented API.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions