Skip to content

Create_sample function #8

@luminousmen

Description

@luminousmen

We need to develop a create_sample function that can generate sample data for testing, experimentation, or demonstration purposes. This function will generate synthetic data that closely mimics real data for various use cases.

AC:

  • Create a create_sample function for both Avro and Parquet util classes that can generate synthetic data based on a specified data schema or structure. Interface
def create_sample(schema_path: Path, n: Int):
    ...
  • Implement options to specify the number of records or rows to generate in the sample data.
  • Ensure that the generated sample data adheres to the provided data schema or structure.
  • Provide flexibility to generate both structured and semi-structured data, including support for nested structures if applicable.
  • It should support existing data sample files from the tests/data directory
  • Write unit and integration tests to validate the correctness of the function.
  • Integrate the function into the interface
  • Ensure the function is easy to use with a clear and well-documented API.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions