-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
We need to develop a create_sample function that can generate sample data for testing, experimentation, or demonstration purposes. This function will generate synthetic data that closely mimics real data for various use cases.
AC:
- Create a create_sample function for both Avro and Parquet util classes that can generate synthetic data based on a specified data schema or structure. Interface
def create_sample(schema_path: Path, n: Int):
...- Implement options to specify the number of records or rows to generate in the sample data.
- Ensure that the generated sample data adheres to the provided data schema or structure.
- Provide flexibility to generate both structured and semi-structured data, including support for nested structures if applicable.
- It should support existing data sample files from the tests/data directory
- Write unit and integration tests to validate the correctness of the function.
- Integrate the function into the interface
- Ensure the function is easy to use with a clear and well-documented API.
Reactions are currently unavailable