-
Notifications
You must be signed in to change notification settings - Fork 1
Duckdb #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
out of the generator proposer
DuckDB usage documentation Fixed DuckDB parquet output dump-data --output as-directory choice proposal crash fixed
- Generator writers `go_to` can cope with table names that have dots. - `dump-data --parquet` can cope with `TIMESTAMP`s - Foreign Keys to ignored tables fixed.
|
Finally this all works! It's actually fairly easy to fake parquet files now; see the |
|
Ahh nice, I'll pencil in some time next week hopefully to review. Appreciate it Tim |
stefpiatek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh this is very fun. Thanks for working on this and getting the translation so it works ❤️
| except TypeError: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh when are we expecting this to happen, and if so do we want to log it?
| RowCounts = Counter[str] | ||
|
|
||
|
|
||
| @compiles(CreateColumn, "duckdb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh this is fun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty nasty actually. But yes, fun that this hook exists!
| if fk_bits[0] not in tables_dict: | ||
| return False | ||
| return bool(tables_dict[fk_bits[0]].get("ignore", False)) | ||
| (table, _column) = split_column_full_name(fk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh I'm not too sure what was happening before but I think this makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, one of those "did this ever work?" moments...
| column_types = { | ||
| column: _dtype_to_sql(dtype) for column, dtype in table.dtypes.items() | ||
| } | ||
| name_pref = name[: name.rfind(".")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get to a point where the name doesn't have a .?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be if the file doesn't have an extension such as .parquet, but you are right that this needs some sort of defense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's fine. That expression works even if no dot is found.
| if last_part in table_names: | ||
| table_names.append(f"{last_part}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting that this has swapped from first to last
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that previously if there was only one part it was called first_part, now it's called last_part because that's how the new split_column_full_name function does it.
DuckDB does not work without this change; it uses the PostgreSQL dialect with minor changes, but it really needs a couple more.
This change adds DuckDB as a SQLAlchemy plugin, and hooks into the SQL compilation process removing the PostgreSQL code that DuckDB does not understand.
dump-datahas also been updated to allow the dumping of all non-ignored non-vocabulary tables in one call, and also to dump the data as Parquet.So with
dump-datafor the destination and DuckDB's in-memory database for the source it is now possible to do Parquet-to-Parquet data faking without interacting directly with DuckDB at all! Seeduckdb.rstfor details.