Skip to content

USE 449 - Update for DuckDB 1.5 compatibility#185

Open
ghukill wants to merge 1 commit intomainfrom
USE-449-handle-duckdb-v1.5
Open

USE 449 - Update for DuckDB 1.5 compatibility#185
ghukill wants to merge 1 commit intomainfrom
USE-449-handle-duckdb-v1.5

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Mar 13, 2026

Purpose and background context

Why these changes are being introduced:

DuckDB was updated from 1.4 to 1.5 (release notes), introducing breaking changes that affect this library. DuckDB 1.5 now strictly requires a home_directory to be set when the HOME env var is unset or empty (e.g. AWS Lambda cold starts), and fetch_record_batch() has been deprecated in favor of to_arrow_reader().

How this addresses that need:

  • Set home_directory in the _install_extensions fallback block before installing extensions, fixing IOException in Lambda environments
  • Replace deprecated fetch_record_batch() with to_arrow_reader() in dataset.py and embeddings.py (param renamed rows_per_batch to batch_size)

How can a reviewer manually see the effects of these changes?

Nothing to see! make test is sufficient, which was failing after upgrade to duckdb == 1.5.

Includes new or updated dependencies?

YES | NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced:

DuckDB was updated from 1.4 to 1.5, introducing breaking changes that
affect this library.  DuckDB 1.5 now strictly requires a home_directory
to be set when the HOME env var is unset or empty (e.g. AWS Lambda cold
starts), and fetch_record_batch() has been deprecated in favor of
to_arrow_reader().

How this addresses that need:

* Set home_directory in the _install_extensions fallback block before
  installing extensions, fixing IOException in Lambda environments
* Replace deprecated fetch_record_batch() with to_arrow_reader() in
  dataset.py and embeddings.py (param renamed rows_per_batch to
  batch_size)

Side effects of this change:

* None expected; to_arrow_reader() returns the same RecordBatchReader
  interface

Relevant ticket(s):

* https://mitlibraries.atlassian.net/browse/USE-449
@ghukill ghukill requested a review from a team as a code owner March 13, 2026 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant