Skip to content

[Bug]: IVFFLAT index create table function waiting complete data from source table before running kmeans clustering #23770

@cpegeric

Description

@cpegeric

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

main

Commit ID

94128cd

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

Currently, we use CROSS APPLY with source table and table function to create IVFFLAT index. With 5 billion vectors, the table function have to wait for complete dataset from source table before running kmeans clustering.
It takes 30 minutes to receive 5 billion vectors and kmeans clustering only takes 6 minutes to complete.

SELECT f.* from `%s`.`%s` AS %s CROSS APPLY ivf_create('%s', '%s', %s) AS f;

Expected Behavior

Change the table function not accept data from source table and run SELECT SQL to get the random sample data (small amount).

SELECT * from ivf_create('param_json, 'config_json') AS f;

Steps to Reproduce

create ivfflat index with 5 billion vectors

Additional information

No response

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions