Is there an existing issue for the same bug?
Branch Name
main
Commit ID
94128cd
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
Currently, we use CROSS APPLY with source table and table function to create IVFFLAT index. With 5 billion vectors, the table function have to wait for complete dataset from source table before running kmeans clustering.
It takes 30 minutes to receive 5 billion vectors and kmeans clustering only takes 6 minutes to complete.
SELECT f.* from `%s`.`%s` AS %s CROSS APPLY ivf_create('%s', '%s', %s) AS f;
Expected Behavior
Change the table function not accept data from source table and run SELECT SQL to get the random sample data (small amount).
SELECT * from ivf_create('param_json, 'config_json') AS f;
Steps to Reproduce
create ivfflat index with 5 billion vectors
Additional information
No response