Skip to content

Feature signed int tree#3

Open
BasementCat wants to merge 4 commits intomxmlnkn:masterfrom
BasementCat:feature-signed-int-tree
Open

Feature signed int tree#3
BasementCat wants to merge 4 commits intomxmlnkn:masterfrom
BasementCat:feature-signed-int-tree

Conversation

@BasementCat
Copy link
Contributor

Depends on #2 - adds support for signed ints as the values (use case - I have a lot of phashes that are signed and for various reasons can't change that)

@mxmlnkn
Copy link
Owner

mxmlnkn commented Nov 17, 2025

Looks good. Thanks for your contribution. I don't really see a reason why the signed int can't just be reinterpreted bit-wise as an unsigned int, but oh well, there also is not much of a reason against adding signed int support.

I was also using this library for similarity hashes. I settled on some dHash variant, if I remember correctly. I basically wanted to find duplicates in millions of images, but as you can see from the benchmark, even a single lookup in millions of hashes was slower than 1ms because the hash distance that I required was too large to make proper use of the BK-tree properties, as you can see in the scaling laws.

And you'd need to do this for all images, so you'll end up with a quadratic complexity for the full duplicate check. I assume there exist some more efficient duplicate check algorithms. Or at least, one might leverage the fact that it is similar to a matrix multiplication and implement cache-efficient blocked data processing and SIMD usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants