Conversation
|
Looks good. Thanks for your contribution. I don't really see a reason why the signed int can't just be reinterpreted bit-wise as an unsigned int, but oh well, there also is not much of a reason against adding signed int support. I was also using this library for similarity hashes. I settled on some dHash variant, if I remember correctly. I basically wanted to find duplicates in millions of images, but as you can see from the benchmark, even a single lookup in millions of hashes was slower than 1ms because the hash distance that I required was too large to make proper use of the BK-tree properties, as you can see in the scaling laws. And you'd need to do this for all images, so you'll end up with a quadratic complexity for the full duplicate check. I assume there exist some more efficient duplicate check algorithms. Or at least, one might leverage the fact that it is similar to a matrix multiplication and implement cache-efficient blocked data processing and SIMD usage. |
99a0d6f to
2283735
Compare
Depends on #2 - adds support for signed ints as the values (use case - I have a lot of phashes that are signed and for various reasons can't change that)