Better to use a few distributions of keys from production-like datasets, e.g., from ClickBench. Most of them will be Zipfian and also have different temporal locality.
Static size, no deleting. Everyone already knew that you can make faster hash tables when they never need to be resized, but nobody bothers doing that because it is pretty useless or at best niche.
Well, not to be completely dismissive here... It's clearly a prototype project to try and make quadratic probing a thing.
I'm not convinced this methology is better than linear probing (which then can be optimized easily into RobinHood hashes).
The only line I see about linear hashes is:
> Linear jumps (h, h+16, h+32...) caused 42% insert failure rate due to probe sequence overlap. Quadratic jumps spread groups across the table, ensuring all slots are reachable.
Which just seems entirely erroneous to me. How can linear probing fail? Just keep jumping until you find an open spot. As long as there is at least one open spot, you'll find it in O(n) time because you're just scanning the whole table.
Linear probing has a clustering problem. But IIRC modern CPUs have these things called L1 Cache/locality, meaning scanning all those clusters is stupidly fast in practice.
As far as I understand, hashbrown already does this. Hashbrown is based on Google's SwissTable, and this project references that SwissTable already does this optimisation.
The test does not look realistic: https://github.com/Cranot/grouped-simd-hashtable/blob/master...
Better to use a few distributions of keys from production-like datasets, e.g., from ClickBench. Most of them will be Zipfian and also have different temporal locality.
Static size, no deleting. Everyone already knew that you can make faster hash tables when they never need to be resized, but nobody bothers doing that because it is pretty useless or at best niche.
Well, not to be completely dismissive here... It's clearly a prototype project to try and make quadratic probing a thing.
I'm not convinced this methology is better than linear probing (which then can be optimized easily into RobinHood hashes).
The only line I see about linear hashes is:
> Linear jumps (h, h+16, h+32...) caused 42% insert failure rate due to probe sequence overlap. Quadratic jumps spread groups across the table, ensuring all slots are reachable.
Which just seems entirely erroneous to me. How can linear probing fail? Just keep jumping until you find an open spot. As long as there is at least one open spot, you'll find it in O(n) time because you're just scanning the whole table.
Linear probing has a clustering problem. But IIRC modern CPUs have these things called L1 Cache/locality, meaning scanning all those clusters is stupidly fast in practice.
The comments don't make sense to you because you know what you are talking about, claude does not, and this code was all written by claude.
Hmmm. That makes me sad but it does explain the uneasy feeling I got when reading the GitHub page
Should it be possible in rust?
As far as I understand, hashbrown already does this. Hashbrown is based on Google's SwissTable, and this project references that SwissTable already does this optimisation.
To elaborate, hashbrown uses quadratic-ish probing over groups, each group can store 16 slots on sse2.
https://github.com/rust-lang/hashbrown/blob/master/src/contr...
https://github.com/rust-lang/hashbrown/blob/6efda58a30fe712a...
Is there like a list of canned responses that low-effort commenters just rotate through? Here lemme try:
This would've been 39,000X better if written in mojo.
Does this work in WebAssembly?
Nice to see people focusing on efficiency instead of web/electron bloat.