Impressive work anti diagonal DP on CUDA, clean MCUPS framing, and the multi language shipping is legit. The “109× faster than NVIDIA on H100” line is accurate for your chosen case (cuDF/nvtext, long strings), but it’s not a blanket “faster than NVIDIA,” and readers will assume that tighten the scope. Bio results are a good baseline, not SOTA; Hopper’s DPX and WFA style tiling/bucketing would likely move you a tier up. Hashing and 52 bit MinHash are clever, but you need full SMHasher reports and retrieval quality metrics, not just entropy/collisions. Publish exact versions, params, and end to end timings (I/O + marshaling), plus short string vs long string batches. If you add those and rename the headline to reflect the setup, the claims will be hard to poke holes in.
Duplicate of https://news.ycombinator.com/item?id=45304807
Impressive work anti diagonal DP on CUDA, clean MCUPS framing, and the multi language shipping is legit. The “109× faster than NVIDIA on H100” line is accurate for your chosen case (cuDF/nvtext, long strings), but it’s not a blanket “faster than NVIDIA,” and readers will assume that tighten the scope. Bio results are a good baseline, not SOTA; Hopper’s DPX and WFA style tiling/bucketing would likely move you a tier up. Hashing and 52 bit MinHash are clever, but you need full SMHasher reports and retrieval quality metrics, not just entropy/collisions. Publish exact versions, params, and end to end timings (I/O + marshaling), plus short string vs long string batches. If you add those and rename the headline to reflect the setup, the claims will be hard to poke holes in.
AI slop meter is off the charts