Processing Strings 109x Faster Than Nvidia on H100

(ashvardanian.com)

34 points | by samspenc 4 days ago ago

3 comments

ozgrakkurt 19 hours ago ago
Duplicate of https://news.ycombinator.com/item?id=45304807
trilogic 13 hours ago ago
Impressive work anti diagonal DP on CUDA, clean MCUPS framing, and the multi language shipping is legit. The “109× faster than NVIDIA on H100” line is accurate for your chosen case (cuDF/nvtext, long strings), but it’s not a blanket “faster than NVIDIA,” and readers will assume that tighten the scope. Bio results are a good baseline, not SOTA; Hopper’s DPX and WFA style tiling/bucketing would likely move you a tier up. Hashing and 52 bit MinHash are clever, but you need full SMHasher reports and retrieval quality metrics, not just entropy/collisions. Publish exact versions, params, and end to end timings (I/O + marshaling), plus short string vs long string batches. If you add those and rename the headline to reflect the setup, the claims will be hard to poke holes in.
[-]
- klysm 12 hours ago ago
  AI slop meter is off the charts