Processing Strings 109x Faster Than Nvidia on H100

(ashvardanian.com)

34 points | by samspenc 4 days ago ago

3 comments

  • ozgrakkurt 19 hours ago ago
  • trilogic 13 hours ago ago

    Impressive work anti diagonal DP on CUDA, clean MCUPS framing, and the multi language shipping is legit. The “109× faster than NVIDIA on H100” line is accurate for your chosen case (cuDF/nvtext, long strings), but it’s not a blanket “faster than NVIDIA,” and readers will assume that tighten the scope. Bio results are a good baseline, not SOTA; Hopper’s DPX and WFA style tiling/bucketing would likely move you a tier up. Hashing and 52 bit MinHash are clever, but you need full SMHasher reports and retrieval quality metrics, not just entropy/collisions. Publish exact versions, params, and end to end timings (I/O + marshaling), plus short string vs long string batches. If you add those and rename the headline to reflect the setup, the claims will be hard to poke holes in.

    • klysm 12 hours ago ago

      AI slop meter is off the charts