5 comments

  • jjcm 4 hours ago ago

    Looks like there is some quality reduction, but nonetheless 2s to generate a 5s video on a 5090 for WAN 2.1 is absolutely crazy. Excited to see more optimizations like this moving into 2026.

    • villgax 4 hours ago ago

      That’s not the actual time if you run it, encoding and decoding is extra

      • Lerc 22 minutes ago ago

        Nevertheless it does seem that generating will fairly soon become fast enough to extend a video clip in realtime. Autoregressive by the second. Integrated with a multi modal input model you would be very close to an AI avatar that would be extremely compelling.

  • redundantly 2 hours ago ago

    Now if someone could release an optimization like this for the M4 Max I would be so happy. Last time I tried generating a video it was something like an hour for a 480p 5-second clip.

  • villgax 4 hours ago ago

    I mean the baselines were deliberately worse and not how someone would be using these to begin with maybe noobs and the quoted number is only for DIT steps not for other encoding and decoding steps, which is actually quite high still. No actual use of FA4/Cutlass based kernels nor TRT at any point.