16 comments

  • singpolyma3 19 minutes ago ago

    Love this.

    It says MIT license but then readme has a separate section on prohibited use that maybe adds restrictions to make it nonfree? Not sure the legal implications here.

  • dust42 29 minutes ago ago

    Good quality but unfortunately it is single language English only.

    • phoronixrly 25 minutes ago ago

      I echo this. For a TTS system to be in any way useful outside the tiny population of the world that speaks exclusively English, it must be multilingual and dynamically switch between languages pretty much per word.

      Cool tech demo though!

  • lukebechtel 23 minutes ago ago

    Nice!

    Just made it an MCP server so claude can tell me when it's done with something :)

    https://github.com/Marviel/speak_when_done

  • armcat an hour ago ago

    Oh this is sweet, thanks for sharing! I've been a huge fan of Kokoro and event setup my own fully-local voice assistant [1]. Will definitely give Pocket TTS a go!

    [1] https://github.com/acatovic/ova

    • gropo 11 minutes ago ago

      Kokoro is better for tts by far

      For voice cloning, pocket tts is walled so I can't tell

    • amrrs 44 minutes ago ago

      Thanks for sharing your repo..looks super cool.. I'm planning to try out. Is it based on mlx or just hf transformers?

      • armcat 34 minutes ago ago

        Thank you, just transformers.

  • tschellenbach 30 minutes ago ago

    It's cool how lightweight it is. Recently added support to Vision Agents for Pocket. https://github.com/GetStream/Vision-Agents/tree/main/plugins...

  • syntaxing 30 minutes ago ago

    Is there something similar for STT? I’m using whisper distill models and they work ok. Sometimes it gets what I say completely wrong.

  • GaggiX an hour ago ago

    I love that everyone is making their own TTS model as they are not as expensive as many other models to train. Also there are plenty of different architecture.

    Another recent example: https://github.com/supertone-inc/supertonic

  • snvzz an hour ago ago

    Relative to AmigaOS translator.device + narrator.device, this sure seems bloated.