Building voice agents with Nvidia open models

(daily.co)

114 points | by kwindla a day ago ago

15 comments

smusamashah 4 hours ago ago
Do any of the top models let you pause and think while speaking? I have to speak non-stop to Gemini assitant and ChatGPT, which is very very useless/unnatural for voice mode. Specially for non-english speakers probably. I sometimes have to think more to translate my thoughts to english.
[-]
- fragmede 3 hours ago ago
  Have you tried talking to ChatGPT in your native tongue? I was blown away by my mother speaking her native tongue to ChatGPT and having it respond in that language. (It's ever so slightly not a mainstream one.)
amelius a day ago ago
I've been using festival under Linux.
https://manpages.ubuntu.com/manpages/trusty/man1/festival.1....
But it is quite old now and pre-dates the DL/AI era.
Does anybody know of a good modern replacement that I can "apt install"?
[-]
- sigmonsays a day ago ago
  I used piper with a model I found online. It's _ALOT_ better than festival afaik. I'm not sure you can apt install it though.
  echo "hello" | piper --model ~/.local/share/piper/en_US-lessac-medium.onnx --output_file - | aplay
  [-]
  - gunalx a day ago ago
    You can in fact apt install piper.
    [-]
    - amelius a day ago ago
      That's a different piper.
      piper - GTK application to configure gaming devices
rickydroll 9 hours ago ago
<pedantic>Voice recognition identifies who you are, speech recognition identifies what you say. </pedantic>
Example:
Voice recognition: arrrrrrgh! (Oh, I know that guy. He always gets irritated when someone uses terms speech and voice recognition wrong)
Speech Recognition: "Why can't you guys keep it straight? It is as simple as knowing the difference between hypothesis and theory."
nowittyusername a day ago ago
This is perfect for me. I just started working on the voice related stuff for my agent framework and this will be of real use. Thanks.
atonse 8 hours ago ago
Can't wait for this to land in MacWhisper. I like the idea of the streaming dictation especially when dictating long prompts to Claude Code.
jjcm a day ago ago
These have gotten good enough to really make command-by-voice interactions pleasant. I'd love to try this with Cursor - just use it fully with voice.
deckar01 18 hours ago ago
It supports Turing T4, but not Ampere…
[-]
- nsbk 13 hours ago ago
  Any ideas on how to add Ampere support? I have a use case in mind that I would love to try on my 3090 rig
  [-]
  - deckar01 4 hours ago ago
    Magpie-TTS needs a kernel compiled targeting Ampere, but it appears to be closed source. It was compiled for the 2018 T4, but not 2020-2024 consumer cards, just 2025 consumer cards.
jauntywundrkind a day ago ago
There's also the excellent also open source unmute.sh. which alas is also Nvidia only at this point. https://unmute.sh/
[-]
- vikboyechko a day ago ago
  The game show is pretty good. Have a feeling this project will consume all my attention this week, thanks for the tip.