Andrew Seagaves, VP of Research at Deepgram

Andrew Seagaves, VP of Research at Deepgram | AIMinds

Рет қаралды 59

Күн бұрын

In this episode we are joined by our very own Andrew Seagaves, VP of Research at Deepgram, explores text-to-speech (TTS) technology and language modeling. With a PhD from MIT and a background in AI-driven explosive design, Andrew now leads advanced speech recognition research. He discusses the challenges of creating natural-sounding TTS systems, the role of context conditioning, and his career journey from MIT to Deepgram.
Episode Highlights:
- Andrew Seagaves shares his insights on why language modeling poses such a complex challenge, particularly in the domain of text-to-speech systems.
- Seagaves discusses how future developments promise to address these issues dramatically.
- From his initial steps at Deepgram working on speech recognition and diarization, to his current focus on scaling models for varied languages and contexts-discover Andrew Seagaves' transformative journey in AI.
- Andrew’s fascinating career trajectory, from designing defense technologies at MIT to spearheading voice technology innovations used by global leaders like Spotify and NASA.
- Demetrios and Seagaves express excitement for the near future of TTS technology, hinting at groundbreaking features that will redefine our interaction with digital devices.
-------------------------------------------------------------
Connect with Andrew Seagaves
/ seagravesan
Connect with Demetrios:
/ dpbrinkm
Connect with Deepgram:
deepgram.com/
/ deepgram
x.com/deepgramai

Пікірлер: 5

@lets-talk-ai 17 күн бұрын

Loved it!

@mgevirtz 20 күн бұрын

Best AI show I've seen to date. The STT as a information discarder is a great observation.

@lets-talk-ai 17 күн бұрын

Trueee

@mgevirtz 20 күн бұрын

How will the asymmetry between languages in training data affect AI and its business uses in the 5-10 year time scale?

@scott_stephenson 19 күн бұрын

The biggest change will be the ability to generate expressive data in any language (via audio generation, the most constrained version of that being TTS), as a way to produce massive scale datasets for any language.