Рет қаралды 39,322
In this episode of 'Deep Learning with PolyAI,' we welcome Shawn Wen, co-founder and CTO at PolyAI. Shawn provides an in-depth overview of the AI tech stack essential for developing high-quality AI voice assistants. Inspired by Andreessen Horowitz's recent publication on AI voice agents, the discussion covers key components of a complex system, including speech recognition, voice activity detection, the application of generative AI models, and the integration of these technologies into practical applications. Shawn also explores the challenges of managing latency, how input affects selected speech recognition models, and the future of end-to-end AI systems. Join us as we unravel the complexities behind creating and optimizing effective voice AI solutions!
00:18 Understanding the AI Tech Stack
00:49 Building and Buying Voice Assistants
02:10 Speech Recognition Challenges
07:29 Voice Activity Detection (VAD)
10:31 Generative AI and Guardrails
15:58 Tooling and Function Calls
22:39 Future of End-to-End Models
#ai #voiceai #texttospeech #asr #aitechnology #deeplearning