Enhanced Speech Endpoint Detection with Fine-Tuned AI

Рет қаралды 110

Linguflex

Күн бұрын

Пікірлер: 5

@daviruela2055 4 күн бұрын

FYI I am a huge fan of your work. Thank you!

@42ndMoose 4 күн бұрын

0:38 regarding the mention of DistilBERT and how it omitted a chunk due to it, will there potentially be a way where we could input new words for 0-shot finetuning? 2:25 i noticed that the model decided to omit the word "like". this is very good for voice agents, but there are some use cases where it is necessary to include certain misspoken words (needed for an accurate transcript or additional nuance). so i suggest a threshold slider, perhaps? will it ever be able to focus on 1 speaker, if another speaker talks over the 1st one? or can it take account of the 2nd speaker at the same time? or will it only assume 1 speaker at all times?

@Linguflex 4 күн бұрын

Good observations. At 0:38 it seems like the sentence got detected too quickly. I needs some deeper analysis for those edge cases to figure out why the algorithm cuts into the sentence. You can fine-tune Whisper to handle new words and then use the fine-tuned models directly with RealtimeSTT: huggingface.co/blog/fine-tune-whisper For 2:25, this looks like a Whisper quirk. Whisper tends to remove filler words like "ah" or repetitions. It might improve if I tweak the initial_prompt behavior to use it only for the real-time model but removing it from the final transcription. Whisper tends to lose some accuracy when prompted, but in this case, it’s necessary for real-time processing. Unfortunately, there’s no simple parameter to make Whisper more accurate regarding filler words. If it’s critical it's needed switch to a different ASR model, there are some that deliver precise transcriptions even on filler words, though that brings its own set of pros and cons. Regarding speaker handling, Whisper doesn’t support speaker diarization natively. It always assumes a single speaker. Real-time speaker diarization is a whole other rabbit hole. Not impossible, but very complex to pull off.