transcription and speaker identification OpenAI-Whisper and Pyannote [Python]

  Рет қаралды 16,439

Mastering Python

Mastering Python

Күн бұрын

Hello guys, in this video I will how you how to transcribe and identify the speaker by using OpenAI Whisper, Pyannote and Pydub .
For Pyannote you must register on huggingface website to get the access token.
Support me by subscribing to my channel and leave a like.
Github repository for the source code :
github.com/Mas...
OpenAi github link :
github.com/ope...
Pyannote github link :
github.com/pya...
Pydub github link :
github.com/jia...
#openai
#openai_whisper
#pyannote
#pydub
#python
#speaker_identification
#transcription
#diarization

Пікірлер: 38
@bhuvneshsaini93
@bhuvneshsaini93 2 ай бұрын
Please provide requirement.txt, else its really very hard to make it workable.
@Yacine_zaki_abderrazzak
@Yacine_zaki_abderrazzak Жыл бұрын
Thanks man, you deserve the best
@hrishikeshnamboothiri.v.n2195
@hrishikeshnamboothiri.v.n2195 Жыл бұрын
try to include its requirements.txt also... Thanks
@userrjlyj5760g
@userrjlyj5760g 10 ай бұрын
ما شاء الله تبارك أخ محمد .... شكراً لك
@chungrandy780
@chungrandy780 7 ай бұрын
Is there a colab version?
@ryanschwartz3340
@ryanschwartz3340 Жыл бұрын
nice video. Is the repo hard-coded to your directory structure? when I tried to change it, it said the format wasn't recognized
@masteringpython
@masteringpython Жыл бұрын
do you mean segment file ?
@sakibzaman7719
@sakibzaman7719 4 күн бұрын
is it working on any other language?
@Hirotodoroki
@Hirotodoroki Жыл бұрын
trying to run this but getting File contains data in an unknown format. tried several files and tried a wav file too, but no luck
@masteringpython
@masteringpython Жыл бұрын
I advise you to use python anaconda to create development environment .Then install whisper openai ,after installing this library run a simple test to check if everything works correctly .Then install pyannote library and also run a simple test ( read carefully the installation guides maybe you missed something while installing the library)
@nadeembaig5943
@nadeembaig5943 4 ай бұрын
@Hirotodoroki were you able to resolve the error (File Contains data in Unknown Format)?
@ThePikkutyyppi
@ThePikkutyyppi Жыл бұрын
can i use this program to split speakers to their own files? or is this only for transcription
@masteringpython
@masteringpython Жыл бұрын
read more about pyannote to see how to split speakers
@ThePikkutyyppi
@ThePikkutyyppi Жыл бұрын
@@masteringpython What? Where?
@EhsanEslahchi
@EhsanEslahchi Жыл бұрын
does this model work on languages other than English?
@masteringpython
@masteringpython Жыл бұрын
onely english
@PaweDuzy
@PaweDuzy 8 ай бұрын
@@masteringpython Only english? What is I change model = whisper.load_model("small.en") to "small"? Acording to Whisper github documentation.
@leoncezammit2502
@leoncezammit2502 10 ай бұрын
Im really struggling to get this working, would i be able you to send you my output log ?
@enriqueleonmacias249
@enriqueleonmacias249 Жыл бұрын
Wow, the transcript takes like two times the duration of the file to process. I guess that this solution wouldn’t work to monitor hours of call recordings unless you use gpu servers.
@masteringpython
@masteringpython Жыл бұрын
it is recomended to use cuda ( nvidia gpu ) for speed cpu is very slow
@ghulamshabbir9532
@ghulamshabbir9532 Жыл бұрын
do this work offline ?
@masteringpython
@masteringpython Жыл бұрын
yes
@kmillanr
@kmillanr 4 ай бұрын
no code in video
@lawrencemedina5593
@lawrencemedina5593 Жыл бұрын
conda activate open_chatting does not work on my computer. "EnvironmentNameNotFound: Could not find conda environment: open_chatting You can list all discoverable environments with `conda info --envs`."
@masteringpython
@masteringpython Жыл бұрын
install conda toolkit then create an environment called open_chatting by typing : conda create --name open_chatting after that install the libraries that i mentioned in the video then run the code
@JasminePlows-r4y
@JasminePlows-r4y Жыл бұрын
Thanks for the demo. I am getting the following error, even while using your audio.mp3 file: end = int(millisec(j[3])) return (int)((int(spl[0]) * 60 * 60 + int(spl[1]) * 60 + float(spl[2])) * 1000) ValueError: invalid literal for int() with base 10: ''
@JasminePlows-r4y
@JasminePlows-r4y Жыл бұрын
@mamido mami Yes, I did that, still getting the same error
@auflute
@auflute Жыл бұрын
same problem
@lunarl-l1k
@lunarl-l1k Жыл бұрын
same problem
@jbatista2008
@jbatista2008 Жыл бұрын
From the error message and the code, it seems that the error is happening because the millisec function is trying to convert an empty string to an integer. The millisec function splits a time string, given in the format "hh:mm:ss.sss", into hours, minutes, and seconds, and then converts these components to milliseconds. Here is an example of the string being parsed: ['[', '00:00:00.998', '-->', '', '00:00:20.622]', 'G', 'SPEAKER_01'] When this loop runs, it returns an empty 'end' string: for l in range(len(k)): j = k[l].split(" ") start = int(millisec(j[1])) end = int(millisec(j[3])) The array position you want for 'end' is 4, not 3. Plus, it has a ']' symbol, so it must be cleaned up: for l in range(len(k)): j = k[l].split(" ") start = int(millisec(j[1].rstrip(']'))) # remove trailing ']' end = int(millisec(j[4].rstrip(']'))) # remove trailing ']'
@WhiteShark010
@WhiteShark010 4 ай бұрын
You have chance.
@patoyrigoyen
@patoyrigoyen Жыл бұрын
Does this need GPU?
@masteringpython
@masteringpython Жыл бұрын
in this video i did not used GPU, but if you want to use it read the pyannote documentation
@bootneck2222
@bootneck2222 Жыл бұрын
Great video. Thank you. Can the output be displayed on screen whilst it is processing?
@ApparaoMulpuri-d6m
@ApparaoMulpuri-d6m 10 ай бұрын
Hi, Thanks for the Video. Need approach on how we can implement the solution with the large Audio with duration of 3 hours.
@KamilKaczmarekSolutions
@KamilKaczmarekSolutions 10 ай бұрын
chunks
@KamilKaczmarekSolutions
@KamilKaczmarekSolutions 10 ай бұрын
chunks and saving .txt from these chunks in files, add logic to see what chunks it already has (if you face error or sth, and you want to come back and don't have to start over, just continue where it left off)
Multi Speaker Transcription with Speaker IDs with Local Whisper
14:56
Prompt Engineering
Рет қаралды 33 М.
Стойкость Фёдора поразила всех!
00:58
МИНУС БАЛЛ
Рет қаралды 2,5 МЛН
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 2 МЛН
The joker favorite#joker  #shorts
00:15
Untitled Joker
Рет қаралды 30 МЛН
How to Install & Use Whisper AI Voice to Text
12:44
Kevin Stratvert
Рет қаралды 481 М.
I Built a Personal Speech Recognition System for my AI Assistant
16:32
Speaker Recognition Using Machine Learning
13:59
Electrical Engineering Sukkur IBA University
Рет қаралды 671
Speaker diarization -- Herve Bredin -- JSALT 2023
1:18:56
Center for Language & Speech Processing(CLSP), JHU
Рет қаралды 6 М.
OpenAI’s New ChatGPT: 7 Incredible Capabilities!
6:27
Two Minute Papers
Рет қаралды 188 М.
Understand Ollama and LangChain Chat History in 10 minutes
11:30
Fast and Simple Development
Рет қаралды 3,5 М.
Nobody Cares About Your Coding Projects
11:02
Tariq10x
Рет қаралды 109 М.
Стойкость Фёдора поразила всех!
00:58
МИНУС БАЛЛ
Рет қаралды 2,5 МЛН