Real-time Speech to Text with DeepSpeech - Getting Started on Windows and Transcribe Microphone Free

Рет қаралды 135,808

Federico Terzi

Күн бұрын

Пікірлер: 290

@FedericoTerzi 3 жыл бұрын

If you are interested in these topics, you can also follow me on Twitter :) twitter.com/terzi_federico

@jeongwonkim247 3 жыл бұрын

was there a video on how to transcribe the audio files into text? Please let me know and thank you!

@dayworkhard 4 жыл бұрын

thank you for sharing. i donated my voice there. this is so cool!

@FedericoTerzi 4 жыл бұрын

That's great! :) We are one little step closer to an open voice model

@Moe_Posting_Chad 3 ай бұрын

*WHY THE FUCK IS PYTHON CONSTANT UPDATING BUT ALL OF THE GODDAMN TIME EVERY PROJECT IS INCOMPATIBLE WITH EVERYTHING BUT A BESPOKE PYTHON VERSION???????* Serious question. Why keep fucking updating it? Why do it if its going to cause incompatibility at all ever? Fucking I know its a scripting language and not "programming" but that really doesn't explain the why.

@tommyboy3164 2 жыл бұрын

was wondering if you could help. I'm getting this error: ERROR: Could not find a version that satisfies the requirement deepspeech (from versions: none) Also, where do you put the two model files after you download

@KPawan108 Жыл бұрын

I am also getting the same error. Did you get the answer now?

@marly1017 3 жыл бұрын

can you please do a video about implanting this code to a project please?

@dhanushabuddhikasandaruwan2677 2 жыл бұрын

I think DeepSpeech is not compatible to run on Windows (Windows 10). Try in a Linux environment. pip install deepspeech -----> Did not work successfully on Windows 10 pip3 install deepspeech -----> Did not work successfully on Windows 10 ERROR: Could not find a version that satisfies the requirement deepspeech (from versions: none) ERROR: No matching distribution found for deepspeech

@Steven-jf4cs 2 жыл бұрын

I just ran 'pip3 install deepspeech' from cmd and it ran like a champ. What version of Python are you running? Depending upon your version you may need to install up/down accordingly. I'm running Python V. 3.8.5

@izufarahiyahizzuddin2119 Жыл бұрын

i already run the code, but it cannot recognize my voice, anyone has solution for it

@KuboF 3 жыл бұрын

Thanks for this short, straightforward, to-the-point video! By reading the manual I thought I am going to need to take a vacation to learn just to run DeepSpeach, now I am very confident about doing it quite quickly!

@FedericoTerzi 3 жыл бұрын

Thanks! Running it is pretty easy with the prebuilt model. Things start to get real complex when you want to train your own :)

@KuboF 3 жыл бұрын

@@FedericoTerzi Yeah, using pre-built model is my first step to training my own 😅

@FedericoTerzi 3 жыл бұрын

Good luck! If you succeed, please let me know how hard it was :)

@KuboF 3 жыл бұрын

@@FedericoTerzi I very much hope I could one day 😅

@PatiPatataFra 3 жыл бұрын

Soo have you finished? :))))

@sebastianochipocomancini1853 3 жыл бұрын

What should I do if I want to use an application like this one for another language like spanish?

@stefang5639 3 жыл бұрын

You can download the language model for other languages as well from the source shown in the video.

@vasanthmaisa293 11 ай бұрын

how did you directly get mic_vad_streaming folder inside the deepspeech folder without doing anything

@abdullamasud4278 7 ай бұрын

he cut out that part from the video. After downloading the file, he simply copy pasted it inside the folder

@aznperswazinable 2 жыл бұрын

(deepspeech) C:\Users\user\Documents\deepspeech>pip3 install deepspeech ERROR: Could not find a version that satisfies the requirement deepspeech (from versions: none) ERROR: No matching distribution found for deepspeech pip and pip3 not working on version 3.10 any ideas?

@silversurfer8057 3 жыл бұрын

realy helpful for me (I think your video is the only one on the subject?). in addition to this, a tutorial on mozilla's TTS would actually be great. I would like something more detailed for that. I currently don't understand how to use new datasets to get other voices. i guess you have to train a model with a dataset. a tutorial on this would be really really cool! maybe you have also dealt with it?In any case, deepspeech and tts can theoretically be combined well.

@shampoo888 2 жыл бұрын

File "C:\Users\jose\deepspeech\lib\site-packages\deepspeech\__init__.py", line 23, in from deepspeech.impl import Version as version File "C:\Users\jose\deepspeech\lib\site-packages\deepspeech\impl.py", line 13, in from . import _impl ImportError: DLL load failed: No se puede encontrar el módulo especificado.

@niharjani9611 4 ай бұрын

Heyy, Pls Solve my query , How many languages does it support ? Like english , spannish could you provide a list of it., I tried to find it on Github and reddit, but was unsucesfull !!!

@chaitanyamalpure6226 3 жыл бұрын

Thank you for the video. Nice tutorial to get familiar with!!!!! Also, I have found a german pre-trained model. could you please explain how to work with german or any other pre-trained model.

@FedericoTerzi 3 жыл бұрын

You should be able to simply pass the german model and scorer and you should be ready to go :)

@chaitanyamalpure6226 3 жыл бұрын

@@FedericoTerzi Thanks alot. It worked!!!!!!!!!!!!!!!!!!!!!

@SivaShankarsss 4 жыл бұрын

I was looking for this kind of video.. Currently I am working on creating AI assistant. This will help me a lot

@LukeHildreth 3 жыл бұрын

Is it possible to write these commands into a python file and just run that?

@FedericoTerzi 3 жыл бұрын

Sure! You can simply edit that script file to fit your needs :)

@boomieboo 3 ай бұрын

Fk all this. I just want a speech transcribing ext to download for FF. Simple.

@DOKOTV 4 жыл бұрын

is this only with english langauge?

@FedericoTerzi 4 жыл бұрын

You can search online for other pre-trained models, try to google "Deepspeech model"

@adribmahmud 2 жыл бұрын

can you please make a video how to train ?

@rosarangithalagahawatta6300 2 жыл бұрын

how can i download mic_vad_streaming

@watevakid 4 жыл бұрын

hmmm after I install DeepSpeech into my venv, I do not see "mic_vad_streaming"... any idea on how to install it?

@FedericoTerzi 4 жыл бұрын

You have to download it from the deepspeech examples: github.com/mozilla/DeepSpeech-examples

@khalidelgazzar 4 ай бұрын

Great video. Thank you 😊

@Luc_Skywalker 2 жыл бұрын

ERROR: Cannot install deepspeech==0.9.3 and numpy>=1.15.1 because these package versions have conflicting dependencies. deepspeech 0.9.3 depends on numpy=1.12.0 I am unable to get around this to work, any idea?

@tobiaskarl4939 3 жыл бұрын

1) Python 3.6.5 doesn't work. I updated to 3.6.7 2) activate give an error ... edit activate.bat in Scripts folder and put and '.' after "delims=:" in line 4 then execute Scripts\activate.bat explicitly

@Cezar-on8lb 8 ай бұрын

Hello! How DeepSpeech can be compared with Open AI Whisper?

@FedericoTerzi 7 ай бұрын

No reason not to use Whisper today! It's amazing

@shampoo888 2 жыл бұрын

help Import Error: DLL load failed: no se puede encontrar el modulo especificado

@wellingtonfurtado2074 3 жыл бұрын

Do you can do a tutorial teaching about how use deepspeech in unreal engine?

@CULTURE_dz 4 жыл бұрын

hello i install everything like you but finally the message of missing dll appere .. ImportError: DLL load failed: Une routine d’initialisation d’une bibliothèque de liens dynamiques (DLL) a échoué. can you help me please thanks

@at-ro9217 4 жыл бұрын

same issue here

@FedericoTerzi 4 жыл бұрын

Hey guys, try with these steps: github.com/tensorflow/tensorflow/issues/23683#issuecomment-532522740

@vallu-Tech 6 ай бұрын

Now not install deepspeech

@ragnov3286 2 жыл бұрын

Can you also Integrate deepspeech into a web app with some API? thanks

@FedericoTerzi 2 жыл бұрын

If you're using Chrome or Safari, you might want to check out the Web Speech API, which is much simpler for web apps :) developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

@Monsieur.Nobody. 5 ай бұрын

Do you think we can run whisper or fast whisper llm on esp32's? Sort of in a form factor like the carputer or beepberry?

@samiyeelalim 25 күн бұрын

ERROR: Could not find a version that satisfies the requirement deepspeech (from versions: none) ERROR: No matching distribution found for deepspeech

@FedericoTerzi 24 күн бұрын

Try to use whisper, it's a much better option nowadays

@arpitv2003 13 күн бұрын

@@FedericoTerzi hi, the tutorials i have found on whisper to run locally on my device take input using .wav audio files. Is there a way where I can do real time speech-to-text using whisper ai run locally on my device?

@FedericoTerzi 12 күн бұрын

Try this :) github.com/ggerganov/whisper.cpp They have a "stream" example you could start from

@tobiaskarl4939 3 жыл бұрын

Different numpy versions requirements make it fail for me. deepspeech 0.9.3 numpy 1.14.4 pip 10.0.1 PyAudio 0.2.11 scipy 1.5.4

@balajicmb1132 2 жыл бұрын

Speech to text transcribe open source library using python pycharm an another id Es using method code is available bro?

@egyfirst 2 жыл бұрын

very usuful

@muntazirmehdi503 3 жыл бұрын

why you have a background music for an educational video. weird and disturbing

@ramsimmha8672 3 жыл бұрын

Its Really cool! I tried this its working but its not printing the text which got listened. Is anyone here faced this? Please help me to fix this.

@doodlearsh739 2 жыл бұрын

hi , i cant install requirement.txt with pip . can you help me

@liamblu 3 жыл бұрын

I get stuck at installing the requirements.txt ERROR: Could not find a version that satisfies the requirement deepspeech~=0.8.0 ERROR: No matching distribution found for deepspeech~=0.8.0 Edit: I already downgraded to Python 3.9.0 which is said to be compatible...

@abdulbaqi6170 3 жыл бұрын

There is an article on internet how to make srt files for movies via deepspeech. I can't get that working in the windows can you make a video how to convert audio files into text or srt via deepspeech pls? it would be very useful and increase your video views

@robc3863 3 жыл бұрын

Many users will need to install pipwin and pyaudio, etc. to get this to work by the way.

@sauravprashar 3 жыл бұрын

Mine is still giving me a DLL error

@nasocha1494 2 жыл бұрын

I wish I had read your comment before watching this tutorial, It would have saved me 3 days trying to fix conflicting dependencies. Thank u so much!

@dibu28 2 жыл бұрын

Thank you. Started DeepSpeech in a minutes.

@ahmedsaeed5149 2 жыл бұрын

Thank you thank you thank you

@yohannesayana9456 2 жыл бұрын

How can we build a speech to text model from scratch in other less resourced languages using deepspeech?

@sibyllasystem1209 Жыл бұрын

Hope we could use it in the Windows environment so that I can study foreign languages easily somemday : )

@ALZlper 3 жыл бұрын

I really like, that you mention the platform at the end!

@sayyidumarshiddiq2397 2 жыл бұрын

What should i do if my laptop has installed python 3.8 version

@ritwikghorui2731 3 жыл бұрын

Thank you so much, but if anyone has done this in a python file kindly please share the link. I'm facing some problems kindly please if anyone has done please provide the link. I have a deadline coming up, please help me.

@murtazahussain8224 3 жыл бұрын

Is deepspeech compatible with nvidia Rtx3090 ?

@harshagowda618 3 жыл бұрын

Text to speech?

@drin1drin 2 жыл бұрын

How can I implement an Italian Recognizer?

@FedericoTerzi 2 жыл бұрын

You might prefer Vosk with an italian model for that :) alphacephei.com/vosk/

@GustavAgar 8 ай бұрын

Is this better than whisper? hmmm

@FedericoTerzi 7 ай бұрын

Not at all, use whisper :)

@amrousimen7170 3 жыл бұрын

good video

@Luc_Skywalker 2 жыл бұрын

You know, it would of been nice if you would of mention in your title that this would be in python!

@fashadahmedsiddique8412 2 жыл бұрын

Hey, can it be possible upon using colab environment

@abhignaconscience358 3 жыл бұрын

At 5:04 You told you're going to show nice little project what is it ??

@LukeHildreth 3 жыл бұрын

Got this working on windows! thanks for the tut!

@FedericoTerzi 3 жыл бұрын

Glad to hear that :)

@sauravprashar 3 жыл бұрын

Could you please help me I am getting a DLL error

@LukeHildreth 3 жыл бұрын

@@sauravprashar I'm actually not sure how to answer that. I'm pretty new to programming. Hope you find the answer!

@waquezemerson4863 2 жыл бұрын

Hi can I ask on how I can integrate this to my application? My application is now working on ionic environment is it possible to integrate this one?

@ariefsaferman 2 жыл бұрын

does the vad streaming work outside deepspeech? i wanna use it in another ASR framework

@husein4458 2 жыл бұрын

could i make it in arabic version?

@FedericoTerzi 2 жыл бұрын

Not sure about DeepSpeech, but Vosk (a very good alternative) seems to have an arabic model :) alphacephei.com/vosk/models

@freegsbox 3 жыл бұрын

Awesome!! can it recognize from files too? and how, please?

@FedericoTerzi 3 жыл бұрын

If I'm not mistaken, the script used in the video also accept an argument for wav files :)

@robc3863 3 жыл бұрын

Thanks for the video! Is any guidance on how to integrate DeepSpeech into an application on Windows? I'm sure that would be very useful for developers! Thanks!

@FedericoTerzi 3 жыл бұрын

Hey, If you app is written in Python, the integration would be pretty easy. Otherwise, your best bet is to look at "tensorflow-lite deepspeech", although I don't have any experience with that

@robc3863 3 жыл бұрын

@@FedericoTerzi Hi, thanks but our app is C++, but so far not found any example of binding DeepSpeech to it. We also don't have many clients with nVidia GPUs...

@FedericoTerzi 3 жыл бұрын

Nvidia GPUs are really not needed (as long as you are not training the model on the client's PC), CPU will handle inferring ok for most use-cases. Regarding the lack of examples, I'm sorry about that, probably the recent Mozilla layoffs did not help the project...

@sslaia 3 жыл бұрын

Excellent. If you could make a tutorial on how to train own model. The big players have already done that for well-known languages. In contrary this one could help with neglected languages like mine. So a tutorial on how to train own model in a new language would be very helpful.

@FedericoTerzi 3 жыл бұрын

Thank you! Unfortunately, I don't know the model that well...

@Karma-vf2qu 4 жыл бұрын

Uuu, really good content here! Grandee

@FedericoTerzi 4 жыл бұрын

Thanks :)

@jacobkelley257 3 жыл бұрын

so I followed everything you did. originally started with python 3.7 and it indeed eventually ran int an error trying to install the requirements.txt so I downgraded to 3.6.8. deleted the folder and started over. this time I got everything to work and when i run the mic_vid_streaming.py with the downloaded files, it says "listening..." and whenever I speak it says "Recognized: " but says nothing after that. it clearly is hearing me because it only spits out "Recognized: " when I say something, but then it doesn't print what I said. have any idea what it might be? I'm a begginer to python and coding in general but I was trying to troubleshoot by changing line 194 to text = stream_context to see if my words were somehow in that but it just says "Recognized: " not sure what that means

@FedericoTerzi 3 жыл бұрын

Perhaps it does not hear you loud enough, can you try with another microphone? If I recall correctly, there is a "device" option in the script to specify it

@anujsharma-my5ll 3 жыл бұрын

hello i am a visually impaired person how can i get setup file of mozilla tts for screen reader called NVDA.. is it possible

@FedericoTerzi 3 жыл бұрын

Hey, unfortunately, I don't think the deepspeech project is good enough yet for your needs...

@simgplusnervt4698 2 жыл бұрын

Nice video. Can you make a video about the use in android?

@parinaypanwar2027 3 жыл бұрын

Bro, I am getting error Could not find a version that satisfies the requirement deepspeech

@FedericoTerzi 3 жыл бұрын

Make sure you have python 3.6

@mo9204 2 жыл бұрын

How much work and time does it need to create a library for new language with its own rules which are not in these libraries?

@FedericoTerzi 2 жыл бұрын

A lot of time, effort and computational power :) You might also want to check out Vosk alphacephei.com/vosk/models

@mo9204 2 жыл бұрын

@@FedericoTerzi is there tutorials for creating own model and training?

@at-ro9217 4 жыл бұрын

Everything is fine only that is not working Traceback (most recent call last): File "C:/Users/Administrator/PycharmProjects/Deepspeech/mic_vad_streaming/mic_vad_streaming.py", line 9, in import deepspeech File "C:\ProgramData\Anaconda3\envs\DeepSpeechEnv\lib\site-packages\deepspeech\__init__.py", line 23, in import deepspeech.impl File "C:\ProgramData\Anaconda3\envs\DeepSpeechEnv\lib\site-packages\deepspeech\impl.py", line 13, in from . import _impl ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed. Process finished with exit code 1

@FedericoTerzi 4 жыл бұрын

Hey, are you sure you are using Python 3.6?

@at-ro9217 4 жыл бұрын

@@FedericoTerzi yes, maybe yo make a video about how you had your env setup from bare metal ?

@DaeOh Жыл бұрын

Thanks. I can't find the follow-up video though

@DaeOh Жыл бұрын

Nevermind, I used Whisper for this application!

@imsteven3044 3 жыл бұрын

Why is teh function of the scorer?

@soulkingdom4600 3 жыл бұрын

what is the difference between deep speech and deep speech 2?

@Dumpitzz 4 жыл бұрын

„Scripts\activate“ is not working. I get a error „parameter wrong -850“

@sebastianochipocomancini1853 3 жыл бұрын

Hi! You are using an already pre-trained model to do this speech-to-text application. But what if you want to train this model with another dataset, like for example in spanish or in italian? Which would be the steps to take in order to train the model to recognize speech in another language that isn't english?

@ThesongsIlikeThemost 3 жыл бұрын

hi, you can find already trained model for Spanish, Italian, German, Polish, and French here. gitlab.com/Jaco-Assistant/deepspeech-polyglot

@sebastianochipocomancini1853 3 жыл бұрын

@@ThesongsIlikeThemost Thank you so much, I finally found the spanish model here: drive.google.com/drive/folders/1-3UgQBtzEf8QcH2qc8TJHkUqCBp5BBmO (which is a link that was on the url you sent me). Replacing the .pbmm and the .scorer files in the command line, it works fine for spanish!

@roshanjustin2426 3 жыл бұрын

does it work offline

@FedericoTerzi 3 жыл бұрын

Yes

@istiyakahamedmilon6512 3 жыл бұрын

Can I use it to generate Bengali language?

@explorefoodculture 2 жыл бұрын

Hi Terzi, can this software run on mac? and can it translate movie videos in to any language? thanks in advance!

@potpu 3 жыл бұрын

Hi Federico, thank you for your video. do you know how to integrate Deepspeech into talon?

@lemon3335 Жыл бұрын

How to integrate into UE4

@SivaShankarsss 3 жыл бұрын

How to train with Indian ascent

@sisfabricio 4 ай бұрын

Works on Windows after struggling for a while, many thanks

@shravanhegde2237 3 ай бұрын

what struggles were they ,could u please tell em?i need to set it for my project so it would really be helpful

@byeblogsinc.7655 3 жыл бұрын

very interesting, could you teach me :)

@samuelige9368 3 жыл бұрын

Can you use deepspeech for a diacritic system

@esakkisundar 3 жыл бұрын

@Federico Terzi, Im from India. It is not recognizing Indian English accent. Any thoughts on how to get Deepspeech recognize

@FedericoTerzi 3 жыл бұрын

Unfortunately, there's not much we can do about it. That model was trained on American english, so it struggles with other accents

@purushothaman2783 3 жыл бұрын

python api of deepspeech

@PatiPatataFra 3 жыл бұрын

I'm working on a college project and I need to make the speech-to-text in my language. Any idea how to use deepspeech in Romanian? I saw the language is available

@mozes_ma 3 жыл бұрын

Hey, similar challenge here, any ideas so far?

@weweweqeqeqe3240 2 жыл бұрын

can this use for movies ?

@bouchradahamni9881 4 жыл бұрын

very nice . plz make a video of how you train your own model

@purushothaman2783 3 жыл бұрын

please put how to use as python api

@jaun-pierrevermeulen6978 3 жыл бұрын

it is not allowing me to download deepspeech through pip

@FedericoTerzi 3 жыл бұрын

Sorry to hear that! Make sure you are running python 3.6

@droidsons1371 3 жыл бұрын

NIce Tutorial..! So I have a custom trained Language model which has (.model) extenstion, how to I convert it into .scorer file?

@FedericoTerzi 3 жыл бұрын

Thanks! They are two different things, you can't convert one into the other :)

@mouradtoumi7296 3 жыл бұрын

I have no skills in Python, I'm trying to read from wav file instead of mic and display metadata, I tried -f arg but didn't work :( any help ?

@tamgaming9861 3 жыл бұрын

I havent got it to work because i cant install python3.6, my python is already higher. But what i read is that you need a special version of wav-format. I mean to remember it was 8 bit, and mono and 16khz but not sure. MP3 does not work so far. There are some softwares who can translate from mp3 to wav online. Hope it helps.

@LukeHildreth 3 жыл бұрын

dumb question, will this work on macOS?

@FedericoTerzi 3 жыл бұрын

As long as you have the required libraries, it should :) (though I did not try it myself)

@LukeHildreth 3 жыл бұрын

Federico Terzi sweet, I think I’m close to getting it working. I was getting an error with some syntax inside one of the voice library training files.

@TTTrouble 3 жыл бұрын

Thanks so much for making this video, it was exactly what I was looking for!

@aman-hl9re 3 жыл бұрын

What about another language? does it generate text too?

@FedericoTerzi 3 жыл бұрын

Yes, it can also do other languages, but you have to find the right model. A good starting point is googling "deep speech spanish model" (replacing spanish with your language of choice) :)