Build your own real-time voice command recognition model with TensorFlow

Рет қаралды 60,806

Күн бұрын

Пікірлер: 45

@donahue1187 Жыл бұрын

This is fantastic. I’m a Newbie to Python and neural nets, but your explanations are great and pretty straightforward. Question - what additional steps would I take to run this on my own local device (pi 4)? And what else would I need to do to introduce new commands such as as trigger word and “turn off the lights”? Would I need to create my own audio samples, save them to new folders, and retrain to retrain the model? Thanks for any guidance! (if you couldn’t tell I’m DONE w Google Home latency, recreating my own. Ambitious! Need help!)

@cornpop3340 Жыл бұрын

This is an incredibly helpful video.

@nguyent3465 2 жыл бұрын

The code on TensorFlow website was changed :(

@seanadin386 Жыл бұрын

Can you do a video regarding the newer version? The run interface now has a different code

@tvartalk 8 ай бұрын

😊

@gokhanersoz5239 2 жыл бұрын

Thank you very much for the tranings. But I think there should be a more complex and more advanced voice recognition, voice classification and similar training series if you see fit. You know, trainings on sound are limited.

@Cyka_Blyatus Жыл бұрын

What did you do so the program does not picks up ambient noise or actually works with the commands given? it seems the model lacks ambient noise data sets and whenever ran it only keeps spamming the first command, but yours works perfectly, how to achieve this?

@obi666 3 ай бұрын

I have the same issue, files from datasets works perfectly but when I try to use my mic like in the video or record audio with my mic using pyaudio it gives me the 1st class all the time.

@erickd4816 Жыл бұрын

Good video, excellent explanation, I have a question, can the same program be trained to recognize only a specific voice? if so, could you explain it to me? I would be very grateful.

@clumsycoder1907 Жыл бұрын

its not working for me

@geekyprogrammer4831 2 жыл бұрын

Can you please post building text to speech models from scratch?

@MrIlvis Жыл бұрын

On which Tensorflow version this was made? because Colab uses latest, but older one should work without problems.

@obi666 3 ай бұрын

Colab has unix like OS, you can use commands like pip install, dnf install and etc

@loydvincentbutron4345 10 ай бұрын

is it for english voice only?

@swasthikk3655 Жыл бұрын

Can i get similar for English alphabets

@danielbogemann1598 Жыл бұрын

They changed the Code. Could u you do a quick update?

@TheSaukkio Жыл бұрын

How can it be that in the video it gives nothing with out speaking. While if i run the code from github it predicts random stuff when im not speaking.

@obi666 3 ай бұрын

I have the same problem - works perfectly when using audio files from datasets but doesn't when I make some input using my mic

@obi666 3 ай бұрын

I've managed to fix that problem - here are the fragments of my code: import sounddevice as sd @staticmethod def record_audio(filename: Union[str, None] = None, duration: int = Config.DURATION, fs: int = Config.INPUT_LEN) -> np.array: print(f"Record sound for {duration} seconds...") audio = sd.rec(int(duration * fs), samplerate=fs, channels=1, dtype=np.float32) sd.wait() #sd.play(audio, samplerate=fs) #sd.wait() # print(f"Recording has been completed") # Convert audio to 16-bit PCM format (in the range [-32768, 32767]) audio_pcm = np.int16(audio * 32767) if filename: write(filename, fs, audio_pcm) # print(f"Sounds has been saved to: {filename}") return np.array([]) return audio_pcm def preprocess_mic_data(self, waveform: np.array) -> EagerTensor: waveform = tf.squeeze(waveform, axis=-1) spectrogram = self.get_spectrogram(waveform=waveform) normalized_spectrogram = tf.expand_dims(spectrogram, axis=0) return normalized_spectrogram def get_prediction(self, audio_data: Union[str, os.PathLike, np.array]) -> str: if isinstance(audio_data, str) or isinstance(audio_data, os.PathLike): normalized_spectrogram = self.preprocess_file_data(file_path=audio_data) else: normalized_spectrogram = self.preprocess_mic_data(waveform=audio_data) prediction = self.model(normalized_spectrogram) # print(prediction) classid = np.argmax(prediction) return self.commands[classid] @staticmethod def calculate_rms(signal: np.ndarray) -> float: """Oblicza RMS (Root Mean Square) sygnału.""" return np.sqrt(np.mean(signal ** 2)) def is_voice_present(self, signal: np.ndarray, rms_threshold: float = 20) -> bool: """ Sprawdza, czy w sygnale znajduje się głos, na podstawie RMS. :param signal: Tablica NumPy z nagranym dźwiękiem. :param rms_threshold: Próg RMS, poniżej którego uznaje się, że nie ma mowy. :return: True, jeśli wykryto głos; False w przeciwnym razie. """ rms_value = self.calculate_rms(signal) print(f"Wartość RMS: {rms_value}") return rms_value > rms_threshold # Sprawdź, czy RMS jest większe niż próg while True: res = predictor.record_audio() print(res.shape) voice = predictor.is_voice_present(signal=res) if voice: print(predictor.get_prediction(audio_data=res)) it's not perfect but at least it works

@TheSaukkio 3 ай бұрын

@@obi666 can you send me your code somehow🤔 or contact me so i can get this🤣🤣

Ай бұрын

@@obi666 masz moze jeszcze to rozwiazanie? bo widze ze z polski jestes

@obi666 Ай бұрын

Dałem odp, nwm czy cię oznaczylo

@oxydol3456 11 ай бұрын

This tutorial is great. I find that the key to build accurate model is gathering quality data a lot. And that sounds arduous work. didn't get good result with 200 examples. Edit: I found the model's accuracy is the way poor than I expected. Maybe it's due to the microphone I'm using and it's needed to taken care of before predicting process.

@tankado_ndakota 8 ай бұрын

Got the error: "Could not import the PyAudio C module 'pyaudio._portaudio'." And couldn't find the solution... Macbook M1 Pro

@tankado_ndakota 8 ай бұрын

I saw a note in other video for M1 :) let me try first :D

@tankado_ndakota 8 ай бұрын

i did everything that I found from web. but still i got the error: "symbol not found in flat namespace '_PaMacCore_SetupChannelMap'"

@sanjeetjha9177 Жыл бұрын

Please provide me the model i need argently I am stuck in it

@rediet.f261 2 жыл бұрын

what is sample_file in here 8:38

@clumsycoder1907 Жыл бұрын

same doubt

@arqamrafay Жыл бұрын

exactly, i think their is file of recorded audio

@LukasKofler Жыл бұрын

See the first line at 5:38 🙂

@itsrairamones 2 жыл бұрын

thankyou dude its a hundred percent work for me but after couple minutes it crashed :(

@Yvtq8K3n Жыл бұрын

Its a shame, you cant train your own model.

@threepe0 Жыл бұрын

of course you can

@Yvtq8K3n Жыл бұрын

@@threepe0 The last time I used this, you were unable to create a custom model and use it. Tensorflow provided you with an already trained model (0-1, left, right) and thats exactly what most people use.