Train your own speech commands model in 10 minutes, and run it on ESP32 with MicroPython

Рет қаралды 13,371

3 жыл бұрын

This video shows how to train a deep learning speech commands recognition neural network model using your own words, and use the model on ESP32-powered M5StickC, with MicroPython, to control a light switch. The model training and Python coding are done in Tinkerdoodle online environment, with zero installation. It does not use any cloud speech service like Alexa or Google Assistant. Access the tutorial at www.tinkerdoodle.cc/user/junf....

Пікірлер: 86

@alzalame 3 жыл бұрын

Perfect work, well done .

@altitude1039 2 жыл бұрын

OMG This is GREAT! :) Thanks for posting

@tinkerdoodlediy Жыл бұрын

Glad you enjoyed it!

@jomfawad9255 6 ай бұрын

Can you explain how dfrobot voice recognition module does recording and training model on the module itself?

@hummmingbear 3 жыл бұрын

Awesome work, thanks for sharing this. Is it possible to record samples longer than 1-second for multiple word commands?

@tinkerdoodlediy 3 жыл бұрын

Thanks for your interest. The model is trained to take 1-second audio only.

@dans-designs Жыл бұрын

@@tinkerdoodlediy Is it possible to combine multiple models? for example, speak 1 of 5 commands from first model, which then opens 2nd model and waits for next 1 of 5 commands from second model?

@tinkerdoodlediy Жыл бұрын

@@dans-designs it should work if your esp32 has enough memory. Try renaming the second model file as speech_model_2.py, and invoke it as speech_model_2.predict() in the code.

@SA-oj3bo 2 жыл бұрын

Amazing, so simple and yet so powerful! Great tutorial and achievement! If I want to detect 1 specific sound, for example my own dog barking, what other sounds should I upload to the model then? How to get the best results to detect 1 sound only? Thank!

@tinkerdoodlediy 2 жыл бұрын

You can have a try. But since the base model was trained using speech commands, it is probably not a good feature extractor for non-speech sounds. You can start with the dog bark sound as positive sample, and with any other sound as negative sample. If that does not yield good result, I would suggest you to train a model based on audio data set like research.google.com/audioset/. You'll need to use tensorflow.

@SA-oj3bo 2 жыл бұрын

@@tinkerdoodlediy thanks for your advise, I will study it, all new but very interesting!

@saj_zamani 2 жыл бұрын

Hey, thanks for your great work and this video I just made this with ESP32 devkit and an INMP441 and it works :D I just started learning micro python and got some questions... 1.well there must be a limit in uploading the trained model. I mean how big can it be or like how many samples? Does it save into flash (or SPIFFS)? if so it shouldn't be more than around 1.5MB 2.Got a problem with the I2S microphone. the voice captured is clear but too weak and needs to be amplified(digitally in code). I saw some examples in C++ that shifted the data in the buffer to amplify the voice. but since I'm new to python I don't know how do it right now... so I tried a dedicated way to capture some samples (1sec, 16bit, wav ,16000rate) with this gear in C code to make the model and I thought I can upload them for training but I couldn't. Just figured it out that the Upload Samples button needs to be the former samples in .txt. well my only option now is to stay away from the PC microphone and capture some weak data and test it.(cuz idk how to play with the last cell (that java script thing) you made) :D Thanks again and Tips and ideas are welcome :D

@tinkerdoodlediy 2 жыл бұрын

Glad to know the tutorial works for you! For you questions, 1. The trained model is of fixed size, no matter how many audio samples are used in the training. The more audio samples you have, the better model you get. 2. As long as it is clear, the sound volume does not matter in recognition. The speech model takes spectrogram as input, not the raw sound data. If you want to prepare audio samples using your own program and upload to the training page, you can refer to the generated speech_model.py on how the audio samples are captured in MicroPython, and convert that into C/C++.

@aboudezoa Жыл бұрын

How did you wire your INMP441 & initiate I2S in the code ? I'm getting error " TypeError: extra keyword arguments given " at channel type , can you share your code please ?

@dans-designs Жыл бұрын

this is great! will the firmware you compiled work with any esp32? for example i have an esp32-wroom-32e, will it be compatible?

@tinkerdoodlediy Жыл бұрын

It should work for any esp32.

@dans-designs Жыл бұрын

@@tinkerdoodlediy I keep getting this error, any idea how to fix it? type object 'I2S' has no attribute 'RX'

@tinkerdoodlediy Жыл бұрын

It seems your firmware version is different. Can you try flashing the firmware I published in the same folder as the notebook?

@OldManSparkplug 3 жыл бұрын

This is excellent. Is this an opensource project? I'd like to learn how to build this sort of tool.

@tinkerdoodlediy 3 жыл бұрын

Yes. If you want to learn how to build the custom speech commands model, and how to use it in MicroPython, then just follow this video, and the link in the video description has the code and instructions. This article talks about the implementation details, www.hackster.io/tinkerdoodle/deep-learning-speech-commands-recognition-on-esp32-b85c28. The base model training and building MicroPython firmware is a lot harder. I can give another tutorial if more people are interested.

@OldManSparkplug 3 жыл бұрын

@@tinkerdoodlediy very interested.

@nickdaves3467 2 жыл бұрын

@@tinkerdoodlediy me too

@hasibal-ahmed7385 2 жыл бұрын

@@tinkerdoodlediy Interested

@SA-oj3bo 2 жыл бұрын

@@tinkerdoodlediy Great project, yes if you have links to good tutorials they are vry welcome! Thanks!

@cleverdickrick 2 жыл бұрын

Cool stuff. Seems like a wake-word would be useful. It would be a shame to turn off the lights just because I used the word "dark" in a sentence.

@tinkerdoodlediy 2 жыл бұрын

That is doable. You can train the wake word and command words in the same model. Then in the code ignore command words unless wake word has been spoken previously.

@frankvanhooft5849 2 жыл бұрын

Looks amazing - I'm trying to build one. It appears M5Stack may have changed the LCD. When I run your code, I get junk on the LCD. The code does: lcd = m5stickc_lcd.ST7735() But the M5StickC-Plus units now shipping use a ST7789v2 controller IC. How do we support this display? Thanks.

@tinkerdoodlediy 2 жыл бұрын

It seems this might work for you: github.com/russhughes/st7789_mpy. But I don't have a M5StickC-Plus to test. Let me know if you make it work. I'll be more than happy to update the Tinkerdoodle shared notebook to include a section for M5StickC-Plus.

@frankvanhooft5849 2 жыл бұрын

@@tinkerdoodlediy Thanks, I'll try it, but I'm wondering if there's more to it than that. I discovered that if I didn't import / run your m5stickc_lcd.ST7735(), then the microphone didn't work either. I suspect your LCD code might also be doing PMIC initializations or something?

@tinkerdoodlediy 2 жыл бұрын

Indeed I ran into similar issue. There was a thread on M5Stack forum (forgot the link) that talked about the interference between LCD and microphone in M5StickC. I did not write m5stickc_lcd. The code was from github.com/lukasmaximus89/M5Stick-C-Micropython-1.12/blob/master/m5stickc_lcd.py.

@user-ws5id8et2g 7 ай бұрын

Do this module works offline?

@spacecdr 2 жыл бұрын

Well done. I tried it on my M5StickC Plus and it works (some changes just for new display) I'm not using MicroPython, but analyzing your code it seems the "main library" is the "speech-commands-firmware.bin" (i followed your link to compile it on python). Do you know how to use this "engine" on ArduinoIDE/VisualStudio? I suppose i can't "include it"... any alternatives solutions? Let me know please! Thank you ;-)

@spacecdr 2 жыл бұрын

OK, i understood it's a micropython environment + numpy and some tensorflow libraries...

@tinkerdoodlediy 2 жыл бұрын

If you don't use MicroPython, you can compile the model with your C/C++ code. See github.com/majianjia/nnom/tree/master/examples/keyword_spotting for detail. The model training part can be reused. I also have kzbin.info/www/bejne/f4mrh2mul7iCZ5Y if you want to learn more internals of the model training and inference.

@saranyaasuresh5710 Жыл бұрын

Hi Nice demonstration, am using esp32 s3 korvo2 board which has mic embedded on it, how can I give live audio input to it and display on pc.

@tinkerdoodlediy Жыл бұрын

Refer to the sample notebook to read audio input from mic. You can transfer the audio data to PC using WiFi connection.

@saranyaasuresh5710 Жыл бұрын

@@tinkerdoodlediy Thank you, can I transfer data without WiFi. Mostly of the notebook has reading audio data with an external mic

@tinkerdoodlediy Жыл бұрын

I don't have a good way. One thing you can try is to print the audio data in MicroPython to standard output, and capture these data in notebook. But I'm not sure how that works with streaming.

@hazimalias Жыл бұрын

who came after 2 years? Great work. do any update on board ESP32 S3? support wake up word.

@tinkerdoodlediy Жыл бұрын

You can add multiple commands and choose one of the commands to be the wake word. Any option is to train two models, but is harder to implement. I haven't got a chance to test on ESP32 S3. I wish the speech module can be built as an external c module so that it can be loaded dynamically and not need to built into firmware. Memory usage maybe a concern. docs.micropython.org/en/latest/develop/cmodules.html#cmodules

@yasirali9190 3 жыл бұрын

can i use it esp32 module with external microphone?

@tinkerdoodlediy 3 жыл бұрын

Yes external I2S microphone should work. Refer to tinkerdoodle.cc/user/_/notebooks/Shared/Tinkerdoodle/Speech%20Commands%20Model.ipynb for source code.

@jasonhedtke659 Жыл бұрын

Could this in theory be used to make a very basic AI? I want to make a pair of smart glasses and have a "virtual" assistant that can do basic requests like responding to a text, allow me to decline a phone call, music controls, ect You think this could help me achieve that?

@tinkerdoodlediy Жыл бұрын

Yes it is doable. The speech model can only handle 1-second audio at a time. So keep your commands short.

@ebrahemkhalifa3675 9 ай бұрын

how can i run this command "%flash esp32 ~/Shared/Junfeng/speech-commands-firmware.bin" using esp32s3

@tinkerdoodlediy 9 ай бұрын

I haven't tried on esp32 s3 so not sure if it works.

@aatifmohd8678 2 жыл бұрын

Hello, can I use the same example with M5stack-core? It has a built-in mic too! If it is possible, can you make a tutorial video of it?

@tinkerdoodlediy 2 жыл бұрын

It should work. You just need to update the pin number for the mic. I don't have a M5Stack core so cannot verify. Let me know if you make it work!

@aatifmohd8678 2 жыл бұрын

@@tinkerdoodlediy Hello, thank you for replying. Initially, I was having an issue while running the first command. Error--> Failed to flash: Failed to connect to ESP32: Invalid head of packet (0x08). Please disconnect the MCU, reload the page, and retry. Solution: I downloaded the binary file (speech-commands-firmware.bin) and uploaded it through the esptool on the command line (windows). Later, I was able to run the second command "%writefile speech_model.py ~/speech_model.py". Lastly, I can't modify the code much as my device required modules from "m5stack" and other libraries, which is not present in the libraries. I also tried to upload the libraries for M5stack in the modules folder, but it seems like "It is restricted". Please help me so that I can make it work!

@tinkerdoodlediy Жыл бұрын

@@aatifmohd8678 You can upload the libraries to your home folder, and "%writefile .py ~/.py" will work. The module folder is read only.

@sahinahadli8681 9 ай бұрын

thanks, but speech_model library is nowhere. where can we download it?

@tinkerdoodlediy 5 ай бұрын

you'll need to download and flash micropython firmware. see the notebook for instructions.

@SuperSmosh123 10 ай бұрын

There's no connect button on the page. Is it a problem with my browser or is it something else? I tried following a tutorial on installing the firmware and on that page there's the connect button but no progress on installing the firmware even after following the instructions.

@tinkerdoodlediy 10 ай бұрын

Can you share a screenshot of your page?

@SuperSmosh123 10 ай бұрын

@@tinkerdoodlediy Sure. But where would I send it to?

@aboudezoa Жыл бұрын

very nice work ! The firmware doesnt work on ESP32C3 , any recommendations ?

@tinkerdoodlediy Жыл бұрын

You'll new to build a new firmware for ESP32C3 with the speech recognition module. The script to build the firmware is linked in the notebook. When you make it work for ESP32C3, please share the firmware, and I'll publish it on Tinkerdoodle so others can use too!

@aboudezoa Жыл бұрын

@@tinkerdoodlediy yes I feel stupid asking the question and then find the link to your firmware build on colab. Will definitely share the firmware once I test it

@aboudezoa Жыл бұрын

@@tinkerdoodlediy Can you please update the firmware to new version of micropython " Tried and its just not working with me :/ " ? also ESP32C3 micropython doest support I2S

@tinkerdoodlediy Жыл бұрын

@@aboudezoa I don't have ESP32C3 so cannot test. You may want to read the MicroPython documentation on how to build a new firmware. Maybe their instructions changed recently.

@sltechgalaxy1677 2 жыл бұрын

can I make it using INMP 441 I2S microphone module ?? plz reply soon

@tinkerdoodlediy 2 жыл бұрын

Any I2S microphone should work. But I haven't tried anything other than M5StickC yet.

@sltechgalaxy1677 2 жыл бұрын

Ok friend but what pins i connect I2S microphone module plzzz reply

@selahattinbabadag2804 2 жыл бұрын

Add audio Sample button doesnt work ?

@tinkerdoodlediy 2 жыл бұрын

Fixed! Turns out the latest tensorflow.js package was bad. Rolled back to use a previous version.

@elafefy Жыл бұрын

hello , I tried again and again from more than one computer and also using a VPN, but nothing works. I press the start audio recording button, but nothing happens. I hope to fix the problem

@tinkerdoodlediy Жыл бұрын

Can you use Chrome browser? If that still does not work, go to menu "More Tools" -> "Developer Tools", and check if there are any error messages in the dev console.

@elafefy Жыл бұрын

@@tinkerdoodlediy yes i use chrome , and the error is speech-commands.html:392 Uncaught (in promise) ReferenceError: tf is not defined at updatePrediction (speech-commands.html:392:23) at addSample (speech-commands.html:542:5)

@elafefy Жыл бұрын

@@tinkerdoodlediy any update please?

@tinkerdoodlediy Жыл бұрын

It works on my browser. Can you check if you can open cdn.jsdelivr.net/npm/@tensorflow/tfjs@3.9.0 in your browser?

@elafefy Жыл бұрын

@@tinkerdoodlediyThis service is prohibited in my country But I use VPN so CDN service works for me after using VPN but your tool still not working😢

@poojadubey8172 2 жыл бұрын

im not able to run "%flash esp32 ~/Shared/Junfeng/speech-commands-firmware.bin",plz help

@tinkerdoodlediy 2 жыл бұрын

What is the error you got? Make sure you use a M5StickC. The flash functionality is experimental. If it does not work, use esptool to flash the firmware.

@dans-designs Жыл бұрын

@@tinkerdoodlediy can you provide a download link for the firmware please? I'm having trouble finding how to download the .bin file to flash via esptool, thank you

@tinkerdoodlediy Жыл бұрын

@@dans-designs there is no direct download link. You'll need to log in first, and download at tinkerdoodle.cc/user/tinkerdoodle.cc@gmail.com/tree/Shared/Tinkerdoodle. Find the bin file in the list.