Build Your Own Voice-Controlled Robot with ESP32 & TensorFlow Lite

  Рет қаралды 32,753

atomic14

atomic14

Күн бұрын

We're building a voice-controlled robot!
In this video, I show how you can build a voice-controlled robot using the ESP32 and TensorFlow Lite.
We'll train up a neural network to recognise the basic command words "Left", "Right", "Forward" and "Backward".
We'll then run this model on the ESP32 using TensorFlow Lite and drive a little robot around.
All the code is in GitHub here - github.com/atomic14/voice-con...
Let me know if you try it out in the comments - I'd love to know if you get it working and build a robot!
Components:
INMP441 I2S Microphone: amzn.to/3cicuiv
ICS-43434 I2S Microphone: www.tindie.com/products/21519/
ESP32 Dev board: amzn.to/3gb6fyc
Cotinuous 360 degree Servo: amzn.to/3imwx35
---
Want to help support the channel? I'm accepting coffee on ko-fi.com/atomic14

Пікірлер: 147
@atomic14
@atomic14 3 жыл бұрын
Interested in ESP32 Audio: kzbin.info/aero/PL5vDt5AALlRfGVUv2x7riDMIOX34udtKD Looking for all my ESP32 projects: kzbin.info/aero/PL5vDt5AALlRdN2KyL30l8j7kLCxhDUrNw
@ynssnp8788
@ynssnp8788 Жыл бұрын
M u7 üzü ui
@pascallongpre7797
@pascallongpre7797 3 жыл бұрын
I've been playing with your alexa code for wake word detection for the last couple of weeks. What you did is really impressive. Your code works and it is very clear! I'm still trying to train the model to recognize a wake word of my own that is not in Google's list. Not there yet but getting close. For those who want to speed the training up, Google colab (paying version, 14$/month) is really worth it. It brought down training time from 2 hours to a few minutes (30 epochs) That's again and keep going, you've got a new fan here!
@atomic14
@atomic14 3 жыл бұрын
That's fantastic - Colab is a really amazing service. For this one I did have to spin up a GPU instance in the cloud - my machine kept running out of memory as the datasets are getting really big. I think there might be a better way of augmenting the data so it's done on the fly rather than generating it all up front.
@Regimantas_Baublys
@Regimantas_Baublys 3 жыл бұрын
Nice demonstration. Thanks :)
@atomic14
@atomic14 3 жыл бұрын
Thanks! It was great fun making it - and shouting at the disobedient robot...
@TheDiverJim
@TheDiverJim 3 жыл бұрын
I love this series!
@atomic14
@atomic14 3 жыл бұрын
Crazy projects! What's not to like :)
@haoyuren7596
@haoyuren7596 3 жыл бұрын
Amazing!
@jjaannnniikk
@jjaannnniikk 2 жыл бұрын
First of all, great work @atomic! Your videos are great. We are trying to build a similar system as your robot. We were able to train a model that detects 3 different command words, export the model to C-code and run a prediction on the basis of data that we processed in Matlab and fed to the C-code as a hard-coded input array. For this, we are using the eloquent tinyml library, which is pretty straightforward to use. We now try to feed 16000 samples to the ESP32 (again hard-coded and extracted from the .wav files of the Google dataset, not from our mic). Uploading the data and process it to generate the input of the neural network on the ESP32 works fine. The processed input array is the same as the one we generated in Matlab. However, if we now additionally upload our trained Model together with our raw data array of 16000 samples to perform the prediction, we run out of memory (dram0_0_seg). We don't really know how to deal with this, since we need to have 16000 samples available at the end to perform our scaling operation, right? Is there a way to avoid the overflow of dram0_0_seg? How did you manage to store one second of data (16000) and scale it without overflowing? We tried to go through your code, but unfortunately, it didn't help to answer the questions. Maybe you can give us a little hint here? Thanks, and keep it up! Greetings from Germany :)
@atomic14
@atomic14 2 жыл бұрын
Hi, thanks for the kind words. Are you trying to create a static variable to hold the 16000 samples? e.g. float samples[16000]; ? What you should do instead of this is allocate it on the heap. If your device supports PSRAM then you should definitely enable that and allocate it PSRAM. Or do you need some hardcoded samples to test the model against to see if it works on the ESP32? How many bytes does your model take up? If you want you can contact me via email on the about page of the channel.
@yaiirable
@yaiirable 3 жыл бұрын
Looks awesome! Subbed
@atomic14
@atomic14 3 жыл бұрын
Thanks!
@edgull_tlt
@edgull_tlt 2 жыл бұрын
Спасибо за видео. Было интересно.
@owolabidaniel
@owolabidaniel Жыл бұрын
Hello, please, in the config.h file the pins for the microphone were chosen but I didn't see where the motors were configured... Can I get a guide to what I am missing
@muchittt
@muchittt Жыл бұрын
Great project sir! does it have to be connected to the internet for voice commands?
@ZaheerAhmed-rb2be
@ZaheerAhmed-rb2be 2 жыл бұрын
Hi, Sir, I can't find tf_lite notebook code in your repo. Just want to know how you implemented spectogram in esp32. want to know sfft more precisely..
@ayanmalik926
@ayanmalik926 3 жыл бұрын
Hello can we captured image .. and do face detection , object detection doing with alexa
@safa4321
@safa4321 2 жыл бұрын
Hi Atomic14 Thanks for the video! I have a question: how did you create the audio file? Did you record 1000 voices for each word or is there a program for that? Thanks in advance
@atomic14
@atomic14 2 жыл бұрын
I used a public data set from Google - it's not great quality - but it's usable: ai.googleblog.com/2017/08/launching-speech-commands-dataset.html
@RG-jc1uq
@RG-jc1uq 16 күн бұрын
Hello. I compile this project and uploaded to esp32 ،in terminal see backtrace error?!! What is my mistake?
@NAVAP_IAS
@NAVAP_IAS 3 жыл бұрын
Thanks man😇😇😇. It will be useful for my academic Project. Can I use it? With your permission. Thanks again. Keep going.
@atomic14
@atomic14 3 жыл бұрын
Definitely! Report back on how well it works. If you make any improvements then make a pull request.
@NAVAP_IAS
@NAVAP_IAS 3 жыл бұрын
@@atomic14 I will write you after finishing up. Thank you 😇😇😇
@RPhulNewsTrue
@RPhulNewsTrue 3 жыл бұрын
@@NAVAP_IAS is your project is completed
@5VoltChannel
@5VoltChannel 3 жыл бұрын
Luar biasa.
@akashsarkar3851
@akashsarkar3851 2 жыл бұрын
Great project sir.....Sir if I want not to stop the robot untill I give next command then how to do.e.g, I say 'forward' and it will go forward and will not stop then if next I say 'turn right' then it will turn right without stopping....
@Rumade
@Rumade 3 жыл бұрын
Now we just need to fit it with a tea machine...
@atomic14
@atomic14 3 жыл бұрын
You can use it to chase your naughty chickens round the garden kzbin.info/www/bejne/aKPaoIialNtkmKc
@Rumade
@Rumade 3 жыл бұрын
@@atomic14 I needed it today! They were being terrors again!
@saydiy1528
@saydiy1528 11 ай бұрын
good job. do you have a Arduino IDE version? Thanks.
@user-ii7jb2kz4b
@user-ii7jb2kz4b Жыл бұрын
Очень круто
@gaucho804
@gaucho804 3 жыл бұрын
Hi, Awesome vid. Is it possible to get this work with MAX9814? Using the firmware that you shown in the last video named "ESP32 Audio Input Using I2S and Internal ADC" ??
@atomic14
@atomic14 3 жыл бұрын
Thanks! Definitely possible - the code should support both I2S mics and the ADC as well.
@jomfawad9255
@jomfawad9255 6 ай бұрын
How many Kb was the whole code when you uploaded to esp32?
@nektarioskourakis9117
@nektarioskourakis9117 2 жыл бұрын
Great job!THANKS.sorry for the irrelevant question regarding the topic. I want to give exactly specific rpm to a robot - a car.WITH arduino i do it with PWM common way.I am not pleased with the result -i want to succed 99% same rpm to both wheels-.Please suggested a way to do it with esp32. i)DAC OR LEDwrite ...? ii)motor driver? it worth trying with esp32?
@atomic14
@atomic14 2 жыл бұрын
Without some kind of feedback telling you how fast the wheels are rotating this is going to be quite difficult - especially with cheap servos. You could use stepper motors which can be controlled very accurately.
@anlpereira
@anlpereira 2 жыл бұрын
Hi, I have a question. I see when you generate de C array of tflite. From this file I would like to extract all the weights and bias of the NN. Is it possible? How to interpret this array? Thank you very much.
@atomic14
@atomic14 2 жыл бұрын
It's probably easier to get these directly from tensorflow - github.com/google/prettytensor/issues/6
@sirriz8876
@sirriz8876 8 ай бұрын
Sorry, i am still newbie to applied this code. First for all thanks with your kindness for sharing this knowledge. I just wanna ask this is already can be used or need to modify? and how to run all code in platform io using visual studio code? Should I run in one folder or only certain file. I just want to study more about tflite. Freshie here
@whatever292
@whatever292 4 ай бұрын
hello, Did you find out?
@sanchaykumar9903
@sanchaykumar9903 4 ай бұрын
Sir being honest , i tried a month and used your code as it is but my machine wasn't able to catch a single word , i don't know why but it doesn't work please help , is it because of traning as i am indian and 19 hence voice mis match or what
@tektronix475
@tektronix475 3 жыл бұрын
wow, does it get to b connected to the cloud, or it runs off-line inside the esp-32?.Thanks.
@atomic14
@atomic14 3 жыл бұрын
It's all running on the ESP32
@rsdosev
@rsdosev 3 жыл бұрын
@atomic great work, mate! I have just found it, searching for way to automate my coffee machine. I am planning to make it start/stop with voice commands, start remotely and display some stats like boiler temp and brew timer on external display. Do you think the ESP32 will be powerful enough to listen for voice commands, run http server and power a display? I can use pi zero but really don't want to run whole linux distro for this
@atomic14
@atomic14 3 жыл бұрын
I think it's probably doable. You might also want to take a look at the DIY Alexa project which would give you a bit more power on the command side: kzbin.info/www/bejne/qJaQlYaMlMZjqq8. It runs wake word detection locally and then hands off the command interpretation to an external service. So you could say "Make me a coffee". Using the robot project you might get a few random coffees coming out of the machine as it does have a bit of a mind of its own... I'm going to have another look at the TensorFlow microspeech examples soon. They are now available on the ESP32 and will give better results.
@rsdosev
@rsdosev 3 жыл бұрын
Thank you for the reply! :)
@user-rn3tr6pv3t
@user-rn3tr6pv3t 3 жыл бұрын
Hi, it was a very good video. What is the name of the microphone module?
@atomic14
@atomic14 3 жыл бұрын
I used an ICS43434 (which I have for sale on eBay and Tindie - www.tindie.com/products/21519/) but any microphone will do the job - the INMP441 is very good.
@kezbankozan1276
@kezbankozan1276 2 жыл бұрын
Hello, can it detect in different tones? Can you only match our own tone in the dataset? In short, can it detect the voices of different people?
@atomic14
@atomic14 2 жыл бұрын
I’m theory it should be able to work with anyones voice. The training data was collected from a large number of people.
@PaulS648
@PaulS648 3 жыл бұрын
Hello, There is a tensorflow package update and your model training have an error : "module 'tensorflow_io.core.python.api.experimental' has no attribute 'audio'" What do you recommand to do ? Thanks
@PaulS648
@PaulS648 3 жыл бұрын
Removed .experimental did the job with the new update
@shufnagl
@shufnagl 3 жыл бұрын
Hi, another great project!! I ported it easily to AtomEcho (without a robot car) because everything (I2S Mic/Speaker) is included.After some testing, it seems that your Mic from Tindie is better (more sensible) then the Atom Mic. Would it be possible to use more then one Mic like the Alexa Device have? What do you think about using the additional 4/8 MB PSRAM of a "ESP32 Wrover"..would it make possible to have more Wake Words? Thanks and looking forward for the next amazing project from you.
@atomic14
@atomic14 3 жыл бұрын
I think so yes, at the moment the model gets baked into the code which means there are quite tight limits on the size you can have. As far as I know, there's no reason why the model could not be loaded from SPIFFS into RAM and used from there - though I have not tried that yet. The only other limit is how long the model and audio processing takes to run. There is currently some scope for optimising the audio processing - we're using floating-point for the FFT and could switch that to fixed point. Definitely worth some investigation.
@shufnagl
@shufnagl 3 жыл бұрын
@@atomic14 Hi, correct me if I'm wrong...from my point of view SPIFFS is a simple Filesystem located at the 4MB SPI flash? But PSRAM is different we can use it like RAM (with ps_malloc(...)) but you will need a ESP32 Wrover Model. The PSRAM is 8 MB and we could use 4 MB. And PSRAM is fast (compared with SPIFFS). But this (PSRAM) are future steps for me...the next steps is to deep dive into Tensorflow. :-)
@atomic14
@atomic14 3 жыл бұрын
Yes, I need to get hold a wrover module to do some experiments. Good luck with TensorFlow!
@shufnagl
@shufnagl 3 жыл бұрын
@@atomic14 Hi, yes the Wrover is very interesting, the cheapest is the ESP32-Cam. But there are some problems with older ESP32 Wrover Chips and PSRAM. ESP32 revision 3 (“ECO V3”) fixes the PSRAM cache issue found in rev. 1. (O-Ton Espressif). Please add "-DBOARD_HAS_PSRAM and -mfix-esp32-psram-cache-issue" inside platformio if you have an older ESP32 Version. It seems that Tensorflow could use the PSRAM... Thx
@amirmahdisoltani1
@amirmahdisoltani1 3 жыл бұрын
Hi Your channel and your tutorials are amazing! i have a "GY-SPH0645" mems microphone. Can I use it for this project??!
@atomic14
@atomic14 3 жыл бұрын
Yes, there are some weird things with the SPH0645, but it should work. There's a flag when you create the I2SMicSampler in the main.cpp that you need to pass which should make it work. Change this line in main.cpp from: I2SSampler *i2s_sampler = new I2SMicSampler(i2s_mic_pins, false); to I2SSampler *i2s_sampler = new I2SMicSampler(i2s_mic_pins, true);
@amirmahdisoltani1
@amirmahdisoltani1 3 жыл бұрын
@@atomic14 thanks!
@faezu2959
@faezu2959 Жыл бұрын
hi sir would u tell me what software name it is? Name 3D design software
@dijitalhobim
@dijitalhobim Жыл бұрын
Amca oğlu. Bunu wifi veya bluetooth olmadan sadece esp 32 ile yapabilirmiyiz?
@yasirali9190
@yasirali9190 3 жыл бұрын
Can you guide me how to train the model with different sounds?
@atomic14
@atomic14 3 жыл бұрын
Your biggest challenge will be collecting enough examples of the sounds you want to train it again. Each sample should be 1 second long and sampled at 16KHz.
@mayurhegde6903
@mayurhegde6903 Жыл бұрын
which files did you upload to the arduino?
@kind3rthen00b
@kind3rthen00b 3 жыл бұрын
Wouldn't the extra 4M of PSRAM on the WROVER module help ? Or is it too slow for the purpose ?
@atomic14
@atomic14 3 жыл бұрын
Definitely very useful - especially for doing things like HTTPS and buffering audio. You pretty quickly run out of memory when you want to hold a few seconds of audio in RAM.
@kind3rthen00b
@kind3rthen00b 3 жыл бұрын
@@atomic14HTTPS's motto should be: "But why is the RAM gone ?" :D I'm building a generic WiFI to BLE gateway, so WiFi, BLE, HTTPS, Websockets, mDNS, some AES plus the Arduino overhead ... a very tight fit but I managed to make it work without using PSRAM.
@dexnug
@dexnug 2 жыл бұрын
For the feature extraction, why don't you use MFCC or MFE instead of just using spectrogram? btw is it spectrogram or mel spectrogram?
@atomic14
@atomic14 2 жыл бұрын
I went for a very simple spectrogram to avoid having to do too much processing.
@dexnug
@dexnug 2 жыл бұрын
@@atomic14 another question, I see your dataset is exact 1 second, mine is more than 1 sec lets say 3 sec or more, stereo and SR 44.1 khz.can I still run the preprocessing?
@safa4321
@safa4321 2 жыл бұрын
Hi atomic14! I used your program with other Command_words on Germany. But there is a problem: the spectograms are empty. It seems that the program does not read the audios. Can you help me? Thanks in advance!
@atomic14
@atomic14 2 жыл бұрын
Make sure the audio files are 16KHz - if you want to open up an issue on the GitHub repo and share some of your files I can check them.
@ants00
@ants00 3 жыл бұрын
does this need internet to work or is the recognition in the board offline?
@atomic14
@atomic14 3 жыл бұрын
It all works offline.
@shubhamsrivastava2562
@shubhamsrivastava2562 2 жыл бұрын
How do I modify this project to run on Raspberry Pi 3 ?? Thanks for the help : )
@atomic14
@atomic14 2 жыл бұрын
Hi there, you can probably run the model on the RaspberryPi, but the code is really designed for running on the esp32.
@nekodesu1844
@nekodesu1844 3 жыл бұрын
Hello I am a student studying esp32 and arduino in Japan. I have made a wifi radio car and am currently trying to make a voice command radio car. I had struggled to find out how to make wake words command on esp32. I found your channel. Thank you very much. This is very helpful for my study. Could you let me know if I want to increase the command words a couple of more, how should I work with this? How does [command_window] work in CommandDetector? I have trained the model with two more words, stop and go. Where should I look into the program for modifying it to accept two more wake words? I have started learning these subject at my school since this May. I am quite a new student in these fields. Is it possible if you can provide me some guidance? Thank you very much. Best regards,
@atomic14
@atomic14 3 жыл бұрын
Hi Neko, that's amazing, well done! It should be straightforward, the command window is just used to average the detection over a period to time to really make sure a word has been detected. To change the code to support more commands you should just need to change the NUMBER_COMMANDS of in the CommandDetector.h file and line 94 in CommandProcess.cpp. Feel free to open up an issue on GitHub and I can help you get it up and running.
@nekodesu1844
@nekodesu1844 3 жыл бұрын
@@atomic14 Thank you very much Atomic14. I could add up wake words. It was easy. Only the problem was that I did now know how to use platformIO.
@atomic14
@atomic14 3 жыл бұрын
@@nekodesu1844 I find it much easier for larger projects - once you have more than a couple of files the Arduino IDE becomes quite hard to use. But it does take a bit of getting used to.
@nekodesu1844
@nekodesu1844 3 жыл бұрын
@@atomic14 Ok, thank you. By the way, can I ask a bit more? How does _problem_nose_ work? There is no data, and when I put some wav files, error occurred. Meanwhile, when I pus some wav files in _background_noise_, error occurred. Are these files not easy to handle? Sorry to trouble you, I am in the early learning stage.
@atomic14
@atomic14 3 жыл бұрын
@@nekodesu1844 No problem! Happy to help. Make sure the WAV files are 16KHz, 16bit signed PCM and mono, not stereo. I used problem noise just for some low-frequency humming sounds and rustling sounds that seemed to cause confusion.
@RonSmits
@RonSmits 3 жыл бұрын
I wonder how well this would work on a raspberry pi pico as it is a faster processor and it has more memory
@atomic14
@atomic14 3 жыл бұрын
Hey Ron - do you mean the Pico or the Zero - I think the Pico is actually a bit slower than the ESP32 as it's only clocked at 80MHz?
@RonSmits
@RonSmits 3 жыл бұрын
@@atomic14 no the pico is clocked at 133Mhz, and there are already people overclocking it too
@atomic14
@atomic14 3 жыл бұрын
@@RonSmits You are right! Very interesting - I'll have to have a look at the overclocking people.
@tryssss
@tryssss 3 жыл бұрын
HI i have this problem when i want to use ADC input with an analog mic. how can i fix it ? src/main.cpp: In function 'void setup()': src/main.cpp:84:58: error: 'i2s_sampler' was not declared in this scope CommandDetector *commandDetector = new CommandDetector(i2s_sampler, command_processor); ^ *** [.pio/build/esp32doit-devkit-v1/src/main.cpp.o] Error 1
@tryssss
@tryssss 3 жыл бұрын
i do that to fix it ! is it ok ? //CommandDetector *commandDetector = new CommandDetector(i2s_sampler, command_processor); CommandDetector *commandDetector = new CommandDetector(i2sSampler, command_processor);
@atomic14
@atomic14 3 жыл бұрын
@@tryssss Hi Patrice, just pushed up a fix to the repo. Let me know how you get on - if you hit any problems then can you open an issue on GitHub? Thx!
@user-ws5id8et2g
@user-ws5id8et2g 6 ай бұрын
Is it posible to train for recognition minor lunguages, for exemple Armenian?
@atomic14
@atomic14 6 ай бұрын
You would need training data for it.
@user-ws5id8et2g
@user-ws5id8et2g 5 ай бұрын
@@atomic14 Can you give me some linkes with examples of how to do this? Thanks in advance
@georgecardell9061
@georgecardell9061 2 жыл бұрын
Does it work with all the esp32 models or only esp32 c3?
@atomic14
@atomic14 2 жыл бұрын
I've not actually tested it on a C3. In theory it should work on the ESP32 model, but I would recommend using one of the dual core version to give you more processing power.
@fabiodavide1513
@fabiodavide1513 3 жыл бұрын
Hi atomic14, I'm a student and I would to use your project but when i have compiled your code and it doesn't work. I'm using esp32 and the microphone is the inmp441. I have this problem in the serial monitor: Average detection time 155ms E (1462716) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (1462716) task_wdt: - IDLE0 (CPU 0) E (1462716) task_wdt: Tasks currently running: E (1462716) task_wdt: CPU 0: Command Detect E (1462716) task_wdt: CPU 1: loopTask E (1472716) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (1472716) task_wdt: - IDLE0 (CPU 0) E (1472716) task_wdt: Tasks currently running: E (1472716) task_wdt: CPU 0: Command Detect E (1472716) task_wdt: CPU 1: loopTask Average detection time 156ms E (1482716) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time: E (1482716) task_wdt: - IDLE0 (CPU 0) E (1482716) task_wdt: Tasks currently running: E (1482716) task_wdt: CPU 0: Command Detect E (1482716) task_wdt: CPU 1: loopTask I hope that you will help me.
@atomic14
@atomic14 3 жыл бұрын
Can you raise an issue on the GitHub repository? It's a bit easier to communicate there.
@fabiodavide1513
@fabiodavide1513 3 жыл бұрын
@@atomic14 Dear Atomic 14 i just raise this issue on GitHub, but you don't answer me.
@atomic14
@atomic14 3 жыл бұрын
@@fabiodavide1513 Sorry - just saw it now.
@dexnug
@dexnug 3 жыл бұрын
between two i2s microphone, which one is the best?most sensitive with sound?
@atomic14
@atomic14 3 жыл бұрын
That will really depend on the microphone - the datasheet for each mic will have the sensitivity there is an accepted standard for how this is measured so you will be able to compare the two microphones.
@dexnug
@dexnug 3 жыл бұрын
I mean, In your video description mention two I2s microphone, which is INMP441 and ICS-43434, whic one is better? And how about limitation esp32?how big a tensorflow model can it handle?
@atomic14
@atomic14 3 жыл бұрын
@@dexnug Ah sorry :) There's a table here of all the Invensys microphones - invensense.tdk.com/products/digital/ both have the same sensitivity but the ICS-43434 has better SNR. The two constraints on the model are time and memory use. I found time to be my main constraint as I was trying to get real-time detection to work.
@dexnug
@dexnug 3 жыл бұрын
@@atomic14 I thought so, ICS-43434 slightky better.. since hardly fond the ICS-43434 in my local e-commerce, maybe I will go for INMP441. how about the code? is there anything i need to change to? do you have another reccomendation board? based on this www.tensorflow.org/lite/microcontrollers, there are 13 boards capable of running tensorflow lite. I'll go for SparkFun Edge. my question is, is it possible to run the tutorial on this video to sparkfun edge or another board in the list? regards,
@atomic14
@atomic14 3 жыл бұрын
@@dexnug Both microphones are I2S so will just work. Most of my tutorials are built around the ESP32 series of devices, so the code will not be the same. You could definitely run through the model training and run the same model on a different microcontroller, but the code for doing that would be different. If you want to investigate other boards then I would probably run the specific examples from tensorflow for that board first.
@elaith9
@elaith9 3 жыл бұрын
Why would you bin frequencies like this with pooling? Why not just use mel filters given that lower frequencies are more important than higher. What you are doing here is binning all frequencies linearly, which kinda goes against logic. I have also noticed that tf pooling is slower than just applying mel filters with librosa. I gotta be honest, I'm surprised this works so good.
@atomic14
@atomic14 3 жыл бұрын
You're correct we might get better results by using mel filters and logarithmically binning the frequencies. One of the issues is that we are running on quite a constrained device. Any processing that we do for training needs to be recreated in the embedded side of the code. We would need to port librosa over to the ESP32 (this might be straightforward) or implement mel filters ourselves - this is also probably quite straightforward but would still involve a fair amount of extra coding. Ideally, we might want to take the mel frequency and generate MFCCs from them. One thing to remember is that we are doing quite a well-constrained problem, we're only trying to recognise a limited subset of words so we should expect reasonable results. Worth having a go though - there is still some CPU time available for a bit more processing of the audio data.
@yaowang4490
@yaowang4490 3 жыл бұрын
Dear Mr. Zhang: I took your suggestion, cleared all the local changes in VScode and recompiled. I got the following information from the serial port. Does it mean that I successfully downloaded it to esp32? The serial port information is as follows: ets Jun 8 2016 00:22:57 rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0018,len:4 load:0x3fff001c,len:1044 load:0x40078000,len:10124 load:0x40080400,len:5828 entry 0x400806a8 Starting up Loading model 12 bytes lost due to alignment. To avoid this loss, please make sure the tensor_arena is 16 bytes aligned. Used bytes 22644 Output Size 1 Created Neral Net m_pooled_energy_size=43 Created audio processor Starting i2s Average detection time 103ms Average detection time 103ms Average detection time 103ms Average detection time 103ms .................................................. Average detection time 103ms
@atomic14
@atomic14 3 жыл бұрын
That looks promising. But I think this is from the DIY Alexa project? We should move any questions around the code to GitHub issues.
@yaowang4490
@yaowang4490 3 жыл бұрын
@@atomic14 Hello engineer~ i think this project is a bit difficult for me at this moment,So i am going to study your other project,hope to be fortunate enough to get your guidance,here is the link of your project. github.com/atomic14/voice-controlled-robot (I have submitted some issues on GitHub) your's sincerely fans Yao
@tryssss
@tryssss 3 жыл бұрын
Hi did you send me your little pcb card with micro ? i command it las week and still dont have it ? patrice rancé
@tryssss
@tryssss 3 жыл бұрын
from Tindie
@atomic14
@atomic14 3 жыл бұрын
@@tryssss Hi Patrice, yes, it was sent on the 12th December - I have been hearing that a lot of post seems to be delayed. It should only take 3-5 days, but seems to be taking a lot longer. Can you let me know if it doesn't arrive in the next few days and I can send again.
@tryssss
@tryssss 3 жыл бұрын
@@atomic14 ok thanks i will let you know
@leocastro4140
@leocastro4140 3 жыл бұрын
español e ingles debajo: Spanish and English below: hola muy buen video, pude replicarlo pero utilizando un micrófono conectado al ADC_0(pin36) y tuve que modificar el código para que la ganancia se adapte bien y después de varias horas de pruebas y debug con serial.print() pude encontrar que el problema lo tenia con la ganancia. te felicito y es justo lo que vengo trabajando hace un tiempo.. utilizaba las frecuencias para tratar de identificar los comandos de voz pero sin buenos resultados y ahora con esto ya voy a poder conversar con mi pequeño robot jaja.... si alguien necesita ayuda en español le puedo ayudar en lo que yo modifique para que me funcione.... no me llevo con el inglés mucho asique uso google para traducirlo..... hello very good video, I was able to replicate it but using a microphone connected to the ADC_0 (pin36) and I had to modify the code so that the gain suits well and after several hours of testing and debugging with serial.print () I was able to find that the problem I had it with the profit. I congratulate you and it is just what I have been working on for a while .. I used the frequencies to try to identify the voice commands but without good results and now with this I will be able to talk with my little robot​​.... if someone needs help in Spanish I can help you with what I modify to make it work for me ....
@atomic14
@atomic14 3 жыл бұрын
That's brilliant! Would it be possible to put the changes you made in a pull request on the GitHub repository? And congratulations on getting it working!
@leocastro4140
@leocastro4140 3 жыл бұрын
@@atomic14 If I have no problem sending the files that I modify and tell about the experience, if you want I will give you my email so that you can contact me and I will send you the files so you can upload them to your github account my email: leocastro5364@gmail.com thanks...
@enriquecalzadajimenez7879
@enriquecalzadajimenez7879 Жыл бұрын
Leo me gustaría replicarlo también, como puedo contactarte?
@tryssss
@tryssss 3 жыл бұрын
hi how could i compile this project in arduino ? do i need Plateform io ?
@atomic14
@atomic14 3 жыл бұрын
Hi Patrice, I'm afraid you'll need to use Platform.io - it's pretty easy to get it setup and it's so much nice to use!
@tryssss
@tryssss 3 жыл бұрын
Serial port /dev/cu.usbmodem14331 Traceback (most recent call last): File "/Users/blackswan/.platformio/penv/lib/python3.8/site-packages/serial/serialposix.py", line 322, in open self.fd = os.open(self.portstr, os.O_RDWR | os.O_NOCTTY | os.O_NONBLOCK) FileNotFoundError: [Errno 2] No such file or directory: '/dev/cu.usbmodem14331' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/blackswan/.platformio/packages/tool-esptoolpy/esptool.py", line 3969, in _main() File "/Users/blackswan/.platformio/packages/tool-esptoolpy/esptool.py", line 3962, in _main main() File "/Users/blackswan/.platformio/packages/tool-esptoolpy/esptool.py", line 3551, in main esp = chip_class(each_port, initial_baud, args.trace) File "/Users/blackswan/.platformio/packages/tool-esptoolpy/esptool.py", line 271, in __init__ self._port = serial.serial_for_url(port) File "/Users/blackswan/.platformio/penv/lib/python3.8/site-packages/serial/__init__.py", line 90, in serial_for_url instance.open() File "/Users/blackswan/.platformio/penv/lib/python3.8/site-packages/serial/serialposix.py", line 325, in open raise SerialException(msg.errno, "could not open port {}: {}".format(self._port, msg)) serial.serialutil.SerialException: [Errno 2] could not open port /dev/cu.usbmodem14331: [Errno 2] No such file or directory: '/dev/cu.usbmodem14331' *** [upload] Error 1
@tryssss
@tryssss 3 жыл бұрын
i have this error when i try to upload ? any idea ?
@tryssss
@tryssss 3 жыл бұрын
@@atomic14 ok i did it :) code is compiling ok ! but i still have issu when upload to esp32
@atomic14
@atomic14 3 жыл бұрын
In the platform.ini file comment out (or delete) these lines: upload_port = /dev/cu.SLAB_USBtoUART monitor_port = /dev/cu.SLAB_USBtoUART You may need to change them to where your device is if it can't find it.
@egorgorelyy6109
@egorgorelyy6109 6 ай бұрын
Hi, thank you for sharing this project. I am able to built and upload it but I only get _invalid as an output. I am using an INMP441 mic. Is this microphone compatible with the code by default or are there changes I need to make? I am using identical pins to your code. I have also tested the mic with the plotter and I can see the resultant graph. Could you please help me with this? EDIT: I checked and my microphone is getting a mean value of 0 I also get this output in my serial monitor: Backtrace: 0x400DF5CB:0x3FFD2FCC |
@fernandoasenjovisiedo556
@fernandoasenjovisiedo556 6 ай бұрын
Hello there , have you fixed the problem ? I am having the same
@lauracastro6493
@lauracastro6493 6 ай бұрын
hi there!!! I have the same issue, did you manage to solve it ? Thanks !!
@egorgorelyy6109
@egorgorelyy6109 6 ай бұрын
It depends on ur ESP32 library version U need one under a certain release will check soon
@lauracastro6493
@lauracastro6493 6 ай бұрын
@@egorgorelyy6109 Thanks for answering! but, what do you mean by that? i dont understand it
@lauracastro6493
@lauracastro6493 6 ай бұрын
nevermind, problem solved!
@kirillyatsenko5365
@kirillyatsenko5365 3 жыл бұрын
Great project. I was struggling to make micro speech project work from the official TensorFlow repository github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech The system was only recognizing "no" words and the response was very slow. I'm not sure if the problem is in the model or inside the code However, your system's response is fast and works well.
@atomic14
@atomic14 3 жыл бұрын
I need to have a look at micro speech as it's moved on from when I did this project. It's on the list of future videos. I'll check back here when I get going and let you know if I can get it to work.
@kirillyatsenko5365
@kirillyatsenko5365 3 жыл бұрын
@@atomic14 thanks
@stark9397
@stark9397 2 жыл бұрын
@@kirillyatsenko5365 Hello master. I am also working on this project and it is very important to me. I will be grateful if you could help me. Do you have a discord or email where I can reach you?
孩子多的烦恼?#火影忍者 #家庭 #佐助
00:31
火影忍者一家
Рет қаралды 45 МЛН
Connect Your ESP32 to Alexa with FauxmoESP
28:28
Bytes N Bits
Рет қаралды 1,4 М.
Build Your Own Drone Tracking Radar:  Part 1
20:08
Jon Kraft
Рет қаралды 479 М.
How To Run TensorFlow Lite on Raspberry Pi for Object Detection
10:48
Edje Electronics
Рет қаралды 856 М.
Running ChatGPT on ESP32 with Audio Output 🔉🔊
10:50
techiesms
Рет қаралды 60 М.
Making the most EQUIPPED DIY Security Camera with ESP32-CAM
26:36
Max Imagination
Рет қаралды 342 М.
Four Simple Speech Recognition Products
16:04
James Bruton
Рет қаралды 72 М.
BEST electronics Module EVER | DFROBOT offline voice recognition module
6:45
EDISON SCIENCE CORNER
Рет қаралды 9 М.
Broadcasting Your Voice with ESP32-S3 & INMP441
8:13
That Project
Рет қаралды 37 М.
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 8 МЛН
ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК
1:00
VA-PC
Рет қаралды 687 М.
PART 52 || DIY Wireless Switch forElectronic Lights - Easy Guide!
1:01
HUBAB__OFFICIAL
Рет қаралды 23 МЛН
ИГРОВОВЫЙ НОУТ ASUS ЗА 57 тысяч
25:33
Ремонтяш
Рет қаралды 335 М.