How to Do Speech Recognition with Arduino

How to Do Speech Recognition with Arduino | Digi-Key Electronics

Рет қаралды 49,468

3 жыл бұрын

Speech recognition is the process of using computers to recognize and understand human speech. Being able to understand full sentences or questions requires a lot of processing power, as it often relies on the complex algorithms found in natural language processing (NLP).
Most microcontrollers (and Arduino boards) cannot run NLP due to their limited resources. However, we can train a neural network to perform basic keyword spotting, which still has many uses (such as enabling a smart speaker by saying “Alexa” or shouting “stop” to halt a machine).
In this video, we will use Edge Impulse to train a neural network to identify and classify a few custom keywords. We will then deploy this trained model to an Arduino Nano 33 BLE Sense to perform keyword spotting in real time.
To begin, we collect samples of the keywords we wish to identify. These can be collected on any number of recording devices and then edited using Audacity to create 1-second snippets. We recommend collecting at least 50 samples to start.
After, we run a custom Python script that mixes the samples with random snippets of background noise and curates the custom keywords along with keywords found in the Google Speech Commands dataset.
You can download the Google Speech Commands dataset here: storage.cloud.google.com/down...
The dataset curation Python script can be found here: github.com/ShawnHymel/ei-keyw...
From there, we upload our curated dataset to Edge Impulse. We use Edge Impulse as a tool to extract features from the audio samples, which are the Mel frequency cepstral coefficients (MFCCs). We then use it to train a neural network to identify our target keywords. Once done, we can test the model and download it as part of an Arduino library.
We load the library into Arduino and use it to perform inference in real time. The Arduino example code continually captures audio data, extracts features (computes MFCCs), and uses those MFCCs as inputs to the trained model. The model returns (what is essentially) the probabilities that it thinks it heard our target keywords.
We can compare those output values to thresholds to take action whenever it hears the desired keyword! To start, we’ll blink a simple LED (because who doesn’t love an overly complicated blinky program?).
Product Links:
Arduino Nano 33 BLE Sense - www.digikey.com/en/products/d...
Related Videos:
What is Edge AI?
• Intro to Edge AI: Mach...
Intro to TensorFlow Lite Part 1: Wake Word Feature Extraction
• Intro to TensorFlow Li...
Intro to TensorFlow Lite Part 2: Speech Recognition Model Training
• Intro to TensorFlow Li...
Intro to TensorFlow Lite Part 3: Speech Recognition on Raspberry Pi
• Intro to TensorFlow Li...
Getting Started with TensorFlow Lite for Microcontrollers
• TinyML: Getting Starte...
Related Project Links:
How to Use Embedded Machine Learning to Do Speech Recognition on Arduino - www.digikey.com/en/maker/proj...
Related Articles:
What is Edge AI? - www.digikey.com/en/maker/proj...
TensorFlow Lite Tutorial Part 1: Wake Word Feature Extraction - www.digikey.com/en/maker/proj...
TensorFlow Lite Tutorial Part 2: Speech Recognition Model Training - www.digikey.com/en/maker/proj...
TensorFlow Lite Tutorial Part 3: Speech Recognition on Raspberry Pi - www.digikey.com/en/maker/proj...
Getting Started with TensorFlow Lite for Microcontrollers -
www.digikey.com/en/maker/proj...

Пікірлер: 51

@VA7AYG 3 жыл бұрын

I have come to greatly appreciate Digi’s dedication to education, not to mention how amazing teacher Shawn is! Keep up the good work and Cheers

@ShawnHymel 3 жыл бұрын

Thank you!

@eloquentarduino5988 2 жыл бұрын

Very detailed tutorial, good job!

@userou-ig1ze 3 жыл бұрын

thank you! Great and timely content, great speed and information content (at 2x)

@harrytsai0420 3 жыл бұрын

This is really a good content!!! Thanks very much

@MuhammadDaudkhanTV100 3 жыл бұрын

Fantastic idea and cool content bro

@ercost60 3 жыл бұрын

Fantastic! TYVM for this vid.

@5_inchc594 3 жыл бұрын

I just found GOLD

@janjongboom7561 3 жыл бұрын

With so many uncertain in test set, lower the minimum confidence rating to 0.6 to get much better results.

@honamyim 3 жыл бұрын

It's very interesting to see the original founder's comment here.

@sumitmamoria 3 жыл бұрын

Nicely done.

@adhamelrouby6445 2 жыл бұрын

Can this method explained in the video be used to recognize a specific sound rather than specific text, e.g., a clap sound?

@resatyigen3430 3 жыл бұрын

Thank you dude. Very cool tutorial. Please make STM32F4 speech recognition example.

@ShawnHymel 3 жыл бұрын

Here's the workshop I did where I show the process end-to-end: kzbin.info/www/bejne/f4PEkIZ-YpiAias. Granted it's an STM32L4, but the same principles should apply.

@mri3884 3 жыл бұрын

Can I use the data_curation for my dataset and profit out of the model I develop?

@ashwinis7513 3 жыл бұрын

Thank You So Much. I have added the file as per Edge Impulse still facing an issue with The filename or extension is too long. Error compiling for board Arduino Nano 33 BLE.

@rohanmanchanda5250 Жыл бұрын

It'd be nice if you could show how to run an audio classifying tflite model on an Arduino Nano / Raspberry Pi Pico *using an Analog Microphone* . There's no proper video that I could find on the web that does that or even resembles this concept remotely.

@bertbrecht7540 3 жыл бұрын

Hi Shawn, Thanks for the great tutorial. I dusted off my BLE, went through every one of your steps and got it all working in near flawless fashion, six hours later. I am now in a good place to start experimenting. My keyword was 'shut-down' (only 48 samples) which turns the LED on and 'go' turns the LED off. This could be a light bulb controller or a TV power switch. A lot of sounds get confused for 'go' so increasing the threshold for 'go' to 85% worked well. Perhaps its just my OS but I had to replace all your '\' with '/' in the curation script command line. Oddly, my feature extraction time took less that half of yours (123ms) but I had the same BLE Sense as you. I want to try to expanding the # of wake words. What do you think held you back: processor speed, RAM, Edge Impulse, data....? Looking forward to your next videos

@ShawnHymel 3 жыл бұрын

That's awesome you were able to get it working! Thank you for the heads up on the '/' vs '\' in the script--I thought I had made it OS agnostic, but I guess I forgot a few spots. I was able to get the BLE Sense to do 4 key words. It has to do with processing speed and RAM. The more classes you look for, the more speed/RAM is required. At some point, the DSP and inferencing will take longer than the allotted 333 ms, and you'll start overflowing the audio buffer :-/

@janjongboom7561 3 жыл бұрын

The Edge Impulse SDK has some improvements recently which sped up dsp code by 200% on this target, that explains!

@akkutyagi16 3 жыл бұрын

Is this using tflite micro in backend?

@eonoire 3 жыл бұрын

I need some help. While setting things up in Anaconda I'm getting this error: dataset-curation.py: error: the following arguments are required: d I really don't know what this could be and I would really apreciate any help, thanks

@sureshtiwari2158 2 жыл бұрын

I want to use this method with an ESP32, how can I make the program use the audio data coming via I2S

@hamishgrant2802 3 жыл бұрын

Hi Thanks for the tutorial. I want to use the method with an ESP32, how can I make the program use the audio data coming via I2C ?

@johnabonilla1266 2 жыл бұрын

Did you ever figure out how to do this?

@hamishgrant2802 2 жыл бұрын

@@johnabonilla1266 I got speech recognition working with about an 80% success rate. There’s is a video in my channel and a link to the code in the description.

@brayanaquino4727 Жыл бұрын

Buen dia, hay una guia como esta pero usando solo raspeberry pi pico y los canales analogicos conectados a un microfono

@nikonissinen6772 3 жыл бұрын

I could of tell you a lot faster way to do the audio samples, but you already did do it so

@F4LL__ 2 жыл бұрын

Well...?

@minhphuongnguyen8117 Ай бұрын

hey bro, this can use to esp32 ?

@withIn40 3 жыл бұрын

Hi sir, can i use it to Voice Recognition Module? thanks

@withIn40 3 жыл бұрын

and please create a video about it that would be so helpful, thank you

@MilSimVipers 3 жыл бұрын

why do i not have app data under my user :(

@imsteven3044 3 жыл бұрын

Hi! I want to convert the speech to text and then work this the text in Python would this module work for me for this?

@djtomoy Жыл бұрын

"hand me my patching trowel, boy!"

@topgearIQ 2 жыл бұрын

Uno work or not

@dhupee 2 жыл бұрын

"I've got 68, which should work for this prototype" Ckckck, I'm disappointed Shawn

@cokeforever 3 жыл бұрын

the entire concept of doing speech recognition localy is outdated and non-effective; you can use google api to simply pass the sound sample and get the recognition result back as a string; 2021... cloud computing... sas... hello)

@dannyash3805 3 жыл бұрын

It's not non-effective if you don't have access to the internet. If you're not interested in the topic of the video then watch a different video!

@cokeforever 3 жыл бұрын

@@dannyash3805 what a stupid thing to say... how do you think people discuss things and eliminate obsolete knowledge and synthesize new knowledge?! p.s. your point is quite funny in the age of IoT (your fridge has internet access) and if you make "sesame, open" switch for your home/cave/volcano - wouldnt it have wifi? will you only power your voice activated device in your tree-house using batteries?!

@dannyash3805 3 жыл бұрын

@@cokeforever What enlightenment! Evidently you have already eliminated the obsolete knowledge of DSP and convolutional neural networks so why don't you eliminate yourself from this comment thread and go be too smart somewhere else.

@cokeforever 3 жыл бұрын

@@dannyash3805 why? because some inefficient low skill troll tells me to? )) not gonna happen, buddy... but you are free to participate in civilized, argumented discussion ;)

@tacticalmanx759 3 жыл бұрын

you do know some people are trying to make offline devices right?