How to Extract Audio Features

Рет қаралды 87,643

Күн бұрын

Learn what are the necessary steps to extract acoustic features from audio signals, both in the time and frequency domains. I also explain key audio processing concepts like spectral leakage, windowing, frames, and hop length.
Slides:
github.com/musikalkemist/Audi...
Join The Sound Of AI Slack community:
valeriovelardo.com/the-sound-...
Interested in hiring me as a consultant/freelancer?
valeriovelardo.com/
Follow Valerio on Facebook:
/ thesoundofai
Connect with Valerio on Linkedin:
/ valeriovelardo
Follow Valerio on Twitter:
/ musikalkemist

Пікірлер: 110

@adi_1o1 Жыл бұрын

Some people are born to teach and your are one of them. It feels like finishing a semester course on signal processing in couple of mins.

@ValerioVelardoTheSoundofAI Жыл бұрын

Thank you!

@amruthgadag4813 3 жыл бұрын

You made me clear the concept of frames, overlapping needs and multiple doubts.

@Hiyori___ 3 жыл бұрын

I am so HAPPY I've found your video. No book could have explained this better, probably not even many professors or tutors. I'm gonna check your other videos!

@fujinafiul6044 2 жыл бұрын

I just started yesterday and I am just loving it as it is helping me a great deal with understanding the audio concepts to do r&d on my ongoing work.

@sharonm1261 2 жыл бұрын

thanks! great explanation of windowing and overlapping frames. I had to rewatch it a couple of times to understand how overlapping frames solved the windowing problem but so much easier and more fun than reading a book!

@rajaniras1970 3 жыл бұрын

I am doing my research in Signal processing along with ML. Thank you for clearly explaining the concepts

@kobyfr Жыл бұрын

Thank you very much for these videos. It is very generous of you to make them. I think there is another aspect of spectral leakage and that is if the signal harmonics do not fall exactly " on top of" the sampling frequencies, then the energy of the harmonics gets spread out to the ALL sampling frequencies, with the most of the energy leaking to frequencies around the closest sampling frequency.

@ardavalilable Жыл бұрын

I just love these videos, it clearly shows the theoritical part and simplifies it into a language that non specialist people (laymen) understands, I am by no means an audio engineer, though my field overlaps with the topics of sound/audio engineering, I work with machinery vibrations which uses basically every tool that you mentioned in this series, I am currently working on my thesis and have used some of these functions already along with wavelet packet transform (which makes my work so easy to do - and it took me quite time to get a hang of it). Great work, may your work help more people like myself for eternity (or till the climate change takes us all out)! Thank you again, and God bless!

@priyanshipal9077 3 жыл бұрын

Great series! You give a good overview of the process, the problems which we face and solutions while keeping the video duration short and optimum so one can keep watching the series. If one is curious enough, they can also read up more on the concepts as well. Great work, thank you! :)

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thank you!

@MrOpossumx3 3 жыл бұрын

Amazing intuition over why using windowing mitigate spectral leakage, great series of videos!

@sathyanarayananvittal7832 5 ай бұрын

Fantastic description. Loved the clarity in concept

@namanagrawal9765 Жыл бұрын

You're an amazing teacher, thanks for the videos! They are really helpful to me!

@venkatesanr9455 3 жыл бұрын

Enjoyed as usual.... and waiting for the next

@yuvaramsingh3773 3 жыл бұрын

i like your work . i have worked with DL and image for quite some time. your series showed me all i need to know abt sound . thanks and keep doing this

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Glad you like this!

@user-yx3wp9xf9o 3 жыл бұрын

Excellent Explanation Solved all my problems Thank You Sir!

@user-si7lr8es6s 4 ай бұрын

I really enjoyed your explanations

@user-si7lr8es6s 4 ай бұрын

Such a great series! You are an awesome teacher Velardo I appreciate you sooo much :)

@jaypople8885 16 күн бұрын

That's exactly what I was looking for, you made my day thanks☺

@TheRainshine79 3 жыл бұрын

Thank you for the awesome explanation. Very clear and understandable! :)

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thanks!

@juniorsilva5713 2 ай бұрын

Thank you very much, Valerio!

3 жыл бұрын

Perfect explanation which I could have not found for months on blogposts etc. Thanks!

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thank you!

@tanyajain3068 Жыл бұрын

Incredibly helpful! It would be immensely helpful if you could post a video of the code for framing and windowing for an audio with the explanation.

@esthermdzitiro31 3 жыл бұрын

Super helpful, thank you for the great work.

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thanks!

@vaibhavdixit4377 8 ай бұрын

Big thumbs up, solid video brother!

@4abdoulaye 3 жыл бұрын

Well explained for overlapping and signal loosing done by Hann windows. Thanks

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thanks!

@tiagoramos121 3 жыл бұрын

Very nice!! Thanks a lot!! Just an observation: in the middle of the video, i got surprised by a slack ring hahaha ... i couldnt find any slack notification in my phone and take a while to discover that the sound comes from de video. That's all . Minute 12:42

@Bigman74066 3 жыл бұрын

Very good, thanks for this video!

@dhruvbishnoi8840 3 жыл бұрын

Thank you! I hope I will get a better rank with this information :)

@hlamzar 11 ай бұрын

This is awesome explanation. now i understand why you can't use a digital filter thats just rectangular

@TheMagicmagic290 3 жыл бұрын

if you have overlapping frames, given that you apply the hann window function, wouldn't you create some form of amplitude modulation that is not present in the original signal?

@yashdeepshetty3401 3 жыл бұрын

Best teacher ever 🙏🙏

@SAWLENE44 3 жыл бұрын

Wow! This is so good. Wish I had seen this before. Merci! :)

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thanks!

@rekreator9481 3 жыл бұрын

I can feel this is highly inspired by Mullers book about Digital Signal Processing :DD I used it for my thesis and I recgonized a lot of stuff like... Ive heard (read) this somewhere before :DD But its described very well there and you explained/summarized it perfectly ;)

@ValerioVelardoTheSoundofAI 3 жыл бұрын

That is a great book, that I love to refer people to for more info/details.

@mishachandar3965 3 жыл бұрын

Great content delivered! Keep going !!!!!

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thanks Misha!

@mishachandar3965 3 жыл бұрын

@@ValerioVelardoTheSoundofAI Looking forward to more videos.

@kewtomrao 3 жыл бұрын

This is hands down the best tut series! Where did you learn so much? Could you recommend]nd some books or websites? Thank you!

@maipyaar 2 жыл бұрын

Thank you for this awesome series. Please, what is the difference between frame and window size? These two terms confuses me alot

@filippobuoncompagni2715 Жыл бұрын

Incredible series! One question, why do we need framing? Can’t we do windowing directly to the all signal?

@haitranminh6307 3 жыл бұрын

Hi sir, we apply the overlapping because of the spectral leakage when we apply the FFT for the frequency domain. But in time domain, this is not the case. Then why do we still apply the overlapping in the time domain? Thank you, sir

@blaze-pn6fk 3 жыл бұрын

as usual, amazing stuff.

@tyhuffman5447 3 жыл бұрын

Very good stuff. Thank you.

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Glad you like it!

@user-oh9gp3is9e 3 жыл бұрын

Thank you through I don't understand. I read other artical extraly to learn it. However, you help me much. I never reliaze leakega in FFT before. LOL

@dionisiuspratama7152 3 жыл бұрын

Thank you for your explanation! Sorry, but may I have the book you are referencing to for this video?

@markusbuchholz3518 3 жыл бұрын

OMG. You and your channel are unique. Amazing. Thanks

@rightcurve9761 Жыл бұрын

Great content! After Windowing(overlapping) you can minimize the "lost period", but it does not seem to solve losing signal problem. To me, by applying Haan function, edges in original signal are lost anyway. So I am not fully sure what we have truly achieved after overlapping, except minimizing useless/lost signals? Frequency Domain will still be missing bars for those overlapped portions of frequencies, isn't it?

@nerd3788 3 жыл бұрын

how to choose overlap length? if it was more than required then wouldn't a part of signal repeat? if i understood correctly, then it doesn't matter as we are extracting features from stft of each frame. is it this way? and also ur videos r really good

@ridamsrivastava9502 3 жыл бұрын

Highly comprehensive and informative series! I had a question : How can we directly input Low Level Descriptors to Keras models?

@ValerioVelardoTheSoundofAI 3 жыл бұрын

You can check out my "DL for Audio with Python" series for that. In this series, I don't use Keras, and focus only on audio features.

@6tyelement979 3 жыл бұрын

Hi please what is the difference between short term and mid term feature extraction. Is mid feature extraction like averaging the short term frames ?

@rakshithv5073 2 жыл бұрын

In Frequency domain,If overlapping frames are introduced to compensate the loss of signal during windowing why do we have overlapping of frames in time domain ? Whether discontinuity(Spectral leak) happens only at the end or start of the frame ? Is it only relevant for frequency domain ?

@missmudassarnizam5158 3 жыл бұрын

thankyou make this video basic information...

@myloveallmine_ Ай бұрын

thx for your help i can better understand audio analysis

@tetlleyplus 5 ай бұрын

According to the sampling theorem, a signal must be sampled at least twice as fast as the highest frequency component in the signal, but in practice what are the recommended sampling frequencies?

@beto5720 3 жыл бұрын

Great video mate, I've seen Librosa use 2048 for frame size and 512 for hop size as default for many of their functions (ej. melspectrogram). Any recommendation on what to use as a general reference? I'm not sure how Librosa does their windowing though so idk.

@ValerioVelardoTheSoundofAI 3 жыл бұрын

A usual ratio between frame size and hop length is 2:1. Beyond this it's difficult to provide a general rule. For some problems you want higher temporal resolution, hence you should use a hope size of 512. For others, 768 or 1024 is totally fine. You'll have to treat the hop length as a hyper parameter that needs to be optimised empirically during training.

@beto5720 3 жыл бұрын

Valerio Velardo - The Sound of AI thanks! I’ve noticed that it varies indeed so I’ll keep playing around with it.

@YarkoFFXI 11 ай бұрын

I'm a bit confused about the denominator in the Hann windowing function. Shouldn't it just be K, i.e. the frame size, or number of samples in a frame, instead of K-1?

@rustombhesania7265 3 жыл бұрын

17:56 was the ahHaa moment for me , loved it thnx for making the video

@Drew_7 Жыл бұрын

At 3:51, could you explain how you go this number again? I tried to do the calculation, but I'm not getting the same thing. Is this because of A/D conversion? Is that single sample number based off of a sample/hold technique or is it just a normal length for one sample in a "one second" length?

@diamondcutterandf598 11 ай бұрын

1/44100Hz = 0.0000227s = 0.0227ms

@ceyhunyldrm7392 2 жыл бұрын

Hello, my friend, I didn't get the point and I wanna ask you. Let's say I have extracted 30 windows overlapping from the audio signal. while I am extracting features should I act these 30 as a different 30 windows but the same target or 30 windows = 1 feature x 30 times = 30 features => my audio. which one do you think is true?

@diamondcutterandf598 11 ай бұрын

11:56 i'm kinda lost. what do u mean by "not an integer number of periods"?

@elyorjonmannonov6114 3 жыл бұрын

Hello Valerio. Thank you for another great video. I would like to ask for your advice for my project and already emailed you about it. Could you take some time and give your thoughts on it please? Would really appreciate it. Thank you

@DANstudiosable 2 жыл бұрын

Can u please explain spectral leakge in layman terms here?

@missestherm9454 3 жыл бұрын

Hie. great video. are time-frequency domain audio features extracted at the same time or one then the other ?

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Usually, you'll extract them at different times. In audio processing libraries like Librosa, there are different functions for different audio features, that you can apply sequentially. Or, perhaps, you may just want to extract one!

@missestherm9454 3 жыл бұрын

Valerio Velardo - The Sound of AI Thank you.

@Waffano 2 жыл бұрын

@12:00: What exactly does it mean that "Processed signal isn't a integer number of periods"? I get that if you loop the sound frame that is showed on the slide, it does not result in a continous wave but it goes from around -3 amplitude directly to around +4 which results in a high frequency, but i cant really get what is meant by "Processed signal isn't a integer number of periods". Can someone help please? :)

@Waffano Жыл бұрын

@@paladise Thank Andrei. I realised what confused me is that I didn't know that a period was an actual feature of the sound wave. Thought it was more abstract than that.

@daviddemmers130 Жыл бұрын

Around 19:39 you explain what hop-length and the frame size is. But very confusing the librosa package has different feature names and also another new feature n_fft Could you clear up what the n_fft means and how they are different.

@josecosta3878 2 жыл бұрын

Is there any videos where you show how to extract and measure silent pauses with Librosa or another Python library? I would appreciate your help. Thanks.

@ValerioVelardoTheSoundofAI 2 жыл бұрын

I don't have such a video. But I would use a feature like RMSE for that.

@DEEPAKYADAV-vb2ee 2 жыл бұрын

Can you tell me the whole feature extraction pipeline for audio data. Like I wank make a class of feature extraction. Which take audio signa. Then what I have to do?

@ValerioVelardoTheSoundofAI 2 жыл бұрын

I covered audio preprocessing pipelines in this video: kzbin.info/www/bejne/hWGXp2aZnK2Bm68

@shafagh_projects 3 жыл бұрын

perfect!

@thechuong1448 3 жыл бұрын

Super!

@CULTURE_dz 3 жыл бұрын

hello sir, can you help me with some links to do windowing to pick speech or audio from microphone for preprocessing , and can you make a tutorial for this soon please thanks

@ValerioVelardoTheSoundofAI 3 жыл бұрын

That's something I'd like to cover in the future. Stay tuned :)

@CULTURE_dz 3 жыл бұрын

@@ValerioVelardoTheSoundofAI.. Ok , i am your student thanks

@heddshot87 Жыл бұрын

I assume spectral leakage simply means a "click/pop" from clipping when abrupting the sample?

@notallama1868 Жыл бұрын

So if I'm understanding correctly, the main purpose of windowing is to deal with spectral leakage. But it sounds like spectral leakage only occurs at the beginning and end of the entire signal. So why bother windowing every frame? It seems like we could save a lot of time and effort by just clipping the erroneous bits off the ends of the signal. I'm guessing there's a reason, but what is it?

@saurabhdeshmukh2182 6 ай бұрын

As per my understanding, your suggestion will work only for static waveform. But in order to calculate Spectrogram (STFT) we need to calculate FFT at every frame and collectively we will get a nice spectrogram to work with.

@tyhuffman5447 3 жыл бұрын

Valerio, I'm using your code on my github python notebook to process some machine sounds and I want to give credit where credit is due so I'm going to mention you, the book you're working on, and the YT series and a link to the YT series. Is that good enough for you? Would you like me to add anything else? I'll make a boilerplate that I will put at the top of the code.

@ValerioVelardoTheSoundofAI 3 жыл бұрын

That is very kind of you :) It's perfect!

@4abdoulaye 3 жыл бұрын

But I noticed an error in slide title, First you titled them as Time domaine 2:49 example then at the end you titled them Frequency domain 20:41.

@ValerioVelardoTheSoundofAI 3 жыл бұрын

Thank you for pointing that out!

@4abdoulaye 3 жыл бұрын

@@ValerioVelardoTheSoundofAI I re-watched the video this is nit a mistake.

@matveysafronov2813 Жыл бұрын

thaks! ps: i think you need nerdy music in the intro. mozart i think )

@user-cm7bb1cc4g 2 жыл бұрын

please make a video list

@ValerioVelardoTheSoundofAI 2 жыл бұрын

This video is already part of a playlist.

@lucasa.w.romeiro2136 2 жыл бұрын

Hello my Name Is Lucas. I'm Brazilian and I'm trying to make an algorithm that differentiates one noise from another. For example: Depending on the sound the rain makes as it hits the ground if I can determine how much water is falling. Things like that, always involving noise. Anyone familiar with this to help me with some directions? Thanks

@aashansamuel1173 3 жыл бұрын

Can you please tell how to extract features from an audio signal by using MATLAB?

@ValerioVelardoTheSoundofAI 3 жыл бұрын

I don't use MatLab, I prefer to work in Python.

@svalente2524 2 жыл бұрын

this is all very cute but how to we actually implement these things? seems like everybody talks about them, but nobody shows it

@chacmool2581 2 жыл бұрын

Spectral leakage. Spectral contamination is more like it.

@saigeeta1993 3 жыл бұрын

Hello, Sir SaiGeeta here from India. Sir, please suggest me good laptop specification for text to speech synthesis using Deep learning algorithm. please please