Mel-Frequency Cepstral Coefficients Explained Easily

  Рет қаралды 130,654

Valerio Velardo - The Sound of AI

Valerio Velardo - The Sound of AI

Күн бұрын

Пікірлер: 225
@Elonimous
@Elonimous 3 күн бұрын
your explanation of the intuition for the cepstrum is so freaking elegant thank you!
@aberone_library
@aberone_library 6 ай бұрын
I cannot express how much I'm thankful to you for making this video! This is my favorite style of explanation that I myself have adopted over the years. You took an hour to explain a concept that could, in principle, have been explained in 15 mins or so, but you did it so clearly and thoroughly that by the end of the video I had a spotless, complete understanding not only of the process of extracting the MFCCs but also of the intuition and the meaning of it. Which is something that a lot of other explanatory videos lack these days. So thank you again for your effort!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 6 ай бұрын
Thanks a lot :)
@IbrahimCikrikcioglu
@IbrahimCikrikcioglu Жыл бұрын
I was watching the video and at some point I stopped and started talking to chatGPT to understand those concepts. I found myself learning about convolutions and cepstral coefficients and its intuition. Once, I got back to the lecture, the first thing Valerio started talking about was convolutions and the intuition behind cepstral coefficients. The moral of this story is he is an amazing teacher and just finish the lecture first and then search for stuff that you did not get in the lecture :)
@thebigVLOG
@thebigVLOG 3 жыл бұрын
This is one of the best lectures I've ever effin watched, thank you so much for making this series!
@emrecan9271
@emrecan9271 8 ай бұрын
You are a perfect man. These videos are literally worth gold. I will watch them from the start. Thank you very much.
@ayishanayyar1283
@ayishanayyar1283 3 жыл бұрын
Description in a pleasant manner, untiring, relaxing effect on nerves. Thank you Valerio Velardo
@Bluephoton
@Bluephoton 3 жыл бұрын
Better than Speech Signal Processing Lecture in terms of explanation and ease of understanding !! Highly recommend to watch for speech related projects!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
Thank you!
@klausjurgenfolz4323
@klausjurgenfolz4323 3 жыл бұрын
I've learned more watching this video than a whole semester in my university. Than You!!!!
@johnmin3821
@johnmin3821 3 жыл бұрын
This video is doing explanations that I couldn't find or understand from hundreds of websites. You're a legend
@nataliakalashnikova1269
@nataliakalashnikova1269 3 жыл бұрын
I did my master's thesis in NLP on automatic emotion recognition comparing CNN and SVM performance using MFCC. I didn't really "get" the meaning of MFCC, how it works, why it is so popular, etc. Now I'm doing my PhD thesis also on emotion classification in speech and I was really struggling with the understanding of these basics concepts. Thank you so much for your work, your clear and vivid explanations! You helped me a lot to move forward in my project. P.S. Sorry for my English, if there are a lot of errors. P.P.S. I am a linguist "I believe it's called" :)
@zahraamuhsen1310
@zahraamuhsen1310 2 жыл бұрын
please I am study and my thesis it also about speech emotion recognition using cnn and mfcc based on GA by using entropy >>>> can you send me your thesis or can you help me to understand
@BlackHermit
@BlackHermit 2 жыл бұрын
This was really, exceptionally good. A rather lengthy video, but worth every second. Thank you so much!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Glad you liked it!
@ashwinalinkil7328
@ashwinalinkil7328 Жыл бұрын
Straight up dude, you are an absolute beast! Every other sentence just blows my mind. You made it so easy to understand and gain an intuition on such abstruse concepts. Thank you so much!
@abhi88mcet
@abhi88mcet 3 жыл бұрын
I am more of a Reinforcement Learning guy with a bad squicky voice trying to start a youtube channel. I was researching the use RL to create a realistic vocoder to substitute my voice, and stumbled upon this gem...awesome work..keep up the good work..
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
Thanks a lot and good luck with the YT channel -- you're on the verge of starting an amazing journey :)
@MegaCadr
@MegaCadr 3 жыл бұрын
20 minutes in, my mind started melting. Amazing video!
@drillsargentadog
@drillsargentadog 2 жыл бұрын
Nice explanation and great course! One comment: I'm pretty sure big X, E, and H at 27:19 should be functions of frequency, not time, and should be multiplied, not convolved.
@naveenfrancis444
@naveenfrancis444 2 жыл бұрын
at 14:38, doesn't the IDFT map a signal on to the time domain? If so, shouldn't the axis be pseudo time instead of pseudo frequency?
@st0a
@st0a Жыл бұрын
That's exactly what I was wondering...
@Underscore_1234
@Underscore_1234 6 ай бұрын
that's AWESOME STUFF. Did expect good stuff, didn't expect that good stuff, you really did good about explaining cepstrums and the wave to separate glutal pulses from voice track. It really made sense.
@quincydelp9586
@quincydelp9586 2 жыл бұрын
This is an incredibly helpful video that taught me how to implement an MFCC algorithm and intuition for why it is useful information. I can't recommend it enough.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Thank you Quincy!
@seathru1232
@seathru1232 3 жыл бұрын
Dear Valerio, I don't get a point. Shouldn't you get time on the x-axis if you apply an IDFT to a signal represented in the frequency domain? If I take a signal x(t) and take the FFT, and then the IDFT, didn't I get back a reconstructed x(t)? Is the log of the FFT the reason behind what you explained?
@pedrobotsaris2036
@pedrobotsaris2036 Жыл бұрын
That is right. I think he is misusing the term inverse Fourier transform here. If you apply a IDFT you get back to the time domain.
@Walsh2571
@Walsh2571 Жыл бұрын
@@pedrobotsaris2036not if you change the scale before performing an ifft
@非常大的圆白菜
@非常大的圆白菜 8 ай бұрын
@@Walsh2571 Why? If you change the scale before performing IFFT, you just get back to the time domain with a different scale, right?
@tutatis96
@tutatis96 7 ай бұрын
​@@非常大的圆白菜i think that the point is that we got rid of the phase with the log, but im not sure
@vaitom6078
@vaitom6078 2 жыл бұрын
you're a genius of vulgarization, thank you for the effort
@Jamazon
@Jamazon 6 ай бұрын
your channel is a gold mine, thank you so much for what you do!
@Goriuable
@Goriuable 2 жыл бұрын
Thank you so much. I searched alot about the Topic of MFCC and I did not found very good explanations. Your Video is really a masterpiece and I have now a good knowledge about the concepts :) For sure I will have a look at some other Videos from you. Keep Up the amazing Work!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Thank you - glad I could help!
@chacmool2581
@chacmool2581 3 жыл бұрын
What I don't understand is why one takes an inverse FT instead of a FT to get to the quefrency domain. If it's indeed a spectrum of a spectrum shouldn't one take a FT of a FT?
@satyajeetprabhu
@satyajeetprabhu Жыл бұрын
Same thought
@booky6149
@booky6149 11 ай бұрын
We are taking inverse Fourier transform to represent the log spectrum in the same way as the human ear hear (i.e Frequency domain to Quefrency domain). FT only takes the Time domain signal as input. FT of FT violates the rules.
@DiogoSanti
@DiogoSanti 3 жыл бұрын
But is it ok to call the inverse Fourier a Spectrum? I tell that because the inverse Fourier brings back the Frequency Domain to Time Domain, and in my head, spectrum is represented by slices of frequency domain, or am i missing the point?
@jdavibedoya
@jdavibedoya 4 ай бұрын
I'm not an expert, but I believe the conventional way to calculate the cepstrum uses the IDFT because of its scaling factor. Both the DFT and IDFT are quite similar and indeed produce results with the same shape.
@jeevanreji7290
@jeevanreji7290 2 жыл бұрын
I absolutely love the way you explain these concepts! Thank you !
@LauraSpinu
@LauraSpinu 3 жыл бұрын
This was so helpful, can't thank you enough for your time and effort. Simply amazing - and your enthusiasm makes it so easy to watch and enjoy through the end!
@zhenxinghu4889
@zhenxinghu4889 3 жыл бұрын
My question is why not apply DFT rather than IDFT again on Log(F(x(t))
@sasankkottapalli6822
@sasankkottapalli6822 8 ай бұрын
Same question here
@tutatis96
@tutatis96 7 ай бұрын
​@@sasankkottapalli6822 i think it works because we're not considering the phase after the log
@yuefenggao7483
@yuefenggao7483 2 ай бұрын
@@sasankkottapalli6822 Both IDFT and DFT are basically equivalent here because they have the same distribution. kzbin.info/www/bejne/iXvSaKmGnLefeLMsi=I664FcrQVgml_Amf&t=77
@zeyuyang2053
@zeyuyang2053 4 жыл бұрын
Best MFCC explanation I‘ve seen ever!Thank you!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you!
@williameden5675
@williameden5675 24 күн бұрын
this is so well done!! I finally think i understand MFCCs :)
@4abdoulaye
@4abdoulaye 4 жыл бұрын
VERYVERYVERY CLEAR, Best video I've ever seen.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thanks!
@AlBeebe
@AlBeebe 4 жыл бұрын
Excellent video. 42:08 ended up making me wonder what happened to the slack message i thought i got. :)
@JigarRajpopatOfficial
@JigarRajpopatOfficial 4 жыл бұрын
Very informative. Thank you!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you Jigar :)
@roninnash6782
@roninnash6782 3 жыл бұрын
Sorry to be so off topic but does someone know a tool to get back into an instagram account..? I somehow lost the account password. I love any assistance you can offer me
@adambryant9487
@adambryant9487 3 жыл бұрын
@Ronin Nash instablaster :)
@roninnash6782
@roninnash6782 3 жыл бұрын
@Adam Bryant i really appreciate your reply. I got to the site through google and im in the hacking process now. Looks like it's gonna take quite some time so I will reply here later with my results.
@roninnash6782
@roninnash6782 3 жыл бұрын
@Adam Bryant it worked and I finally got access to my account again. Im so happy! Thanks so much, you saved my ass !
@beincheekym8
@beincheekym8 4 жыл бұрын
awesome course, so complete, and very clear visualization. really amazing. thank you!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thanks!
@subrahmanyamkunapuli1860
@subrahmanyamkunapuli1860 4 жыл бұрын
👏Excellent way to explain intricate details!! Thanks for the video series.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you!
@akanshmaurya1568
@akanshmaurya1568 4 жыл бұрын
I am confused about why it is a spectrum of a spectrum, when we take Fourier transform, we go from time to spectrum, so according to last step while calculating cepstrum, should we not call as inverse of spectrum?
@Erosis
@Erosis 4 жыл бұрын
Yeah, the inverse is kinda confusing me. I thought we'd use another Fourier Transform to get quefrequency, not the inverse (which puts it back into time domain). I read a post about this ( dsp.stackexchange.com/questions/5940/mfcc-process-confusion ) where they say that both are going to produce relatively the same thing, so it doesn't matter in the end.
@bijan8705
@bijan8705 2 жыл бұрын
He clearly don't know that inverse FT is not the same as FT at 14:00
@RudraSingh-pb5ls
@RudraSingh-pb5ls 2 жыл бұрын
@@bijan8705 who doesn't know, Valerio or Akansh, the guy who asked this question here ?
@非常大的圆白菜
@非常大的圆白菜 8 ай бұрын
@@Erosis Thank you for this information. I read the post but I'm still confused... why are both going to produce the same thing? One is the inverse of the other
@katsiarynaruksha9381
@katsiarynaruksha9381 Жыл бұрын
Extremely useful series of lectures. Thanks a ton!
@IngridKnoch
@IngridKnoch 3 жыл бұрын
That was so clearly explained!! Thank you for this, Valerio
@avidreader100
@avidreader100 3 жыл бұрын
I am still stalled at this video. I feel the founders of the concept have confused us by naming these unique parameters the way they did. Quefrency as a metric with a measure of seconds was quite a big factor confusing me. I am gradually coming to terms with it. Let me share my thoughts so that others can correct me if I am off. In the Fourier transform that gave us the spectrum, we say we convert a signal from the time domain to frequency domain. We look at the time domain signal as an additive value of multiple uniform/ steadier frequency components (all taken within a short time frame). The amplitude in vertical axis is expressed in different units (dB etc), but is conceptually the same - magnitude. The Fourier transform inverted the x axis. From time it went to inverse of time, which is frequency. The cepstrum is basically looking at the up and down shifts of the spectrum as we scan along with respect to frequency. These are the formants in speech. The amplitude is again not tampered with beyond expressing as log etc. The x axis is not flipped once again from cycles per unit time to time. In both spectrum and cepstrum we did flipping of x axis. First time around it analyzed the signal and have all the frequency components. In the second time it gave all the formats. The amplitude of the spike in the cepstrum gave us the significant components, and the quefrency or time value at which the spikes occurred, when inverted gives us the formant frequency corresponding to this spectrum. Does this sound right?
@Waffano
@Waffano 2 жыл бұрын
The IDFT part is a typo if you ask me. For me it only makes sense that the cepstrum is a spectrum of a spectrum, meaning DFT applied to a spectrum. This is the only way we can collect the frequencies of the formants. If it was IDFT it would just result in a complex waveform with no information of frequencies. In the end Valerio also specifically uses discrete cosine transform and NOT inverse discrete cosine transform, to get the final MFCCs, which makes sense. So I strongly believe the IDFT in the beginning is just a mistake and should be DFT.
@thierrydesot1164
@thierrydesot1164 4 жыл бұрын
Thanks a lot for this brilliant explanation. I have read several papers to grasp the concept of mfcc, mel scaling, delta derivates etc. But after watching this youtube tutorial it is the first time I have the feeling I 'got' it. So I am on my way to watch your other tutorials.
@ritwickjha3954
@ritwickjha3954 2 жыл бұрын
maybe you should be a bit clear, taking IFFT of frequency domain will give us time domain. Quefrency is in the time domain. I was a bit confused because you kept saying IFFT will give something like a frequency domain. Also i am not sure if taking log of signal in time domain is correct, since it is convolution of E and H, log should be in frequency domain where it is multiplication of E and H. please correct me if i am wrong. great video
@Jononor
@Jononor 4 жыл бұрын
In "Computing Mel-Frequency Cepstral Coefficients" (approx time 38:00) you put Waveform->DFT->Log-Amp->Mel-filterbank->DCT. Is it not more conventional to apply the Mel filterbank to linear magnitude spectrogram, and then do the log transform? But maybe the order is not so important between those two steps?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
It's really a matter of "preference". Both approaches work.
@6tyelement979
@6tyelement979 4 жыл бұрын
4:32 When u cannot answer a question u got asked in front of whole class btw great vid
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
LOL
@ashokdhingra4
@ashokdhingra4 3 жыл бұрын
Hi, Fourier transform of a time domain signal is a series of terms, and not a single number. What then is the meaning of Log of the Fourier transform? Or is it Log of each term in the Fourier transform? Further, when we take inverse Fourier transform, we should go back in time domain. So it is not really 'spectrum of a spectrum'.
@AsEnIxX-wtf
@AsEnIxX-wtf Жыл бұрын
Excellent presentation & explanation
@keem.studio
@keem.studio 4 жыл бұрын
this video just saved my engineering final project
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Nice :)
@AlexTuduran
@AlexTuduran 2 жыл бұрын
Why not just call it a *meta-spectrum*, which is literally a spectrum of a spectrum? Also, this is one of the best explanations I came across. Well done.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Meta-spectrum sounds really cool!
@dver89
@dver89 2 ай бұрын
Incredible video. Thank you!!
@Jononor
@Jononor 2 жыл бұрын
You should put (MFCC) in the title, I think. It should help people discover the video. Not everyone knows what the abbreviation stands for :)
@user-yo4kd7zy9j
@user-yo4kd7zy9j 2 жыл бұрын
So well explaniert! Thanks alot for your amazing work.
@harutyunyansaten
@harutyunyansaten 3 жыл бұрын
I want to learn deeper can you please provide references where you tookthis info?
@MrOpossumx3
@MrOpossumx3 3 жыл бұрын
Another great vid! I would have appreciated a bit more intuition over the meaning of the MFCC coeffs / time matrix presented around 48:37. If a spectrogram is intuitive, if found a MFCCs coefs over time matrix to be harder to interpret. Do you have some intuition of MFCCs coefs over time from a psycho-acoustical perspective? In a Spectrogam, the intensity of a given frequency at a frame nicely link to the perception we have of a sound high or low pitch. What would a perceptual equivalent for MFCCs coefs over time?
@Waffano
@Waffano 2 жыл бұрын
Say we wanted to identify a specific individual from their speech, to use for unlocking a device with speech for example. In this case spectral detail would be more important than spectral envelope right? Because spectral detail tells us something about the unique pitch of the speakers voice? In contrast, in ASR, where we only care about the words that are spoken, and not by whom they come from, it makes sense to use spectral envelope?
@dataista7717
@dataista7717 2 жыл бұрын
Thanks for the series, man. You accelerated my speed jumping into this field a lot. Like A LOT. Really, u rock 🙌
@advaithpillai
@advaithpillai 2 жыл бұрын
Mate you are a life-saver!
@amitrege502
@amitrege502 3 жыл бұрын
Around 50:53 you tell that MFCC ignores fine spectral structures like "pitch" which we don't care about, generally. Then you also say that MFCC works well in speech and music. I think in music, pitch is the most relevant information, people are interested in, because the musical notes themselves are defined around pitch frequencies. I think there is a contradiction in the statements. Will you please clarify. Thank you for such a nice video.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
It's not a contradiction. For tasks like music genre, mood, instrument classification timbreis more important than pitch.
@amitrege502
@amitrege502 3 жыл бұрын
@@ValerioVelardoTheSoundofAI Thank you
@RahulSharma2501
@RahulSharma2501 2 жыл бұрын
This is absolutely amazing.
@annazaitseva6213
@annazaitseva6213 3 жыл бұрын
If cepstrum is a spectrum of a spectrum why inverse Fourier transform is applied to a log spectrum of a signal not forward?
@pratyushsaha8482
@pratyushsaha8482 3 жыл бұрын
Very well explained. You are awesome man !
@seanperman2000
@seanperman2000 2 ай бұрын
why are we getting a Cepstagram when doing the mfcc? are we using the Time-fequency domain? I'm a little confused going from the Cepstrums to the mfcc part of the video
@shereenelmetwally522
@shereenelmetwally522 3 жыл бұрын
Thank you very much. It was really wonderful!
@shaidhasan6895
@shaidhasan6895 4 жыл бұрын
Thanks a lot. Was waiting for this.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Glad you liked the video!
@lingarajmishra8981
@lingarajmishra8981 Жыл бұрын
After application of Fourier transformation how did the vocal tract response and glottal pulse still was in the time domain....plz explain
@ChirsXu-n9o
@ChirsXu-n9o 10 ай бұрын
I am wondering that the 1st rhamonic is representing the envelope(formants) or the glottal pulse in the latter of this video? I am a little bit confusing here at 16:12
@AlexTuduran
@AlexTuduran 2 жыл бұрын
I watched the suggested video for how to compute the envelope, but I find it unfit for this problem or I'm missing something. Basically, to compute the envelope, you take the max of a frame. This works well in general with audio, but in constructing the envelope of a spectrum, the data is rather short / scarce (ex. FFT 1024 => 512 points) and breaking it down in frames increases the chances of computing a rather "false" envelope. How do you manage to avoid the local minima and account only for the actual peaks? And since we're talking about speech, we'll have a lot of local minima. Applying a low-pass filter kind of does it, but it obviously has the disadvantage of potentially shave off important peaks. Sow how to do it properly?
@Waffano
@Waffano 2 жыл бұрын
@37:44 You mention that we get a mel spectrum. However most of the ressources I found don't mention any mel spectrum at that step but instead they mention a 1D mel vector with length = M, where M is the number of mel bands and m is the band number. The m'th element of the mel vector then contains the sum of the products between the m'th mel filter bank and the power spectrum. Is this mel vector the same as a mel spectrum? And whats the pros and cons of using either, if they are different?
@yatosaurio
@yatosaurio 2 жыл бұрын
fantastic explanation, very didactic, thank you very much
@rakhshandamujib2793
@rakhshandamujib2793 Жыл бұрын
Absolutely loved it!
@ruanjiayang
@ruanjiayang Жыл бұрын
We apply Fourier transformer or inverse Fourier transformer on the Log power spectrum? Completely different things!
@RickNance
@RickNance 3 жыл бұрын
Sorry... to start with, I might have gotten confused. When you say the MEL spectral analysis shows _perceptually relevant scale for "pitch"_ you mean frequency, right? If not, I've misunderstood something at the start.
@brandonlincolnsnyder
@brandonlincolnsnyder 2 жыл бұрын
this video is blowing my mind!
@MrHowdai
@MrHowdai 3 жыл бұрын
I never understand it this clear until watching your videos!! Really appreciated it. :)) After watching this I got 2 little questions, 1. According to Nyquist theorem, when extracting the MFCCs, do we need more Mel filter banks when processing audio signals in higher sampling rates? Cuz I found the MFCCs of an audio sampled at 44.1KHz are NOT the same as the down-sampled one, which is at 16Khz. 2. Is it right to say that MFCCs is volume-independent audio features? Thanks for the great videos again! And I hope there's someone can help with my questions, thanks in advance!!
@sagarparmar6715
@sagarparmar6715 3 жыл бұрын
greatly admire this video. it's quite detailed. thanks a lot
@pohjanakka1
@pohjanakka1 3 жыл бұрын
Thank you so much. This was clearly explained.
@sharonm1261
@sharonm1261 Жыл бұрын
could anyone perhaps tell me which is the next video to watch for how to use MFCCs from different speakers to tell the speakers apart....no worries if there's not one, I will also search and google, thank you :)
@TanupatBoon
@TanupatBoon 4 жыл бұрын
is DCT just another Fourier transform? Why is it the inverse one?
@ourissueanniversary
@ourissueanniversary Жыл бұрын
Hello! May i have a question about MFCCs? You said that MFCCs are not so great for synthesis. So it means that usual mel spectrograms are mostly used in speech synthesis tasks?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI Жыл бұрын
Correct. (Mel) spectrograms, and, lately, directly raw audio.
@MarkEdwardsGreenside
@MarkEdwardsGreenside 8 ай бұрын
autocorrelation? This feels like achieving autocorreation using fft and ifft. Is there a relationship between cepstrum and autocorrelation? I'm a newbie to this and doing my best to self-learn - would appreciate understanding if this observation is correct!
@kaziasifahmed2443
@kaziasifahmed2443 4 жыл бұрын
Nice video sir,I have understand lots of things about MFCC.So If i want i to make a speech recognizer With RNN should i do?only feed spectrums or MFCC.I am not That experts at this sector.Just asking.Again Great job by providing valuable informations.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
These days we tend to use (Mel) spectrograms more than MFCCs for speech recognition tasks.
@muntazirmehdi503
@muntazirmehdi503 3 жыл бұрын
@@ValerioVelardoTheSoundofAI can we use mel spectrogram with RNN instead of CNN
@parismitasarma1572
@parismitasarma1572 3 жыл бұрын
Amazing, you are explaining the underlying concept in much easier way. Thank you so much Sir.
@virendrawadher8006
@virendrawadher8006 4 жыл бұрын
any resources to know more about MFCC? and resources to know what are each coefficient belongs too like MFCC[1] -> energy, MFCC[2] -> spectral envelope etc
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
There isn't a direct mapping between each coefficient and a perceptual / acoustic attribute. Unfortunately, I haven't found many comprehensive resourcess on MFCCs.
@alappattjacobantony
@alappattjacobantony 3 жыл бұрын
At around 30:45 you say we care about the vocal tract frequency response component more than the component carrying F0 information because the former gets us closer to the identity/timbre of the sound source. This makes sense to me with respect to speech processing in non-tonal languages like English/Hindi......would this perspective change when we're trying to look at speech in tonal languages, or am I conflating ideas here?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
That's a great intuition! Relative F0 information would actually be important for tonal languages.
@desikharkara9407
@desikharkara9407 3 жыл бұрын
Very good and great explained thanks 👍
@anupambhattarai8765
@anupambhattarai8765 4 жыл бұрын
Great explaination.👍
@stefanhopman9176
@stefanhopman9176 3 жыл бұрын
Great Video! Thank you.
@praburocking2777
@praburocking2777 2 жыл бұрын
hi, I thought we are using decibel (log scale) for amplitude because pow(10,-12) and pow(10,1) watts/meter^2 is a huge range, so to reduce that we use decibel log scale. is that true ? also the amplitude is a physical property of sound unlike its counter part loundness which is a perceived property of sound. if not for the above stated case, why should we use log scale for measuring amplitude ?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
You're spot on. We use log scale because the amplitude range is enormous, and because, just like pitch, we perceive amplitude (I should say loudness) logarithmically.
@ahmadkhadra992
@ahmadkhadra992 3 жыл бұрын
Hi Valerio, I have a little question, when we apply DFT on the signal, why we got a power spectrum ? Why not just a spectrum ?
@praburocking2777
@praburocking2777 2 жыл бұрын
39:40 why we are taking log of the amplitude ? I think we perceive only frequency in logarithmic scale, right ? or do we perceive amplitude also in logarithmic scale?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Loudness is also perceived logarithmically.
@praburocking2777
@praburocking2777 2 жыл бұрын
@@ValerioVelardoTheSoundofAI do we have any scale than measure the perceived level of amplitude like Mel scale for frequency ?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
@@praburocking2777 decibels
@praburocking2777
@praburocking2777 2 жыл бұрын
@@ValerioVelardoTheSoundofAI thanks. I am getting started in audio processing. Ur videos are very helpful. Keep up the good work.
@praburocking2777
@praburocking2777 2 жыл бұрын
@@ValerioVelardoTheSoundofAI hi, I thought we are using decibel (log scale) for amplitude because pow(10,-12) and pow(10,1) watts/meter^2 is a huge range, so to reduce that we use decibel log scale. is that true ? also the amplitude is a physical property of sound unlike its counter part loundness which is a perceived property of sound. if not for the above stated case, why should we use log scale for measuring amplitude ?
@sharonm1261
@sharonm1261 2 жыл бұрын
this is really interesting, great explanation, thanks! now I just have to work out how to relate this to blossom bat squeaks 🤔 (their frequencies are a lot higher)
@amitrege502
@amitrege502 3 жыл бұрын
This is a good video. However the question is, in the section on 'Formalizing Speech' why are you using the (t) variable in the transform domain also. The domain should be frequency.
@zweiteid3340
@zweiteid3340 Жыл бұрын
Hello, We are currently doing a project on verification using the human voice (speaker recognition). Would mfcc be useful here at all, when it is actually about filtering out phonemes?
@DOMINIK32110
@DOMINIK32110 3 жыл бұрын
Great video as always! Could you recommend books or other sources (it'd be great if it was possible to find them on the Internet) to read more about MFFCs? Especially in context of speech.
@mohamadhanifomarsaifuddin4578
@mohamadhanifomarsaifuddin4578 3 жыл бұрын
Good Explanation 5star
@fabricejumel4630
@fabricejumel4630 2 жыл бұрын
Thanks a lot . Just perfect
@amoghshekharhiremath6627
@amoghshekharhiremath6627 3 жыл бұрын
Very Astounding!!!!!!!!!!!!!!!!
@parasharparikh9352
@parasharparikh9352 3 жыл бұрын
Can I use MFCCs for extracting features from the current signal?
@kxiong4021
@kxiong4021 3 жыл бұрын
Thank you for sharing this amazing content. Very informative and specific. Came for copper and found gold!
@bigpenguin8457
@bigpenguin8457 2 жыл бұрын
Thank you for the video, i wanted to ask if you have any documents or codes related to extracting "spectral detail" or the entire procedure that you described in the video (spectrum-->log amplitude spectrum-->spectral envelope-->spectral detail) i have applied amplitude envelope on log power spectrum which is a spectral envelope by theory but it gives me lesser values so i cannot do element wise subtraction with log power spectrum to get spectral detail, please suggest me if i am wrong somewhere. Thank you.
@kaziasifahmed2443
@kaziasifahmed2443 3 жыл бұрын
what does the color of visualising MFCC means? Does it means any intensity of a certain co_effecient value in each frames.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
Yes, it's the "intensity", better said value, of a coefficient.
@harisbournas6600
@harisbournas6600 3 жыл бұрын
Great explanation
@evrenbingol7785
@evrenbingol7785 3 жыл бұрын
We take log amplitude and we also do another log in Mel scaling. Does not that add up to log*2 eventually ?
@avidreader100
@avidreader100 3 жыл бұрын
The Mel scale conversion is not a second mathematical log of the measure magnitude, but a different scale based on equal perception which has a logarithmic relationship. Do not think of Mel as a magnitude scale of sound pressure, but think of it as a scale of sound perception.
Extracting Mel-Frequency Cepstral Coefficients with Python
10:52
Valerio Velardo - The Sound of AI
Рет қаралды 58 М.
Mel Spectrograms Explained Easily
30:31
Valerio Velardo - The Sound of AI
Рет қаралды 101 М.
FOREVER BUNNY
00:14
Natan por Aí
Рет қаралды 34 МЛН
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 30 МЛН
Farmer narrowly escapes tiger attack
00:20
CTV News
Рет қаралды 13 МЛН
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 19 МЛН
Intensity, Loudness, and Timbre
37:14
Valerio Velardo - The Sound of AI
Рет қаралды 60 М.
Short-Time Fourier Transform Explained Easily
34:47
Valerio Velardo - The Sound of AI
Рет қаралды 78 М.
Mel Frequency Cepstral Coefficients (MFCC) Explained
5:58
DataMListic
Рет қаралды 36 М.
Understanding Audio Signals for Machine Learning
25:16
Valerio Velardo - The Sound of AI
Рет қаралды 60 М.
Sound and Waveforms
26:53
Valerio Velardo - The Sound of AI
Рет қаралды 88 М.
Demystifying the Fourier Transform: The Intuition
37:17
Valerio Velardo - The Sound of AI
Рет қаралды 43 М.
How to Extract Audio Features
22:19
Valerio Velardo - The Sound of AI
Рет қаралды 95 М.
Speech features intro 3: Mel-scale spectrogram
22:05
Herman Kamper
Рет қаралды 10 М.
10 - Understanding audio data for deep learning
32:55
Valerio Velardo - The Sound of AI
Рет қаралды 60 М.
FOREVER BUNNY
00:14
Natan por Aí
Рет қаралды 34 МЛН