your explanation of the intuition for the cepstrum is so freaking elegant thank you!
@aberone_library6 ай бұрын
I cannot express how much I'm thankful to you for making this video! This is my favorite style of explanation that I myself have adopted over the years. You took an hour to explain a concept that could, in principle, have been explained in 15 mins or so, but you did it so clearly and thoroughly that by the end of the video I had a spotless, complete understanding not only of the process of extracting the MFCCs but also of the intuition and the meaning of it. Which is something that a lot of other explanatory videos lack these days. So thank you again for your effort!
@ValerioVelardoTheSoundofAI6 ай бұрын
Thanks a lot :)
@IbrahimCikrikcioglu Жыл бұрын
I was watching the video and at some point I stopped and started talking to chatGPT to understand those concepts. I found myself learning about convolutions and cepstral coefficients and its intuition. Once, I got back to the lecture, the first thing Valerio started talking about was convolutions and the intuition behind cepstral coefficients. The moral of this story is he is an amazing teacher and just finish the lecture first and then search for stuff that you did not get in the lecture :)
@thebigVLOG3 жыл бұрын
This is one of the best lectures I've ever effin watched, thank you so much for making this series!
@emrecan92718 ай бұрын
You are a perfect man. These videos are literally worth gold. I will watch them from the start. Thank you very much.
@ayishanayyar12833 жыл бұрын
Description in a pleasant manner, untiring, relaxing effect on nerves. Thank you Valerio Velardo
@Bluephoton3 жыл бұрын
Better than Speech Signal Processing Lecture in terms of explanation and ease of understanding !! Highly recommend to watch for speech related projects!
@ValerioVelardoTheSoundofAI3 жыл бұрын
Thank you!
@klausjurgenfolz43233 жыл бұрын
I've learned more watching this video than a whole semester in my university. Than You!!!!
@johnmin38213 жыл бұрын
This video is doing explanations that I couldn't find or understand from hundreds of websites. You're a legend
@nataliakalashnikova12693 жыл бұрын
I did my master's thesis in NLP on automatic emotion recognition comparing CNN and SVM performance using MFCC. I didn't really "get" the meaning of MFCC, how it works, why it is so popular, etc. Now I'm doing my PhD thesis also on emotion classification in speech and I was really struggling with the understanding of these basics concepts. Thank you so much for your work, your clear and vivid explanations! You helped me a lot to move forward in my project. P.S. Sorry for my English, if there are a lot of errors. P.P.S. I am a linguist "I believe it's called" :)
@zahraamuhsen13102 жыл бұрын
please I am study and my thesis it also about speech emotion recognition using cnn and mfcc based on GA by using entropy >>>> can you send me your thesis or can you help me to understand
@BlackHermit2 жыл бұрын
This was really, exceptionally good. A rather lengthy video, but worth every second. Thank you so much!
@ValerioVelardoTheSoundofAI2 жыл бұрын
Glad you liked it!
@ashwinalinkil7328 Жыл бұрын
Straight up dude, you are an absolute beast! Every other sentence just blows my mind. You made it so easy to understand and gain an intuition on such abstruse concepts. Thank you so much!
@abhi88mcet3 жыл бұрын
I am more of a Reinforcement Learning guy with a bad squicky voice trying to start a youtube channel. I was researching the use RL to create a realistic vocoder to substitute my voice, and stumbled upon this gem...awesome work..keep up the good work..
@ValerioVelardoTheSoundofAI3 жыл бұрын
Thanks a lot and good luck with the YT channel -- you're on the verge of starting an amazing journey :)
@MegaCadr3 жыл бұрын
20 minutes in, my mind started melting. Amazing video!
@drillsargentadog2 жыл бұрын
Nice explanation and great course! One comment: I'm pretty sure big X, E, and H at 27:19 should be functions of frequency, not time, and should be multiplied, not convolved.
@naveenfrancis4442 жыл бұрын
at 14:38, doesn't the IDFT map a signal on to the time domain? If so, shouldn't the axis be pseudo time instead of pseudo frequency?
@st0a Жыл бұрын
That's exactly what I was wondering...
@Underscore_12346 ай бұрын
that's AWESOME STUFF. Did expect good stuff, didn't expect that good stuff, you really did good about explaining cepstrums and the wave to separate glutal pulses from voice track. It really made sense.
@quincydelp95862 жыл бұрын
This is an incredibly helpful video that taught me how to implement an MFCC algorithm and intuition for why it is useful information. I can't recommend it enough.
@ValerioVelardoTheSoundofAI2 жыл бұрын
Thank you Quincy!
@seathru12323 жыл бұрын
Dear Valerio, I don't get a point. Shouldn't you get time on the x-axis if you apply an IDFT to a signal represented in the frequency domain? If I take a signal x(t) and take the FFT, and then the IDFT, didn't I get back a reconstructed x(t)? Is the log of the FFT the reason behind what you explained?
@pedrobotsaris2036 Жыл бұрын
That is right. I think he is misusing the term inverse Fourier transform here. If you apply a IDFT you get back to the time domain.
@Walsh2571 Жыл бұрын
@@pedrobotsaris2036not if you change the scale before performing an ifft
@非常大的圆白菜8 ай бұрын
@@Walsh2571 Why? If you change the scale before performing IFFT, you just get back to the time domain with a different scale, right?
@tutatis967 ай бұрын
@@非常大的圆白菜i think that the point is that we got rid of the phase with the log, but im not sure
@vaitom60782 жыл бұрын
you're a genius of vulgarization, thank you for the effort
@Jamazon6 ай бұрын
your channel is a gold mine, thank you so much for what you do!
@Goriuable2 жыл бұрын
Thank you so much. I searched alot about the Topic of MFCC and I did not found very good explanations. Your Video is really a masterpiece and I have now a good knowledge about the concepts :) For sure I will have a look at some other Videos from you. Keep Up the amazing Work!
@ValerioVelardoTheSoundofAI2 жыл бұрын
Thank you - glad I could help!
@chacmool25813 жыл бұрын
What I don't understand is why one takes an inverse FT instead of a FT to get to the quefrency domain. If it's indeed a spectrum of a spectrum shouldn't one take a FT of a FT?
@satyajeetprabhu Жыл бұрын
Same thought
@booky614911 ай бұрын
We are taking inverse Fourier transform to represent the log spectrum in the same way as the human ear hear (i.e Frequency domain to Quefrency domain). FT only takes the Time domain signal as input. FT of FT violates the rules.
@DiogoSanti3 жыл бұрын
But is it ok to call the inverse Fourier a Spectrum? I tell that because the inverse Fourier brings back the Frequency Domain to Time Domain, and in my head, spectrum is represented by slices of frequency domain, or am i missing the point?
@jdavibedoya4 ай бұрын
I'm not an expert, but I believe the conventional way to calculate the cepstrum uses the IDFT because of its scaling factor. Both the DFT and IDFT are quite similar and indeed produce results with the same shape.
@jeevanreji72902 жыл бұрын
I absolutely love the way you explain these concepts! Thank you !
@LauraSpinu3 жыл бұрын
This was so helpful, can't thank you enough for your time and effort. Simply amazing - and your enthusiasm makes it so easy to watch and enjoy through the end!
@zhenxinghu48893 жыл бұрын
My question is why not apply DFT rather than IDFT again on Log(F(x(t))
@sasankkottapalli68228 ай бұрын
Same question here
@tutatis967 ай бұрын
@@sasankkottapalli6822 i think it works because we're not considering the phase after the log
@yuefenggao74832 ай бұрын
@@sasankkottapalli6822 Both IDFT and DFT are basically equivalent here because they have the same distribution. kzbin.info/www/bejne/iXvSaKmGnLefeLMsi=I664FcrQVgml_Amf&t=77
@zeyuyang20534 жыл бұрын
Best MFCC explanation I‘ve seen ever!Thank you!
@ValerioVelardoTheSoundofAI4 жыл бұрын
Thank you!
@williameden567524 күн бұрын
this is so well done!! I finally think i understand MFCCs :)
@4abdoulaye4 жыл бұрын
VERYVERYVERY CLEAR, Best video I've ever seen.
@ValerioVelardoTheSoundofAI4 жыл бұрын
Thanks!
@AlBeebe4 жыл бұрын
Excellent video. 42:08 ended up making me wonder what happened to the slack message i thought i got. :)
@JigarRajpopatOfficial4 жыл бұрын
Very informative. Thank you!
@ValerioVelardoTheSoundofAI4 жыл бұрын
Thank you Jigar :)
@roninnash67823 жыл бұрын
Sorry to be so off topic but does someone know a tool to get back into an instagram account..? I somehow lost the account password. I love any assistance you can offer me
@adambryant94873 жыл бұрын
@Ronin Nash instablaster :)
@roninnash67823 жыл бұрын
@Adam Bryant i really appreciate your reply. I got to the site through google and im in the hacking process now. Looks like it's gonna take quite some time so I will reply here later with my results.
@roninnash67823 жыл бұрын
@Adam Bryant it worked and I finally got access to my account again. Im so happy! Thanks so much, you saved my ass !
@beincheekym84 жыл бұрын
awesome course, so complete, and very clear visualization. really amazing. thank you!
@ValerioVelardoTheSoundofAI4 жыл бұрын
Thanks!
@subrahmanyamkunapuli18604 жыл бұрын
👏Excellent way to explain intricate details!! Thanks for the video series.
@ValerioVelardoTheSoundofAI4 жыл бұрын
Thank you!
@akanshmaurya15684 жыл бұрын
I am confused about why it is a spectrum of a spectrum, when we take Fourier transform, we go from time to spectrum, so according to last step while calculating cepstrum, should we not call as inverse of spectrum?
@Erosis4 жыл бұрын
Yeah, the inverse is kinda confusing me. I thought we'd use another Fourier Transform to get quefrequency, not the inverse (which puts it back into time domain). I read a post about this ( dsp.stackexchange.com/questions/5940/mfcc-process-confusion ) where they say that both are going to produce relatively the same thing, so it doesn't matter in the end.
@bijan87052 жыл бұрын
He clearly don't know that inverse FT is not the same as FT at 14:00
@RudraSingh-pb5ls2 жыл бұрын
@@bijan8705 who doesn't know, Valerio or Akansh, the guy who asked this question here ?
@非常大的圆白菜8 ай бұрын
@@Erosis Thank you for this information. I read the post but I'm still confused... why are both going to produce the same thing? One is the inverse of the other
@katsiarynaruksha9381 Жыл бұрын
Extremely useful series of lectures. Thanks a ton!
@IngridKnoch3 жыл бұрын
That was so clearly explained!! Thank you for this, Valerio
@avidreader1003 жыл бұрын
I am still stalled at this video. I feel the founders of the concept have confused us by naming these unique parameters the way they did. Quefrency as a metric with a measure of seconds was quite a big factor confusing me. I am gradually coming to terms with it. Let me share my thoughts so that others can correct me if I am off. In the Fourier transform that gave us the spectrum, we say we convert a signal from the time domain to frequency domain. We look at the time domain signal as an additive value of multiple uniform/ steadier frequency components (all taken within a short time frame). The amplitude in vertical axis is expressed in different units (dB etc), but is conceptually the same - magnitude. The Fourier transform inverted the x axis. From time it went to inverse of time, which is frequency. The cepstrum is basically looking at the up and down shifts of the spectrum as we scan along with respect to frequency. These are the formants in speech. The amplitude is again not tampered with beyond expressing as log etc. The x axis is not flipped once again from cycles per unit time to time. In both spectrum and cepstrum we did flipping of x axis. First time around it analyzed the signal and have all the frequency components. In the second time it gave all the formats. The amplitude of the spike in the cepstrum gave us the significant components, and the quefrency or time value at which the spikes occurred, when inverted gives us the formant frequency corresponding to this spectrum. Does this sound right?
@Waffano2 жыл бұрын
The IDFT part is a typo if you ask me. For me it only makes sense that the cepstrum is a spectrum of a spectrum, meaning DFT applied to a spectrum. This is the only way we can collect the frequencies of the formants. If it was IDFT it would just result in a complex waveform with no information of frequencies. In the end Valerio also specifically uses discrete cosine transform and NOT inverse discrete cosine transform, to get the final MFCCs, which makes sense. So I strongly believe the IDFT in the beginning is just a mistake and should be DFT.
@thierrydesot11644 жыл бұрын
Thanks a lot for this brilliant explanation. I have read several papers to grasp the concept of mfcc, mel scaling, delta derivates etc. But after watching this youtube tutorial it is the first time I have the feeling I 'got' it. So I am on my way to watch your other tutorials.
@ritwickjha39542 жыл бұрын
maybe you should be a bit clear, taking IFFT of frequency domain will give us time domain. Quefrency is in the time domain. I was a bit confused because you kept saying IFFT will give something like a frequency domain. Also i am not sure if taking log of signal in time domain is correct, since it is convolution of E and H, log should be in frequency domain where it is multiplication of E and H. please correct me if i am wrong. great video
@Jononor4 жыл бұрын
In "Computing Mel-Frequency Cepstral Coefficients" (approx time 38:00) you put Waveform->DFT->Log-Amp->Mel-filterbank->DCT. Is it not more conventional to apply the Mel filterbank to linear magnitude spectrogram, and then do the log transform? But maybe the order is not so important between those two steps?
@ValerioVelardoTheSoundofAI4 жыл бұрын
It's really a matter of "preference". Both approaches work.
@6tyelement9794 жыл бұрын
4:32 When u cannot answer a question u got asked in front of whole class btw great vid
@ValerioVelardoTheSoundofAI4 жыл бұрын
LOL
@ashokdhingra43 жыл бұрын
Hi, Fourier transform of a time domain signal is a series of terms, and not a single number. What then is the meaning of Log of the Fourier transform? Or is it Log of each term in the Fourier transform? Further, when we take inverse Fourier transform, we should go back in time domain. So it is not really 'spectrum of a spectrum'.
@AsEnIxX-wtf Жыл бұрын
Excellent presentation & explanation
@keem.studio4 жыл бұрын
this video just saved my engineering final project
@ValerioVelardoTheSoundofAI4 жыл бұрын
Nice :)
@AlexTuduran2 жыл бұрын
Why not just call it a *meta-spectrum*, which is literally a spectrum of a spectrum? Also, this is one of the best explanations I came across. Well done.
@ValerioVelardoTheSoundofAI2 жыл бұрын
Meta-spectrum sounds really cool!
@dver892 ай бұрын
Incredible video. Thank you!!
@Jononor2 жыл бұрын
You should put (MFCC) in the title, I think. It should help people discover the video. Not everyone knows what the abbreviation stands for :)
@user-yo4kd7zy9j2 жыл бұрын
So well explaniert! Thanks alot for your amazing work.
@harutyunyansaten3 жыл бұрын
I want to learn deeper can you please provide references where you tookthis info?
@MrOpossumx33 жыл бұрын
Another great vid! I would have appreciated a bit more intuition over the meaning of the MFCC coeffs / time matrix presented around 48:37. If a spectrogram is intuitive, if found a MFCCs coefs over time matrix to be harder to interpret. Do you have some intuition of MFCCs coefs over time from a psycho-acoustical perspective? In a Spectrogam, the intensity of a given frequency at a frame nicely link to the perception we have of a sound high or low pitch. What would a perceptual equivalent for MFCCs coefs over time?
@Waffano2 жыл бұрын
Say we wanted to identify a specific individual from their speech, to use for unlocking a device with speech for example. In this case spectral detail would be more important than spectral envelope right? Because spectral detail tells us something about the unique pitch of the speakers voice? In contrast, in ASR, where we only care about the words that are spoken, and not by whom they come from, it makes sense to use spectral envelope?
@dataista77172 жыл бұрын
Thanks for the series, man. You accelerated my speed jumping into this field a lot. Like A LOT. Really, u rock 🙌
@advaithpillai2 жыл бұрын
Mate you are a life-saver!
@amitrege5023 жыл бұрын
Around 50:53 you tell that MFCC ignores fine spectral structures like "pitch" which we don't care about, generally. Then you also say that MFCC works well in speech and music. I think in music, pitch is the most relevant information, people are interested in, because the musical notes themselves are defined around pitch frequencies. I think there is a contradiction in the statements. Will you please clarify. Thank you for such a nice video.
@ValerioVelardoTheSoundofAI3 жыл бұрын
It's not a contradiction. For tasks like music genre, mood, instrument classification timbreis more important than pitch.
@amitrege5023 жыл бұрын
@@ValerioVelardoTheSoundofAI Thank you
@RahulSharma25012 жыл бұрын
This is absolutely amazing.
@annazaitseva62133 жыл бұрын
If cepstrum is a spectrum of a spectrum why inverse Fourier transform is applied to a log spectrum of a signal not forward?
@pratyushsaha84823 жыл бұрын
Very well explained. You are awesome man !
@seanperman20002 ай бұрын
why are we getting a Cepstagram when doing the mfcc? are we using the Time-fequency domain? I'm a little confused going from the Cepstrums to the mfcc part of the video
@shereenelmetwally5223 жыл бұрын
Thank you very much. It was really wonderful!
@shaidhasan68954 жыл бұрын
Thanks a lot. Was waiting for this.
@ValerioVelardoTheSoundofAI4 жыл бұрын
Glad you liked the video!
@lingarajmishra8981 Жыл бұрын
After application of Fourier transformation how did the vocal tract response and glottal pulse still was in the time domain....plz explain
@ChirsXu-n9o10 ай бұрын
I am wondering that the 1st rhamonic is representing the envelope(formants) or the glottal pulse in the latter of this video? I am a little bit confusing here at 16:12
@AlexTuduran2 жыл бұрын
I watched the suggested video for how to compute the envelope, but I find it unfit for this problem or I'm missing something. Basically, to compute the envelope, you take the max of a frame. This works well in general with audio, but in constructing the envelope of a spectrum, the data is rather short / scarce (ex. FFT 1024 => 512 points) and breaking it down in frames increases the chances of computing a rather "false" envelope. How do you manage to avoid the local minima and account only for the actual peaks? And since we're talking about speech, we'll have a lot of local minima. Applying a low-pass filter kind of does it, but it obviously has the disadvantage of potentially shave off important peaks. Sow how to do it properly?
@Waffano2 жыл бұрын
@37:44 You mention that we get a mel spectrum. However most of the ressources I found don't mention any mel spectrum at that step but instead they mention a 1D mel vector with length = M, where M is the number of mel bands and m is the band number. The m'th element of the mel vector then contains the sum of the products between the m'th mel filter bank and the power spectrum. Is this mel vector the same as a mel spectrum? And whats the pros and cons of using either, if they are different?
@yatosaurio2 жыл бұрын
fantastic explanation, very didactic, thank you very much
@rakhshandamujib2793 Жыл бұрын
Absolutely loved it!
@ruanjiayang Жыл бұрын
We apply Fourier transformer or inverse Fourier transformer on the Log power spectrum? Completely different things!
@RickNance3 жыл бұрын
Sorry... to start with, I might have gotten confused. When you say the MEL spectral analysis shows _perceptually relevant scale for "pitch"_ you mean frequency, right? If not, I've misunderstood something at the start.
@brandonlincolnsnyder2 жыл бұрын
this video is blowing my mind!
@MrHowdai3 жыл бұрын
I never understand it this clear until watching your videos!! Really appreciated it. :)) After watching this I got 2 little questions, 1. According to Nyquist theorem, when extracting the MFCCs, do we need more Mel filter banks when processing audio signals in higher sampling rates? Cuz I found the MFCCs of an audio sampled at 44.1KHz are NOT the same as the down-sampled one, which is at 16Khz. 2. Is it right to say that MFCCs is volume-independent audio features? Thanks for the great videos again! And I hope there's someone can help with my questions, thanks in advance!!
@sagarparmar67153 жыл бұрын
greatly admire this video. it's quite detailed. thanks a lot
@pohjanakka13 жыл бұрын
Thank you so much. This was clearly explained.
@sharonm1261 Жыл бұрын
could anyone perhaps tell me which is the next video to watch for how to use MFCCs from different speakers to tell the speakers apart....no worries if there's not one, I will also search and google, thank you :)
@TanupatBoon4 жыл бұрын
is DCT just another Fourier transform? Why is it the inverse one?
@ourissueanniversary Жыл бұрын
Hello! May i have a question about MFCCs? You said that MFCCs are not so great for synthesis. So it means that usual mel spectrograms are mostly used in speech synthesis tasks?
@ValerioVelardoTheSoundofAI Жыл бұрын
Correct. (Mel) spectrograms, and, lately, directly raw audio.
@MarkEdwardsGreenside8 ай бұрын
autocorrelation? This feels like achieving autocorreation using fft and ifft. Is there a relationship between cepstrum and autocorrelation? I'm a newbie to this and doing my best to self-learn - would appreciate understanding if this observation is correct!
@kaziasifahmed24434 жыл бұрын
Nice video sir,I have understand lots of things about MFCC.So If i want i to make a speech recognizer With RNN should i do?only feed spectrums or MFCC.I am not That experts at this sector.Just asking.Again Great job by providing valuable informations.
@ValerioVelardoTheSoundofAI4 жыл бұрын
These days we tend to use (Mel) spectrograms more than MFCCs for speech recognition tasks.
@muntazirmehdi5033 жыл бұрын
@@ValerioVelardoTheSoundofAI can we use mel spectrogram with RNN instead of CNN
@parismitasarma15723 жыл бұрын
Amazing, you are explaining the underlying concept in much easier way. Thank you so much Sir.
@virendrawadher80064 жыл бұрын
any resources to know more about MFCC? and resources to know what are each coefficient belongs too like MFCC[1] -> energy, MFCC[2] -> spectral envelope etc
@ValerioVelardoTheSoundofAI4 жыл бұрын
There isn't a direct mapping between each coefficient and a perceptual / acoustic attribute. Unfortunately, I haven't found many comprehensive resourcess on MFCCs.
@alappattjacobantony3 жыл бұрын
At around 30:45 you say we care about the vocal tract frequency response component more than the component carrying F0 information because the former gets us closer to the identity/timbre of the sound source. This makes sense to me with respect to speech processing in non-tonal languages like English/Hindi......would this perspective change when we're trying to look at speech in tonal languages, or am I conflating ideas here?
@ValerioVelardoTheSoundofAI3 жыл бұрын
That's a great intuition! Relative F0 information would actually be important for tonal languages.
@desikharkara94073 жыл бұрын
Very good and great explained thanks 👍
@anupambhattarai87654 жыл бұрын
Great explaination.👍
@stefanhopman91763 жыл бұрын
Great Video! Thank you.
@praburocking27772 жыл бұрын
hi, I thought we are using decibel (log scale) for amplitude because pow(10,-12) and pow(10,1) watts/meter^2 is a huge range, so to reduce that we use decibel log scale. is that true ? also the amplitude is a physical property of sound unlike its counter part loundness which is a perceived property of sound. if not for the above stated case, why should we use log scale for measuring amplitude ?
@ValerioVelardoTheSoundofAI2 жыл бұрын
You're spot on. We use log scale because the amplitude range is enormous, and because, just like pitch, we perceive amplitude (I should say loudness) logarithmically.
@ahmadkhadra9923 жыл бұрын
Hi Valerio, I have a little question, when we apply DFT on the signal, why we got a power spectrum ? Why not just a spectrum ?
@praburocking27772 жыл бұрын
39:40 why we are taking log of the amplitude ? I think we perceive only frequency in logarithmic scale, right ? or do we perceive amplitude also in logarithmic scale?
@ValerioVelardoTheSoundofAI2 жыл бұрын
Loudness is also perceived logarithmically.
@praburocking27772 жыл бұрын
@@ValerioVelardoTheSoundofAI do we have any scale than measure the perceived level of amplitude like Mel scale for frequency ?
@ValerioVelardoTheSoundofAI2 жыл бұрын
@@praburocking2777 decibels
@praburocking27772 жыл бұрын
@@ValerioVelardoTheSoundofAI thanks. I am getting started in audio processing. Ur videos are very helpful. Keep up the good work.
@praburocking27772 жыл бұрын
@@ValerioVelardoTheSoundofAI hi, I thought we are using decibel (log scale) for amplitude because pow(10,-12) and pow(10,1) watts/meter^2 is a huge range, so to reduce that we use decibel log scale. is that true ? also the amplitude is a physical property of sound unlike its counter part loundness which is a perceived property of sound. if not for the above stated case, why should we use log scale for measuring amplitude ?
@sharonm12612 жыл бұрын
this is really interesting, great explanation, thanks! now I just have to work out how to relate this to blossom bat squeaks 🤔 (their frequencies are a lot higher)
@amitrege5023 жыл бұрын
This is a good video. However the question is, in the section on 'Formalizing Speech' why are you using the (t) variable in the transform domain also. The domain should be frequency.
@zweiteid3340 Жыл бұрын
Hello, We are currently doing a project on verification using the human voice (speaker recognition). Would mfcc be useful here at all, when it is actually about filtering out phonemes?
@DOMINIK321103 жыл бұрын
Great video as always! Could you recommend books or other sources (it'd be great if it was possible to find them on the Internet) to read more about MFFCs? Especially in context of speech.
@mohamadhanifomarsaifuddin45783 жыл бұрын
Good Explanation 5star
@fabricejumel46302 жыл бұрын
Thanks a lot . Just perfect
@amoghshekharhiremath66273 жыл бұрын
Very Astounding!!!!!!!!!!!!!!!!
@parasharparikh93523 жыл бұрын
Can I use MFCCs for extracting features from the current signal?
@kxiong40213 жыл бұрын
Thank you for sharing this amazing content. Very informative and specific. Came for copper and found gold!
@bigpenguin84572 жыл бұрын
Thank you for the video, i wanted to ask if you have any documents or codes related to extracting "spectral detail" or the entire procedure that you described in the video (spectrum-->log amplitude spectrum-->spectral envelope-->spectral detail) i have applied amplitude envelope on log power spectrum which is a spectral envelope by theory but it gives me lesser values so i cannot do element wise subtraction with log power spectrum to get spectral detail, please suggest me if i am wrong somewhere. Thank you.
@kaziasifahmed24433 жыл бұрын
what does the color of visualising MFCC means? Does it means any intensity of a certain co_effecient value in each frames.
@ValerioVelardoTheSoundofAI3 жыл бұрын
Yes, it's the "intensity", better said value, of a coefficient.
@harisbournas66003 жыл бұрын
Great explanation
@evrenbingol77853 жыл бұрын
We take log amplitude and we also do another log in Mel scaling. Does not that add up to log*2 eventually ?
@avidreader1003 жыл бұрын
The Mel scale conversion is not a second mathematical log of the measure magnitude, but a different scale based on equal perception which has a logarithmic relationship. Do not think of Mel as a magnitude scale of sound pressure, but think of it as a scale of sound perception.