Types of Audio Features for Machine Learning

  Рет қаралды 71,008

Valerio Velardo - The Sound of AI

Valerio Velardo - The Sound of AI

Күн бұрын

Пікірлер: 95
@MilanaShkhanukova
@MilanaShkhanukova 3 жыл бұрын
I've completed a course in uni covering nearly the same info, but the structure and logic of your videos make a whole understanding and I don't feel scared to answer questions. Thanks!
@Drew_7
@Drew_7 Жыл бұрын
19:42, this is by far the BEST and easiest rundown I've heard on different types of ML. Without this, I was beginning to lose faith in my ability to figure out how to summarize it all. Ty my friend! TY TY TY!
@cs306labevaluation3
@cs306labevaluation3 3 жыл бұрын
The series is a such a gem! Very well articulated and is really helping me with my literature review. Thanks so much! Keep posting.
@suyashramteke3588
@suyashramteke3588 4 жыл бұрын
I am able to understand the principles in a way I've never before. Thank you!!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Glad I could help!
@pk-bb8cq
@pk-bb8cq 3 жыл бұрын
@@ValerioVelardoTheSoundofAI ur level of explaination and ur content is absolutely amazing.......im working on music genre classification project nd u r my savior
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
@@pk-bb8cq thank you :)
@abhishek-shrm
@abhishek-shrm 4 жыл бұрын
The only resource on KZbin. Great Work! Keep making videos.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you!
@thatchessguy7072
@thatchessguy7072 3 жыл бұрын
This is a godsend, I’m using this to learn background for a senior project.
@rahuldeora5815
@rahuldeora5815 3 жыл бұрын
At 14:11: The fourier transform shows a amplitude at around 128 Hz but it does not show up in the spectrogram. Why is this when others are visible? The highest amplitude is said to be 256 in the video which does appear.
@zamundacrypto
@zamundacrypto 10 ай бұрын
This series is FIRE 🔥 🔥 I have been looking for a solid course that connects the dots between Audio Signal Processing & Machine Learning/AI. This is it! THANKS 💪🙏🙏
@xyc6090
@xyc6090 3 жыл бұрын
Deeply thanks! Better than any of the other series courses offered on Coursera.
@ridamsrivastava9502
@ridamsrivastava9502 4 жыл бұрын
Highly comprehensive and informative series! I had a question : How can we directly input Low Level Descriptors to Keras models?
@wesleymorris1573
@wesleymorris1573 2 жыл бұрын
"Nobody uses MFCCs for machine learning anymore"... Enter HuBERT!!!! All seriousness, your videos are amazing and perfect, never stop!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Thanks!
@Lhtokbgkmvfknv
@Lhtokbgkmvfknv 4 жыл бұрын
Thank you very much for these amazing series. I'm going to check all the content on the channel. ✌
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you!
@albertgabrielmatei138
@albertgabrielmatei138 9 ай бұрын
So interesting ,i'm doing a Robotic Inteligence degree and i have never programed audio but you explain so well and you make me try it.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 9 ай бұрын
Amazing!
@drewpriebe5372
@drewpriebe5372 2 жыл бұрын
Amazing content man! Much appreciated! You explained concepts in a logical fashion that made it easy to learn and understand the concepts. Subscribed and looking forward to any future videos!
@tkorting
@tkorting 3 жыл бұрын
Dear Valerio, thanks for the good material presented in your channel. I would like to find occurrences of a pattern (0,5 seconds) in a long audio (e.g. 15 minutes). In your channel you provide a source code to extract audio features, using for example Mel-frequency cepstral coefficients (MFCC). So, I computed MFCC from my pattern, and from small parts of my long audio, comparing both using DTW. Do you think this is a good method to find the pattern occurrences? Thanks in advance. Regards
@ragavans85
@ragavans85 Жыл бұрын
Thanks for the extremely informative course. What would be the audio features that when extracted would help in comparing two recitations. This is my use case. In India there is a tradition of memorizing texts. A teacher recites them and students recite back 2-3 times. The teacher corrects the pronunciation if there are any mistakes. And the students memorize as the process gets repeated. I am involved in developing an app that could potentially replace the teacher. For that the app would play a record of the teacher's recital and wait for the learner to recite. The app would then compare the recital of the learner with the recorded recital. What would be the audio features that would help in identifying matches and mismatches.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI Жыл бұрын
Difficult to say without trying out. But often Mel Spectrograms tend to work best ;)
@dudusash
@dudusash 4 жыл бұрын
Great video as usual - will u be covering compression techniques before feeding to dense NN's ? any good books i can refer to along with video
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
I still haven't decided. As for a reference book, I suggest you check out Fundamentals of Music Processing.
@tentyluaysari3393
@tentyluaysari3393 3 жыл бұрын
this series is really well explain about the types of audio features and even giving the reference to it. i hope you will start another exciting series soon! :)
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
I will! Stay tuned :)
@lakshaykhanna9811
@lakshaykhanna9811 2 жыл бұрын
@14:30 Shouldnt the y-axis be time axis and x axis be frequency? and the brightness indicating the amplitude? Because if what you said is true, then that would mean if we move through time axis, then at one particular instance I would have multiple frequency which is certainly not possible. It would be great if you could clarify this doubt.
@zeldisuryady1541
@zeldisuryady1541 4 жыл бұрын
Informative and impressive video, thanks a lot Valerio
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thanks!
@sophalchan775
@sophalchan775 3 жыл бұрын
Great VDO Thanks so much for sharing. I wonder why some wav file have blue background and some have black when I transformed to spectrogram?
@anshkadiyan5170
@anshkadiyan5170 3 жыл бұрын
Sooo happy to find this series
@dapr98
@dapr98 Жыл бұрын
Thanks Valerio, this is brilliant. Could I actually create my own playlists and have a model identify a pattern in that playlist as if it was classifying music by genre, but instead let's say it's classifying playlist1, playlist2, playlistt3, etc? So then it would add songs automatically to each playlist.....How could I approach this?
@c.mirashi
@c.mirashi 3 ай бұрын
am learning this for our project Speech Emotion Recognition so any help and guidance would be greatly appreciated
@wixor_69
@wixor_69 4 жыл бұрын
Hi will you cover the topic of wavelet transform or Hilbert transform? Very good content btw.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
I haven't planned to cover wavelet transform soon. However, I'll cover it in the future. Stay tuned :)
@subramanyabhattm4626
@subramanyabhattm4626 3 жыл бұрын
If we are using traditional ml and solving a problem of classifying emotions based on audio which are the 3 features to be considered?
@gabriellara7456
@gabriellara7456 2 жыл бұрын
Valerio, could you please provide bibliographic references for the taxonomy presented in this video?
@santhosh20071993
@santhosh20071993 2 жыл бұрын
Excellent Video. Liked all the your channel videos
@rxz8862
@rxz8862 4 жыл бұрын
Hey brother, your videos are so amazing, thank you a lot🙌🙌
@venkatesanr9455
@venkatesanr9455 4 жыл бұрын
Hi Valerio, Thanks for your knowledge sharing. Why it is STFT(Short-time Fourier transform) ? Thanks
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
STFT is a particular transformation that enables us to calculate a spectrogram. You can think of it as a series of spectra calculated sequentially on a small subset of the audio signal. I'll cover this in detail moving forward!
@venkatesanr9455
@venkatesanr9455 4 жыл бұрын
@@ValerioVelardoTheSoundofAI Thanks for your response and the series
@vivekmankar9643
@vivekmankar9643 4 жыл бұрын
Your way of explaining is amazing !!
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you!
@shubham6867
@shubham6867 4 жыл бұрын
Can We detect the Locust voice also from Voice recognition model and if yes how can we?
@parisaahmadzadeh6866
@parisaahmadzadeh6866 11 ай бұрын
Hi, thanks a lot for your great video. Is mfcc in the same category with spectrogram? I mean is it a handcrafted feature?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 11 ай бұрын
I'd say so. However, MFCC is more "handcrafted" than a spectrogram, in that there are more manipulations of the original signal.
@laidbackmedia
@laidbackmedia Жыл бұрын
What is the source of Deep Learnings tuning reference?
@rrrjo4137
@rrrjo4137 2 жыл бұрын
Thank you for the super kind lecture!
@shafagh_projects
@shafagh_projects 3 жыл бұрын
you are amazing. the best tutorial ever made. thanks alot
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
Thank you!
@raffaelrameh14
@raffaelrameh14 Жыл бұрын
Appreciate this content very much! Thanks!
@tyhuffman5447
@tyhuffman5447 4 жыл бұрын
Very good stuff, thanks for making this. Question, how practical would it be to use the entire list of Amp Env, Root-mean Square, Zero crossing,... to initial train a smallish model and slowly reduce the list to get to the list that works best with the data we are looking at? Rather than attempting to guess our way through the data since our instincts of sound are way different than an ML model.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
I would suggest doing a pre-processing analysis, investigating correlation between the different audio features and data samples. This way you can have at a glance an idea of which features are the most promising.
@tyhuffman5447
@tyhuffman5447 4 жыл бұрын
Valerio Velardo - The Sound of AI thank you! Pre-processing analysis is in a lesson coming up. Good to know.
@ektabajaj1683
@ektabajaj1683 3 жыл бұрын
Hello sir. I am doing the research study through machine learning. But can you please clarify why machine learning is still in use if in deep learning, we don't need to define features as you described in video.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
It depends on the use case. Sometimes DL applications are impractical. DL takes a lot of computational resources. Traditional ML techniques require less resources and less specialised talent. If you have a small dataset, again it doesn't make much sense to go with a DL approach.
@ektabajaj1683
@ektabajaj1683 3 жыл бұрын
@@ValerioVelardoTheSoundofAI okay. thanks a lot for your response and time. The series is really great and being helpful in my study. Much appreciated.
@shivshakti_1111_
@shivshakti_1111_ 3 ай бұрын
sir please bring series for wifi signal csi data
@kaziasifahmed2443
@kaziasifahmed2443 4 жыл бұрын
Sir,Which audio feature extraction process are trending to feed in into an RNN model or CNN_lstm model
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
In DNN, we generally use (Mel) Spectrograms. A lot more on this in coming videos!
@kaziasifahmed2443
@kaziasifahmed2443 4 жыл бұрын
thanks for helping us by clearing concepts of sound processing
@6tyelement979
@6tyelement979 4 жыл бұрын
Hi man thank you you should do a course on audio classification it would be awesome
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you! I have a series called "DL for Audio with Python", where I tackle an audio/music classification problem.
@vidyagopal3431
@vidyagopal3431 2 жыл бұрын
well organized
@priataoshru3900
@priataoshru3900 3 жыл бұрын
Hello sir, would it be possible for you to tell me what features are necessary for specific voice recognition part ? it would consist classifying age, gender and also individuals. I am doing my ML project on this and I am very very confused. It would be a great help if you tell me. Thank you.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
You could use both MFCCs or (Mel) Spectrograms for those tasks.
@priataoshru3900
@priataoshru3900 3 жыл бұрын
@@ValerioVelardoTheSoundofAI thank you so much. really appreciate it.
@priataoshru3900
@priataoshru3900 3 жыл бұрын
@@ValerioVelardoTheSoundofAI also do you have any audio dataset cleaning video ?
@markusbuchholz3518
@markusbuchholz3518 4 жыл бұрын
Hello Valerio, As always impressive work and effort! Not sure about your schedule but it will be very interesting (may be in the future) to see your approach in removing the noise from signal (using deep learning). Normally such approach (removing noise) should work RT, however I am not convinced about if it is feasible to run such application int RT. Thanks and have a good day.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
Thank you! Denoising is definitely an application that I'd like to cover in the future. Not sure about having it RT though...
@akshaykumar-yx9ic
@akshaykumar-yx9ic 2 жыл бұрын
Great video 🙏
@ektabajaj1683
@ektabajaj1683 3 жыл бұрын
Sir , I am doing the project of higher studies in Alzheimer detection. So I do need datasets of speech of patients. There are some organizations which provide datasets like dementia bank. But they require authentication procedure. Can you please help me find datasets...even if for small size.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
I'm sorry Ekta, but I'm not familiar with these types of datasets. Have you tried Kaggle? Other option, you could ask this question in The Sound of AI Slack group. Somebody there may know about this... PS: Please call me Valerio ;)
@ektabajaj1683
@ektabajaj1683 3 жыл бұрын
@@ValerioVelardoTheSoundofAI Kaggle have many datasets easily available but sadly it doesn't have the speech dataset of Alzheimer's. I will ask on slack group surely. Thank you.
@subhamkundu5043
@subhamkundu5043 4 жыл бұрын
Great video. Excellent explanation. Can you make one video about some projects which we can make.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 4 жыл бұрын
That's a nice idea! I'll add this topic to the backlog of my "Tips and Tricks" videos.
@subhamkundu5043
@subhamkundu5043 4 жыл бұрын
@@ValerioVelardoTheSoundofAI hope to see the video very soon.
@marioandresheviacavieres1923
@marioandresheviacavieres1923 10 ай бұрын
Oro Puro!
@shafagh_projects
@shafagh_projects 3 жыл бұрын
I have a question: how can we convert a time-dependent signal to an audio file in python?
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
You can use librosa.
@shafagh_projects
@shafagh_projects 3 жыл бұрын
@@ValerioVelardoTheSoundofAI I have found librosa.org/doc/0.7.1/generated/librosa.output.write_wav.html in which they have eliminated librosa.output.write_wav function in version 8 i have attached the file in slack for your further consideration and would be appreciated if could give me some hints.
@juniorsilva5713
@juniorsilva5713 8 ай бұрын
Thanks a lot!!!
@mohammadjadidi233
@mohammadjadidi233 17 күн бұрын
perfect 👌 👌
@manojrana009
@manojrana009 3 жыл бұрын
A super thanks to you,🙏
@efeozkaya1372
@efeozkaya1372 3 жыл бұрын
Great content! Hope to collaborate with you at some point on a project.
@sushruthbhat5727
@sushruthbhat5727 3 жыл бұрын
Suppose I am given a task regarding voice identification. There is a database that contains audio files (voices) of all my customers. When a person calls my company for any reason, I must authenticate that this person, based on the audio files (voices), is the same person calling. If anyone could direct me to solve this problem I’d really appreciate it.
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 3 жыл бұрын
Search for "speaker verification".
@sarabhian2270
@sarabhian2270 2 жыл бұрын
the logo of channel tells that this guy is huge fan of Neural style transfer learning 😂🤣
@ValerioVelardoTheSoundofAI
@ValerioVelardoTheSoundofAI 2 жыл бұрын
Indeed I am!
@manojnoochila
@manojnoochila 8 ай бұрын
Can u do a project regarding deepfake audio detection
@wudaqin4310
@wudaqin4310 4 жыл бұрын
every time speaker says "focus" I misunderstand it as a ''f-word'...
@niyanderniyago7577
@niyanderniyago7577 3 жыл бұрын
500th like❤
How to Extract Audio Features
22:19
Valerio Velardo - The Sound of AI
Рет қаралды 95 М.
Understanding Audio Signals for Machine Learning
25:16
Valerio Velardo - The Sound of AI
Рет қаралды 60 М.
What type of pedestrian are you?😄 #tiktok #elsarca
00:28
Elsa Arca
Рет қаралды 37 МЛН
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 30 МЛН
Миллионер | 3 - серия
36:09
Million Show
Рет қаралды 2,2 МЛН
Чистка воды совком от денег
00:32
FD Vasya
Рет қаралды 4,1 МЛН
Residual Vector Quantization for Audio and Speech Embeddings
13:53
Efficient NLP
Рет қаралды 3,6 М.
Intensity, Loudness, and Timbre
37:14
Valerio Velardo - The Sound of AI
Рет қаралды 60 М.
Mel Spectrograms Explained Easily
30:31
Valerio Velardo - The Sound of AI
Рет қаралды 101 М.
Necessity of complex numbers
7:39
MIT OpenCourseWare
Рет қаралды 2,8 МЛН
How I’d learn ML in 2024 (if I could start over)
7:05
Boris Meinardus
Рет қаралды 1,2 МЛН
Audio Data Processing in Python
19:52
Rob Mulla
Рет қаралды 170 М.
How to Extract Root-Mean Square Energy and Zero-Crossing Rate from Audio
32:16
Valerio Velardo - The Sound of AI
Рет қаралды 36 М.
3 - Audio Feature Extraction using Python
13:58
Prabhjot Gosal
Рет қаралды 44 М.
1A - Signal Processing basics: SIGNAL SAMPLING (Theory)
14:35
Prabhjot Gosal
Рет қаралды 10 М.
AI vs Machine Learning
5:49
IBM Technology
Рет қаралды 1,2 МЛН
What type of pedestrian are you?😄 #tiktok #elsarca
00:28
Elsa Arca
Рет қаралды 37 МЛН