3 - Audio Feature Extraction using Python

  Рет қаралды 44,005

Prabhjot Gosal

Prabhjot Gosal

Күн бұрын

Пікірлер: 37
@fuchunhsieh7396
@fuchunhsieh7396 Жыл бұрын
this is a really helpful video for someone who just starts trying to do signal processing and classifying, thank you for your effort and it really helps me understand better on spectrogram and signal processing!
@kashan-hussain3948
@kashan-hussain3948 2 жыл бұрын
Thank you for sharing your effort.
@soumyadrip
@soumyadrip 2 жыл бұрын
Thank you for the video. Very excited for these video series. One of your videos, yolo to coco conversation was very helpful.
@prabhjotgosal2489
@prabhjotgosal2489 2 жыл бұрын
I am glad you find the content helpful. Thanks for the feedback!
@lindaflow5434
@lindaflow5434 8 ай бұрын
this was really helpful! thank you very very much!!
@PlabonTalukder
@PlabonTalukder Жыл бұрын
Thanks for the video!
@Dr.AhmedQusaySabri
@Dr.AhmedQusaySabri Жыл бұрын
Thanks a lot
@sawinderkaur5057
@sawinderkaur5057 2 жыл бұрын
Excellent Keep it up beta
@RKYT0
@RKYT0 9 ай бұрын
if you want to increase the resolution on the x axis you can increase the sr. But how do you increase the resolution of the frequency on the y axis?
@demkut
@demkut 11 ай бұрын
I want to analyze a frequency signal with a fairly large bandwidth. Will this method suit my task?
@KhanhNguyen-dn8bm
@KhanhNguyen-dn8bm Жыл бұрын
Hi Prabhjot , can you make a video about LPC algorithm in Feature Extraction please?
@prabhjotgosal2489
@prabhjotgosal2489 Жыл бұрын
Hello, I sure will. It might take some time though as I wrap up the direct speech to speech translation project.
@jamilamuhammad5717
@jamilamuhammad5717 Жыл бұрын
Hello Prabhjot, this is indeed an amazing work. Thank you for taking your time to share knowledge to the world. Could you please guide me on how to save batches of spectrograms? I have created TensorFlow dataset of audio files and pass them through a data pipeline inline with the kind of decoding in accordance to my work. I want to plot and save each spectrogram from the dataset generated. Thank you in anticipation of your kind response. Cheers!
@devisthesis343
@devisthesis343 2 жыл бұрын
hi. can I have the slide presentation?really nice presentation in 5:47 you mention spectral leakage, what is exactly?
@hydraulicgames2493
@hydraulicgames2493 Жыл бұрын
I think that's the signal that is 'lost' when converting from analog to digital.
@Bekeyurious
@Bekeyurious Жыл бұрын
hello, its really helpful but can you please tell me how should i run the code and where? (ik silly question but im new to it)
@prabhjotgosal2489
@prabhjotgosal2489 Жыл бұрын
Hi - The code can be run either in a jupyter notebook or Google Colab. Colab might be easier if you don't already have python/jupyter notebook installed. To run the code in the Colab, just download the code file from my github and drop it in your google drive. Then adjust the file paths, etc. I hope this helps!
@helios4.257
@helios4.257 11 ай бұрын
Hello, we are working on an assignment related to gender recognition from voice. However, we want to extract values such as "mean frequency, standaty deviation ,spectral flattnes" from a person's voice using the data you use. How can we achieve this?
@JeffT-n9u
@JeffT-n9u Жыл бұрын
is it possible to share a link to the 'h_1.wav' file used in your youtube demo please 🙂
@prabhjotgosal2489
@prabhjotgosal2489 Жыл бұрын
I have uploaded the sample file on my github page: github.com/PrabhjotKaurGosal/Helpful-scripts-for-MachineLearning
@drjfilix3192
@drjfilix3192 2 жыл бұрын
Hi Prabhjot Gosal, thank you for your hot video which turned out to be very interesting! a practical case: if I have to change the bpm of a song to make them constant for its entire duration (avoid drifting tempo) How tight is my library?
@prabhjotgosal2489
@prabhjotgosal2489 2 жыл бұрын
Hi, Thanks for the comment! I am not well versed with processing music audios and cannot help at this time unfortunately.
@Techieadi
@Techieadi Жыл бұрын
Just shrink the song. A 100 bpm shrink by 25% will result in 75 bpm
@durgaganesh423
@durgaganesh423 2 жыл бұрын
Hi Very interesting I have a query How can we find audio abnormalities like Missing samples specific duration And glitches in between audio file Could help me Thank you adavance
@prabhjotgosal2489
@prabhjotgosal2489 2 жыл бұрын
Hi, that is an interesting question... If by missing samples, if you mean the audio is silent for a specific duration, then that is easy to detect. We can simply check for where the amplitude is 0. For the glitches, it will be a little trickier in my opinion as we would need some way of telling whether a spike/abnormality in the audio is part of the audio or is it truly a glitch. If you have a footprint of what a glitch looks like, then you could check if your audio has a similar footprint.
@durgaganesh423
@durgaganesh423 2 жыл бұрын
@@prabhjotgosal2489 Hi thanks for reply, Glitches like inbetween spikes or suddenly reduce time interval
@simhan2895
@simhan2895 2 жыл бұрын
Hi, Good Day. Can we use this audio feature extraction to compare two voice of a same speaker in terms of authentication? Will saving the log mel output as logMel.out and compare the same speaker voice in different time as logMel2.out and compare both these output to authenticate ? Is that possible and result in a good way for this use case? Regards, Simhan
@prabhjotgosal2489
@prabhjotgosal2489 2 жыл бұрын
Hi, It is certainly possible to compare the two log mels and they may yield what you are looking for. However, MFCC features are better features for speech in general, compared to the log mel spectrograms. Thankks!
@simhan2895
@simhan2895 2 жыл бұрын
@@prabhjotgosal2489 Thanks for your response. Do you have any references for MFCC feature extraction ? It would be of great help for my research.
@prabhjotgosal2489
@prabhjotgosal2489 2 жыл бұрын
@@simhan2895 , this video maybe helpful: kzbin.info/www/bejne/jXusXmRtgrOjmqc
@juicetin942
@juicetin942 Жыл бұрын
Hi Prabhjot, I want to compare audio of person A with audio of person B and get a match percentage. Can you guide how to achieve this? Just pointing me in the right direction will be great help.
@prabhjotgosal2489
@prabhjotgosal2489 Жыл бұрын
Hi, you may want to check out pypi.org/project/speaker-verification/ I have not tested it myself but this could be a good start.
@juicetin942
@juicetin942 Жыл бұрын
@@prabhjotgosal2489 I went through this, sorry my question was wrong. I don't want to verify the speaker, I want keep person A as reference and then match pronounciation of Person B with A. Then get a percentage match score.
@prabhjotgosal2489
@prabhjotgosal2489 Жыл бұрын
@@juicetin942 Very interesting problem! I can think of few things which may or may not work depending on the dataset. I would start with the basics: 1. Use cross correlation to compare two audios (en.wikipedia.org/wiki/Cross-correlation). 2. Compare the spectrograms of the two audios (first convert each of the audio waveform to their respective spectrogram and then find the difference between two spectrograms). The difference spectrogram will indicate how different the two audios are. 3. Formulate this as a template matching problem. The audio (waveform or its spectrogram) of person A can be considered as a template. Then, the goal is to find how much the audio of person B matches with this template. Look for research papers in CV space for template matching. Please note this problem becomes harder if the two audios are drastically different because then we need to deal with many more variables besides the pronunciation differences. Such as different recording environments (which means we have to deal with the different background noise), different audio lengths (if the content in the audios is the same but it is said at different timestamps in the audios), different speaker voices, etc.
@juicetin942
@juicetin942 Жыл бұрын
@@prabhjotgosal2489 Thank you
@kenturkey1971
@kenturkey1971 8 ай бұрын
What a waste! You showed literally nothing! Well, you showed you have no clue what you’re doing. Ugh.
@vialomur__vialomur5682
@vialomur__vialomur5682 Жыл бұрын
Thanks a lot for this!
Yay😃 Let's make a Cute Handbag for me 👜 #diycrafts #shorts
00:33
LearnToon - Learn & Play
Рет қаралды 117 МЛН
Noodles Eating Challenge, So Magical! So Much Fun#Funnyfamily #Partygames #Funny
00:33
Mel Frequency Cepstral Coefficients (MFCC) Explained
5:58
DataMListic
Рет қаралды 36 М.
How to Extract Audio Features
22:19
Valerio Velardo - The Sound of AI
Рет қаралды 95 М.
Seeing Voices: 1 - Intro to Spectrograms
8:34
Jay Alammar
Рет қаралды 11 М.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 340 М.
Audio Data Processing in Python
19:52
Rob Mulla
Рет қаралды 169 М.
Anna Wszeborowska - Processing music on the fly with Python
24:27
Epic Python Videos
Рет қаралды 68 М.
Speech features intro 3: Mel-scale spectrogram
22:05
Herman Kamper
Рет қаралды 10 М.