Hi Jon. Great presentation. I am absolutely new to machine learning and found your talk really clear and useful. Thanks for sharing.
@Captura228 ай бұрын
Hi Jon, I am doing a final year undergraduate project on bioacoustics, I am new to signal processing as well as your channel! I was just wondering - do you have a paper covering some of the stuff you've talked about, which I could reference?
@Jononor8 ай бұрын
Hi! Yes, this work is mostly in my master thesis. If you search Google Scholar for "Environmental sound classification on microcontrollers using Convolutional Neural Networks" you should find it. I would give you a link, but KZbin tends to shadowblock messages with links...
@michaelwirtzfeld78474 жыл бұрын
Thank you. A very good presentation. Is Keras model code you showed (i.e. "block_1", "block_2", etc.) on a couple of your slides available in one of your GitHub repositories?
@Jononor4 жыл бұрын
Thank you Michael. Yes, all the Keras I tested in my thesis are in the following repo/folder. The one in question is probably in "strided.py" or "sbcnn.py" github.com/jonnor/ESC-CNN-microcontroller/tree/0d3a1231831d3ee61c22a4f8b461a7511fae3de7/microesc/models
@GadisaGemechu-j2u8 ай бұрын
perfect bro. can you exchange an idea how to prepare dataset ?
@jayshaligram44744 жыл бұрын
Hi... great work! Thank you for uploading this video. If you had the exact frequency vs time data for a particular sample in text or csv format, How to use it to improve accuracy of a cnn? Can image data be correlated to corresponding frequency data to get more accurate predictions?
@jayshaligram44744 жыл бұрын
Also.. is data augmentation (time shift, pitch shift etc,) manual or is there any automated process for achieving this?
@Jononor4 жыл бұрын
Hi Jay. The spectrograms contain basically all the time versus frequency data. But if you have some additional information available, there are way to incorporate that. If the data is always available (both training time and prediction time), then you can use it as an additional input to the neural network.
@Jononor4 жыл бұрын
Data augmentation is basically always automated. Either as a pre-processing batch job. Or done on-the-fly while training the neural network. This posts shows the code for common audio augmentations, medium.com/@makcedward/data-augmentation-for-audio-76912b01fdf6
@Woofawoof_wwooaaf Жыл бұрын
Hi can you please explain how can we convert mp3 audio file into. Wav file
@Jononor Жыл бұрын
For a single file use Audacity. For multiple files can use ffmpeg and shell to script it. To do it from Python, use librosa.load and soundfile.write
@sidalibourenane53772 жыл бұрын
Hey Mr Hope you doing good ! Please Can you help me ? How Can we use speech recognition to detect falling in elderly people ? Just another question how to combine audio with image to implement fall detection ?? Thank you
@sigitpriyohartanto21293 жыл бұрын
thanks you, for great presentation. i have question : how to make comparisons between one person's voice and another.
@Jononor3 жыл бұрын
Search for "speaker recognition". I recommend looking into pretrained models based on X-vectors or I-vectors
@sigitpriyohartanto21293 жыл бұрын
@@Jononor ok thanks
@a2sirmotivationdoses7822 жыл бұрын
Respected Sir... My project is to cancel the noise from audio... For this how can i train ML model? And how can i proceed for that plz help me....
@cookingcriss4 жыл бұрын
Thank you so much for sharing the presentation with us! I m new in machine learning and I have some questions. From where could I download or use datasets of audio for my project? Thank you in advance !
@Jononor4 жыл бұрын
A good overview of environmental audio datasets can be found at www.cs.tut.fi/~heittolt/datasets
@weirjwerijrweurhuewhr5884 жыл бұрын
Interesting talk! In the example you showed, lots of the sounds are quite different from each-other, e.g. the children playing, a siren, and a jackhammer. Does it also work for sounds that are very similar? For example different crow calls or different type of chimpanzee sounds?
@Jononor4 жыл бұрын
Hi Ramon. Yes the same basic approach can be used in such a case. Whether good results can be achieved depends on how hard the task is annd how good the data is.
@girishraghunathan22214 жыл бұрын
Interesting Presentation !
@chacmool25813 жыл бұрын
Great stuff. How's the job market for this type of knowledge and skills? I am an old EE just starting a DS masters and I've turned my attention to audio classification.
@Jononor3 жыл бұрын
Hi Chac. For audio, image, video etc type of processing - the kind of companies that before would hire for Digital Signal Processing skills are today hiring for Machine Learning. If you have an EE background with skills around embedded systems, that is a very good compliment for many such companies. At the moment the demand for ML engineers is high - and many are trying to build new ML-based products and functionality - and there is a lack of skilled people. So pretty good I would say - but you need to go for the places that match your skill profile. A masters degree will set you apart from the large number of self-learners, in terms of demonstrated qualifications
@chacmool25813 жыл бұрын
@@Jononor Thank you very much for that. Much appreciated.
@chacmool25813 жыл бұрын
John, hate to bug you again, but I am actually kinda serious about this. My DS program is actually not geared or focused for 'TinyML' so I need to supplement it with other learning. What online program or set of courses would you recommend to get into 'TinyML'?
@Jononor3 жыл бұрын
@@chacmool2581 There is a TinyML book. Have not ready, but probably a good start. The TinyML youtube channel has many good talks, but they are on bleeding edge research - not a pedagogical resource. But apart from the usual embedded/DSP topics, the main part of TinyML is computationally efficient and small models. So focus on understanding how to choose and optimize for such models. For CNNs my master thesis has some pointers on that
@Jononor3 жыл бұрын
@@chacmool2581 Also, do a few practical projects. Get an ESP32 board and build something fun (does not have to be useful)
@sadeghmohammadi55673 жыл бұрын
Thank you very much for your very informative presentation. However, I have a question regarding one of your slides, Specifically on Aggregation analysis windows: Could you please explain further (possibly with an example). For instance windows = 6 is number of segment that you have extracted from you audio signals or it is length of windows (6*sampling_rate)? or bands=32? Moreover, regarding base model, is the model that you presented in slide before (3 layers CNN?) so the logic is that we kind get the audio signals convert them into the sequence of windows and pass them through SB-CNN and propagate it over time and compute the average pooling and will use the output of average pooling to the softmax to conduct the prediction. is this logic is correct? In advance thank you for you considerations.
@idrisseahamadiabdallah76693 жыл бұрын
Hello Jon , you did a great presentation. Thanks for sharing. I am working on my master's thesis, specifically in Lung Sounds classification using CNN. I am using mfcc's features. I am getting about 88% of accuracy. Do you think that melspectogram can give a high accuracy than 88% ?
@Jononor3 жыл бұрын
Hi Idrisse! Thank you. Yes, I think that mel-spectrogram instead of MFCC might give you a slight increase in performance for your usecase, at least it is worth trying out!
@idrisseahamadiabdallah76693 жыл бұрын
@@Jononor thank you
@idrisseahamadiabdallah76693 жыл бұрын
@@Jononor thanks sir, I would like to ask something, please bear me. Step1 : original dataset 177 samples ( 3 classes , each class has 59 audios files). Because of the small size of the data, I did data augmentation. Step 2: After data augmentation, I extracted mfcc's features of the Audio files with its respective labels in order to create a useful dataset. Step 3 : I splitted the new dataset into training, validation and testing sets. Step 4: Feed the CNN with the training and validation sets for the training process. Step 5: evaluated the CNN with the testing set, we are able to reach an accuracy around 90-93%. Is correct ( logic) to test the model with the testing data that l got in step 3? Or I should split the data to training and testing sets before doing the data augmentation.? Doing so l got an accuracy around 40-43. Thanks a lot for replying to me.
@Jononor3 жыл бұрын
@@idrisseahamadiabdallah7669 the testing set should be kept unmodified. Data Augmentation should only be applied to training. It sounds like your data augmentation may have introduced bigger changes than planned. Check the statistics of the data, it should still be very similar between augmented train and original train/test, otherwise you will get trouble
@idrisseahamadiabdallah76693 жыл бұрын
@@Jononor okay I understood, thanks a lot. One other question. Do you think that the 177 wav files , maybe enough to train a CNN model efficiently?
@xXDarQXx3 жыл бұрын
I was quite surprised that for classification you didn't feed the feature embeddings of the windows to an rnn and instead just used a post processing trick. Wouldn't an rnn work better, what about a transformer? Also, I know that mel spectrograms work better than just feeding raw audio, but how better? is it like +5% accuracy or is it game changing? nvm 😅 both of these questions were answered at the end. another question that came to mind though is: what about speech recognition models or something similar, are spectrogram-based models still dominating or is it a different story?
@Jononor3 жыл бұрын
Temporal aggregation using mean or majority voting is simple and works pretty well. It can be done with an RNN, or AutoPool, or an attention function - and it can increase performance a bit
@Jononor3 жыл бұрын
Whether mel-spectrogram or raw audio works best depends on the task and dataset. It is much more challenging, and more data intensive, to make a system that learns from raw audio - but it sometimes performs better once it works. Though combining both tends to work the best. Not always worth the complexity though
@xXDarQXx3 жыл бұрын
@@Jononor jesus, that was quick XD thank you so much for the reply! I really appreciate it. and that was great presentation btw. It was very easy to follow. I hope you have a nice day ma, cheers :D.
@Jononor3 жыл бұрын
@@xXDarQXx Thank you :) Happy learning, have a nice day!
@peterm.40263 жыл бұрын
I'm new to machine learning and I feel like I watched so many audio machine learning videos and the tips & tricks section to the end on this is the most practical and unique stuff I've seen. Thanks! Does the simple audio recognition by tensor flow tutorial still exist? I can't seem to find it? Also, in the audio augmentation slide you talk about adding noise to your data for benefit of the model but in the Q&A you talk about how de-noising is helpful. Could you clarify the different cases where you use both?
@Jononor3 жыл бұрын
Hi Peter. The Tensorflow simple audio tutorial still exists, but they keeping moving it around and renaming it. Currently it is called "Simple audio recognition: Recognizing keywords" at www.tensorflow.org/tutorials/audio/simple_audio
@Jononor3 жыл бұрын
Training with noise via data augmentation is almost always beneficial (possible exception, if one of your classes is very noise like). And given sufficient data, this will work well, and is the simplest solution. However, if one 1) has a small amount of data and 2) there are well known denoising methods that work well for the case - it may be worth a try. Examples of usecases where I have seen denoising step work well is bird audio spotting in remote monitoring cases (forests etc) - here it is often very quiet and the noise floor can be significant. It may be the noise is that of the microphones and electronics themselves, which is near constant, and relatively simple to denoise
@tranthanh30604 жыл бұрын
I really like your presentation. Thank you very much. Since I'm trying to classify sound for my project now, could I ask you some more questions?
@Jononor4 жыл бұрын
Just ask here, or create Stack Overflow questions and link them here. Then I can respond :)
@tranthanh30604 жыл бұрын
Could you help me explain more detail about mel spectrogram, more mathematical
@Jononor4 жыл бұрын
@@tranthanh3060 here is a good intro, haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
@tranthanh30604 жыл бұрын
@@Jononor Thank you so much for your prompt response, this is exactly what I need. Hope you have a nice day!
@saleemjamali35213 жыл бұрын
Sir can you share the code of your model?
@Jononor3 жыл бұрын
Hi Saleem. You can find the code here, github.com/jonnor/eSC-CNN-microcontroller
@saleemjamali35213 жыл бұрын
@@Jononor thank you so much sir
@doyourealise2 жыл бұрын
i am here again, one question. Why don't you upload audio processing videos weekly ? Thanks !!!!!!
@Jononor2 жыл бұрын
Several reasons. But the main one is that I do not have the time right now. It takes around 10 hours to make a 10 minute lecture with solid content.
@doyourealise2 жыл бұрын
@@Jononor you are right! Its hard and sometimes a headache haha, anyway loved the old content!