Build a Deep Audio Classifier with Python and Tensorflow

  Рет қаралды 185,512

Nicholas Renotte

Nicholas Renotte

Күн бұрын

Пікірлер: 243
@captainlennyjapan27
@captainlennyjapan27 2 жыл бұрын
41 minutes into the video. Not even a second I was bored. Amazing
@sheikhshafayat6984
@sheikhshafayat6984 2 жыл бұрын
This is exactly what I was looking for the past one month, and suddenly popped up on my recommendation! Can't thank you enough for this. You saved my semester!!
@captainlennyjapan27
@captainlennyjapan27 2 жыл бұрын
1 minute into the video, absolutely amazed by the high high quality of this video. You are my favorite programming youtuber along with FireShip and NomadCoders! Thanks so much Nicholas!
@IronChad_
@IronChad_ 2 жыл бұрын
You’re the absolute best with these tutorials
@guillaumegalante
@guillaumegalante 2 жыл бұрын
Thanks so much for all these great tutorials! I’ve discovered your channel a few days ago, your way of teaching makes it really easy to understand and learn. I was wondering if you’d be able to do a series or video around recommender systems: building a recommendation engine (content-based, collaborative filtering), rather Netflix (movie) recommendations, Spotify’s music recommendation (could include audio modeling) or Amazon (purchases) predictions. Many thanks! Keep up the amazing tutorials :)
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Definitely! I’m doing my own little deep learning challenge atm, will add it to the list!
@prajiltp8852
@prajiltp8852 Жыл бұрын
Can we use same if I wanted to seperate my bpos call recording from a conversation files, like if I train it based on my bpos recording and after that if I give a audio will it seperate my bpos sound?? Please help
@dwiechannel3196
@dwiechannel3196 Жыл бұрын
@@NicholasRenotte please answer my question, I really need some direction.🙏🙏🙏
@adarshd249
@adarshd249 2 жыл бұрын
Another great content from Nick. Thrilled to do a project on this
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Yess! Let me know how you go with it!!
@Maxwell-fm8jf
@Maxwell-fm8jf 2 жыл бұрын
I worked on similar project on Audio classification hooked up on raspberry with some sensors three months ago but using rcnn and librosa. A different approach from yours basically the same steps. Thumb up mate!!
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Woahhh, nice! What was the latency like on the rpi? Noticed when I started throwing more hardcore stuff at it, it kinda struggled a little.
@farhankhan5951
@farhankhan5951 Жыл бұрын
What you have developed in your project?
@ellenoorcastricum
@ellenoorcastricum 10 ай бұрын
What where you using the pi for and have any tips on how to make a system that recognizes certain sound in real time?
@pedrobotsaris2036
@pedrobotsaris2036 Жыл бұрын
good tutorial. Note that sample rate has nothing to do with the amplitude of an audio file but rather the number of times the audio file is sampled per seconds.
@12imr
@12imr Жыл бұрын
if anyone get error at 22:14 for pos = tf.data.Dataset.list_files(POS+'\*.wav') neg = tf.data.Dataset.list_files(NEG+'\*.wav') just use the / instead of \. pos = tf.data.Dataset.list_files(POS+'/*.wav') neg = tf.data.Dataset.list_files(NEG+'/*.wav')
@TheHearts567
@TheHearts567 9 ай бұрын
thank you
@mayankt28
@mayankt28 5 ай бұрын
If you're encountering a shape issue when calling model.fit and getting the error "cannot take length of shape with unknown rank," the solution might be to explicitly set the shape of your tensors during preprocessing.
@MahmoudSayed-hg8rb
@MahmoudSayed-hg8rb 4 ай бұрын
can you elaborate further ?
@johndaniellet.castor7189
@johndaniellet.castor7189 2 ай бұрын
​@@MahmoudSayed-hg8rb great! @mayankt28, thank you very much for your insight! basically, we can add the line: spectogram = tf.image.resize(spectogram, [1491, 257]) right before the "return spectogram, label" line in the preprocess() function
@gregoryshklover3088
@gregoryshklover3088 Жыл бұрын
Nice tutorial. A few inaccuracies there though about stft() usage: "abs()" there is not for getting rid of negatives, but for complex values amplitude. frame_length would probably better be power of 2...
@orange-dd5rw
@orange-dd5rw 4 ай бұрын
how can i implement detection for example when the initial capuchin call started and ended how can i get this information in the end result( like in result it should show capuchin call - 2.3s)
@gregoryshklover3088
@gregoryshklover3088 4 ай бұрын
The classification works on sliding windows of fixed size (3sec in this tutorial). One can slide the window with overlap to try to approximate the start of the matching sequence, or use other signal processing methods to find start of the sequence.
@abrh2793
@abrh2793 2 жыл бұрын
Nice one! Looking forward to a multi label text classification if you can! Thanks
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Yup, code is ready, should be out this week or next!
@abrh2793
@abrh2793 2 жыл бұрын
@@NicholasRenotte yo thanks a lot! The way you get inputs from the community and interact is nice to see
@gaspardbos
@gaspardbos Жыл бұрын
Mc Shubap is spinning the decks in your memory palace 😆 Great tutorial so far.
@rachitjasoria9041
@rachitjasoria9041 2 жыл бұрын
A much needed tutorial !! btw can you make a tutorial on tts synthesis? not with pyttsx3... train a model to speak from provided voice data of a human
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
You got it!
@rachitjasoria9041
@rachitjasoria9041 2 жыл бұрын
@@NicholasRenotte 😃
@urielcalderon1661
@urielcalderon1661 2 жыл бұрын
It's him, he is back.
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Ayyyyy Uriel!! What's happening!! Thanks a mill!
@urielcalderon1661
@urielcalderon1661 2 жыл бұрын
@@NicholasRenotte Always faithful man, while there deep learning tutorials we will be there
@enzy7497
@enzy7497 2 жыл бұрын
Just discovered this channel on my recommended. Really awesome stuff man! Thanks for the great content.
@guruprasadkulkarni635
@guruprasadkulkarni635 2 жыл бұрын
can I use this for classifying different guitar chords' audio?
@lakshman587
@lakshman587 Жыл бұрын
This video is Awesome!!! I got to know from this video that we convert Audio data to image data, to approach audio related tasks in ML!!!
@ronakttawde
@ronakttawde 2 жыл бұрын
Very Cool video Nick Bro!!
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Thanks homie! Good to see you @Ronak!
@akashmishrahaha
@akashmishrahaha 10 ай бұрын
Why are we reducing the sample rate from 44khz to 16khz, it was not clear to me?
@ChrisKeller
@ChrisKeller 10 ай бұрын
Super, super helpful getting my project off the ground!
@alfasierra95
@alfasierra95 3 ай бұрын
It gives me a error when i try model.fiit +
@Ricemuncher17
@Ricemuncher17 Ай бұрын
same
@Computer.Music.And.I
@Computer.Music.And.I 6 ай бұрын
Hello Nicholas, I have been using this great video in my beginners courses and last year everything was fine. Unfortunately in today's lecture the code did not run on any of my machines or configurations ... The load_16k_ wav function is not able to resample the audio files, and much worse, the model.fit function complains about an input that could not be -1, 0 or 1. Are you willing to check and update your code ? (Spend 6 hours now to find the error😊) Thx jtm
@sanjay6013
@sanjay6013 2 ай бұрын
did u find the fix for this pls help
@johndaniellet.castor7189
@johndaniellet.castor7189 2 ай бұрын
@@sanjay6013 @MahmoudSayed-hg8rb made a great insight for this model.fit issue basically, we can add the line: spectogram = tf.image.resize(spectogram, [1491, 257]) right before the "return spectogram, label" line in the preprocess() function this is to explicitly set the shape
@johndaniellet.castor7189
@johndaniellet.castor7189 2 ай бұрын
i think i also had the issue of tensorflow-io not being able to be installed so what we did is we used alternative code instead, in my case: original_length = tf.shape(wav)[0] new_length = tf.cast(16000 / tf.cast(sample_rate, tf.float32) * tf.cast(original_length, tf.float32), tf.int32) new_indices = tf.linspace(0.0, tf.cast(original_length - 1, tf.float32), new_length) new_indices = tf.cast(new_indices, tf.int32) wav = tf.gather(wav, new_indices) return wav credits to @shafagh_projects for this!
@DarceyLloyd
@DarceyLloyd Жыл бұрын
Great video. Would love to see a version of this done using the GPU, with multiple classifications, not just binary.
@0e0
@0e0 Жыл бұрын
Tensorflow has GPU builds
@henkhbit5748
@henkhbit5748 2 жыл бұрын
Awesome sound classification project👍I need a capuchino break after hearing the capuchind bird sound😎
@asfandiyar5829
@asfandiyar5829 Жыл бұрын
Had a lot of issues getting this to work. You need python 3.8.18 for this to work. I had that version of python on my conda env.
@matthewcastillo8775
@matthewcastillo8775 11 ай бұрын
I need help getting compatible version of tensorflow and tensorflow-io. The latest release of tensorflow io is 0.35.0, however my os is saying that only up to 0.31.0 is available. My tensorflow is updated to the latest version and I have Python 3.11.6.
@rajkumaraj6848
@rajkumaraj6848 2 жыл бұрын
@NicholasRenotte The kernel appears to have died, It will restart automatically. Got this error while running model.fit. How can I solve this?
@dimmybandeira7
@dimmybandeira7 2 жыл бұрын
Very smart! can you identify a person speaking in the midst of others speaking more quietly?
@tatvamkrishnam6691
@tatvamkrishnam6691 2 жыл бұрын
Tried to recreate the same. Somehow the program abruptly stops at hist = model.fit(train, epochs=4, validation_data=test) It has to do with using lot of RAM. Anyway for me? Thanks!
@karlwatkins7054
@karlwatkins7054 2 жыл бұрын
did you solve this issue?
@thetechmachine5446
@thetechmachine5446 7 ай бұрын
Why we need to calculate Mean , Min and Max in 30:20
@MahmoudSayed-hg8rb
@MahmoudSayed-hg8rb 4 ай бұрын
to kind of pick a reasonable threshold to the length of our audio files as they vary in length and ML Models often expect equally sized inputs. so we chose 48000 as a threshold. Everything beyond that is either cropped out if it's longer or zero padded if it's shorter
@ayamekajou291
@ayamekajou291 2 жыл бұрын
Hey nicholas, this project is great but how do i classify multiple animal calls using this model? I can classify the audio as capuchin or not capuchin this way but if i included more audio classes, how could i classify the audio file as the animal as well as the number of counts ?
@GuidoOliveira
@GuidoOliveira 2 жыл бұрын
Incredible video, much appreciated, on the side note, I love your face cam, also audio is excellent!
@orange-dd5rw
@orange-dd5rw 4 ай бұрын
how can i implement detection for example when the initial capuchin call started and ended how can i get this information in the end result ( like in result it should show capuchin call - 2.3s)
@sederarandi1507
@sederarandi1507 6 ай бұрын
bro you are absolute gold, thank you so much for all the effort you put on your videos and teachings +1 subscriber
@benbelkacemdrifa-ft1xr
@benbelkacemdrifa-ft1xr Жыл бұрын
It's a very interesting video. But can we do the test using sound sensor?
@TheOfficalPointBlankE
@TheOfficalPointBlankE 10 ай бұрын
Hey Nicholas, I was wondering if there was a way to change the code to print the timestamps in the audio clip that each sound is recognized?
@davidcastellotejera442
@davidcastellotejera442 2 жыл бұрын
Man these tutorials are amazing. Congrats for creating such great content. And thank!!
@luisalmazan4183
@luisalmazan4183 2 жыл бұрын
Thank you so much for these tutorials, Nicolas. Will be great a tutorial about few shot learning. Grettings from México!
@mendjevanelle9549
@mendjevanelle9549 7 ай бұрын
Hello sir! I installed tensor flow as presented but I don't understand the reason of the error message,no module named tensor flow.
@faresbecheikh7052
@faresbecheikh7052 2 жыл бұрын
15:00 file_contents = tf.io.read_file(CAPUCHIN_FILE) is not working can you tell me why plz ?
@SaiCharan-ev8hu
@SaiCharan-ev8hu 7 ай бұрын
hey nicholas,trying to execute this but facing issue as you havent done any preprocessing on the training data,looking for help from you
@ellenoorcastricum
@ellenoorcastricum 10 ай бұрын
Is it possible to run this while i have my mic always listening and to do live proccesing on that? Btw this will be my first project and i know its a lot.
@ChristianErwin01
@ChristianErwin01 11 ай бұрын
I've gotten through to the part where you start testing the predictions and my validation_data isn't showing up. The epochs run fine, but I have no val_precision or val_loss values. All I have are loss and precision_2. Any fixes?
@primaryanthonychristian2419
@primaryanthonychristian2419 Жыл бұрын
Bro, great video and very good detailed explanation. 👍👍👍
@tims.4396
@tims.4396 Жыл бұрын
Im not sure about the batch and prefetch part, for me i generates empty training sets afterwards and also it only takes 8 prefetched files for training?
@vishalm2338
@vishalm2338 2 жыл бұрын
How to decide the values of frame_length and frame_step in tf.signal.stft(wav, frame_length=320, frame_step=32) ? Appreciate any help !
@thewatersavior
@thewatersavior 2 жыл бұрын
58:00 - Another great one, thank you, already looking forward to applying. Quick question - why mix the audio signals on the MP3. I get that it gets us to one channel - is there a way to just process one channel at a time. Im imagining that would allow for some spatial awareness in the model? Or perhaps too many variables because we are just looking for the one sound? Thinking that it would be useful to associate density with directionality... but not sure that's accurate if the original recordings were not setup to actually be directional...
@cadsonmikael9119
@cadsonmikael9119 2 жыл бұрын
I think this might also introduce distortion in the result, since we have to deal with stereo microphone separation, ideally about 100-150mm for human perception. I think the best idea is to just look at one channel in case of stereo, at least if the microphone separation is high or unknown.
@asimbhaivlog108
@asimbhaivlog108 2 жыл бұрын
Hi I have to detect tree cutting voices after detection in a forest using iot can you make a detailed viedo on it and which hardware sensors and module can be used Illegal Tree-Cutting Detection
@farhankhan5951
@farhankhan5951 Жыл бұрын
Are you done with your project?
@stevew2418
@stevew2418 2 жыл бұрын
Amazing content and explanations. You have a new subscriber and fan!
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Welcome to the team @Steve, glad you liked it!
@eggwarrior5630
@eggwarrior5630 Жыл бұрын
Hi i am working with a new audio dataset which does not require audio slicing part? What should I modify to loop through the folder for the last part. Any help would be greatly appreciated
@Ammmmmmmman
@Ammmmmmmman 9 ай бұрын
import tensorflow_io as tfio I'm getting an error with this code , can someone help me out?
@johndaniellet.castor7189
@johndaniellet.castor7189 2 ай бұрын
have you figured this out? i can't install tensorflow-io too
@mosharofhossain3504
@mosharofhossain3504 Жыл бұрын
Thanks for such a great tutorial. I have a question: What happens when resampling is done to an audio file? Does its total time changes or its number of sample changes or both changes or it depends on specific algorithm?
@andycastellon919
@andycastellon919 Жыл бұрын
Us humans can hear up to 22kHz approximately and due to Nyquist frequency, you need to sample it twice as its higher frequency, hence that 44100Hz you may have seen. However, on audio analysis, most useful data is found in up to 8000Hz, so we resample it up to 16000Hz, losing the rest of higher freq. The length of audio does not change. What changes is the amount of bits we need to save the audio.
@SA-oj3bo
@SA-oj3bo 2 жыл бұрын
If I want to count how many times 1 specific dog barks / day ? Then it is clear that samples of this dog barking are needed, but how many? And what other sounds must be sampled and how many if at the same place many other sounds can be heard? Thx!
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Heya, I would suggest starting out with 100+ samples of the dog, you can then augment with different background sounds and white noise to build out the dataset. This is purely a starting point though, would need to do error analysis from there to determine where to put emphasis next!
@SA-oj3bo
@SA-oj3bo 2 жыл бұрын
@@NicholasRenotte It would be very interesting if an accurate counter can be made. This is a case of animal abuse ( dog in a small cache and neglected by the owner and barking for attention for over 2 years), so an accurate counter would be very helpful and usefull for other projects. What I not understand is why 1 sample/spectogram of the barking dog would not work good enough to detect it in a recording of for example 24h, because there must be very few or no sounds that have the same spectogram. I understand it will always be different but can 2 different sounds ( cat and dog for example) have 2 spectograms that are very similar? So my question is why it is not possible to identify a specific sound in a recording by comparing the spectogram of the sound to detect to all possible spectograms in the recording ms after ms? If you accept payed projects I would love your help, because this is all new for me. Regards!
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
@@SA-oj3bo I think you could. If there were multiple dogs in the sample you would probably need a ton of data though to be able to clearly identify which dog is barking. This would allow the net to pick up the nuances for that particular bark.
@tatvamkrishnam6691
@tatvamkrishnam6691 2 жыл бұрын
23:30 What is the significance of that len(pos) or len(neg)? When len(pos) is replaced with 2 , I expect only first 2 sample data to have '1' label. However when I run -> positives.as_numpy_iterator().next(), I get '1' labelled not only for the first 2 samples but also for the rest.
@GArvinthKrishna
@GArvinthKrishna 7 ай бұрын
what approach is the best to find a number of blows in the recording of a Jackhammer?
@jawadmansoor2456
@jawadmansoor2456 Жыл бұрын
Thank you for the great comment. How do you classify multiple sounds in a file and get time information as well like a sound was made at time 5 seconds into the audio file and another was made at 8 seconds how do we get time and class?
@plazon8499
@plazon8499 2 жыл бұрын
Dear Mr. Renotte, I'm trying to use this tutorial as a basis to build a classifier over several music genres. The only thing that I don't know how to adapt is the last layer of the CNN. How should I modify it so that it can get me as output let's say 10 different labels ? Should the labeling be modified upstream ? (I want to have 10 outputs instead of 1 at the last Dense layer, but I can't just modify it like it, so I'm wondering how I should do it) Thanks a lot !
@armelayimdji
@armelayimdji 2 жыл бұрын
Since the time you asked the question, you have probably solved it. However, my guess is that for your multi class problem, you should first have data for the 10 classes (samples of each of 9 music genres, plus a non classified genre) and the last layer should be a Dense layer with 10 neurons activated by a softmax function (instead of sigmoid) that gives the predicted probability of each class. You also have to change the loss function to be one of the 'categorical crossentropy' available in tf keras.
@plazon8499
@plazon8499 2 жыл бұрын
@@armelayimdji Hey Armel Tjanks a lot for the advice ! Obviously I'm done with my project and I went for something else : Instead of taking the spectrograms as input of my CNN, I extracted features from the sound wave and all the physical aspects of the music to have an input vector of features that I passed through an MLP and it worked well !
@farhankhan5951
@farhankhan5951 Жыл бұрын
I have similar kind of problem, can you help me?
@gauranshluthra7520
@gauranshluthra7520 6 ай бұрын
How did you uploaded the file as colab does not support folder upload until it is in zip file format
@supphachaithaicharoen7929
@supphachaithaicharoen7929 5 ай бұрын
Thank you very much for your hard work. I really enjoy the video.
@vigneshm4916
@vigneshm4916 2 жыл бұрын
Thanks for a great video. Could please explain why we need tf.abs in preprocess function?
@HananAhmed0311
@HananAhmed0311 2 ай бұрын
@NicholasRenotte can we use same thing in human voice classification with this?
@badcatprod
@badcatprod 2 жыл бұрын
1k! ) //THANK YOU! 🤗
@kundansaha2369
@kundansaha2369 7 ай бұрын
i am getting a error and tried to debug so many way but not solve. Error is "The procedure entry point could not be located in the dynamic link library - to Positive and Negati a', 'Parsed_Capuchinbird_Clips') a', 'Parsed_Not_Capuchinbird_Clips') C:\ProgramData\anaconda3\lib\site-packages\tensorflow_io\ python\ops\libtensorflow_i0.50."
@UzairKhan-gs3nq
@UzairKhan-gs3nq Жыл бұрын
How can we use Linear predictive coding in the preprocessing function of this code?
@oaydas
@oaydas 2 жыл бұрын
Great content, keep it up man!
@riyazshaik4006
@riyazshaik4006 Жыл бұрын
Thanks so much sir, one request sir can you explain about how to classify audio as positive, negative and neutral
@paulj9833
@paulj9833 Жыл бұрын
In the cell 'hist = model.fit(train, epochs=1, validation_data=test)' the Kernel crashes in my case. Seems to be a tensor flow problem. I tried to install different versions of tensor flow, it didnt work though. Does anyone have any advice?
@Varadi6207
@Varadi6207 Жыл бұрын
Awesome explanation. Please help me to create audio augmentation for health records without losing information. I worked with time_shift(-0.5 to 0.5 variation in the wav). But, model ACC is not up to the mark.
@gangs0846
@gangs0846 2 ай бұрын
Thank you. Can you make a realtime Audio Classification?
@allinoneplayz5588
@allinoneplayz5588 24 күн бұрын
can anyone suggest if its a resume worthy project..
@JunaidAnsari-my2cx
@JunaidAnsari-my2cx 17 күн бұрын
Same question, is it? Or gotta improve it?
@insidecode
@insidecode Жыл бұрын
Amazing job
@Uncle19
@Uncle19 2 жыл бұрын
What an amazing video. Definitely earned my sub.
@Sachinkenny
@Sachinkenny 2 жыл бұрын
What happens when there are multiple birds in the dataset. Now how good is a CNN model on this kinda dataset? Again the source training audio samples can vary in length, sometimes in minutes. How can we do the pre processing in such cases?
@abhishekmistry933
@abhishekmistry933 Жыл бұрын
Hey, Im facing an issue while compiling the model model = Sequential([ Conv2D(16, (3,3), activation='relu', input_shape = (1491, 257,1)), Conv2D(16, (3,3), activation='relu'), Flatten(), Dense(units = 128, activation = 'relu'), Dense(units = 1, activation = 'sigmoid') ]) how to avoid ResourceExhaustError from the above code I cannot install tensorflow-gpu, ig coz i only have a GeForce MX450 and am unable to install cuda So can anyone help me out
@shivankamadushan
@shivankamadushan 11 ай бұрын
Same problem. Did you fix it anyhow?
@abhishekmistry933
@abhishekmistry933 11 ай бұрын
@@shivankamadushan I used google colab, but that also can exhaust the resources hence you reduce the no of units in each dense node Or if you want you could purchase the premium version where they provide a gpu to work with
@Ashwinmahajan-ld6qt
@Ashwinmahajan-ld6qt Күн бұрын
@@shivankamadushan same issue...what to do
@zainhassan8421
@zainhassan8421 2 жыл бұрын
Awesome, kindly make a video on Speech Recognition model using Deep Learning.
@vidyagopal3431
@vidyagopal3431 2 жыл бұрын
in the fit Model while executing the code, it shows InvalidArgumentError: Graph execution error: Can only read 16-bit WAV files, but received 24 [[{{node DecodeWav}}]] [[IteratorGetNext]] [Op:__inference_train_function_3224] Can you help me this case
@abdullahalhammadi2940
@abdullahalhammadi2940 9 ай бұрын
Thasnk you so much for this. however, I have one question I am facing a compilation error when I excute this line (wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)). It stated that (unable to open file: libtensorflow_io.so). could you hep me please ?
@oskitarmixtomilcadad
@oskitarmixtomilcadad 7 ай бұрын
I have the same problem
@ahmedgon1845
@ahmedgon1845 Жыл бұрын
Great video thanks so much, I have a small question, In the line Spectogram = tf.signal.sftf Why you choose Fram_step =320 Fram_length=32 Can some one explain the method of choosing this please?
@benliu5858
@benliu5858 2 жыл бұрын
while i run this file_contents = tf.io.read_file(CAPUCHIN_FILE) i get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 89: invalid start byte
@biblicalrevelations6383
@biblicalrevelations6383 2 жыл бұрын
Hi is it possible for you to do a audio background cleaning project. I'm stuck in that area and I don't know how to do it using deep learning.
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Sure, will take a look and see what we can come up with!
@biblicalrevelations6383
@biblicalrevelations6383 2 жыл бұрын
@@NicholasRenotte thank you so much☺️
@unteejo3678
@unteejo3678 2 жыл бұрын
Maybe you need to prepare a bunch of audio with noise and the same audio but without noise. Now you have inputs and targets. Convert both audio files to spectrogram. Then, for each instance of input, prepare input’s spectrogram windows for each target’s spectrogram column. For example, let’s say my spectrogram has 128 frequency bins and each column is equal to 0.01 second of audio. Column 0 refers to 0-0.01 s, column 1 = 0.01-0.02 s, column 2 = 0.02-0.03 s, and so on. Let’s say that I need 0.05 second per window. A window of input’s column 0 - 4 will be used to predict target’s column 0. Input’s column 1 - 5 predicts target’s column 1. Input’s column 2 - 6 predicts target’s column 2. … and so on. Padding windowed spectrogram with 0 if necessary, so that all windows have shape = (128, 5). Target shape is (128). If the length of input audio is 3 seconds, there’ll be 300 input-target pair. Now you can train the model. CNN is recommended because it’s fast. Try U-NET architecture if you prefer. FYI, there’s a competition for separating a music audio into 4 segments: vocal, bass, drum, and others. The code of the winner is open-source too so you may learn from that. Search ‘spleeter’ too. You may be interested in audio-segmentation-related topics.
@biblicalrevelations6383
@biblicalrevelations6383 2 жыл бұрын
@@NicholasRenotte Thank you so much
@biblicalrevelations6383
@biblicalrevelations6383 2 жыл бұрын
@@unteejo3678 Thank you so much. Actually I am new to deep learning and I will look into what you said. Thank you so much again.
@toni3124
@toni3124 2 жыл бұрын
Hey, I have a question. I am working with the Mozilla Common Voice dataset and I converted the audio files to a wav file. Now there comes my problem. I want a mfcc of the files with the shape (128,) but it is not possible for me to get it to this shape. I always get a shape like (128, and here a random number) My Code is: y, sr = librosa.load(os.path.splitext(f"{base_name}\\{f[0]}")[0] + ".wav") y = librosa.to_mono(y) y = librosa.resample(y, orig_sr=sr, target_sr=16000) mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=128) f is the filename extracted of the csv file.
@Ankur-be7dz
@Ankur-be7dz 2 жыл бұрын
data = data.map(preprocess) in this part im getting an error -----------------> TypeError: tf__decode_wav() got an unexpected keyword argument 'rate_in' although rate_in is the parameter of tfio.audio.resample
@Yy.Srinivasan
@Yy.Srinivasan 3 ай бұрын
How to sort it out, getting the same error
@mrsilver8151
@mrsilver8151 8 ай бұрын
thanks for the great tutorial as always sir in case i want to make voice recognition to identify for which person is this voice is this steps will help me to reach that or do i need to look for something which is more specific to this task.
@iPhoneekillerr
@iPhoneekillerr 10 ай бұрын
please help me, why doesn't colaboratory open this code? How should it be changed so that it can be opened in the colaboratory?
@danielcolombaro6645
@danielcolombaro6645 Жыл бұрын
when I to retrieve the dataset with "tf.data.Dataset.list_files" I get "Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true.". I am mounting Google Drive as I have loaded the data there. I tried everything to make it work, but I can not seem to find a solution. Any help would be greatly appreciated, thanks!
@annmariyafrancis5630
@annmariyafrancis5630 10 ай бұрын
Hai did you resolve the issue
@venkatakrishnanramesh4718
@venkatakrishnanramesh4718 2 жыл бұрын
Whats the python version being used in the tutorials
@NuncNuncNuncNunc
@NuncNuncNuncNunc 2 жыл бұрын
Maybe a basic question, but what does zero padding do when getting the frequency spectrum?
@Kishi1969
@Kishi1969 2 жыл бұрын
Always given inspiration of new knowledge...You are Great Thank Advice please Please my question is that can I buy Graphic card of 2G(NVidia for starting Computer Vision because my PC is too slow when I'm using my CPU..
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
I would be looking at something with more RAM is you can swing it @Benya, 2gb is a little too small for most image computer vision tasks.
@unteejo3678
@unteejo3678 2 жыл бұрын
Use Google Colab’s TPU (Tensor Processing Unit). It’s very fast if your model use only CNN.
@Kishi1969
@Kishi1969 2 жыл бұрын
@@NicholasRenotte Thanks 🙏
@raktimdey3154
@raktimdey3154 2 жыл бұрын
Hey Nick. I'm getting a Unicode Decode error when I'm trying to grab a single batch of training data using numpy iterator. Can you please help?
@manthanshah4303
@manthanshah4303 Жыл бұрын
i am not able to install tensorflow-gpu library in jupyter notebook
@md.abdullaalmamun3965
@md.abdullaalmamun3965 2 жыл бұрын
Please make video on image segmentation (semantic and instance).
@NicholasRenotte
@NicholasRenotte 2 жыл бұрын
Coming soon!
@jesusherrerosjimenez8618
@jesusherrerosjimenez8618 2 жыл бұрын
how can i add a new class for the model , i want to train with 3 classes thanks
@tando90
@tando90 2 жыл бұрын
Hey Nick, i try to do like you but a different dataset and i had an error called : Bad bytes per sample, expected 2 but got 4, I checked my data and it's a wav file with 16000 Hz. What should i do?
@empedocle
@empedocle Жыл бұрын
Amazing job Nicholas!! I have just a question, why didn't you calculate also the standard deviation of files' lenght so to have a more precise interval for your window?
@Lhkk28
@Lhkk28 Жыл бұрын
Hello Nicholas, thanks for you video :) I have a question I am aiming to build a model for sound detection using deep learning algorithms ( I am thinking about using LSTM). for now I am done with preprocessing step. I have the spectrograms of the sounds (generated using Short time Fourier transform) also I have the labels (binary labels as arrays, 0s where there are no events and 1s where the events are present). I am now confused about who to fed this data to the model. The shape of each spectrogram is (257, 626) and the shape of each label is (626,). How should I give this data to the LSTM. Can I build a model that takes the spectrograms with their current shape and give the labels as sequence of ones and zeros or I have to segment the spectrograms and give each segment a label?
@harsh9558
@harsh9558 Жыл бұрын
This was awesome 🔥
@srilankanfox
@srilankanfox 2 жыл бұрын
How do you save the model and use it somewhere else?
@周淼-k3u
@周淼-k3u Жыл бұрын
Thank you so much for these nice tutorials! They are quite helpful! I have a small question. I saw your process of building up models and training and testing them. If I want to spend less time in classifying the model, do you think it's possible to introduce some existing datasets such as esc-10 or esc-50 in your method?
@OwaiseAhmed-g8j
@OwaiseAhmed-g8j Жыл бұрын
hello, for some reason wave = load_wav_16k_mono(CAPUCHIN_FILE) nwave = load_wav_16k_mono(NOT_CAPUCHIN_FILE) is giving an error "the procedure entry point cannot be located" though I have stored the 3 audio folders in the same directory
@kaushikk2270
@kaushikk2270 Жыл бұрын
did you get a answer please share if you did
Build a Deep CNN Image Classifier with ANY Images
1:25:05
Nicholas Renotte
Рет қаралды 649 М.
Types of Audio Features for Machine Learning
22:42
Valerio Velardo - The Sound of AI
Рет қаралды 72 М.
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 56 МЛН
Audio Data Processing in Python
19:52
Rob Mulla
Рет қаралды 174 М.
Learn Machine Learning Like a GENIUS and Not Waste Time
15:03
Infinite Codes
Рет қаралды 269 М.
I Built a Personal Speech Recognition System for my AI Assistant
16:32
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 347 М.
3 - Audio Feature Extraction using Python
13:58
Prabhjot Gosal
Рет қаралды 44 М.
Build a Comment Toxicity Model with Deep Learning and Python
1:12:46
Nicholas Renotte
Рет қаралды 72 М.
Getting Started with Python Deep Learning for Beginners
1:10:44
Nicholas Renotte
Рет қаралды 209 М.
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН