what an outstanding explanation. GOATed this one. Keep going!
@bengonoobiang6633Ай бұрын
Thank you for the explaination!! Also thank you for the blog.
@AzeAltmi2 ай бұрын
How accurate?
@MAdMAX-rz8gx2 ай бұрын
hello i need help please about speech to speech translation can you contact with me i have a final project
@drm81642 ай бұрын
love you
@李薇-w3p2 ай бұрын
How did you prepare the parallel speech data (SRC speech - TGT speech pairs)? Is there a detailed instructional video available? Thank you very much for sharing and your help.
@MAdMAX-rz8gx2 ай бұрын
hello can you help me i do a final project in coolage for machin translate speech to speech can you contact me any one
@cgyh687483 ай бұрын
Nice!
@anbanglee983 ай бұрын
hey,waiting for your update
@baazjhaj81253 ай бұрын
Hello Prabhjot, were you able to get good results with your model? I trained mine on Fairseq's Jap to Eng and after training for approx 200 epochs the model only generated the training data when I ran inference.
@leemark77394 ай бұрын
Will you updated?
@jatinjoshi75494 ай бұрын
can i get the dataset used by you
@songli51454 ай бұрын
How did you clip the fall and non-fall samples? Are they already sampled as fall and non-fall clips?
@shreyojitdas93334 ай бұрын
never seem someone explain better.I love ur contents
@medhnhadush97034 ай бұрын
thank you very much , great content.
@abdulwali22354 ай бұрын
Thanx a lot ma'am , your research work is highly useful for me, i m working in this domain, i m facing difficulties , my area is also "DIRECT SPEECH-TO-SPEECH TRANSLATION". please i need help. best regards
@MAdMAX-rz8gx2 ай бұрын
the same me can you contact with me i need help idont know where i can start
@anbanglee29105 ай бұрын
the training process is long and large data,can i just terminate and use one of them?
@baazjhaj81255 ай бұрын
Hello. I am attempting to preprocess the data from FLEURS as well. I was wondering how you went about renaming the wav files so that they were the same names in each folder? I am running into issues with reading from the tsv file. Thanks!
@prabhjotgosal24895 ай бұрын
I will make a new video about processing Fleurs data to get into a format that Meta's S2UT code needs. I am pre-committed at the moment. Can't get to the video making for next 2 weeks or so.
@baazjhaj81255 ай бұрын
@@prabhjotgosal2489 that would be great. I feel like there are some issues with the dataset. I had to manually go through the en_us tsv file and update it which was pretty annoying.
@leemark77393 ай бұрын
Did you make it
@baazjhaj81253 ай бұрын
@@leemark7739 yes. However, I ran into issues with model performance and training time.
@leemark77393 ай бұрын
@@baazjhaj8125 could you please share your code for processing the data,which languages do you focus on
@leemark77395 ай бұрын
what is the second column in dev.tsv?how to get it
@leemark77395 ай бұрын
will you talk about train unwritten languages like hokkien
@leemark77395 ай бұрын
hey ,can you please share your code,your file.by the way, what if a language is unwritten ,how to evaluate it with asr?
@prabhjotgosal24895 ай бұрын
Hi - I will share the code. Give me about 2 weeks or so. If we are dealing with an unwritten language, evaluation becomes harder. Take a look at the following paper: ai.meta.com/blog/ai-translation-hokkien/ The authors developed their own method to evaluate an unwritten language, Hokkein. However, I have not read the details of their method yet and it could be that their method was specifically for the Hokkien language.
@leemark77394 ай бұрын
@@prabhjotgosal2489 thanks
@leemark77394 ай бұрын
@@prabhjotgosal2489 hey are you free now?
@anbanglee983 ай бұрын
@@prabhjotgosal2489you are so busy😂
@anbanglee9822 күн бұрын
@@prabhjotgosal2489waitting. For you😂
@SaifAbbas-c9p5 ай бұрын
clear thanks
@coryspencer70075 ай бұрын
Hello Prabhjot! Thank you for your videos covering the SeamlessM4T library. I'm currently working with a cancer patient who had extensive damage to her vocalization abilities over the course of the cancer treatment process. We were exploring whether we would be able to use the SeamlessM4T model to help translate her current speech patterns back into a more easily understandable voice. Are you available for a quick chat via video call, by any chat? I would love to have your perspective.
@prabhjotgosal24895 ай бұрын
Hi @coryspencer7007 - That is a very valuable use-case! Kuddos to you for using technology for giving someone a better living experience. I doubt that Seamless M4T can do what you are trying to do but I have only scratched its surface so far. A new model called GAMA (arxiv.org/pdf/2406.11768) was just released today. Demo: 92c08e835dbe471711.gradio.live/ and 326e9a808b0c28063d.gradio.live/ Project website: lnkd.in/giZjmugU It is not an ASR model however, so it cannot transcribe speech. But it can take non-speech audio as input and reason about it. I would explore this one. Use the demo link and upload a sample audio you have and see what the model can do for you. I would love to chat with you about this but I am pre-occupied with personal commitments for another 3 weeks. My LinkedIn contact is on my channel page. Please fee free to use that for further discussion. In the near future, I will plan to explore the model I shared above and make a video.
@Htk-es1ky5 ай бұрын
Thanks for explaining this, there are very few resource available on Audio ML..watched all of them in one go, please come up with video on some real life example on audio classification problem .
@prabhjotgosal24895 ай бұрын
Yes, will share an example.
@tiendungmai22925 ай бұрын
can you tell me that how do you get $SRC_AUDIO and $TGT_AUDIO? and what is the txt file in $TGT_AUDIO?
@santosotoso42876 ай бұрын
Thanks for the explanation
@dr-bijay-kumar-singh6 ай бұрын
I got no error message, but at "print(len(transcripts_all_NoFall))" the output is zero. Although I have kept 10 English audio samples in the directory. Because of this when I proceed further the output for WER is empty results without any error message. Can you share how you kept the audio samples in your directory. I am using your code on colab.
@prabhjotgosal24896 ай бұрын
Hi - The audio samples are simply kept as .wav files in the directory that I am reading from. This directory exists in my Google drive. There are few ways to help root cause your issue further. 1. Before you do any ASR with Whisper.. check if audios are properly read. You could use librosa.load( ) to do so.. Check the output of the librosa.load() to see if the values in the audio files are non-zero. 2. Make sure the audio files contain speech. If there is no speech detected by the Whisper model in the audio, it will output nothing for the transcripts.
@dimtsel40446 ай бұрын
You explained it very well, bravo
@ABAnuSaraReality7 ай бұрын
Hi, i have doubt . I uploaded the files in google collab via google drive , but the upload files and folders are not displaying even i auth the google drive with google collab , it doesn't show. Is it ok to use vscode as i am familar with the vscode functionality? Or is it compulsory to submit via google collab or can i upload the vs code executed files in google collab ?
@prabhjotgosal24897 ай бұрын
Hi - What error do you get when you try to run the code in colab? I am not sure I understand what you mean by, "the vs code executed files in google collab", Are you refereeing to audio files or something else?
@ABAnuSaraReality7 ай бұрын
@@prabhjotgosal2489 I am referring about mounting the whole project in google collab. but i can't properly auth the google drive in google collab and mount them.
@prabhjotgosal24897 ай бұрын
@@ABAnuSaraReality You can run the entire code in VScode locally on your machine. You may have to adjust some syntax and ofcourse the filepath. I suggest creating a new file in VScode and copy/paste sections of code from my file little by little, rather than running the original file all at once. It will make debugging and understanding the code easier.
@arashroshanpoor16827 ай бұрын
really it is the best presentation of attention. I googled many tubes, but this one is significant.
@tarazdxs7 ай бұрын
Thank you so much. really good explanation!
@chezhian47477 ай бұрын
Very good explanation. Thank you 🙏
@EngineerAnandu7 ай бұрын
clear.
@EngineerAnandu7 ай бұрын
very useful.
@itsworks7237 ай бұрын
Please help, when using expressivity_predict I get the error expressivity_predict is gated. How to fix it?
@yusufcan13047 ай бұрын
thanks madam.
@yusufcan13047 ай бұрын
thanks madam.
@yusufcan13048 ай бұрын
thanks you, life saver.
@lindaflow54348 ай бұрын
this was really helpful! thank you very very much!!
@Tarzan_of_the_Ocean8 ай бұрын
HuBERT proposed by Hubert😄
@prabhjotgosal24898 ай бұрын
Haha! That would have been a great title for this paper.
@jacobdavies37618 ай бұрын
Thanks for the help 👍
@pavangnetrakarnetrakar80198 ай бұрын
Please Make video on DASpeech arxiv.org/pdf/2310.07403.pdf
@kenturkey19718 ай бұрын
What a waste! You showed literally nothing! Well, you showed you have no clue what you’re doing. Ugh.
@nutifafaakpeleasi25438 ай бұрын
Thank you very much for this video. I love it.
@randalthor698 ай бұрын
can you provide the code or the github repositery?? it would be helpful for college project??
@SagarVerma-jl3cj9 ай бұрын
hi thank you for this video , But i am getting an error that m4t_prediction not found . could you please help me how to resolve this issue
@sauravmeher48999 ай бұрын
Could you please create a tutorial on fairseq. Various modules. How to use it. How to use pre trained models etc. would be really helpful
@prabhjotgosal24899 ай бұрын
Yes!! I will create more tutorials on fairseq in the near future. I am taking a short pause with making videos due to my upcoming PhD defense.
@sauravmeher48999 ай бұрын
@@prabhjotgosal2489 sure. Thanks for the tutorials so far! Best of luck with your defence!. I wish that you walk out that room with a PhD!