🚀 Sign up for AssemblyAI's free API token using my link 🚀www.assemblyai.com/? I cover the fundamental ideas behind all of the recent big ML models you must have heard of like Meta's OPT-175B, BigScience BLOOM 176B, EleutherAI's GPT-NeoX-20B, GPT-J, OpenAI's GPT-3, Google's PaLM, DeepMind's Chinchilla/Gopher models, etc. Do let me know your thoughts on this one! I'm so excited about the knowledge I accumulated over the past weeks. You can expect exciting videos going forward - that's all I'll say! :))
@na7515 Жыл бұрын
I'll give one criticism which is that this video is going to be for people who mostly already have some basic/good understanding of all the different techniques out there so the best way to watch this video would be to follow along and then do your research as the instructor is going through the topics. Having said that, this is a fantastic overview of all the techniques that are being used in the industry for large model training. It's really awesome the amount that you were able cover in an 80 minute video so huge props.
@theresabarton8582 жыл бұрын
Very excited to see deepspeed covered on KZbin
@TheAIEpiphany2 жыл бұрын
🥳🦄
@unsaturated84823 ай бұрын
FYI , we are still adding them 27:00 , but at the same time, instead of seperate times , Great diagram regardless thanks.
@btnt5209 Жыл бұрын
18:18 the reason that skip connections are replicated is because at the time PyTorch did not support Tensor stashing/popping. That meant that the skip connections, which aren't sequential, required copying outputs for certain layers throughout multiple GPUs or nodes, which is more time consuming than simply keeping the skip connection on each node. Now that torch.distributed supports Tensor stashes, the skip connections need not be duplicated.
@brianpulfer41592 жыл бұрын
Went through the whole video. Absolutely amazing stuff Aleksa! Learning with you is very enjoyable! Never stop making videos :)
@TheAIEpiphany2 жыл бұрын
Thanks a lot Brian!
@iNTERnazionaleNotizia58910 ай бұрын
Bro, I think you should start a new Playlist: "Paper Walkthrough", because you can explain most Deep Learning papers better than my Professor!
@oneman70942 жыл бұрын
Ah how many times have I searched for ZERO, finally something 🔥
@SatoBois Жыл бұрын
Thank you for making something that seemed so duanting so much more approachable! King behaviour 😤👌
@sacramentofwilderness66562 жыл бұрын
Thanks, Aleksa, for the great job! Very thorough, self-contrained and understanble explanation.
@TheAIEpiphany2 жыл бұрын
Thank you!
@vaishnavisonawane855910 ай бұрын
This is very helpful. Thanks for sharing, Aleksa!
@fouriertransformationsucks43811 ай бұрын
Amazing video, love it!🤗
@everythinganime8672 жыл бұрын
Was always wondering this . Thank you
@TheAIEpiphany2 жыл бұрын
You're welcome!
@beegbrain Жыл бұрын
Incredible knowledge that you gave in this video, thank you very much for you clear explanations !
@DistortedV122 жыл бұрын
You guys are smart. I don't know if I'd have the patience or a career to learn new topics on a weekly basis
@TheAIEpiphany2 жыл бұрын
After a while you start asking yourself the opposite: would I be able to do something that doesn't involve continual learning 😅
@armish41972 жыл бұрын
@@TheAIEpiphany That would be a dream job
@ayushjain33919 ай бұрын
literally loved the video :)
@erkinsagroglu851911 ай бұрын
Hello, one of the most amazing materials that I’ve seen for years. The thing that I didn’t get on 28:20, why did we do row-wise split rather than the column wise split, what did change from the first part for the feed forward part where we did vertical/column-wise split?
@saidtaghadouini622510 ай бұрын
because in the first part we had the same input (X) which was duplicated while in the second part we already had separate inputs(Y1 and Y2) so we need to split the B matrix row-wise otherwise we can not compute the product; the result is Y1B1(device 1) + Y2B2 (device 2) so we need an all-reduce to get teh result on the same device.
@anishbhanushali2 жыл бұрын
I was just wondering how do i gather info on large scale distributed DL training framework...and you sir, just read my mind !!!
@jakekalstad24942 жыл бұрын
Great stuff as always
@TheAIEpiphany2 жыл бұрын
Thanks Jake!
@rachadlakis15 күн бұрын
Can you add a tutorial on Distributed Training (FSDP) on AWS?? It would be great if you add it :)
@MengLin-l8b10 ай бұрын
for the dp method, is averaging usually the default? it's quite uncommon because for samples in a batch, the gradients are usually sumed instead. Very grateful if you can answer my quetion.
@nicom98532 жыл бұрын
Zdravo Aleksa super ti je video! Imam jedno pitanje koje odlazem vec neko vreme, koji tip softvera koristis za otvaranja, anotiranje, crtanje i grupisanje vise pdf dokumenata? Ovaj na pocetku tvog videa deluje cool, a mozda imas i neki drugi savet? Spremam se za doktorske studije pa trazim neki organizacioni softver da bih se snasao sa stotinama dokumenata haha. Hvala unapred!
@TheAIEpiphany2 жыл бұрын
Cao hvala! Ja koristim OneNote za hvatanje beleski a pdf-ove jednostavno grupisem po direktorijumima. Baci pogled i na Notion mozda ti pomogne.
@nicom98532 жыл бұрын
@@TheAIEpiphany Super, hvala ti !
@DED_Search Жыл бұрын
34:55 Is it a add-up or concatenation? I think it should be a concate.
@MariuszWoloszyn2 жыл бұрын
You've accidentally linked to the old (v2) version of the ZeRO paper in the description. The one shown in the video is here: arxiv.org/pdf/1910.02054v3.pdf
@TheAIEpiphany2 жыл бұрын
Oops thanks! Will update it
@hongtaoyang3759 Жыл бұрын
Thanks for the great video! Can you explain more about ZeRO-3 model parallelism vs Megatron tensor parallelism? It sounds to me that ZeRO 3 include Megatron tensor parallelism, or are they different techniques and can be applied together?
@ChiragAhuja1 Жыл бұрын
Do you share the annotated papers also ?
@bodasadala35162 жыл бұрын
Great work, thanks for the effort!
@TheAIEpiphany2 жыл бұрын
Thanks!
@bingbingsun63042 жыл бұрын
3D U-net, with input 1024 by 1024 by 1024, any suggestions?
@ahmadhamdan442 жыл бұрын
TOP!!!!!
@TheAIEpiphany2 жыл бұрын
gotta stop uploadling on Sunday lol 😂
@stephennfernandes2 жыл бұрын
I really wanted to learn deep about how mellanox infiniband switches work, how the networking, routing, configurations work. How to setup your own GPU cluster from scratch. But upon searching for months on the internet i couldn't find anything. Does anyone have any good resources on this ?
@mraihanafiandi Жыл бұрын
up, i have the same concern like you
@mraihanafiandi Жыл бұрын
@TheAIEpiphany
@stephennfernandes2 жыл бұрын
🎉🎉🎉✨✨
@ShishilKumar8 ай бұрын
the video doesnt demonstrate the actual steps to actually deploy large models using deepspeed? which is much more important than understanding all the theory stuff
@eugeneku32394 ай бұрын
Nah theory reigns supreme
@juliusvalentinas2 ай бұрын
A100 gpu is 30k usd, is this offloading all theoretical nonsense? Where is apps that allow to run actual llama 3.1 on one or two 3090? Offloading non used stuff on nvme ssd?