What runs GPT-4o? | Inside the Biggest AI Supercomputer in the cloud with Mark Russinovich

  Рет қаралды 65,882

Microsoft Mechanics

Microsoft Mechanics

Күн бұрын

Microsoft has built the world’s largest cloud-based AI supercomputer that is already exponentially bigger than it was just 6 months ago, paving the way for a future with agentic systems.
For example, its AI infrastructure is capable of training and inferencing the most sophisticated large language models like GPT-4o at massive scale on Azure. In parallel, Microsoft is also developing some of the most compact small language models with Phi-3, capable of running offline on your mobile phone.
Watch Azure CTO and Microsoft Technical Fellow Mark Russinovich demonstrate this hands-on and go into the mechanics of how Microsoft is able to optimize and deliver performance with its AI infrastructure to run AI workloads of any size efficiently on a global scale.
This includes a look at: how it designs its AI systems to take a modular and scalable approach to running a diverse set of hardware including the latest GPUs from industry leaders as well as Microsoft’s own silicon innovations; the work to develop a common interoperability layer for GPUs and AI accelerators, and its work to develop its own state-of-the-art AI-optimized hardware and software architecture to run its own commercial services like Microsoft Copilot and more.
► QUICK LINKS:
00:00 - AI Supercomputer
01:51 - Azure optimized for inference
02:41 - Small Language Models (SLMs)
03:31 - Phi-3 family of SLMs
05:03 - How to choose between SLM & LLM
06:04 - Large Language Models (LLMs)
07:47 - Our work with Maia
08:52 - Liquid cooled system for AI workloads
09:48 - Sustainability commitments
10:15 - Move between GPUs without rewriting code or building custom kernels.
11:22 - Run the same underlying models and code on Maia silicon
12:30 - Swap LLMs or specialized models with others.
13:38 - Fine-tune an LLM
14:15 - Wrap up
► Unfamiliar with Microsoft Mechanics?
As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.
• Subscribe to our KZbin: / microsoftmechanicsseries
• Talk with other IT Pros, join us on the Microsoft Tech Community: techcommunity.microsoft.com/t...
• Watch or listen from anywhere, subscribe to our podcast: microsoftmechanics.libsyn.com...
► Keep getting this insider knowledge, join us on social:
• Follow us on Twitter: / msftmechanics
• Share knowledge on LinkedIn: / microsoft-mechanics
• Enjoy us on Instagram: / msftmechanics
• Loosen up with us on TikTok: / msftmechanics
GPT-4o is the large language model used behind Apple Intelligence and updates to Siri.
#AI #AISupercomputer #LLM #GPT

Пікірлер: 78
@alexpearson415
@alexpearson415 24 күн бұрын
This is my favorite video that Microsoft makes. So cool
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
Thank you so much! Appreciate your taking the time to comment and glad you liked it.
@ThaLiquidEdit
@ThaLiquidEdit 23 күн бұрын
Mark Russinovich is a legend!
@deeplearningpartnership
@deeplearningpartnership 5 күн бұрын
Oh, he's good alright.
@Breaking_Bold
@Breaking_Bold 9 күн бұрын
Very very informative…sent it to my kid who is in college to see and keep seeing till they understand every word!!!
@blitzio
@blitzio 24 күн бұрын
Awesome to see this, especially the hardware, networking and data center breakdown and info.
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
Glad you enjoyed it!
@ds920
@ds920 23 күн бұрын
That’s why I choose to buy their stocks, they know what it means to actually work. It was a long way for me from early 90s, when I’m - hardcore Unix user was calling Windows only using words “must die”, to start spending my free money on their stocks, and to actually admit what this company is really doing all this time. Thank you guys for keeping that spirit!
@Gersberms
@Gersberms 2 күн бұрын
They do awesome work, VS Code is basically the best program I've ever used. It's just such a shame Windows 11 is garbage all over again. I just moved to Ubuntu at home and couldn't be happier with it.
@Daniel-es9dq
@Daniel-es9dq 8 күн бұрын
I’m so glad people much smarter than I are working on this.
@user-sr5dk6rw2k
@user-sr5dk6rw2k 7 күн бұрын
With Great Power comes Great Capabilities... Microsoft 📲💻🖥🎮
@BigEightiesNewWave
@BigEightiesNewWave 24 күн бұрын
Man, Mark is God-status at Microsoft
@liberty-matrix
@liberty-matrix 9 күн бұрын
"it's funny you know all these AI 'weights'. they're just basically numbers in a comma separated value file and that's our digital God, a CSV file." ~Elon Musk. 12/2023
@ABLwAmazing
@ABLwAmazing 12 күн бұрын
Ah, the sysinternals guy. I owe half my career to this guy. Thx.
@user-gg8we2ot4b
@user-gg8we2ot4b 24 күн бұрын
Interesting architecture.
@LouSpironello
@LouSpironello 24 күн бұрын
Great info about the architecture! Thank you.
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
Thank you! Glad it helped on the architecture front.
@ShpanMan
@ShpanMan 22 күн бұрын
Underrated video, a lot of cool useful details!
@MSFTMechanics
@MSFTMechanics 21 күн бұрын
Thank you! Happy that it's useful - and it keeps evolving quickly.
@drivenbycuriosity
@drivenbycuriosity 22 күн бұрын
Most fascinating part for me is the Multi-LORA.
@MSFTMechanics
@MSFTMechanics 21 күн бұрын
It is. It's a little like differencing disks with the additional state/data.
@SuperRider-RS
@SuperRider-RS 24 күн бұрын
Great session, Thank you
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
Appreciate the compliment, thank you!
@jeetmajumdar7588
@jeetmajumdar7588 17 күн бұрын
Great session, Mark is as always the best❤
@MSFTMechanics
@MSFTMechanics 17 күн бұрын
Thanks so much! Appreciate your taking the time to comment.
@IshaqIbrahim3
@IshaqIbrahim3 24 күн бұрын
Timeline: 9:00 What happen to the heat energy extracted during cooling? Does it get used to generate electricity to power other devices or supply energy to some of the cooling fans or is it not used for anything?
@jamieknight326
@jamieknight326 Күн бұрын
It’s not reused. The heat is distributed across millions of litres of water and it can’t be concentrated back into a single spot. Sadly we can’t take 2 litres of 50c water and turn it into 1 litre of 100c water… The water is heated, but not heated enough to be very useful for much beyond heating offices / nearby buildings. I’m curious if someone will use the heat for some sort of low energy industrial process like drying cement.
@IshaqIbrahim3
@IshaqIbrahim3 Күн бұрын
@@jamieknight326 like keeping the tea, coffee, eggs etc. warm. 🤣
@duran5533
@duran5533 6 күн бұрын
Did I understand correctly: "Today, 6 months later, we deploy the equivalent of 5 of those supercomputers every month"!?!?
@MSFTMechanics
@MSFTMechanics 4 күн бұрын
That's right. 30+ instances have been built since November 2023
@phobosmoon4643
@phobosmoon4643 23 күн бұрын
Great video. I have a maybe annoying question; how can we know that cloud ai services are selling us what they say they are? For example, context length could easily be fudged.
@phobosmoon4643
@phobosmoon4643 22 күн бұрын
@@test-zg4hv yea I'm asking how you test it? Is it kind of like a error checking algorithm?
@MSFTMechanics
@MSFTMechanics 22 күн бұрын
You can stipulate that in code or using the Azure AI Studio, and you can test it. We cover that to some extent in this episode kzbin.info/www/bejne/aZm9oKWCrpt9n6M
@sceptic33
@sceptic33 6 күн бұрын
on the subject of cooling and power requirements, i've been saying for ages that the "waste heat" is only waste if you don't use it. most electricity generators work by using heat to drive turbines. instead of using burning fuel or nuclear reactions to create heat, we should use the heat generated by compute as the source for generating electricity. pump and compress the heat from the cooling fluid into a reservoir which a second heat exchanger uses to vaporise a second working fluid which drives the turbines turning generators that feed electricity back to the GPU clusters. recycle the power endlessly.
@jamieknight326
@jamieknight326 Күн бұрын
The physics problem is around concentrating energy / heat into one spot. While the total heat energy is in the MW range, it’s distributed across millions of litres of fluid (water / air) which is lightly heated and can’t be concentrated into a single place. Thermodynamics doesn’t allow for addition of heat between working fluids. You can’t use 2 litres of 50c water to create 1 litre of 100c water. I’m a nutshell, we can take the distributed head and convert it into the high pressure high volume of steam needed to run an electricity turbine. The heat may be useful for an industrial process like drying cement. But that ends up being being uneconomical as power from the grid is much cheaper than recovered heat. I wish this process worked. It would be amazing, but the physics doesn’t work out. :(
@sceptic33
@sceptic33 Күн бұрын
@@jamieknight326 people always say it can't be done. i'm not convinced. low grade heat is raised when compressed by a heat pump. using a multi stage setup where a chain of pumps uses the increased temp from the previous pump as the base to concentrate further, is see no reason why a final reservoir of compressed heat shouldn't be hot enough to drive a turbine and generate electricity. you can generate electricity with a sterling engine and a cup of tea. a data centre converting 100MW of electricity into 99.9MW of heat, should be able to provide 99.9MW of heat to a heat engine.
@lifeslooker
@lifeslooker 24 күн бұрын
What would it take to take a 175B model to shrink it to run on a mobile phone? What are the limitations? The language used in the model? Can a compression be used or a language be developed that doesn't take up much space?
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
The closest correlation to size is the parameter count, so Phi-3-mini has 3.8bn parameters and is roughly 2.2GB file size to run locally on the phone as demonstrated by Mark in the video. There are things that the larger models will do in terms of reasoning and built-in knowledge, as Mark said. One example that we actually hit while planning this show is that the slightly larger Phi-3 models could phrase the cookie recipe in the writing style of Yoda from Star Wars. Because mini didn't have the pop culture references in its training set, we made the tone sarcasm instead.
@lifeslooker
@lifeslooker 24 күн бұрын
@@MSFTMechanics funny I’m watching Star Wars episode 1 right now on Apple TV+😂😂😂😂 Sarcasm is something the is very rich in style and in different languages would be interesting to see how this is done in say Italian or French
@Rafael555888
@Rafael555888 3 күн бұрын
So they can now run the same LLm on different GPUs(Nvidia vs Maya vs AMD)?
@sachoslks
@sachoslks 21 күн бұрын
5 times the Azure supercomputer deployed each month, thats insane!!! What does that mean for training next gen frontier models? 30x November 2023 does it mean you can train it 30x longer, 30x bigger or 30x faster or what? Will this continue up to the end of the year reaching almost 65x compute in one year?
@MSFTMechanics
@MSFTMechanics 21 күн бұрын
Good questions. We have deployed 30x total or on average 5 additional instances per month of the November 2023 Top 500 submission with 14k networked GPUs, 1.1m cores and 561 petaflops. These will continue getting bigger and more instances provisioned in the future. And now there are more options for GPUs and AI accelerators, too, plus the Nvidia H200 and Blackwell architectures are coming soon with more speed, power and efficiency.
@Jj-du8ls
@Jj-du8ls 24 күн бұрын
5 times the Azure supercomputer deployed each month? Is that a typo..
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
It's not. We just announced 30x have been added since November 2023
@Hashtag-Hashtagcucu
@Hashtag-Hashtagcucu 23 күн бұрын
What he isn’t saying is for how long this rate goes on
@guruware8612
@guruware8612 23 күн бұрын
@@Hashtag-Hashtagcucu For ever, as long as there are people thinking that it's a great idea to chat with a machine or have a robot-dog. Insanity is the new norm.
@coreystrait513
@coreystrait513 21 күн бұрын
​@@MSFTMechanicsStargate and quantum computing hurry up
@kylev.8248
@kylev.8248 24 күн бұрын
This is awesome
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
Glad you liked it and thank you!
@GhostyDog
@GhostyDog 5 сағат бұрын
What’s that again..? You’re adding the capacity of the third most powerful supercomputer every month! 😮
@MDFnyny
@MDFnyny 21 күн бұрын
Thanks, quite impressive!
@MSFTMechanics
@MSFTMechanics 21 күн бұрын
Thanks for watching and commenting!
@nestorreveron
@nestorreveron 24 күн бұрын
Thanks.
@MSFTMechanics
@MSFTMechanics 24 күн бұрын
You're welcome
@RohanKumar-vx5sb
@RohanKumar-vx5sb 20 күн бұрын
cool stuff!
@jamieknight326
@jamieknight326 Күн бұрын
It’s amazing… impressive budget for by chips from NVIDIA. But is it worth it? Curious to see if AI will take off or not.
@bfg5244
@bfg5244 24 күн бұрын
that's inspiring
@MSFTMechanics
@MSFTMechanics 22 күн бұрын
Glad you liked it. Thanks for taking the time to comment.
@synthwave7
@synthwave7 20 күн бұрын
Glad Microsoft is making sure there is co-existence between all hardware manufacrturers, otehrwise AI hardware will become chaos.
@kyber.octopus
@kyber.octopus 20 күн бұрын
Nice
@jeffreyrh
@jeffreyrh 23 күн бұрын
Wouldn't it be possible to create a distributed computer system like SETI or that Protein folding project, and use this computing power to train AI systems? Those projects used peoples personal computers when they had idle time.
@Zreknarf
@Zreknarf 18 күн бұрын
it's called a botnet and yeah you can do that. these are purpose built AI chips though, nobody has those at home because they are not for sale yet.
@Zreknarf
@Zreknarf 18 күн бұрын
also, from the video, inferencing requires high bandwidth memory, not so much compute power, which would suffer greatly from latency
@Crunch_dGH
@Crunch_dGH 9 күн бұрын
I prefer the much more reliable/resilient IOS. Just replacing my trusty Air with a 2TB M3 iPad Pro.
@youtuberun6626
@youtuberun6626 3 күн бұрын
Rubén godoy islas 4:35
@Rkcuddles
@Rkcuddles 23 күн бұрын
This dude AI?
@DeployJeremy
@DeployJeremy 21 күн бұрын
Mark has been trained on at least 175 billion parameters, but he isn't AI 🙂
@amg2u
@amg2u 5 күн бұрын
iPhone?
@MSFTMechanics
@MSFTMechanics 4 күн бұрын
Yes, iPhone 15 Pro Max in this case.
@Arcticwhir
@Arcticwhir 22 күн бұрын
13:38 you used the same exact joke a year ago with mark
@MSFTMechanics
@MSFTMechanics 22 күн бұрын
Yes, that was intentional, because Multi-LoRA would allow Neo to have hundreds or thousands of skills added simultaneously, not just the one like last year.
@Danrchy
@Danrchy Күн бұрын
Mark looks like ai generated when he talks
@ArronLorenz
@ArronLorenz 22 күн бұрын
Solid organic joke.
@donelson52
@donelson52 7 күн бұрын
How much CO2 does this cost? EXACTLY how bad is it now and EXACTLY HOW will you power this by 2030
@MSFTMechanics
@MSFTMechanics 6 күн бұрын
Check out the Microsoft sustainability site for details: www.microsoft.com/en-us/corporate-responsibility/sustainability-journey
@oldtimer2192
@oldtimer2192 2 күн бұрын
Quick question. Why do we even need this kind of technology? Society seemed to get by decades ago without all this sudden rushing for AI by mega corporations! Social media “sorry it’s actually anti social media which has been responsible for many teen suicides and other issues such as cyber bullying to say the least and that is putting it mildly” is mostly a curse on recent generations! SMH yes seriously Shaking My Head!!!! 👎👎👎
How Supercomputers ACTUALLY Run The World
21:17
The Tesla Space
Рет қаралды 101 М.
The Story of Snapdragon X Elite
14:44
High Yield
Рет қаралды 180 М.
They RUINED Everything! 😢
00:31
Carter Sharer
Рет қаралды 25 МЛН
Шокирующая Речь Выпускника 😳📽️@CarrolltonTexas
00:43
Глеб Рандалайнен
Рет қаралды 11 МЛН
Build your own copilot! New Microsoft Copilot extensions
9:33
Microsoft Mechanics
Рет қаралды 20 М.
98% Cloud Cost Saved By Writing Our Own Database
21:45
ThePrimeTime
Рет қаралды 301 М.
Top-of-mind on AI with Mark Russinovich | Studio71
18:26
Microsoft Developer
Рет қаралды 1 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
What's actually inside a $100 billion AI data center?
27:15
AI Just Changed Everything … Again
18:28
Undecided with Matt Ferrell
Рет қаралды 391 М.
I’m kind of an iPad hater, but this is MAGICAL. - iPad Pro M4
15:03
ShortCircuit
Рет қаралды 1,1 МЛН
Building a $100,000 PC for Minecraft
19:09
Linus Tech Tips
Рет қаралды 7 МЛН
TOP-18 ФИШЕК iOS 18
17:09
Wylsacom
Рет қаралды 646 М.
iPhone 15 Unboxing Paper diy
0:57
Cute Fay
Рет қаралды 3,7 МЛН
i like you subscriber ♥️♥️ #trending #iphone #apple #iphonefold
0:14