Your ability to synthesize complex topics and make them digestible is remarkable - no mistaking that you're a fantastic professor. Thank you!
@FrankBurke-u6s3 күн бұрын
Uuiuuuuupuooooooooououoopportunity to get the job and the kids okoooooooooyoyyykykkkkkkkkkkkkkkkykykkky PThis
@jajessee3 күн бұрын
Thanks!
@jajessee2 күн бұрын
Really appreciate the depth and effort. Best
@tlan773 күн бұрын
This was a wonderful presentation of beyond cutting-edge network technology created by Tesla. Dr. Know-it-all did a masterful job on this one! Kudos!!! ❤
@Ask-a-Rocket-Scientist3 күн бұрын
As a TSLA shareholder, I think TSLA should have gotten $5B worth of xAI or more for the use of our patent that was central to xAI success.
@johnfitzpatrick83103 күн бұрын
Seems like IP theft unless xAI is licensing this tech from Tesla. Hoping Elon has all of his bases covered on these technology transfers between Tesla, a public company, and his private ones. I understand that IP has actually flowed in both directions over the years, including SpaceX.
@redneckcoder2 күн бұрын
I suspect you'll get more than 5B in value back from what xAI is doing.
@ericmckinney58982 күн бұрын
@redneckcoder. Thru what mechanism?
@spadjustersshubert28722 күн бұрын
@@ericmckinney5898by the time Tesla installs the blakwells they are going to be up to speed with the learnings of x and it will probably be invaluable for Tesla imagin that they will have the largest computer in existence and the second largest
@redneckcoder2 күн бұрын
@@ericmckinney5898 R&D for effectively free.
@khut2u3 күн бұрын
This discussion isn’t necessarily new. It just gets shut down due to the high costs of implementation. We talked about such a system for a network application for ISS and it never got out of the talking phase. Non-technical managers at typical companies never want to absorb the cost or the risk.
@jacobuserasmus3 күн бұрын
Hi Dr. Know-It-All, It seems there may be some confusion here. First, you mentioned sub-millisecond latency, but the XAi node cluster actually uses a **400GbE backbone** between nodes, achieving **sub-microsecond (30ns)** latency, which is about **30,000 times faster** than what you referenced. To put it into perspective, the latency for a round trip over Ethernet is in the **same range** as a **DDR5** local memory lookup, meaning a cluster machine can access memory on another node with nearly the same latency as accessing its own local memory, with a slight reduction in bandwidth (about half the bandwidth). You stated in your video that TCP can only be implemented in software. This is incorrect. High-end routers and switches have **hardware-enabled TCP** and can process **TCP packets at wire speed**. In the case of Tesla, the protocol is interesting because it removes the **TCP/IP overhead**, making Ethernet a **reliable transport protocol**. It also addresses some of the limitations in the standard **TCP flow** and **handshake**. **In other words, they can use less expensive switches** while still achieving high performance. Thank you for bringing this to my attention. I’m interested to see how this develops and whether they have a deal to manufacture Ethernet cards with the protocol builtin or if they simply used RoCE or have hardware built that can doe TTP.
@shuriken48523 күн бұрын
Not sure why you did not know about this, as Tesla presented Dojo and the Tesla Transport Protocol (TTP) at the Hot Chips 34 conference in 2022, where they showcased the microarchitecture of the Dojo supercomputer and its custom protocol.
@OnigoroshiZero3 күн бұрын
The fact that they found a way to scale a single cluster as much as they want, while all the others are limited to 25-30k GPUs, is massive. This, along with the fact that they build it so fast, gives them at minimum a 14-18 months advantage in compute power (massive advantage, especially if they scale it to 200-500k GPUs). Grok will not have a competition in 2025 and most likely well in to 2026, as the scaling laws are far from reaching their limits, and xAI will be the only company (well, also Tesla, but it's not direct competition) to reap the massive benefits of all this concentrated power.
@Paulctan3 күн бұрын
Key invention is the replay data path. OMG, that is brilliant. I never would have thought of that!!
@Nurse_Diesel4 күн бұрын
At least he didn't call the new patent, Skynet!
@KilgoreTrout-w3n4 күн бұрын
I am very sure he at least thought about it 🙂
@taavetmalkov32953 күн бұрын
this is the best AI news of the year... that xAI did this leap!
@ranig28483 күн бұрын
I wonder if Tesla now has silicon implementing TTTP and being deployed in Texas and Memphis. If they do this can give them 1-2 year lease on large coherent AI clusters. Also explains why Elon is buying as many GPUs as he can as he can leverage that advantage in this 1-2 year window. If so, it would be a HUGE!! Win!
@slwiser14 күн бұрын
A million cpus not a billion I think is what was said.
@Pleiades7214 күн бұрын
Yeah, I was thinking the exact same thing. Pretty sure it was a million. If he did say a billion, well all that it represents then is that this method of supercomputing is scalable, based solely on human's time, effort, and resource availability. If so, humans can make it happen to the billion.
@SirHargreeves4 күн бұрын
You’d need a Dyson sphere to power 1 billion
@favesongslist4 күн бұрын
Musk said a million, yet maybe this new hardware architecture can support a billion?
@Pleiades7214 күн бұрын
@@SirHargreeves It all depends on hardware efficiencies happening over time, for example making processors with smaller nanometer wires. Maybe we get down to the micrometer, etc, and you can pack more in at a higher energy savings. It all depends on where our roadblocks are.
@iandavies48534 күн бұрын
Musk did reply to a post on X re billion GPUs. It’s a possible future, not that far away. Everything is crazy anyway.
@sluggo3slug3 күн бұрын
I am impressed by your very concise summation
@NitroReviewsMN4 күн бұрын
John I love this content! keep it up!
@musicman533 күн бұрын
Just an awesome explanation John, you are the best mate!
@islabonita41934 күн бұрын
DO you think the various media houses will ever publish for the public to understand that Tesla is miles ahead of anyone? Elon is truly the Techno King
@Pleiades7214 күн бұрын
They'll claim it at some point when it's undeniable, then magically say they've been saying the whole time.
@juliahello66733 күн бұрын
No because politics. And billionaire bad.
@solomongrundy13 күн бұрын
So sounds like they effetively made a new nic/transceiver for L1 - L4. Makes sense. Just added 2 osi more layers in the hw to reduce latency in network and transport layer to wire speed rather than having the os process it. There actually are innovations that do something similar, just not for this purpose. The timing system really makes the system scale and cohere.
@jamesbond_0073 күн бұрын
Thank you for the detailed explanation. I think Mesochronous computing is a stroke of genius -- so very out of the box thinking, and once you see it, the benefits of it are obvious. Some of the aspects of this architecture seem to incorporate the notion of bounded eventual consistency -- where "eventual" here is a very short time interval.
@mpd86333 күн бұрын
Very interesting video. I'm not a tech guy, but this video gave me a really simplified and basic idea of what's going on with all this coherence within AI/GPU clusters stuff. Thanks, Doc! ❤
@janh-r8h3 күн бұрын
❤
@Itsallmeagain2 күн бұрын
Good job on explaining the use of hardware vs software in the management of packets distribution/reception conformation. Well done.
@stilllearning74343 күн бұрын
Oh my, how my head hurts.
@coreycoddington81324 күн бұрын
Fantastic breakdown as always ! If you're competing against musk prepare for Innovation things are not going to stay the same
@natpainter81854 күн бұрын
gonna need to watch that twice. thanks doc
@Optimusbeard-r7q3 күн бұрын
I always like it whenever @drknowitall says; "In others". That's usually were he dumb it down for me 😅😅😅😅
@rustyfox813 күн бұрын
This basically like FPGA computer vs software maybe 100X faster
@NitroReviewsMN3 күн бұрын
I'm not any networking engineer, but hearing you lay this out, to me sounds like the same process is happening from how we went for single core CPU chips to more and more multi core chips. I wonder if that was some of the thinking of correlation.
@Balance20973 күн бұрын
I know likening a supercomputer to a human brain has been done a billion times, but this new development is REALLY starting to mean we are creating something analogous (and more powerful).
@darwinboor13003 күн бұрын
Today's supercomputers are still digital deterministic devices. We are learning ways to make them perform like an analog non-deterministic brain. One of the next quantum leaps in computing will be the return to analog computing.
@briolewale72303 күн бұрын
you are an amazing teacher
@darwinboor13003 күн бұрын
There is still a wall. Tesla just moved the wall to the back 40. New hardware provides most of the leap in scaling and coherence. AI-based system configuration and monitoring (pytorch) provides in cluster virtual machine design and monitoring, off cluster dataset organization and data compression/expansion, and training cycle control. Once again, Tesla and the Xs are showing the world that factory design is far more important and far more complex than the products produced by the factory.
@robschultz926223 сағат бұрын
I think the bigger point on this information is the TTP (Tesla Transport Protocol) that is in hardware. I'm thinking about this and by the time you recognize the issue, think of possible options with current technology, design the chip and protocol and submit it for manufacturing, and test and get it into production, Tesla must of started this will over a years (or two) ago. They recognized and projected a bottleneck that they would not hit for a year of two and worked on a solution. This is a crazy amount of steps ahead of their competition that today didn't think is was possible. This is truly a company looking into the future to see what is needed now to remove roadblocks tomorrow. Crazy.
@bzn2sfo3 күн бұрын
Is this also the networking technology they use in the cybertruck?
@gregbailey452 күн бұрын
No.
@erics211221 сағат бұрын
156.25 MHz may be the frequency, but that's only counting the up or down edge; if both edges are counted, the clock rate can be doubled
@iDarekZ3 күн бұрын
3:25 Your lossy system is your wife system? 😂😂😂
@JoelSapp3 күн бұрын
While I think your assessment is correct that the current TTP can’t be flexible, that’s by design criteria. I would expect designers could model for the 90% case in other environments initially between servers in server farms, then along hops. They’d be most determined to do so as traffic could be increased with more efficient transport. Then finally the last mile would benefit greatly.
@coachkevinyoungКүн бұрын
It seems as though the timing parameters, such as the background timers, could be programmable so that it could be used in slower environments, similar to the way TCP/IP works. Back in my days as a software developer, anything you could do in software or could be done in hardware. It just costs more!
@JoelSappКүн бұрын
@@coachkevinyoung yeah. The innovation the Elon's team is taking advantage of is the ability to make chips much more cheaply than in previous times. I think your saying should be updated to say: You can do anything in hardware or software, but it'll cost you money or compute cycles.
@chrisgilpin1942 күн бұрын
That was a great explanation
@pdbsprinter95874 күн бұрын
Brain exploded 🤯
@Advoc8te4Truth3 күн бұрын
So often, Elon, or aka Elon companies, create geometric innovations in technology by applying basic physics principles to real-world conundrums that have alluded some of the biggest companies in technology for decades. 😂❤ I mean if your going to have a mantra first principles thinking is a pretty good one. 😊❤
@imconsequetau52753 күн бұрын
alluded -> eluded
@gregbailey452 күн бұрын
*eluded.
@chlistens77424 күн бұрын
my first computer had 48K of ram total. it is a next step and i hope it works well.
@RobertLoPinto3 күн бұрын
Your first computer had 48K of RAM? That was huge compared to my first computer, a Commodore VIC-20 with a whopping 3K of RAM!
@imconsequetau52753 күн бұрын
A lot of early computers had large programs segmented into 4096 byte blocks. The maximum addressable program+data space was 0 to 4095 (12 bits). So a 16 bit virtual address space was a huge improvement, especially when each word addresses increased to stored 16 bits or two bytes.
@ericew4 сағат бұрын
At tens of thousands of GPUs you also end up with hardware failures as a significant issue.
@ghrosenb3 күн бұрын
I just learned so much.
@marriagepartnersministry59422 күн бұрын
So if Meta wants to do it Teslas way they would have to pay Tesla for the patent usage????
@janh-r8h3 күн бұрын
Elon‘s companies are just amazing ❤
@Mojo16011973Күн бұрын
Great content. I think i missed SMR's video on this.😁
@formytots01289 сағат бұрын
Thanks for this great video, Dr Gibbs. Learned something new. :)
@TheBestNameEverMade3 күн бұрын
Most good video calls often use the lower layer UDP with a smart layer on top. TCP makes things worse.
@mckirkus4 күн бұрын
I wonder if the whole Mixture of Experts approach was just an attempt to scale to more GPUs by abandoning coherence.
@Garageshelves3 күн бұрын
That was great
@nattydred25932 күн бұрын
I think having layer 3 ipv4, v6 in hardware is very common. Am I mistaken? Sure, years ago you could run that layer in software on say a Linux box, and that becomes your router, but nowadays everyone impliments that in hardware I think. There may be exceptions like software defined networks, but you'd never use a SDN for a high performance cluster or network.
@Michael-il5wd4 күн бұрын
The doc helps us, Laura help the doc with a like and comment
@gregbailey452 күн бұрын
I suspect they are already doing this in the DOJO system.
@jasonmoss68913 күн бұрын
So the race is on who can spend the most and get them connected in mass? By submitting the pattend did the secret get out? Who has the best AI today and into the future? Was does this mean for power consumption? Should this back and forth save heat and therefore be more efficent? Where do the lost nodes go?
@gregbailey452 күн бұрын
Watch it again. And again. And again...
@joe_limon3 күн бұрын
Can this be used for training multi modal transformer based models?
@gregbailey452 күн бұрын
As John explained, it would be totally unnecessary for LLM's!
@joe_limon2 күн бұрын
@@gregbailey45 I am not thinking inference. i am thinking how much more powerful training can be
@MiaSoreryOF3 күн бұрын
So if this approach is the only way to string together hundreds of thousands of GP used together and Tesla holds the patterns does that mean? Tesla for the foreseeable future will be the only ones in town with super massive data centers
@gregbailey452 күн бұрын
Patterns?
@ranig28483 күн бұрын
Ford invented the production pipeline, Tesla/AI invented the AI training pipeline that’s constantly moving along, not a “step” at time but continuously flowing 🥳😱
@simbabuilds93383 күн бұрын
Power move would be to give this tech to OpenAI so he can crush them in a fair fight
@Guytron95Күн бұрын
Glad to signup and hungry for more!
@VivekLuna2 күн бұрын
I'm 54 and my wife and I are VERY worried about our future, gas and food prices rising daily. We have had our savings dwindle with the cost of living into the stratosphere, and we are finding it impossible to replace them. We can get by, but can't seem to get ahead. My condolences to anyone retiring in this crisis, 30 years nonstop just for a crooked system to take all you worked for.
@ЕленаФирсова-ц6м2 күн бұрын
I feel your pain mate, as a fellow retiree, I’d suggest you look into passive index fund investing and learn some more. For me, I had my share of ups and downs when I first started looking for a consistent passive income so I hired an expert advisor for aid, and following her advice, I poured $30k in value stocks and digital assets, Up to 200k so far and pretty sure I'm ready for whatever comes.
@VivekLuna2 күн бұрын
@@ЕленаФирсова-ц6м That's actually quite impressive, I could use some Info on your FA, I am looking to make a change on my finances this year as well
@ЕленаФирсова-ц6м2 күн бұрын
@@VivekLuna My advisor is *MARGARET MOLLI ALVEY*
@ЕленаФирсова-ц6м2 күн бұрын
You can look her up online
@IbrahimIsabella-002 күн бұрын
@@ЕленаФирсова-ц6м The crazy part is that those advisors are probably outperforming the market and raising good returns but some are charging fees over fees that drain your portfolio. Is this the case with yours too?
@321Misfits2 күн бұрын
He’s the reason why there is no sales on gpu’s at Best Buy ):
@patitofeo19364 күн бұрын
fpga for flexibilitiy?
@philipp5943 күн бұрын
Why does it matter how fast they train if they are data constrained. (Tesla).
@kazedcat3 күн бұрын
This is not about Tesla but xAI
@philipp5943 күн бұрын
@ Tesla is literally the first word in the video title.
@kazedcat3 күн бұрын
@philipp594 Clickbait. xAI has very low Mindshare.
@imconsequetau52753 күн бұрын
They found out that training longer on a given set of data will improve the inference dataset. The resulting inference neural net works better, faster. Training on faster hardware also helps. @@philipp594
@darwinboor13003 күн бұрын
Many data constraints on training FSD can be reduced using generative AI to expand the existing datasets. Every time you change the model for your AI, you have to train the new version of the model and then validate the model for performance while screening for new emergent errors and reemergent old errors. Then you need to make corrections to the new model and repeat the process. It is much like the old days when program steps were punched into cards one card at a time. You did not learn you had an error until you waited overnight for the program to run. Reducing the time between runs to less than the time to correct an error changed computer programming forever. We are still looking for a method to achieve the same transformation to the process of building AIs. At this time using AIs to design, generate data for, train, and validate new AIs holds the greatest promise to approach this transformation.
@Gargamel-n-Rudmilla3 күн бұрын
There is no spoon. 😊
@allanoas523Күн бұрын
I thought Elon’s words were 30 to 100 thousand and up to a “million” GPUs, not a “billion”. A billion is way too many, even for Elon’s optimistic past estimates.
@paulmuriithi91953 күн бұрын
this is bigger than google's gemini 2.0 or openai's pro mode releases. elon's computer clusters will give agentic generative workflows will have superior reasoning due to this coherency. 2025 will end up being a big big win for reasoning auto agents and agentic swarms.
@nicoxis4 күн бұрын
you think this will trigger a massive demand for Nvidia GPUs from other players trying to follow XAi ??
@imconsequetau52753 күн бұрын
Some vendors use an entire silicon die to hold n* GPU cores, communicating with n*n on-die photonic fabric channels. This is another excellent scalable technology. It reconfigures around failing cores and cache, so resiliency is great.
@breyrey76124 күн бұрын
The next question is.... when are they selling the protocol in digital form like bitcoin? :-)
@Martin-se3ij2 күн бұрын
When I look up META it says their computer has 600,000 GPUs. So where does your 32,000 limit come from?
@gregbailey452 күн бұрын
Good question. Maybe they operate in parallel.
@Martin-se3ij2 күн бұрын
@@gregbailey45 I've heard of that, I think it's in Kansas.
@ClayBellBrews3 күн бұрын
Some of this smells like bullshit. I’m pretty sure we’ve been doing TCP/IP on server NIC hardware for about 15 years now some of these NIC’s are so powerful they can run VM’s on the NIC. And to be clear, your internet is hardware based, has been for the last 20/25 years or when ever it was that the Cisco BFR came out. Now a faster, higher frequency clock is a real improvement. Same with caching more on hardware. But again, it’s still incoherent and will crap out once your timing shift exceeds your clock frequency (minus the error rate). I really like the idea of the clock offset creating a wave of partitions that have enough bandwidth to complete their ops before the next wave of ops, re-use the same bandwidth. But a clock tic is a cycle, and what we can stick in that cycle depends on how well/fine we can divide the tic.
@ClayBellBrews3 күн бұрын
Sorry for all the edits, my train of thought is also incoherent.
@darwinboor13003 күн бұрын
YES. They did not eliminate the wall they just moved it to a point where it is almost non-relevant.
@jimbert503 күн бұрын
What is a GPU here? A GPU chip and even many processor chips contain many individual GPUs. So in this context, is a GPU an entire Nvidia chip, for example, or just one of the GPUs in a chip?
@kazedcat3 күн бұрын
This is warehouse level so a GPU should be one GPU rack or 8 GPU package each package contains multiple GPU chips.
@imconsequetau52753 күн бұрын
When it works *_coherently,_* the entire building / datacenter contains a *_single huge GPU,_* no matter how many • Cores, • Pixel pipelines, • Vertex shaders, • Memory addresses, • Texture mapping units, or • Render output units are contained in any chip, board, rack, or the entire datacenter.
@jimbert503 күн бұрын
@@imconsequetau5275 Uhm - the whole video is about how they broke the 30 thousand GPU limit. So that's 30,000 buildings? ;-)
@seancollins9745Күн бұрын
Software is cheaper the silicon, so, that's why tcp/ip is software based. Wait until the cam put the entire nueral stack on silicon, instead of a inference acceleration device, the driving ai is a dedicated chip built from training, probably need something akin to a foga with a gpu vlwi structure. Also the ttp seems to be very buffer bloat avoidant 256k storage points right to it
@HàoNguyễnVăn-f4p3 күн бұрын
Set to XAI600K 2$
@christopherprovenzano36544 күн бұрын
How come in that video on the all in pod they act like no one else has been able to get up to 100,000 gpus when meta has as far as I can tell from the headlines.
@taavetmalkov32953 күн бұрын
there could be a difference in the perfect sync method and less efficient gpu cluster architecture
@4CPhạmThịBảoLinh3 күн бұрын
XAI600K will probably be involved
@ThanhNguyen-dj1cq3 күн бұрын
I am bullish for $XAI600K and $NEXO only!!
@eaglegp73 күн бұрын
And there is Willow Quantum chip
@honkytonk44653 күн бұрын
for research only
@hardheadjarheadКүн бұрын
So…looking at the titles of this guy’s videos…it looks like he’s exclusively promoting Elon Musk. Tesla this and Tesla that.
@tungthanh38773 күн бұрын
Let's go XAI600K so much potential to go to the moon.
@ExploringCabinsandMines3 күн бұрын
Could an AI such as this hold off a computer virus?
@LinhTran-w2d3 күн бұрын
Where to buy XAI600K pls
@coachkevinyoungКүн бұрын
Scam
@ĐìnhsanDương3 күн бұрын
XAI600K going up like crazy! Pick up around 0.67 and now it’s hit $1! I wish i had bought more!
@coachkevinyoungКүн бұрын
Scam
@SuperUbuntudude3 күн бұрын
@johnsonjjohnson1004 күн бұрын
Question: Is Tesla making their own factory robots? They bought a German company some time back that we haven't heard from in some time Is the company they bought helping with Optimus? Unboxed Method?
@Martin-se3ij3 күн бұрын
if Tesla has patented this method how do the other players follow? China of course will just copy it.
@dsmith59404 күн бұрын
Neat
@fungibleunit447718 сағат бұрын
These folks are really not the first to do this ditch Ethernet and do it in hardware.. Inmos OS and DS links (mod 80s) Fujitsu K computer (and its forebears), iBM BlueGene etc...Worth noting that INMOS DS links supported wormhole touting in the early 90s...
@michaelbartell11664 күн бұрын
Human brain
@DanFrederiksen4 күн бұрын
hmm, DOGE should abolish the patent system. since it is only a scourge. completely useless for it's intended purpose. just a lawyer swamp
@workingTchr2 күн бұрын
So Elon is doing patents now? Little loss of idealism, but life will do that to you.
@frodekleppe38842 күн бұрын
❤
@FfĐ-m4u3 күн бұрын
Where do you buy XAI600K?
@thoughtpolease71832 күн бұрын
Tesla is just a car company
@ElmnopenКүн бұрын
7 minutes in and still all you've done is say they did it. You haven't said any word about how
@NguyễnThịHoa-j8m3 күн бұрын
Ive stared buying XAI600K ,and staked them.
@johnsteichen5239Күн бұрын
I can’t continue to listen to you Bable about packets. Your mind is lost.
@user-erick0073 күн бұрын
This video will cause or lead a Lawsuit against Elon Musk and XAI for sharing/stealing Tesla patent technology without Tesla board approval & proper payment/compensation
@allangraham9703 күн бұрын
Tesla And XAi extremely likely to a contract in place already for how technology is shared as technology appears to flow in both directions
@user-erick0073 күн бұрын
@allangraham970 A lawsuit has already filed against Elon & Tesla for apparently transferring Nivida GPUs from Tesla to XAI , so idk how many things r actually legally contracted & how much of it is Elon doing his own thing without thinking about bureaucracy
@BrianMosleyUK4 күн бұрын
FSD when?
@HaHoang-ol6gt3 күн бұрын
Do not sleep on XAI600K people
@ThanhvuNguyen-s8o3 күн бұрын
In your opinion, XAI600K for $10? 1 year or so?
@KhuongNguyen-n6b3 күн бұрын
Just swapped all of my last ETH and swapped it into XAI600K. Already up a little bit. Unfortunately I have some other junk staked which won’t free up for a while. Still now I am on the train!