Bigger Chips = Better AI? Nvidia's Blackwell vs. Cerebras Wafer Scale Engine

Рет қаралды 31,962

Күн бұрын

Пікірлер: 114

@chipstockinvestor 7 ай бұрын

👉👉Want more Chip Stock Investor? Our membership delivers exclusive perks! Join our Discord community, get downloadable show notes, custom emojis, and more. Become a true insider - upgrade your experience today! Join at KZbin: kzbin.info/door/3aD-gfmHV_MhMmcwyIu1wAjoin Join on Ko-Fi: ko-fi.com/chipstockinvestor 🔥🔥🔥

@RedondoBeach2 7 ай бұрын

Good information but painful to listen to the two of you talk. Do yourself a favor and listen to your own speech patterns. You both have a terrible habit of talking in... a... very... choppy... pattern. This is aside from the obvious editing done to this video.

@alexsassanimd 7 ай бұрын

REQUEST: an episode of details of most important NVIDIA "partnerships" and what it means for the company's future. You guys are awesome.

@pierrever 7 ай бұрын

Oracle dans les ordi quantique

@oker59 7 ай бұрын

Cerebras pace of making smaller and smaller transistors in large scale wafers, suggests they have some systematic understanding of how to deal with the thermal/quantum jitters. ASML now has 1 nm feature size ability(ASML's technology is also a technological miracle. And, they can see how they can go beyond their current miracle. ). So, I expect Cerebras for one to beat their cs-3

@darrell857 7 ай бұрын

NVidia makes its chips at or near the retical limit, as does WSE. Both designs overprovision functional units that can be fused off/routed around and still meet the specification (some estimate about 10-20% of H100 chip is disabled silicon). Nvidia can bin bad chips into lower blackwell products to offset costs, WSE doesn't have this option. WSE requires a complex cooling system but a lot less networking. Blackwell requires an additional NVlink chip per 8 gpus or so, advanced packaging for the GPU dies/HBM, advanced melanox networking to get a lot of gpus to communicate. So it isn't so clear who wins on a cost basis. Cerebras seems to have solved the cooling/mechanical problems so in theory they can outperform blackwell on certain models that fit within the chips memory. However that is substantially less memory than blackwell.

@chipstockinvestor 7 ай бұрын

Exactly. Thank you for the extra detail on the comp. Fun to watch the battle. All good stuff for customers.

7 ай бұрын

Cerebra’s has also its own networking technology and it’s fast. Something to consider when evaluating his smaller memory per chip. Cerebra’s has also its own networking technology and it’s fast. I’m curious about the diff between Blackwell and WSE-3 operation cost. Also their business model are different. Cerebra’s doesn’t sell their chips but act as cloud provider and supercomputer builder/administrator In diff modalities. I’m curious about the diff between Blackwell and WS

@Dmillz192 7 ай бұрын

Huang said last week NVIDIA doesnt make or sell chips tho lmao then to double down and say theyre a software company

@v1kt0u5 5 ай бұрын

@@Dmillz192 it's true... they're designers... TSMC makes the chips ... and in fact the insider knowledge/software part of Nvidia is by far the most valuable!

@Dmillz192 5 ай бұрын

@@v1kt0u5tsmc makes both gpus and cpus. Again nvidia is known for their gpus which will take atleast 6 years to catch up and perform to the level of wafer chips; which tsmc is actually manufacturing for Cerebras. Cerebras was the only client of tsmc doing wafer chips until recently with Teslas wafer sized 'Dojo' which was announced a few weeks ago.

@kualakevin 7 ай бұрын

Good video, but hope video can elaborate more on how Cerebras has solved problems (3) and (4) in their product. And for problem (5), power consumption, although larger chip would consume more power per chip, but it consumes less power for the equivalent compute (of smaller chips stitches together with interconnects or other methods).

@Stan_144 4 ай бұрын

I found interesting book: "Chiplet Design and Heterogeneous Integration Packaging" by by John H. Lau. 895 pages. The book focuses on the design, materials, process, fabrication, and reliability of chiplet design and heterogeneous integraton packaging. Both principles and engineering practice have been addressed, with more weight placed on engineering practice. This is achieved by providing in-depth study on a number of major topics such as chip partitioning, chip splitting, multiple system and heterogeneous integration with TSV-interposers, multiple system and heterogeneous integration with TSV-less interposers, chiplets lateral communication, system-in-package, fan-out wafer/panel-level packaging, and various Cu-Cu hybrid bonding. The book can benefit researchers, engineers, and graduate students in fields of electrical engineering, mechanical engineering, materials sciences, and industry engineering, etc.

@basamnath2883 7 ай бұрын

Great video

@sugavanamprabhakaran2028 7 ай бұрын

Excellent! As always you both are great teachers in this field! Keep up your amazing hard work! ❤

@rastarebel4503 7 ай бұрын

HIGH QUALITY CONTENT!!! the 5 reasons on chips size limits was excellent... love it!

@valueinvestor8555 7 ай бұрын

Very interesting video, especially the five reasons for size limiations at the end. #1 was new and interesting to me. But it makes sense. This is something most non-experts would probably not find out by themselves easily. #2 was relatively obvious. Carebras has at least somehwhat of a solution for this as you mentioned. They are somehow routing around damaged transistors (not sure how effective their solution is). #3 also makes sense. But like with #1 most people wouldn't know by how much exactly this would limit the chip size. #4 also makes sense. Maybe materials science could help here!? or maybe the optimal available materials are already used. It would seem that Nvidia wouldn't make compromises here given the product price. #5 I guess the previous points all play into this TCO calculation and it is probably cheaper to cool separate smaller chips. It would be interesting to know if the Nivida CEO thinks that the size of the Blackwell chips is already optimal or if it could make sense to grow chip size further at least for very large customers who need the most computing power. I asked Gemini why 300 mm is the current standard for wafers. One interesting aspect is that precisely handling 450 mm diameters wafers for example would be an immense technological challenge, because the wafers are so fragile.

@majidsiddiqui2906 7 ай бұрын

Great video. Good basic explanation regarding the 5 main reasons chips cannot easily be made bigger.👍

@mdo5121 7 ай бұрын

Another plethora of important info....thanks as always

@eversunnyguy 7 ай бұрын

Your channel has come to my eyes at the right time...but I wished I knew this channel before the AI frenzy 2 months ago..

@matteoianni9372 22 күн бұрын

Nice to discover a video of NotebookLM’s previous model. It was definitely less life like and realistic than today.

@chipstockinvestor 22 күн бұрын

Blackwell upgrade incoming, we'll look and sound no different than real humans!

@zebbie09 7 ай бұрын

Excellent presentation. Thanks for sharing….

@stgeorgetalk9849 14 күн бұрын

The problem with Cerebrus' approach is that chiplets themselves already have redundancy to help prevent defective parts. Also, you need to break out to external devices at some point, and having everything monolithic means you have to break out for IO, storage, etc etc. anyway.

@andreinedelcu5330 7 ай бұрын

Great videos and content! as always

@Stan_144 4 ай бұрын

Great content ! I learned a lot from it ,

@chipstockinvestor 4 ай бұрын

We're glad to hear it!

@styx1272 7 ай бұрын

Thanks Crew. I wonder if you might do a video on Brainchip Corps neuromorphic Akida chip ? I'm very curious to understand how the Akida 2000 works because it has memory embedded in the chip in 4 memory configurations to a 'node' or an axiom. Producing a super low powered chip. I'm wondering why other companies aren't following this design? And does it have the potential to be scaled into training ?

@chrisgarner5765 7 ай бұрын

Nvidia has the fastest interconnect of all the competitors ... Nvidia also is the company that started all of this, really with Deep learning! Plus, Nvidia is more than capable of making a wafer scale chip if the believed it was a better way!!!!!!!!!!!!! Nvidia also has the Best software stack and tools for the job!!!!!!

@limbeh3301 7 ай бұрын

No, Cerebras has the fastest interconnects between dies. It's basically like communication between two Blackwell dies, but instead of 2 you get 80-90 dies. Also Cerebras inter-die communication is faster than Blackwell since they're not using 2.5D. They're just using a metal layer, it looks like.

@lynnecoles7276 2 ай бұрын

If you think about all the leading companies that are no longer the leading at the top. Cerebras System has already released inference that is faster than Invidia! This is who I am watching right now!

@chrisgarner5765 2 ай бұрын

@@lynnecoles7276 well to be fair NVIDIA hasn't bothered to focus on inferencing as of yet, but you can inference a lot of ways. I use a 96core Genoa with 768GB of 12 channel DDR5 and 2 3090fe cards. Genoa feeds the gpus memory and works great and can basically run any model if in gguf extremely well that fits within 800GB environment.

@aaronb8698 7 ай бұрын

GREAT PRESENTATION!

@mtoporovsky 7 ай бұрын

Do u have some info about firms with develop on semi-light combining solutions?

@heelspurs 7 ай бұрын

The entire wafer is etched by reticles before it's cut into chips, so I don't see how problem #1, 'the reticle" is a problem for using the entire wafer for 1 chip. As for defects, the architecture enables bypassing sections that have a defect. Groq does this. It's not simply "infrastructure" that limits wafers to 12 inches, but the inability to make the flow of gases and heat across the entire wafer perfectly even. You could slow each step down to help gases and heat spread more evenly, but that reduces production rate. The only very fundamental physics problem is that you want as much of the chip to be synchronized with the clock steps as much possible because parallel computing for generalized algorithms can greatly waste computation. You can't have the entire wafer synch at high clock speeds because, for example, at 1 GHz, light can travel only 300 mm and it's not a straight path across the chip & capacitances greatly reduce that max speed, and at 1 GHz you really need everything sync'd at less than 1/4 the clock cycle (75 mm max distance). Fortunately, video and matrix multiplication are algorithms that can efficiently do parallel ("non-sync'd") computation. Training can't do parallel efficiently, but inference can, although NVDA's GPU architecture can't do it nearly as efficient as theoretically possible. Groq capitalizes on this, not needing any switches (Jensen was proud of NVDA's new switches being more efficient) or any L2 or cache (which at least 2x the energy per compute required), which is why Groq is 10x more tokens per energy than H100.

@SavageBits 7 ай бұрын

Reticle size constrains the size of the of the largest unique design that can be patterned on the wafer. Those identical designs are then repeated across the entire wafer. Blackwell takes 2 of maximum reticle size dies and connects them together in the same package. My prediction is that the Blackwell 's successor will connect 4 maximum reticle size dies in the same package. Nvidia approach is more flexible then the WS-3, which has massive cooling, power distribution, and defect management challenges.

@limbeh3301 7 ай бұрын

The advantage of staying on the wafer is that you have extremely low latency and extremely high bandwidth between the reticles. First, Blackwell only allows 2 reticles to talk to each other using the 2.5D interconnect (which likely has larger pitch than what Cerebras is doing). Second, the moment the data has to leave the Blackwell package you'll need to use NVlink, and eventually infiniband. This is why you see that everyone is trying to make larger and larger chip, to optimize for the communication between compute elements.

@geordiehawkins7372 7 ай бұрын

Great insight for this non-techie. Still able to get good info that will help with due diligence before investing. Thanks!

@missunique65 7 ай бұрын

could you cover the building out of the newer bigger data centers -I heard Andreeson talk about them.

@IATotal 7 ай бұрын

Thanks a lot for the video!

@AdvantestInc 7 ай бұрын

How do you see the role of advanced packaging techniques evolving in response to these scaling challenges?

@chipstockinvestor 7 ай бұрын

We think advanced packaging companies have a lot to gain to make it all happen.

@johndoh5182 7 ай бұрын

Bigger chip = higher defect rate. If the chip is designed to deal with failed parts of the die so they can still get to market (pathways through the chip can be disabled and the chip specs allow for a certain percentage of the chip to fail in production), then it's not terrible. But a wafer size chip is a nightmare. Pretty much any wafer that comes off a line has defects. It's only a matter of percentage. The prevailing knowledge is that the smaller you can make a die (chip), the smaller the percentage will be for chips that fail off that one wafer. For instance is a single wafer is used to make ONE chip AND there is no allowance for failed parts of that chip, then the failure rate is pretty much always going to be 100% and of course that's not feasible.

@shannonoliver7992 7 ай бұрын

GREAT video! I can't believe you continue to produce such great content. Job well done, and a BIG thanks!!

@stachowi 2 ай бұрын

this was very good.

@chipstockinvestor 2 ай бұрын

Thanks, glad you liked it!

@Ronnieleec 7 ай бұрын

What about patent limits? Are semiconductor companies and EDI companies patenting variations, like pharma companies and etc.?

@alan-od8hk 7 ай бұрын

Little disappointed that you really didn't cover the cerebras cs-3 chip and compare it to nvidias grasshopper.

@RoTelnCheese 7 ай бұрын

Great work guys. What do you think of Tesla and their developments in robotics and AI? Their stock value is compelling right now

@ronmatthews2164 7 ай бұрын

Under $ 140 in a year.

@limbeh3301 7 ай бұрын

Tesla met with Huang to beg for more GPUs. That tells you how well his Dojo supercomputer is doing.

@rahulchahal3824 7 ай бұрын

Just SUPER

@lightichigo 7 ай бұрын

Can you guys do a video anout Groq and how it will impact nvidia monopoly ?

@владши-о8з 7 ай бұрын

Thanks you!🌹🌹🌹

@1964juls 7 ай бұрын

Great information, love your reviews! Can you review ALAB(Astera Labs Inc)?

@groom84 4 ай бұрын

Heat and SRAM

@limbeh3301 7 ай бұрын

How is N5 one and a half generation behind N4?? That's like half a generation behind...

@chipstockinvestor 7 ай бұрын

The N4 node being utilized isn't the standard one, but a newer "enhanced" N4

@mach1553 7 ай бұрын

This is GPU bridging by 2 die stitching & gaining an extremely huge boost in performance!

@pieterboots8566 7 ай бұрын

One more disadvantage: path length or wire length. Everybody knows these are all steps towards the optimal full 3d chip not just interconnects. This will have the highest transistor count and the shortest path length.

@limbeh3301 7 ай бұрын

Problem with stacking vertically is power delivery and cooling. For compute you can't really stack much because the heat density will be too high to cool. This is why you only see memory being stacked on top of compute.

@pieterboots8566 7 ай бұрын

@@limbeh3301 Chiplets with interconnects also have this problem.

@GustavoNoronha 6 ай бұрын

nVidia isn't ahead of the pack in terms of packaging, the new Blackwell double chip is exactly the same thing as the Apple M1 Ultra - 2 really big chips connected together using TSMC CoWoS. What makes nVidia the leader of the pack is their design, and in some cases the software support. For AI that is not a big deal, CUDA is not as relevant, people aren't writing to those APIs, they are using things like PyTorch, higher level frameworks that support all of the major vendor APIs these days, so that is not a big competitive advantage. It would be good to do a deep dive in all the technologies used in the MI300 - AMD has been on the vanguard when it comes to packaging. It doesn't mean it gets the win, but it should be a good case study for how all of these advanced packaging technologies work, how they can be used to increase cost effectiveness by reducing the size of the dies that are fabricated (yield), and in providing a lot of flexibility on product-level differentiation. MI300A is a good indication of what the future holds.

@chipstockinvestor 6 ай бұрын

Did you see our fab equipment video? We are planning some more detail on what CoWoS entails, as these are the processes all these chips and systems utilize.

@UltimateEnd0 5 ай бұрын

MI300A=home super computer Cerabras=commercial super computer. They aren't even in the same league.

@GustavoNoronha 5 ай бұрын

@@UltimateEnd0 MI300A is definitely not for home computers, the El Capitan super computer being installed right now should take the number 1 spot in the TOP500 super computers list when it's fully installed, and it's powered by MI300A.

@DigitalDesignET 7 ай бұрын

@9:15 - 4np for the Blacwell is actually 5nm technology, it's not 4nm. That's why people need to understand this meaning is no longer tells us anything about transistor density. If I misunderstood someone correct me.

@chipstockinvestor 7 ай бұрын

Sorry but we don't make up the names for these manufacturing processes. It is indeed called 4N, regardless of what the transistor sizes actually are, that's the name of it.

@DigitalDesignET 7 ай бұрын

@@chipstockinvestor thanks replying, it soure is interesting to understand more about this manufacturing process as it can be misleading information which tech is more superior.

@alexsassanimd 7 ай бұрын

how can one invest in Cerebras? They seem to be a private company

@chipstockinvestor 7 ай бұрын

You are correct Cerebras is private

@limbeh3301 7 ай бұрын

There are some websites that allows transactions in secondary markets. You might get lucky and score some shares.

@JayDee-b5u 7 ай бұрын

Is there a native compiler for numpy to cerebras? If they are doing the latter, Nvidia is just fine.

@chipstockinvestor 7 ай бұрын

www.cerebras.net/blog/whats-new-in-r0.6-of-the-cerebras-sdk

@eversunnyguy 7 ай бұрын

Would like to hear your view on PLTR Palantir...Or this channel is only for chips...

@t33mtech59 7 ай бұрын

Why do the hosts seem AI generated lol. Or just oddly calm and consistent in cadence

@seabassmoor 7 ай бұрын

I think the video is chopped up

@MsDuketown 7 ай бұрын

Monolithic boundaries.. But smaller calculation units are better. ARM already proved that, and now the explosion of diversification will do the rest..

@GuyLakeman 7 ай бұрын

WELL, THEY FRY EGGS TOO !!!!

@camronrubin8599 7 ай бұрын

Nvidia going to stitch waferscales together 😆

@Baylorbetterthanbrown Ай бұрын

Yeah Nvidia can do the same thing 😀

@ARIK.R 7 ай бұрын

And also CAMT

@elroy1836 7 ай бұрын

To paraphrase another reactor to a different review of NVDA's Blackwell, I hope at some point there is some discussion of AVGO's (Broadcom) newly produced ASIC chip with 12 HBM stacks versus the 8 on Nvidia’s Blackwell. While the focus seems constantly directed at the innovation of NVDA, the AVGO solution reportedly provides 50% more performance in an accelerator at the same or lower price than NVIDIA's solution.

@tamasberki7758 7 ай бұрын

So you guys are telling me those pills I bought on a shady webshop won't make my chip bigger? 😉😃

@tarikviaer-mcclymont5762 7 ай бұрын

May result in chip shrinkage

@chipstockinvestor 7 ай бұрын

😂

@danielstevanoski 7 ай бұрын

Co-fee?

@jacqdanieles 7 ай бұрын

Ko-fi

@suyashmisra7406 7 ай бұрын

You were doing okay until you said "these are not perfect conductors, they are semiconductors" Good video otherwise, considering the channel is dedicated more towards people who are interested in stocks rather than the tech itself.

@almostdead9567 7 ай бұрын

Why isn't liquid nitrogen used to cool these chips ? I mean quantum chips use liquid nitrogen so why not these big ones ?

@chipstockinvestor 7 ай бұрын

Power consumption. It takes more energy to cool the chips, in addition to the energy to operate them. A poorly designed cooling system can add a huge expense to a data center's operations.

@bounceday 7 ай бұрын

Bigger chips are hotter chips. Why is this not a concern. Is it lower energy architecture and smaller die allowance?

@chipstockinvestor 7 ай бұрын

New techniques being used to try and keep those monster chips cool. All in the name of tearing through more data. We have some research in queue on Vertiv (VRT).

@TheBestNameEverMade 7 ай бұрын

That last point is not correct. Per compute, the celebras system uses less power because you need less extra equipment to do the same thing. Did you not research the numbers?

@chipstockinvestor 7 ай бұрын

Uh, we don't recall attacking Cerebras and we certainly didn't say that it used more power. What we did say, was that it is possible that a bigger chip may have an increased total cost of ownership. We gave 5 reasons why it is a challenge to make bigger chips work. Did you not watch the whole video? Context is important.

@TheBestNameEverMade 7 ай бұрын

Thanks for responding. I did. Go to the section where you talk about TCO. 16.05. I know you said might but it doesn't because when you have 60x as much compute and a huge amount of memory on chip there is less power in total even if the there is more power per chip for cooling etc... also Nvidia needs dozens of chips for communication to do the same as one chip. Communication is much cheaper in power usage if it's just baked into the chip.

@kleanthisgroutides7100 7 ай бұрын

My issue with Cerebras is the them being adamant they get 100% yield which is of course BS… they will not disclose how much of the wafer is actually bad/defective. As for the power they are not lower power when running at full tilt with a normalised process… yes there is an architecture advantage for lower power but in the grand scheme of things it’s not significant. Transistors are transistors, they need to switch hence consume power.

@UltimateEnd0 5 ай бұрын

Except that that Cerebras CS-3 uses 200x less energy consumption than the fastest super computers currently operating in the world.

@kleanthisgroutides7100 5 ай бұрын

@@UltimateEnd0 15KW-25KW is not low power… there’s no comparison to a supercomputer since it’s not Apples to Apples.

@christian15213 7 ай бұрын

Doesn't this all lead to the push for quantum

@noway8233 2 ай бұрын

Cerebras is huge , the umpalompas gone sufer

@Baylorbetterthanbrown Ай бұрын

Is privately owned so it's not for the public

@anahitaaalami9064 3 ай бұрын

so….. is cerebras a threat to nvidia or not?

@chipstockinvestor 3 ай бұрын

No probably not

@Baylorbetterthanbrown Ай бұрын

Yesh Yes they are

@Baylorbetterthanbrown Ай бұрын

Im getting in early 😅

@RedondoBeach2 7 ай бұрын

Why.... do.... you..... talk.... like.... robots?

@anahitaaalami9064 7 ай бұрын

Intel

@johnsands6652 5 ай бұрын

When will cerebras go public?

@themusic6808 3 ай бұрын

Sounds like they’re planning for a IPO in October

@godfreycarmichael 2 ай бұрын

These people are AI generated. Show me your hands!

@MichaelMantion 7 ай бұрын

Just skip to 12:44 this video was such a waste of time I think I will unsubl

@ARIK.R 7 ай бұрын

Camtek (NASDAQ:CAMT) said Monday it has received a new order for about $25 million from a tier-1 HBM manufacturer, for the inspection and metrology of High Bandwidth Memory.