Why Hybrid Bonding is the Future of Packaging

Рет қаралды 117,832

Күн бұрын

Пікірлер: 354

@DeltaNovum Ай бұрын

I have 0 practical use of the information provided in this video, but I still enjoyed it very much. The way you explain everything, and your voice paired with incredible visuals and animations, made for a really easy to understand and entertaining watch!

@vladislav_artyukhov Ай бұрын

One of practice use is ability to confidently navigate through marketing confetti, while choosing new system.

@dianapennepacker6854 Ай бұрын

I just watched someone freaken put solder on a chip pretty evenly in what looked like a minute or two. Will I ever have to the 50 to 100 beads spread across a microchip? Never. Yet that was non chalant it made it wild. I am only a minute into a video. Didn't know people did that. Thought those things burnt into the other side. So basically a min into the video and learning already. Kkk bye.

@m_sedziwoj Ай бұрын

wisdom is built on knowledge, so is good to know a lot of things, even if they are not useful directly.

@ApocDevTeam Ай бұрын

The fact that the regular Zen 5 CPU's were mostly received as being very disappointing but the 9800X3D is flying off shelves like crazy, should tell you how valuable this tech can be for the right applications. Being able to cram all that extra cache in there really makes these CPU's shine in games.

@osopenowsstudio9175 Ай бұрын

Maybe Zen 5 is basically a revamped X3D, but hey it's still confusing on why the normal Zen 5 sucks bad

@johndoh5182 Ай бұрын

@@osopenowsstudio9175 Two things. Windows 11 has issues that affect the performance, most of which has been fixed and if you don't keep up with Tech news you'll miss these kinds of data and you get stuck thinking something that isn't true. Next, a LOT of reviews were done with the 9700X and this was the typical CPU used for game performance of Zen 5 in general since that's about the most any game can use. But the 9700X is a 65W TDP part. AMD I believe has released a new AGESA that allows it to clock a bit faster, and it does so very easily. So, Zen 5 doesn't suck bad and in fact it's an excellent CPU generation. If all the CPUs released NOW instead of when they did the day 1 reviews would look much better other than the 9700X unless AMD changed that to a 105W TDP part like it should have been. It easily handles running at that power scheme. Oh, except Microsoft STILL would have had issues in Win11 affecting Zen 5 because they weren't going to fix it until a few million consumers started screaming at them. It's been tested running higher power schemes and it does a bit better than when it's set at 65W. Having said that Arrow Lake also didn't look good at launch and they seem to have the same problem in that Windows doesn't handle it well. Shocking huh? You mean a Microsoft OS has issues that affect the performance of these X86-64 CPUs? Gee I wonder if it's their intention to make those look worse than they are since after all they're selling their own devices now using ARM processors and they want people to buy THEIR hardware and lock people into a Microsoft ecosystem very much like what Apple has. For more info watch the different Level1tech videos in dealing with this issue. It seems Zen 5 runs like you'd expect under Linux. Zen 6 is GOING to use Zen 5 cores with maybe one or two changes so apparently AMD thinks their Zen 5 cores are perfectly fine, but I think Intel and AMD should partner and put out a professional version (distro) of Linux

@Concorde1059 Ай бұрын

@@osopenowsstudio9175 what I'm reading is some folks believe that the current IO die (memory bandwidth?) is bottlenecking the CPU and preventing a lot of the real gains in the core from actually showing up. The 3D cache masks a lot of that

@multiplyx100 27 күн бұрын

@@osopenowsstudio9175 Zen 5 is still better than Zen 4. It doesn't improve every aspect of performance, which is to be expected. Zen 4 was the best and now Zen 5 improves on it. People wanted more than they got, which was unreasonable, but, at the end of the day, Zen 5 is unsurpassed in x86 performance. Also, of course, there was the power usage reduction to take into account, which many didn't care about.

@osopenowsstudio9175 27 күн бұрын

@@multiplyx100 Fair enough but Zen 5 kinda tripping balls when they are barely faster than Zen 4 (for Windows) that even AMD is surprised.

@kuekejuu5057 Ай бұрын

the possibility of moving I/O into cache chiplet then hybrid bonding them with core chiplet might explain why zen 5 still use old I/O die, they still need to cook it before making sure it's ready. honestly i think they really need new I/O die for future zen cpu, even the curent one seems to be held back by the I/O die. integration I/O into cache chiplet could give them benefit like lower idle power consumption, higher memory support, lower ccd to ccd latency, etc.

@RobBCactive Ай бұрын

In "client" AMD have marketed monolithic chips with good battery life and lowered idle consumption especially when sleeping, the plans are for a Strix Halo using CCD chiplets & a beefy IOD/GPU. What does Intel have that's better than Zen4? Why would AMD divert resources from Zen6, to make a Zen5+ when they could use the re-use the cheaper EPYC platform edge computing variant for a new prosumer Threadripper range with quad-channel memory and loads of PCIE lanes? The current cheap IOD & chiplet architecture is being replaced already in Zen6, as was mentioned MI300 and Ryzen x3D are proving the technology which should be ready and scaled up for the performance desktop market launch. Perhaps Zen5 replaces the low end and Zen6 will initially be a high end product only. As general purpose compute is losing relative importance, we may be seeing core, GPU & NPU chiplets stacked on an IOD/cache interconnect in future.

@bobo-cc1xw Ай бұрын

The mild leak if you believe it has a silicon interloper. That would be an easier intermediate step. That approach would not work for strix halo and with nvidia entering. Mega apu game that would be a concern. Making laptop chips chiplet with a low power interloper seems a logical intermediate step

@RobBCactive Ай бұрын

@@bobo-cc1xw you mean an interposer, but that was a replacement for an organic substrate with solder bump connections and wires inside known as infinity fabric. They already have hybrid bonding working in V-cache and stacked MI300, but you need ways to connect up the tiles of larger server chips which are planned to offer more than compute but things like signal processing for telcos. Without a 2.5D off die interconnect the silicon area will be serverly limited by the reticule limit.

@shadow7037932 Ай бұрын

Unlikely they'll be able to do this due to thermal constraints.

@LeonardTavast Ай бұрын

it would be interesting with an IOD/SRAM/interposer combo but it would use a lot of die area and may have issues with yields and thermals. The latest processors from Intel are close to this idea though.

@NikilanRz Ай бұрын

This generation demonstrate that AMD was working hard to push boundaries over the past 10 years, and at the same time shows that INTEL was WASTING THE LAST 10 YEARS and trying to scam everyone selling the same chip every year..

@TuxikCE Ай бұрын

AMD was also trash at one point and lacked innovation. This is why Intel wasn't pressured to compete. You can't blame Intel if there was no competition. Now they are working their ass of, innovating their fabrication.

@SupraSav Ай бұрын

Intel has been such a disappointment. I used to be an intel fanboy but after their issues over the last year and questionable business practices.... AMD needs competition.

@TuxikCE Ай бұрын

@@SupraSav AMD has competition. Intel is not that behind. Just wait for Panther Lake with Intel 18A and Xe3.

@monad_tcp Ай бұрын

Yep, I have a 5820K and 10 years later , unless I get server CPUs , nothing much changed, they put more cache and that's it. Also actually it got worse. That E-core idea they took from ARM isn't good. (it's good for battery powered devices, not workstations) I want symmetrical huge cores, not smaller cores to save power. It doesn't even solve the dark silicon problem. Maybe it's for "ESG" . Like TVs being stupid slow because they can't use more than 40W in the Soviet, I mean European Union. But I digress.

@jrherita Ай бұрын

Sort of. Intel packaging is actually a bit more advanced than TSMC. Intel isn't fully taking advantage of it on the CPU side yet.

@eruiluvatar236 Ай бұрын

The increase in electrical resistance from having the cache die below the CCDs may not be as bad or even bad at all. It is true that the distance will be a bit higher but that can be offset by more tsvs in parallel. Another potential improvement is that for very high frequencies, current flows mostly in the outer layer of conductors increasing the apparent resistance of thicker ones. I would have to do the numbers to see if this is significant here. The skin effect doesn't apply to DC but even if the input voltage to a CPU is DC, the current flow isn't. Each time a transistor switches on/off there is a change in current, and that happens a lot at multiples and dividers of the clock. I don't work in the industry to actually know how bad is that current ripple and I have not made any numbers to know the skin depth at the required frequencies but I wouldn't be surprised if it plays some role as transistor switching itself can be much faster than the CPU speed and there is harmonics too from using square waves.

@mikebruzzone9570 Ай бұрын

"Skin effect", Industry terms; "magnetics", "field effects", cohesion, resistance, repulsion there is definitely analog happening here in the glue and the harmonics regulated not to be destructive. Well-articulated and easy to comprehend, thank you for arousing my thinking on the topic. mb

@kazedcat Ай бұрын

AMD uses an integrated inductor coil in the metal layer. This is a technology they introduced during the bulldozer era. The inductor smooths out current fluctuations in power delivery.

@doggSMK Ай бұрын

So Bulldozer was not 100% tragedy... 😆

@kazedcat Ай бұрын

@@doggSMK It's a flop but a lot of the technology they develop during that time was still being use today. For example Infinity Fabric is using a physical layer called GMI which was developed during the bulldozer era.

@mikeb3172 Ай бұрын

Switching frequencies high enough to cause a problem, eg in PCIe, are always accompanied by GND or the inverse frequency... so there's no problem.

@davidgunther8428 Ай бұрын

A hybrid bonded chip also has better thermal conduction between the layers than solder bumps would.

@19CD91 Ай бұрын

Yeah thats cool and all but have you tried super glue?

@nameeman207 Ай бұрын

also add cotton wool and baking soda for better adhesion

@usmcp 15 күн бұрын

Reddit leave

@KnowledgePerformance7 Ай бұрын

Thank you for these amazing videos! They have gotten me very excited about chip technology. I currently work adjacently in wearables but having learned about all this amazing stuff, I'd love to consider a second career in chip technology.

@HighYield Ай бұрын

There's some cool chip tech in wearables.

@wveerallday20 Ай бұрын

@@HighYieldever considered doing a video on something like the Apple watch S chips (or something similar)? I imagine there have to be some interesting choices and compromises in the interest of efficiency and space.

@spartanfoxie Ай бұрын

3:34 I want my cpu to be glued together with tiny burgers

@gurukarthikc7870 7 күн бұрын

After continuously learning about the semiconductor industry, I can say for sure these blokes have the best sense of humour!

@RaymondDoerr Ай бұрын

Thanks for giving a good background of hybrid bonding. I am so proud to be working in semiconductor packaging.

@e2rqey Ай бұрын

Been waiting for this video!!! Watching it on my 9800X3D 😁

@HighYield Ай бұрын

Can you ask your 9800X3D if it still has a support silicon on top? xD

@pedro.alcatra Ай бұрын

@@HighYield LOL

@TheEVEInspiration Ай бұрын

Does the video play faster?

@Onkar-i4s Ай бұрын

@@TheEVEInspiration340 fps😂

@gustavo_vanni Ай бұрын

I was expecting more layers of cache to be stacked. The flip to the bottom was a nice surprise. I knew they would probably do that in the future, just not so soon.

@AbdullahDeveloper-nw9cu Ай бұрын

Sir I’m a big fan of your channel and addicted to your content . The way you explain CPU technology and architecture with high defination visual in such a detailed and easy-to-understand manner is truly inspiring or addicted . I’m a beginner who is trying to learn and self studing about CPU architecture from the basics . Recently, I’ve been very curious about the Intel 6502 architecture, but I haven’t found any video that explains it as well as you explain other architectures or CPU technologies . I kindly request you to make a video on the 6502 architecture. It would be incredibly helpful for me and many others who are eager to learn.

@kevikiru Ай бұрын

An additional thing: On the question of whether AMD will combine cache and IO, it would make sense. It seems that they did not change the IO chiplet architecture and it is the Achilles heel for base zen 5 (non-3d). The challenge would be that in higher core counts, 24 and up, you would have a couple of IO dies just to accommodate the numerous ccd chiplets even with higher density of cores per ccd. For example, even if you combine the cache (the size of one chiplet) and the io die ( the size of about one and a half cpu chiplets), it would only accommodate two, perhaps three cpu chiplet. Maybe they will use this method for consumer CPUs, but have a separate 'IO die for other IO dies' in threadripper and epic CPUs, but this is cumbersome but might be worthwhile for the benefits.

@RobBCactive Ай бұрын

Non-consumer have much larger IOD offering far more memory channels and the IF links for far more chiplets. It would be desirable to re-unify the L3 cache between dual CCDs and hybrid bonding appears to make that possible. On moooaaaahhhhhrrrr cores, L3 less chiplets leaves room for that perhaps with larger L2 cache reducing the frequency of trips off die. Zen6 is AM5, the platform is fixed. As you observed V-cache solved some constraints to performance that the 9700x reportedly suffers from regarding memory latency and bandwidth, not fully feeding 2t per full fat core. Considering a hybrid of full fat & dense cores, as it stands little die area reduction would be gained once L3 is off die, so while 6+6 or 8+4 might seem attractive a chiplet with heterogeneous cores creates binning problems for less gain than that seen in analysis of the mobile CPU which re-layed out merged function blocks allowing them to use the freed empty space created by the dense cores

@tsclly2377 Ай бұрын

Cache initial access can be of a reduced speed by a factor of 2x (L1) to 4x (L2) as he cache is usually in a dump mode. Direct access out to the DRAM still is a multi-step (25-30 cycles) mode that can make many clock cycles shame the whole concept of the multi gigahertz process. Fast is on silicon die chip, pretty fast is on bonded-chiplet.... (and so on).....

@RobBCactive Ай бұрын

@@tsclly2377 part of the reason for the slower access is when you go from L1 using virtual addresses and translated physical addresses.

@b1lleman Ай бұрын

This video has answered all the questions I had about Zen 5's X3D packaging. Thank you.

@winstonsmith478 15 күн бұрын

Just incredible that nm scale ICs can be made in the first place, then made even more incredible that LAYERS of them can be properly aligned and then electrically bonded together.

@osor_io Ай бұрын

Literally forging the chips together. That’s so neat 😊

@davidgunther8428 Ай бұрын

The interesting thing to me is what do the regular Zen5 CCD to package connections look like? For X3D the cache chip has the solder bumps. Where do the solder bumps go on the CCD when it's the only chip?

@HighYield Ай бұрын

You know what, that's a great question I completely missed! My guess would be that the upper metal layers have to be different. But that would be a big BEOL change.

@kazedcat Ай бұрын

You can do copper to copper bonding with an organic redistribution layer.

@davidgunther8428 Ай бұрын

@kazedcat I didn't think fabrication with organic substrates had the precision needed. Also, it would only be the copper pads bonding, the oxide and organic layers wouldn't meld. The bond would be fragile.

@kazedcat Ай бұрын

@@davidgunther8428 Use a different bonding substance for silicon to plastic bond. Precision is not necessary for non-cache connection. Just design the IO and power via to have enough spacing and put the cache vias in a separate region.

@draco10111b Ай бұрын

How about they cluster the TSV's. The density can only be as high as the microbumps on the cache chip anyways. Wonder if a modified microbump process could connect with 4 copper connects.

@marce.fa28 Ай бұрын

I really missed this KZbin channel. Excellent graphics as always! I love it! A lot! And… ☺️😉

@zhenle777 Ай бұрын

Excellent video! From the AMD engineer's comment it does seem like they are still using two reconstituted wafers, but it would be interesting to do a cross-section SEM elemental analysis to try to see whether the oxide layer exists.

@SaccoBelmonte Ай бұрын

Ohh laser and plasma dicing!. I knew there was something more sophisticated than sawing.

@GustavoNoronha Ай бұрын

Awesome work as usual! Thank you for the great information, and the reading recommendations =)

@rjmaas Ай бұрын

Every time you post a video, I learn something new. Thank you so much for all the hard work you put into this.

@TheBackyardChemist Ай бұрын

Called it! In a comment under an earlier video.

@protox07 Күн бұрын

Have a happy new year High Yield

@timparker9174 Ай бұрын

Loved the video. I was wondering if you were planning a deep dive into the M4 chips like you did for the M3 chips. That was very interesting and would be great to see that. Cheers!

@HighYield Ай бұрын

I'm currently still waiting for die shots to appear. As soon as I have something to work with, I'll make a video!

@ThatTrueCJ201 Ай бұрын

Anecdotally, undervolting the 1st gen 3D VCache CPUs had the greatest effect on performance, which make sense given the many thermal barriers in the chip hierarchy.

@artemhnilov Ай бұрын

Your videos are the best in this field.

@Takashita_Sukakoki Ай бұрын

Unsure if the added complexity of unifying the the IO die with chiplet/cache is something we will see next consumer zen but seems like a logical approach in the near future.

@JeffBartlett-kj6sq Ай бұрын

It used to be the case that creating logic structures in a process that is optimized for memory had required a lot more area than a process optimized for logic. My knowledge is from about 15 years ago from a layout engineer that was working on the 'on die' calibration circuit on an airbag sensor.

@LCTRgames Ай бұрын

I wonder if the reason we have seen unexpected variations of the 5800X3D (5700X3D, 5600X3D) is due to the final binning of the resultant hybrid bonded chips? Perhaps they saw an increased defect rate and that was another reason they chose to go a completely different route for 9800X3D?

@kevikiru Ай бұрын

Hello good friend @HighYield. It's always awesome to see you do analyses on these chips...and as I said it before, on such a niche subject for most people. Honest question: How are you able to fund all this? I know you have Patreon, but that would not be enough income for you, it would barely pay for the original research!

@Daonexus Ай бұрын

I reckon it is part patreon and part enthusiasm

@HighYield Ай бұрын

There's a reason it sometimes takes weeks for me to release a new video, I work a normal job. KZbin is just my hobby. I'd love to focus more on it, but it's difficult to make the switch.

@b1lleman Ай бұрын

@@HighYield Well, for a hobby it's really exceptional quality !

@shahlegno5890 Ай бұрын

I not understand most of the microchip acronym works, but it was interesting to know more about how this super duper tiny thing was build in process.

@isaacpiegat6735 Ай бұрын

got all giddy and started kicking my feet up when i saw there's a new vid

@MrJacker1991 Ай бұрын

Great video like always!

@KeinNiemand 25 күн бұрын

Now they just need to add a second cache chiplit on top and make a sandwitch

@andersjjensen Ай бұрын

I think they are shooting for a 3 layer approach, which makes V-Cache "mandatory". Memory controllers and PCIe lanes scale the worst with node advances, so those go on the bottom layer. The middle layer will have L3 cache in the various little "house hold knickknacks" like integrated graphics, voltage rectification, sensors, etc (and in the case of laptop chips, basically the entire chipset/SoC functionality). The top layer will have the compute chiplets, that only have L1 and L2 cache. If Zen 6 does this it will probably be N6 -> N4 -> N2 (or N3X)

@n.shiina8798 Ай бұрын

wont having GPU sandwiched between them create thermal problem?. i mean, the heat will need to go through the top die and we all know GPU is quite power hungry device. intel's foveros seems to be much better approach since each die can dissipate heat at more or less the same thermal resistance

@andersjjensen Ай бұрын

@@n.shiina8798 An IGP is generally a fairly tame beast. The one on Zen4/5 pulls 7W maximum. Obviously AMD will need to make other considerations if they want an APU like the 7840/8840 family of laptop chips with integrated graphics powerful enough to do light gaming.

@andytroo Ай бұрын

One thing i haven't seen is that the cores are the main heat generators - so having the package go Very hot core - warm cache - cool substrate would be less thermal/mechanical stress than the vcache (forced to be core temp)-core-substrate

@TrueThanny Ай бұрын

Right now, AMD doesn't have a wafer bottleneck, but a packaging bottleneck. The only way I see things going the way you suggest, with I/O and cache put on a massive base die (which would need to have room for a large number of CCD chiplets on top) is if TSMC drastically increases their packaging capacity. That's not impossible, but the limited supply of the rather early X3D release shows it's far from a current reality. The cache on bottom did surprise me, but mostly because I didn't know it was essentially the same process. The only real issues were engineering at the front end. I thought being on the bottom would require more advanced packaging, which competes (in terms of allotted time) with the MI300 family of products that AMD is pushing hard to cash in on with the neural machine learning bubble.

@Ashaegen 15 күн бұрын

Great content! Thanks for your work!

@BrandonMeyer1641 Ай бұрын

My reaction to hearing the cache was on the bottom was one of disbelief. Nobody talked about it and made it seem like it was no big deal from a complexity and manufacturing standpoint. Having tsvs for connecting the ccd to the substrate for power and data that go through the cache die along with the tsvs to connect the cache to the ccd is drastically more complex than the way 3D cache used to work. It makes sense why the cache die is the same size as the ccd, it’s to fit all those tsvs. The word is that amd is moving to silicon interposers for zen 6 to connect the io die to the ccd and vcache. There is talk of an increased level of modularity coming with this change which will lead to cost savings as there will be less bespoke designs for ccds. Rather, they will use a common interposer to connect ccd to iod and the iod will be come bespoke based on the product. Ie. special iod for various client and commercial products. This is an interesting move with amds move towards a unified graphics architecture, udna. Perhaps we will see the next generation of an mi300 type product that is much less costly to manufacture due to the modularity of forthcoming products. Imagine a server product with a common silicon interposer connecting all ccds w/3d v-cache and udna dies to the io. Pretty cool.

@_tsu_ Ай бұрын

Imagine Zen 7 APU with a single unified CPU chiplet, GPU chiplet, and shared memory chiplet. It would be next level.

@notaras1985 Ай бұрын

I do. My PC will be 13995X3D with 6090ti.

@samuelsmith9582 Ай бұрын

This guy really enjoys talking about amD's Packag(ing). Honestly mad interesting. I'm curious what can be done to reduce thermal issues with more stacked layers.

@billwhoever2830 Ай бұрын

so how do you bond two delicate electrical components? TSMC: we heat them until all of their the conductors melt and fuse them together

@undertone2472 Ай бұрын

You are so good a breaking down how packaging is done. I don't work silicon industry professionally but I find it interesting, thanks for making it it palatable for me. I also it would be awesome if they indeed combine the IO and Cache die, dealing with the latency. Noted this current approach allows then to have to two different CCDs. So it will be interesting to see how the deal with 16 core. Maybe a 16 core chiplet for consumer 😊

@andrasrudnai9386 Ай бұрын

I think it may soon be time to see the Butter Donuts in action!

@spodule6000 Ай бұрын

I thought this was going to be boring but I was wrong.

@HighYield Ай бұрын

At some point during the editing, when I have seen the video too many times, I always start to think it will suck. And I'm happy when it doesn't :)

@doggSMK Ай бұрын

For Zen 6 AMD would have thick cache on the bottom with TSVs and I/O on it and 12 core thin CCDs on top. So cache and I/O die will also be the support for the thin 12 core CCD. Also the 12 core CCD may have more L2 cache and no L3 as it could be all moved to the cache chiplet. This way cores get optimal cooling.

@ole7736 Ай бұрын

Really great content, keep it up!

@budthecyborg4575 Ай бұрын

3D Stack IO would solve the memory latency problem. But then we do already have monolithic Zen5 chips (every laptop CPU) and they don't seem to perform massively better than chiplet Zen5.

@cahdoge Ай бұрын

It's not about the performance per se, but about being able to use older tech for the IO and being able to bin multiple processing chiplets for your SKUs, making the whole package significantly cheaper and reducing the massive die-to-die-latency we experience on Ryzen desktop today.

@ococoseco Ай бұрын

16:30 How do they thin the bottom carrier wafer down without damaging the transistors? If the solution is to leave a bit of carrier wafer, then that carrier wafer would also need to have matching power delivery structures in place, right?

@UJ-nt5oo Ай бұрын

Great video. I'll make sure to use hybrid bonding in the next cpu i design instead of the dumb way i do it now.

@JoaoBarbosa1996 Ай бұрын

"Base tile" IO with cache + two CCD's + backside power would be nuts

@maynardburger Ай бұрын

Seems to me there's a lot they can theoretically do going forward, but it's hard to guess because of issues with costs and scalability. Basically what makes practical sense to do for consumer products, without increasing costs too much, that they can also apply to big core server CPU's while still retaining the scaling and reusability of the chips for both purposes. Gonna be a balancing act there unless they want to go the route of creating entirely separate dies for these things.

@tuckerhiggins4336 Ай бұрын

Chip glue go brrrrrrrrtt

@RobBCactive Ай бұрын

At last a good explanation of what Lisa Su announced way back when she talked about bumpless & micro bump interconnects and nobody had a clue how troublesome x3D would prove to rivals. How good were sales of Milan/Genoa-X? I am wondering about Turin-X, the new IOD there offers much faster memory compatability, but Zen has always had the issue that 2c would max out a CCD's bandwidth, so for some applications V-Cache was killer allowing massive scaling within an 8c/16t CCD. May be the newer massive bandwidth AI aimed accelerators removed the memory limitations which kept calculations off GPUs that exceeded VRAM and so Turin-X is not a priority market compared to gaming. OTOH Turin-X delay could simply be a product of phasing, after all they have Turin dense and Turin-X customers are likely to be Genoa-X ones so delay may mean CPU upgrades down the line. Using 4nm one would expect the small chiplets and V-cache dies, soon won't need pairing for known good matches. Given 32MB L3 is standard across all Zen CCDs it must have built in redundancy (never saw 5 core cheaper 30MB models knocking around). Perhaps some screening of the wafer using visual recognition could estimate likely wastage so both approaches could be used together. But it could simply be an artefact of fab procedures, known good dies were always the input to hybrid bonding, not wafers and scaling up to a different method is on some long optimisation TODO list.

@Media-h4p Ай бұрын

Very educational, thumbs up!

@Skillnoob_ Ай бұрын

I kind of expected amd to solve the thermal issues and allow for higher clock speeds, because that was the biggest thing holding x3d back. I did not expect them to put the cache on the bottom though, but it paid off well, the 9800x3d is the current gaming king and as seen by the GamersNexus stream it is a overclocking beast as well

@John.Philip.Tan876 Ай бұрын

Interesting. I learned a lot of new stuff from your videos as always. But they are using glass interposers in the future right? What if it's not just in the substrate and it could be used for bonding layers as well?

@davidthiel483 Ай бұрын

Yes, the bonding of the CCXs to the die/interposer/cache/I/O/GPU that can be on a less expensive node where the components on the "interposer" layer/die wouldn't benefit from the same scale as the tiny logic CCXs on top and yet disperse the heat that the interposer layer doesn't produce... This has to be the thought here.

@VADemon Ай бұрын

It's apparent that on optional cache on top is easier logistically. But what replaces the bottom cache die, when there's none? The CPU still must meet structural and height requirements, the power still must flow to the CCD part? Is there always a dummy layer or are they able to move the CCD layer to the bottom for non-X3D?

@einekleineente1 Ай бұрын

Any news on backside Power delivery?

@HighYield Ай бұрын

Coming with Intel 18A and probably TSMC A16 iirc.

@einekleineente1 Ай бұрын

@@HighYield Great. Thanks a lot! so that should be 2027 for the customer

@BellJH Ай бұрын

@@einekleineente12025

@sqr00t 2 күн бұрын

I've enjoyed this deep dive very much and it has reminded me of the joys of my undergraduate (Materials Science, which I have moved far away from hardware + packaging tech since working in Data instead). I wonder if the reconstituted wafers can use silicon carbide wafers for better thermals and structural strength?

@wahdangun Ай бұрын

now AMD can just make core without l3 cache and just add them at the bottom, so chip yield will be higher and we can get bigger l3

@josuemartinez2660 24 күн бұрын

love your videos very informative

@timt.2764 Ай бұрын

Insanely informative video, current zen 5 x3d chips must be chip to waffer, the evidence of that is it's sucess to failure rate.

@leorickpccenter Ай бұрын

I actually expected AMD to move the 3D VCache below the CCD when the 5800X3D came out. Because its the only move that made sense to me. Remember this is a pet/side project that got lucky. Thus, the layout of the CCD did not have any 3DVcache in mind at the time.

@essentials1016 Ай бұрын

maaan youre video really are somethinng else!!

@devindodge8648 Ай бұрын

This is genius. Trust AMD to come up with every good idea, and other companies will copy them.

@MacGuyver85 Ай бұрын

Excellent explanation, thank you! Will be interesting to see their next move, the IO Die is clearly holding them back in Zen 4 and Zen 5, so Zen 5+ or Zen 6 should have major changes in that regard, no matter if it becomes part of the stack or just renewed.

@notaras1985 Ай бұрын

When are Zen 6-7 coming?

@MacGuyver85 Ай бұрын

@@notaras1985 2 years for each generation is a reasonable assumption so Zen 6 would be 2026, Zen 7 2028.

@johndoh5182 Ай бұрын

No, the IOD is NOT holding their CPUs down. The type of interconnect is holding them down. AMD is SUPPOSED to move to direct connects between chiplets, so you end up with what Intel calls tiles, where each chiplet can be pushed up next to another. If you can move to direct connects, you don't need a parallel to serial conversion just to send data from the CCD to IOD or the other way around although most data on the IOD is probably in serial form. But as a for instance, the CCD has to write data to memory. The cores first have to convert data to serial, then transfer that data over the CPU PCB, then into the IOD (at BEST a serial transfer clock rate of 3GHz for Zen 4 and 3.2GHz for Zen 5). There's already a lot of latency just from that. With direct connects, no need for a data conversion, data moves parallel between chiplets, or that's the way it should work. The transfer clock speed will probably bump up to 4+GHz, so about 25% faster ADDED to the removal of latency from not having to do data conversions. But back to that poor data that got sent to the IOD to store to memory, it has to go into the infinity fabric multiplexer for it to be directed to the correct place, and that multiplexer takes up a lot of space on the IOD. AMD should be able to get rid of that moving to direct connects. So then after the data moves through the multiplexer it can THEN be sent to the mem controller. AMD is using a cheap way to connect die right now which is understandable because they have to compete with Intel and when Zen 2 came out which is when AMD moved to MCM, Intel was the ruler of the world of X86-64 and AMD had to price products under what Intel did. They've kept that same interconnect using the IOD and Infinity Fabric through Zen 5, because it costs less. But that's changing and AMD should be able to move to direct connects between the CCDs and IOD so that whole notion of IOD holds anything back is just not true. It's the connection speed of the Infinity Fabric along with data conversions that hold back the CPUs, but it didn't so much for Zen 3. It started to pretty clearly for Zen 4 when AMD was able to push clocks for the cores a bit higher. In fact moving ANYTHING off the IOD and onto a die that sits below cores means you now have to make the entire product line more expensive as they would ALL now have to have stacked die, and AMD doesn't do well when their products are more expensive than Intel unless they are CLEARLY better. But OEMs won't use those parts to build PCs and laptops because they say they can't price AMD based products higher than they can Intel. So, simply changing out the type of interconnect to something that's slightly more expensive is by far the better option for AMD. They don't exist in a bubble, and most the market still sees AMD as a budget option, probably even you.

@notaras1985 Ай бұрын

@@johndoh5182 so what should we. Expect from Zen 6 and 7 architectures?

@MacGuyver85 Ай бұрын

@@johndoh5182 So what you're saying is that the IO Die is holding their CPUs back in Zen 4 and 5, got it. (The interconnection is part of that setup, changing the interconnection is changing the IO Die, so would removing the Infinity Fabric multiplexer) Still, appreciate the explanation, thank you. Not sure, seems to me that having the IO Die in the stack with TSVs is cheaper than having a silicon interposer on which you place the IO Die and the chiplets, but it's possible the interposer would be cheaper than the stacking process. Someone else mentioned the Infinity Fabric on High Performance Fanout that they used in RDNA3 as another option. That probably sits somewhere in-between the cost and benefits of stacking/interposer and the current setup. And no, AMD has been the premium gaming option since at least Zen 3 X3D, and depending on workload the premium multi-threaded option too. You know what they say about assumptions buddy.

@SasquatchsCousin33 Ай бұрын

Your theory about future I/O die integration makes sense for consumer/mobile CPU's. And it might explain the stagnation on AM5 chipsets. But PCIE can be powerhungry so i'm not sure an external chipset can be avoided

@petrsehnal7990 19 күн бұрын

How do they manage to heat two chiplets to 150 or 300 degrees without burning the transistors?

@AK-vx4dy Ай бұрын

If diffrence in cost between wafer to wafer and chip to wafer is big enough it could be jusifiable to left space around smaller chips on wafer, especially if need final ratio of chips is 1:1

@LexBarun Ай бұрын

4:47 Cleaned, processed in vacuum, pressing each other side... Is it... cold welding?

@johnsch8634 Ай бұрын

I wasn't surprised by the move because I've heard about AMD engineers complaining about the heat for some years now.

@vanqy. Ай бұрын

damn i cant stop watching those super interesting engineering videos. whats the anticipated way of becoming a silicon engineer?

@betag24cn Ай бұрын

the only reason i see for amd to put io under chiplets and extra cache is to leave space for gpu, npu and possible arm cores in a near future

@Gastell0 Ай бұрын

It seems that this technology can also be used when bonding with glass substrate

@shotgunenvy2657 Ай бұрын

I'm so excited for the future from AMD.

@---.. Ай бұрын

Putting the cache between the cores and RAM seems logical. As far as the data flow goes, the cache needs to connect to the ram, and the cache to the cores. Access to ram bypassing cache isn't common, so this doesn't seem like as much of a routing mess as it could be. That said, if that is how it works, making the same CCDs work without the caches on the non x3d chips is really impressive.

@Kemano24 Ай бұрын

I thought AMD would eventually add more layers of 3D v-cache. But placing the cache and IO on the base doesn't point that direction

@davidgunther8428 Ай бұрын

@@Kemano24 there might be room for IO and cache, but not room for extras like graphics. Maybe the memory controller would go on the cache chiplet, and there's an IO die for USB, PCIe and everything else. I think then you would have NUMA issues with more than 1 CCD. Cache and IO sound perfect for a trailing process node, I just don't know where to divide the pieces.

@benjaminoechsli1941 Ай бұрын

Oh, they'll add more layers. Just not in the direction we expected. ;P

@PanduPoluan Ай бұрын

You know, the thing that amuses me, is that before the 9800X3D launched, _everyone_ scoffed at the rumors that AMD was going to put the cache under the CCD, listing drawbacks that are "much larger" than the benefits. Then the 9800X3D appeared, shocking all the pundits, proving that the tradeoffs are not that bad as the 9800X3D totally annihilated all other "gaming CPUs" with zero exceptions.

@johnpaulbacon8320 Ай бұрын

I knew AMD had to change something with how the CPU-wafer and V-Cache wafer were gonna interact with each other. But as to what that would be or how it would be done ; I had no idea. It's surprising that AMD was able to come up with a bettter option in such a short amount of time. Maybe AMD had both options for how the CPU and V-Cache were to be connected and how that would effect performance and AMD used Option B first time around then found out that the other option not used yet was the better one.

@dionysus6081 Ай бұрын

If AMD combines io and 3d v-cache, wouldn't that lead to having 3d v-cache for lower in the stack cpus like a hypothetical ryzen 5 10600 x3d?

@m_sedziwoj Ай бұрын

maybe is stupid question, but could you put 3 active dies using hybrid bonding? so to have core die, cache die and at bottom IO die? And that cache die is active (ofc) don't they do only TSV for other parts, and if there are only TSV or they put some inactive (connection) layer? I doubt it, because it would required each core die to have catch die to reconnect to PCB, but in theory they could do it, but then it would be real 3D construction.

@looncraz 11 күн бұрын

I 100% anticipated AMD moving the cache to below the CCD - it was inevitable. I knew it was necessary before 3D VCache was ever mentioned by AMD, only their first papers on die stacking. I was shocked that they didn't do that the first time - I still can't see why they thought putting the VCache on top was a good idea... which probably means it was by some necessity or the thought that using the filler silicon was cheaper... which it probably was. I was doubly shocked that they didn't metalize the bonding layers with the filler silicon... a simple process that would have prevented an oxide layer that would have insulated the stack... likewise, they didn't even need to use silicon as filler and could have used copper directly, which maybe even could have made the 3D VCache chips easier to cool than their counterparts - assuming the L2 doesn't have the problematic hot spots.

@l3r4f54 Ай бұрын

Surely a dumb question but is it possible to mix two different process nodes on the same wafer? I mean, making the first layers (transistors and cache) using the most advanced mode (N3,N4) and then use a less advanced one like N7 for the vcache and the rest of the layers?

@Alex.The.Lionnnnn Ай бұрын

They're screwing with us. The new 3D cache is actually hanging onfrom the side like a koala.

@MoonDweller1337 Ай бұрын

Really good info!

@collin6526 23 күн бұрын

When will we be getting compute cubes? Cubes of pure processor.

@andreycamper5863 Ай бұрын

If 3D v-cache is now the same size as a die, then amd can fit more cache in it? It is like twice the size then before. Or am i wrong? Second question: for 99xxX3d there will be 2 dies and 2 Vcaches. Can they be one large independent cache? so that there will be 200mb cache because of it? AMD said in the past, that it is not possible, because they are to far from another cores. But it is not true, VRAM in videocards are much more away from core then any vcache from any dies in cpu. So technically amd can make 400mb cache without any work?

@HighYield Ай бұрын

So first question: yes, in theory AMD could pack more cache into the new X3D chiplet. But as of now they are not doing that. Second question: there are rumors that the 9950X3D will still only have one CPU chiplet with extra cache. And even if both would have cache below, right now you couldn't connect them, because they are not designed that way. But in theory it could be possible.

@FreeOfFantasy Ай бұрын

VRAM is DRAM. There is a giant gap in performance, bandwidth and reaction speed, between them. A few numbers from a 5700X: Read/Latency: L1 2206GB/s, 0.9ns L2 1112GB/s, 2.6ns, L3 605 GB/s, 11.9 ns RAM(DDR4-3600 dual channel): 52GB/s, 62.2ns. VRAM is usually build with an wider interface but latency is a lot longer, usually 200ns and more. GPU are optimized to work with large amounts of streamed data and perform a lot of calculations on them, but their control flow is quite slow.

@kwerboom Ай бұрын

Great explanation video. This is all well above my head, but I like watching videos like this to try and expand my understanding of technology. If I had to guess about the future of Zen products, I'd say that Zen6 would be more of the same 2nd generation 3D packaging since AMD kept 1st generation 3D packaging for Zen3 and Zen4. If anything, I would think that this is setting AMD up for "Zen7" (or whatever AMD calls it or whatever underlying design it uses if it isn't Zen based) on the next socket after AM5. AMD is going to have to change something considering how much has to be fit on a CPU package and how constrained AM5 feels size wise. AMD is have to do something like make the socket size larger, do more creative stacking to fit more specialized chiplets in, and/or both. For me, I'm also excited about what this means for RDNA5/UDNA1 since AMD usually moves its technological successes from product stack to product stack.

@Ph42oN Ай бұрын

Well that could be something they will use in next gen ryzen to improve CCD to IO connection. Maybe it will be on consumer GPUs some day as well.

@adr2t Ай бұрын

So it short this means that AMD could drop the IF and allow chip to chip commucation in the future. I didnt watch the full video, but this would mean we could see a single chip like package vs the 2-3 chips we see today. Along with allowing them to make some of the chip even smaller going forward. Instead of large L3 and L2 (from base) we could just move those memory layers into their own layer leaving a ton more space for just the compute. OR more than like Chipet -> Memory -> Chiplet -> Memory etc. This way all the compute and memory would be accessable and addressable across all compute and IO. That way you can have one core completing one task, while another core could then access that same memory (without having to task schedule the working thread) back onto that same memory even though its on a totally different CCD. To be fair though, this wouldnt really improve performance for say, maybe power for sure (as it could lower idle power draw), but you would still be limited in compute for say. More than likely this would allow a smaller foot print package chip follow by a less cost on the compute chiplets. Everything else would be the same or more as the packaging method would have some increase cost. So skus that have two chiplets wouldnt have to fight for the "gaming cores" or if one has 3d cache or not as they both would have access to that extra memory. If its all SRAM - I wonder if that layer would just be mark out for the different memory locations as well (L1-3) and mark off as such too.

@AmitojP 12 күн бұрын

Time to do a m4 deep dive like m3!! Why is it so much better with same 3nm

@phillycheesetake Ай бұрын

24:24 Regarding the "large base chiplet" that's the first thing I thought of when I heard about the cache-flip. It might cut into their budget to manufacture the IO in the same node as the cache, but AMD's memory performance is already lagging even as their logic out-performs, so whatever they do, they can't just keep producing that IO die forever. If I had to guess, we might see the dawn of truly monstrous cache chips which vastly reduce memory calls outside the CPU package.

@Jona-gs7ye Ай бұрын

Can we expect a chip analysis of the Apple m4 SoC family?

@shocka007 Ай бұрын

New materials science could be the next step, Graphene / Diamond semiconductors could allow for Huge clock increases / structural strength and great thermal dissipation characteristics... it probably already exists in "The Military Black Budgets" ?