It Takes Two To Tango: A New Era of Performance

Рет қаралды 21,141

TechTechPotato

Күн бұрын

Пікірлер: 81

@SwordQuake2 9 күн бұрын

FP co-processors are back on the menu, boys

@IkethRacing 8 күн бұрын

i definitely read that in orc

@jefreyjefrey6349 8 күн бұрын

nope. they will just outsource computing stuff into cloud and pretend it all is new AI thing..

@jaymacpherson8167 3 күн бұрын

What? I really prefer hobbit. 😋

@artemis1825 9 күн бұрын

The CEO seems like a lovely guy who knows his stuff. We need more Engineers as CEOs

@3amael 9 күн бұрын

Amen! Engineers make the best CEO's because they focus on the thing they love the most... technology!

@tringuyen7519 9 күн бұрын

@3amaelEngineering CEOs can’t go off on passion projects & disregard the business. In the end, the company must make a profit.

@artemis1825 9 күн бұрын

@tringuyen7519 nobody's gonna buy your product if its just a draconian cash grab 😉

@Brl007 9 күн бұрын

Yeah, Pat likes it.

@esra_erimez 9 күн бұрын

Tell that to Intel, they just got rid of an engineer as CEO... Pat Gelsigner

@capability-snob 9 күн бұрын

The idea that I could run dataflow directly is exciting enough that I'd happily drop my existing languages tbh

@SaccoBelmonte 9 күн бұрын

I would love to see a revolution regarding optimizing software.

@SwordQuake2 9 күн бұрын

Not while companies push you to ship "features" instead of quality code.

@prashanthb6521 8 күн бұрын

Try convincing the software guys.

@notthedroidsyourelookingfo4026 7 күн бұрын

You're painting with an enormously broad brush here. First slide: HPC software. You think those are unoptimised?

@SaccoBelmonte 7 күн бұрын

@@notthedroidsyourelookingfo4026 I do, yes. I would like to see all current main players freaking optimize their software.

@jannegrey 9 күн бұрын

I'm (slowly) chugging through "Computer Architecture: A Quantitative Approach 6th Edition" thanks to George and I do have to say that I start to appreciate complexity of those gigantic "Super Computers". There is a lot that goes into such a system and so many bottlenecks. So, while I wait for 7th Edition next year, I am happy that you make such Interviews. Especially, because everyone talks about AI and that sort of makes less space for learning about scientific computing.

@3amael 9 күн бұрын

I've been saying this for yonks!!! Drop anything below 64 bits and focus all the silicon on that! Glad others are thinking the same!

@leocomerford 9 күн бұрын

Even back in the day many users of old 36-bit scientific computers weren’t at all happy about having to step down to 32-bit (or up all the way to 64-bit) FP, IIUC.

@TheIntelligentVehicle 9 күн бұрын

Love this guy's enthusiasm! Thx for finding these great interviews.

@TheDoomerBlox 8 күн бұрын

A processor like this sounds like a very fun thing to play around with, and seeing how it responds to different ways of performing the same data manipulations in "performance over time" or however you'd want to quantify that. Watching how quickly it responds, what the performance ceiling is, how quickly it reaches it, all those silly details. Yummy chip, bigbyte/10

@dogonthego 9 күн бұрын

6:43 this may be a dumb question, but what is Wolf?

@brandondraeger7371 9 күн бұрын

Not at all - its just one of the MANY acronyms we love in the HPC community. :) In this case its WRF or Weather Research and Forecasting model. As the name suggests its HPC code that focuses on complex multi-physics weather simulation used by your favorite national weather service, research institution, and private companies that are sensitive to weather forecasting (e.g. impacts on supply chains).

@TechTechPotato 9 күн бұрын

WRF (pronounced Warf) is weather simulation

@interrobangings 9 күн бұрын

IT TAKES TWO TO MAKE A THING GO RIGHT IT TAKES TWO TO MAKE IT OUTTA SIGHT

@creed5248 9 күн бұрын

Wow that's an old song ... LoL !!!

@lesserlogic9977 9 күн бұрын

You can tell he loves what he does, I bet that'll transfer into the product

@jaymacpherson8167 3 күн бұрын

I no longer code. I no longer run scientific models. YET THIS IS EXCITING!!! I may need to dig out my 287 as a reminder of this new era!

@or1on89 9 күн бұрын

It’s all exciting…but they need to get Universities onboard. A new computing paradigm needs everyone to apply the right math to their code in order to take advantage of the hardware. Today in software development a lot of this is taken for granted and it is the primary source of unoptimised code.

@ishkool8664 9 күн бұрын

they are not using a new architecture, they are implementing an already discovered architecture by academia. Programming for custom arch is what most HPC/kernel engineers do.

@blue-pi2kt 9 күн бұрын

If their package optimises code for the chip, Universities will get onboard almost immediately. The amount of compute time that is wasted in Universities running horrendously optimised code that is hacked together by non-programming specialists is probably somewhere between 60 to 90% of a HPC's uptime. Whilst GPT and other LLMs has improved this issue in recent years for CUDA, it won't solve this issue for bespoke hardware or very niche languages like WRF, what you really need is this sort of co-processor technology with a software implementation that solves for this problem.

@ElijahSamsonWiltonChen 9 күн бұрын

great interview

@phantomtec 9 күн бұрын

0:15 Why don’t high-performance computing systems ever make good DJs? 💕💕 Because no matter how many cores they spin, they can’t drop the cache! ☮☮

@likbezpapuasov4888 8 күн бұрын

My computer crashed attempting to compile this interview into meaningful code

@NathanielStickley 9 күн бұрын

Too vague to be exciting....seems like fancy profile-guided optimization in hardware. I'm interested in seeing more details in the future, though.

@kelownatechkid 9 күн бұрын

What's old is new again. I love it haha

@joseperez-ig5yu 7 күн бұрын

Time to geek out guys!😅😊

@gadlicht4627 9 күн бұрын

Their field of scientific computing is not talked about as much LLMs or image generator nowadays where you combine scientific models based on actual physics or non-ML models not based on data with ML models, this includes PINNS (physics informed neural networks). Some of these models work by making sure output follows known laws of physics, and corrects them if don't and adds that to to loss term in training. Other use ML to speed up parts, take initial guesses, etc. That's just some examples. Traditionally, non-ML scientific models could benefit most from methods like this, so perhaps hybrid models could really benefit from this architecture compared to ML alone and designing not with plain ML in mind, but these hybrid models

@EriIaz 9 күн бұрын

I am sure FP64 will be relevant to LLMs eventually. There is a trend - the more training a model goes through, the higher is the minimal precision where inference has acceptable quality. For Llama-1, the sweet spot was 4BPW. You could take any L1 model, say 33B, run it at 4 bits, and know that it works better than both 65B 2BPW and 14B 8BPW. Fast forward to Llama-3, and it's 5BPW already, even 8BPW when it comes to the multilingual use cases. The difference between these models, for the most part, is training. L3 got a lot more than L1. Chances are, an undertrained LLM "expects" noisy weights and doesn't mind the precision loss as much, but when you quadruple the training, activations must be much clearer. And sure, it will take A LOT of time to go from 4 or 8 bits all the way back to FP64, but I don't think that's impossible, and if we have a breakthrough in this area, it might accelerate the process. In that case, these guys will have it, while the rest of the competition is what? bf16?

@TeeTeeNet 3 күн бұрын

I’ll believe him when I get to see some examples with data.

@tappy8741 9 күн бұрын

The tech power up listing for battlemage B580 says it has 1.7 TFlops FP64 with a 1:8 ratio. A few days ago FP64 was missing entirely from the listing. Can I get confirmation that B580 does have 1.7 TFlops of FP64? Because that would make it the king of FP64 per dollar, especially for a consumer card. Thanks.

@technicallyme 9 күн бұрын

I have a Trs 😅

@kayaogz 9 күн бұрын

Crysis runs?

@AK-vx4dy 9 күн бұрын

After optimisation phase

@predabot__6778 7 күн бұрын

LOL...! Well, it actually seems as if the idea is... yes, actually. xD (After help from the company to optimize the code.)

@davidgunther8428 9 күн бұрын

Sounds like they have a clear longterm direction. It would be fantastic if the optimization cycle they have gets applied to software more broadly.

@luciususer8870 9 күн бұрын

Is this chip a CGRA? Cause from the discussion on dataflow, and a reference to fpgas, this thing sounded very cgra-shaped.

@toreheidelarsen7155 8 күн бұрын

What's the title of John L. Gustafson's new book?

@baxtermullins1842 8 күн бұрын

How about the computer wind tunnel?

@benmcreynolds8581 9 күн бұрын

I am curious if one day we might see some kind of Tech that specializes in improving the ways a Graphics Card will render out each frame, or improving How the CPU gets utilized. Currently we are seeing a ton of improvements in our hardware BUT it seems we are falling behind when it comes to software and our abilities to utilize our hardware to it's fullest potential.. Such as inefficiencies in the realm of rendering prioritization, processing efficiency, finding and fixing code in order to optimize video games, or physics simulation models

@ishkool8664 9 күн бұрын

these things are tough because for any new arch the developments have to happen on both ends sw as well as hw stacks it's kind of a co-designing process so a lot of man power is required to even be at par with SOTA leave alone beating it.

@fletcherluders415 9 күн бұрын

Speaking of scientific compute hardware, would be interested to get your perspective on the upcoming Dynex Apollo Silicon Based Quantum Code Compatible Chip. This is reportenly going to be packed into a form factor you can plug into your typical PC motherboard.

@predabot__6778 7 күн бұрын

So... something I'm not understanding here is... what kind of a chip is this? The Architecture-description was very diffuse and opaque, I felt? But, my impression, is that this is something similar to those Almost-FPGA things which Ian has mentioned in the past? The ones that have higher reconfigurability than GPU's, but have more optimized blocks than something like a traditional FPGA? (I mean when compared to a LUT, or some other reconfigurable block, I'm aware that most high-performance FPGA's are eFPGA's nowadays, with several hard ip's integrated) EDIT: Ok, he actually mentions this towards the very end of his architecture-description, most of the talk he didn't really give a definition. What was the names of those types of architectures again? I recall there actually being suggested standardized nomenclature.

@TechTechPotato 4 күн бұрын

They're going to do a full architecture deep dive next year. They're purposefully being opaque for now because of time lines. Well get performance numbers in Q1 to begin

@ullibowyer 8 күн бұрын

Please tell me where I can get paid a million dollars a year to write GPU code? 8:08

@infango 9 күн бұрын

It this similar to Intel CPU that has been canceled ? it supposed to go from 1 big core to may small cores ..

@predabot__6778 7 күн бұрын

Not really, this is something similar to an FPGA, but with accelerator-cores similar to a GPU - I think! The CEO wasn't very descriptive on that, but he says towards the end that their hardware is somewhere in-between an FPGA and a GPU. Intel had (has?) some chips that came at least to the prototyping-stage which combined Intel Core and Altera FPGA -tiles on the same package though, which is at least remotely similar. (it's quite different though)

@awebuser5914 2 күн бұрын

Off-topic question here: There has been a ton of chatter about how Intel's new B580 GPU is horrifically "expensive" to produce compared to others. _Apparently_ , the B580 uses the TSMC N5 node at a size of 272 mm² with 19.6B transistors, and as an example, an nVidia 4070Ti is on N4 node at a size of 294 mm² with 35.8B transistors. The raging "controversy" is that armchair pundits say that the B580 die costs just as much as the 4070Ti, which is a card that sells for about triple the MSRP! TL/DR: can you compare simple die-area equivalencies to predict wafer costs in any real way?

@ron5948 4 күн бұрын

Implement precision domain specific not in hardware? Ye ron obviously! 1:49

@dieselphiend 9 күн бұрын

That's the biggest problem with digital- everything is stepped.. rounded off. It can never be perfect, unlike analog. The future of computing.. somehow, someway, depends on analog.

@geoffstrickler 5 күн бұрын

Analog doesn’t give the precision that scientific computing is dependent upon. Analog has its place, but this isn’t it.

@dieselphiend 5 күн бұрын

@@geoffstrickler I didn't say to abandon digital- that's just not possible. Go look at a digital representation of a wave, and compare it to an analog wave.. Light is analog. Silicon photonics is analog, and digital. Reality is analog. All I'm saying, is that these two things need to be more integrated.

@lowspeed2000 9 күн бұрын

Was that intetview in Israel?

@predabot__6778 7 күн бұрын

Huh, I see here the company is indeed Israeli - so that makes sense. I think this was shot during SC24 in Atlanta USA though, back in November. It's the only big show I can find that makes sense.

@lowspeed2000 6 күн бұрын

@@predabot__6778 👍

@vasudevmenon2496 9 күн бұрын

If Mavericks 2 can be added to pcie x16 then 5800x3d users will be happy. Berkeley, Colorado, MIT are the universities to be targeted and you will see great market share if most code can be accelerated in Co_fp64 processor just like cuda. Is Mavericks compatible with AMD hip generated code? It's not Mavericks but FlexArch 😊