New Silicon for Supercomputers: A Guide for Software Engineers

Рет қаралды 85,216

TechTechPotato

Күн бұрын

Пікірлер: 118

@emilinhocorneta Жыл бұрын

So much knowledge shared in a single video, wow. I wonder why this channel doesn't have 1M followers!

@ChristianBrugger Жыл бұрын

That is a very good talk, giving a very good overview of what you have covered in the last years. Thank you for making it and filming it. Much appreciated.

@peterfireflylund Жыл бұрын

Whoever was in charge of audio did a very good job!

@mycosys Жыл бұрын

That was a really great update on HPC for us plebs, feels like an hour well spent. That datascale switched compute fabric model certainly looks fascinating.

@kehoste Жыл бұрын

I swear I saw some of those portraits move during the Q&A bit, very strong Harry Potter vibes in that room... Great talk, very dense on info!

@NPzed Жыл бұрын

Thanks for sharing this on YT!

@ethergreen9348 Жыл бұрын

Love the Scope of this episode!

@theworddoner Жыл бұрын

We really need some opensource cuda alternative. I'm rooting for either ROCM or OneAPI to take shape. If they become successful, we can see some inference GPU cards from 3rd party vendors that have 100gb of VRAM that can run a lot of AI algorithms. They'd be terrible for training since they likely won't be as powerful as Nvidia and AMD but can work relatively well for Inference. Running a trained algorithm is a lot less performance intensive than training the algorithm. We need more competition in this market.

@TheDoubleBee Жыл бұрын

I agree, desperately, but so far there, unfortunately, hasn't been much movement in that direction. I heard about and looked into SYCL some years ago and got really excited because it embeds into your code just like CUDA - in fact, it's even better, more modern C++ approach, but all implementations are either completely abandoned or moving at a snail's pace. The likes of OpenCL and Vulkan compute are somewhat of an alternative, but those are way too low level to become more popular - at least in terms of integrating them in C++ code, maybe using them as a backend for Python bindings could be more of a possibility. I should mention that I work as a C++ programmer in the field of photogrammetry and we, unfortunately, have to rely heavy on CUDA.

@m_sedziwoj Жыл бұрын

I would say is even something which should get suport from others, because CUDA dependence is bad for whole industry, and personally I think is better to Intel and AMD works for one solution (open source) than each fight for own when they have winner in room.

@michaeldeakin9492 Жыл бұрын

@@TheDoubleBee have you looked at LLVM's PTX compiler? When I last looked, Kokkos (which is not dead) supported compiling to PTX through it. I suspect it's not as performant as compiling with nvcc though

@TheDoubleBee Жыл бұрын

@@michaeldeakin9492 I haven't heard about Kokkos before, but I had a look and already showed it to my coworker today, and we agreed it seems incredibly interesting for us. Thanks for the heads up! The thing is we're producing software to run on clients' machines, so expanded compatibility of the software would be preferable to a bit lower performance compared to native CUDA.

@herringreen7938 Жыл бұрын

Hoping for RISC-V to someday extend into the realm of GPU cores. Perhaps the architecture isn't suitable as a GPU core but Intel did something similar (if memory serves me right): a PCIe card packed with many low power/performance x86 cores.

@WingofTech Жыл бұрын

Thank you for your research! 🔬

@123gostly Жыл бұрын

Thank yo for sharing this. Really interesting stuff!

@mansemi5480 Жыл бұрын

I kinda get what this video wants to say. I've been working in a semiconductor manufacturing company and now working in an AI company. I can see confusions ahead. Everyone knows there should be 'new computers' so they're making new ideas every year, from gradual ones(some tiled CPU-GPU chips with new protocols) to radical ones(like quantum and analog neuromorphic) but no one really seem to know how we(semi + programmers) can get there TOGETHER. Chickens and eggs problems everywhere...even switching from NVIDIA to Gaudi is challenging..

@CTBell-uy7ri Жыл бұрын

Good talk!

@dreamphoenix Жыл бұрын

Thank you.

@AndersHass Жыл бұрын

52:00 I would argue Intel’s efficiency cores aren’t exactly what they are on phones. They have mainly added them for multicore performance and not for power efficiency usage, where the point on phones is they can actually be really power efficient for minor tasks so the battery life is improved and then it can give a tiny bit more multicore performance.

@swenic Жыл бұрын

This was great!

@Im1CrazyCow Жыл бұрын

DR. IAN Doing his Cool Thing !!!! I want my Samples back...omg Priceless !!!

@mycosys Жыл бұрын

I am REALLY surprised TI hasnt made itself a player in analog computing AI. I dont think anyone would object to calling TI the masters of analog ICs and DACs, theyre in edge TPU, it kind of amazes me the havent put their feet into analog machine learning.

@LawrenceKincheloe Жыл бұрын

TI has a leadership problem, I wouldn't be surprised if they get sold for parts in the near future.

@andrewferguson6901 Жыл бұрын

TI if you wanna hire me to do analog neural network work. I don't know how to do it but neither do you so.. let's give it a shot

@mkvalor Жыл бұрын

I don't know what kind of fancy wizardry Qualcomm is doing at 35:49 . But if they can fit 3452 into an 8-bit integer, I'd like to know the radix for that counting system! (and the software libraries that can take advantage of it) Garden-variety binary 8-bit integers max out at 255. Edit: Hmmm, looks like machine learning quantization involves factors and biases (external constants) which can make this possible.

@TechTechPotato Жыл бұрын

Yeah, I got asked that in the Q&A too. :) Biases are cool tho

@kamikakushi_ Жыл бұрын

Hi, I came across this channel after looking around into PC hardware technicalities (trying to understand more). I just realized I have no idea what you were talking about in the comment :( What kind of topic I should find and learn first? At least so that it can be some sort of the first step into this unknown world (for me at least) and I can explore the rest myself. I'm interested in trying to understand how the theory or calculation made in the cpu so that for example a program can be run. I'm Android programmer but never touch embedded nor advanced graphics using Vulcan just yet. Erm, I might use some term that might not make sense, sorry. If there's a book reference to start with, it would be really helpful.

@xantochroi Жыл бұрын

thanks doktor

@alexandertrimm5246 Жыл бұрын

Ian, at 1:04:15 you said "training is where the big money is". Did you mean that in terms of investment or hardware sales or both? I've seen metrics saying the amount of compute used for inference is >3x that of training, and I expect that ratio will increase even further once there are more mature, useful models. Obviously there is a lot of competition to train bigger and better models right now, but do you see training continuing to have such significance in a few years?

@bayanzabihiyan7465 Жыл бұрын

Training is "once" and inference is paid every time. Training is a LOT harder, but due to how much you're actually using your models, inference ends up using more cumulative compute.

@philsburydoboy Жыл бұрын

Training is worth more money because higher quality hardware is used in training. All training nodes have to be orchestrated to work together. All inference nodes can be completely independent. Scaling a tightly orchestrated process is much harder, so developers prefer expensive hardware for training.

@freakinccdevilleiv380 Жыл бұрын

Excellent 🤯

@dansadventures5514 Жыл бұрын

Although the cost of the CPU can be inconsequential, note that software licensing can be billed per core. This can make many-core CPUs extremely expensive for some workloads that aren't CPU intensive and care more about memory capacity.

@llothar68 Жыл бұрын

You could also bill per Megahertz clock frequency. If you want money there is always a way to get it from customers. Unless they are not willing to pay for it. Just say no. And in my opinion AI is only morally acceptable if it runs on your own hardware. So we need more silicon to store terabyte LLM.

@dansadventures5514 Жыл бұрын

@@llothar68 That's not practical in today's world with variable frequencies. I'm talking about real-world billing practices that are common today

@llothar68 Жыл бұрын

@@dansadventures5514 You wouldn't need to measure it. Just say that you oversell your 3Ghz VCore to three people. At the moment in the web world you get billed and not even know the oversell factor. Ok, AI is different, i know. But who knows when the creative marketing also starts to slice down costs to cents to hide the true cost. I'm surprised that we still have to bill per hour and not per time slice 🙂

@ewenchan1239 Жыл бұрын

@ServeTheHome So cool that Dr. Cutress cited your SmartNIC continuum.

@AlexSchendel Жыл бұрын

1:14:18 Gracemont is actually only 4 generations from Silvermont, but as you noted, Gracemont was such a huge redesign from Tremont that it might as well have been 6 generations haha.

@konradcomrade4845 Жыл бұрын

I am voting for Gustafson's Posits or Unums number systems. Beating Floats at their own game!

@rabbitman7 Жыл бұрын

Thanks for that! And it's happening, mainly via RISC-V.

@TechTechPotato Жыл бұрын

John, good to see you here! We should connect, would love to get more insight. ian@techtechpotato.com

@hubstrangers3450 Жыл бұрын

Thanks....

@misiekt.1859 Жыл бұрын

Closest to CUDA replacement is ROCm. I don't have new stuff but it worked for me 1:1 on RDNA2 with PyTorch FP16/FP32/BF16 just fine, at least 6 months already.

@wile123456 Жыл бұрын

Looking sharp with the fresh haircut

@aDifferentJT Жыл бұрын

I wonder what you think of Modular and Mojo

@potat0-c7q Жыл бұрын

its insane that we can even make that without a single transistor being messed up

@sunroad7228 Жыл бұрын

"In any system of energy, Control is what consumes energy the most. Time taken in stocking energy to build an energy system, adding to it the time taken in building the system will always be longer than the entire useful lifetime of the system. No energy store holds enough energy to extract an amount of energy equal to the total energy it stores. No system of energy can deliver sum useful energy in excess of the total energy put into constructing it. This universal truth applies to all systems. Energy, like time, flows from past to future".

@m_sedziwoj Жыл бұрын

Why you not mention any in-memory compute? Do it die, or simple is not trendy today?

@TechTechPotato Жыл бұрын

Still a way to go - there will be a couple of discussions at Hot Chips on it

@Veptis Жыл бұрын

I am considering to add a "ai" accelerator card i to my next PC build. And it will use a gpu for gaming next to it. It's interesting to see this kind of market grow up and I kind of want to get into it.

@manueladolfoholzmannillane3050 Жыл бұрын

@techtechpotato hello how are you? How can i talk with you. I want to study a career to develope hardware. Could you guide me?

@gregandark8571 Жыл бұрын

Why there's 0 news about hardware based on magnonic logic (spintronic) & photonics?

@TechTechPotato Жыл бұрын

the funding for spintronic logic rounds to zero. Photonics was mentioned - Lightmatter is the big player.

@gregandark8571 Жыл бұрын

@@TechTechPotato "the funding for spintronic logic rounds to zero." It's depressing,because from what i have learn from the papers,a spintronic based mosfet can run without any kind of issues at it's ordinary speed of 12Ghz. Also - Isn't intel's MESO (today called Tunnel Falls chip) a spintronic based device??

@gregandark8571 Жыл бұрын

spin wave logic.

@wskinnyodden Жыл бұрын

Lastest AMD chip is the MI300, which is almost EVERYTHING I have been asking of a CPU chip since HBM came out, so since VEGA basically...

@cbrunhaver Жыл бұрын

FURY

@m_sedziwoj Жыл бұрын

6:32 maybe is substrate and not EMIB, but is more advance because is not MCM but chiplets. 14:30 yeah, I see it, custom design for client by Nvidia... do you feel well?

@Philippe275 Жыл бұрын

servers with only e cores are going to be great to make multiplayer game servers, so many threads that don't need to be ultra powerful

@jandrews377 Жыл бұрын

Funny that there is such a high demand amongst the hyperscalers for high-core-count CPU's. From my anecdotal experience migrating from Xeon to Graviton3, there appears to be a big perf difference moving from vCPU units underpinned by hyperthreading to real physical cores. Maybe the marketing BS has caught up with Intel?

@Philippe275 Жыл бұрын

@@jandrews377 depends on the workload. if your workload isn't really that demanding per core but scales well, its better to run on more weaker cores. if its the opposite then fewer stronger cores is better. obviously more stronger cores is better but it should cost less to have 64 intel e cores than 64 intel p cores...

@Lion_McLionhead Жыл бұрын

Surprised that much of AMD & Intel sales are supercomputer CPUs. Most of these new technologies are going to fizzle, but no-one knows which ones.

@jurepecar9092 Жыл бұрын

On topic of number formats, what happened to Gustafson's unums and posits? 10 years ago they looked like an enlightenment, nowdays they're nowhere to be seen. Why?

@TechTechPotato Жыл бұрын

That's one of the questions at the end! Gustafson is a member of the board of Vividsparks, which looks like it wants to create Posit-based silicon.

@alb.1911 Жыл бұрын

Why AMD is not in the hardware list?

@TechTechPotato Жыл бұрын

AMD is mentioned many times. But as an end user, good luck buying it, and if you round the number of hardware installs they have specifically for AI to the nearest whole number, it's either 1 or 0. Source: am an analyst for AMD

@alb.1911 Жыл бұрын

@@TechTechPotato thank you, and what do you think about the announcement that "AMD to Showcase Next-Generation Data Center and AI Technology at June 13 Livestream Event"? I imagine that we have to wait....

@TechTechPotato Жыл бұрын

We will have to wait :) I'll be there at the event

@zyxwvutsrqponmlkh Жыл бұрын

I need risk 6, because risk v is so 2023

@wile123456 Жыл бұрын

Well ARM is the most widespread Risk architecture. ARM stands for Acorn Risk Machine. Back when the company was still called acorn computers and they reverse engineered the risk instruction set

@mycosys Жыл бұрын

@@wile123456 No, RISC is a concept not an instruction set, it is a computing paradigm - there are many RISC instruction sets. The basic idea of RISC (Reduced Instruction Set Computing) is only having the basic instructions that you have hardware to directly support as a single action, rather than the traditional complex instructions (CISC) that encode a series of actions by the CPU, mostly to conserve expensive RAM. And the original Acorn Risc Machine was an expansion board for the BBC Micro, which was designed to allow students to learn to code on a reduced instruction set computing processor at low cost.

@SlyNine Жыл бұрын

I like how he mentioned gpt three times lol

@andreapassaglia213 Жыл бұрын

I didn't know you had your own nation

@vensroofcat6415 Жыл бұрын

Feels like I'm back at university again. The truth is Microsoft Powerpoint doesn't make you great at presentations. That' just their marketing.

@DerekWoolverton Жыл бұрын

Those chip in Lucite giveaways are not new, I have a 386 in plastic somewhere from Comdex a million years ago.

@TechTechPotato Жыл бұрын

Yeah it used to be a thing for a select few, but for most of the 2010s they stopped. It's back again

@radicalrodriguez5912 Жыл бұрын

This guy knows hardly a thing about Graphcore. Their SDK is actually pretty good at this point

@TechTechPotato Жыл бұрын

I remember seeing Poplar back at SC18/19. Happy to go get an update

@platin2148 Жыл бұрын

So more matrix computation sadly not accessible would be great to use for data manipulation. Most of that funky stuff is funky waste of Energy till it can formulate itself what it's doing. So someone else can optimize it. If we don't get to that state it's over. Which I very much don't think is an option as we did see how badly these brute force approaches work. I mean feed as much data in till it defines the rough shapes of something one would actually write is a very energy intensive approach. And can Qualcomm get there license crap fixed?

@daverobert7927 Жыл бұрын

When are we going to see 3 dimensional chips and 8+ states per bit.. Am I dreaming.

@m_sedziwoj Жыл бұрын

3D we have today, but if you are talking about more than few layers, you must wait for photonics, because low power mean you can stack them without problem with thermals. 8+ states per bit? Qbit is "as vector in 3d space" (only direction) so quantum have it more than cover, problem is that you can only read in one axis (and it would be eg in same direction as field or opposite). Photonics is crossing to quantum to, so it is possible to use it to it, but you can still use many parameters of light as many information, frequency, amplitude, polarisation, so you can process meny photons with different wavelength in one "transistor" to do same operation, etc... Personally I think photonics will be before quantum, and maybe even evolve in to it. Because low power is something we need today, and many solutions are at most ready to use, only to put more people and money.

@orthodoxNPC Жыл бұрын

If a chip doesn't have teeth marks on it... how good could it possibly be?

@drunkgamerdad1423 Жыл бұрын

$10B market? Isn't that what Nvidia made alone? *Steve from GN intensifies*

@TechTechPotato Жыл бұрын

The total startup funding is 10B or so. More like 12B now. Startup being the key word here. Nvidia isn't a startup.

@drunkgamerdad1423 Жыл бұрын

@@TechTechPotato just making a dull joke based on severely lack of both accurate perception and updated knowledge. 😅 TY for the quick response tho! :)

@AndreiNeacsu Жыл бұрын

Is Intel still selling glued-together CPUs? I might let it go, but only when my R9 5950X becomes so totally obsolete that nobody in my family can have any use of it any longer.

@m_sedziwoj Жыл бұрын

Glued together are AMD CPU, Intel are tape together ;) (tape = EMIB)

@lmmlStudios Жыл бұрын

49:10 so what was said to need to be redacted

@gravitas2974 Жыл бұрын

You didn't mention log float which Bill Daly talks about here: kzbin.info/www/bejne/anfCp5mGe7OYZpY and int3 and int4 which are used a lot for quantizing large language models for inference on consumer devices.

@nifty6486 Жыл бұрын

Am i mal-informed or is tesla's DOJO chip not quite ground breaking when it comes to AI compute and scalability, seems like a miss on a very promising tech

@alb.1911 Жыл бұрын

Mal informed, check his videos on the topic.

@Stopinvadingmyhardware Жыл бұрын

Are you declaring war on us?

@christopherjackson2157 Жыл бұрын

From some sort of palace it seems...... Lol

@TechTechPotato Жыл бұрын

I was at Imperial College London, one of the fancy rooms :)

@zakmorgan9320 Жыл бұрын

@@TechTechPotato if you ever give a talk at UCL please let me know so I can attend!

@Tgspartnership Жыл бұрын

A beautifully fancy room

@numlockkilla Жыл бұрын

Yes

@woolfel Жыл бұрын

Vecchio looks like it is a hungry hippo

@Tgspartnership Жыл бұрын

Kellehers are definitely over represented in your talk.

@grproteus Жыл бұрын

sometimes you need to pause and think: is Ethernet optimized enough for chip-to-chip transfer, if you are doing HPC? the answer is a huge no btw.

@absolute___zero Жыл бұрын

lets just make an open source FPGA and we won't need to buy these chips anymore. and also an open source fab would also be good to have, if there is wikipedia why there isn't any open source fab? there must be one!

@mycosys Жыл бұрын

'final Sparc' *smh* i hate you XD