Why Buying GPUs Is a Disaster

  Рет қаралды 192,650

ThePrimeTime

ThePrimeTime

Күн бұрын

Twitch / theprimeagen
Discord / discord
Become Backend Dev: boot.dev/prime
(plus i make courses for them)
This is also the best way to support me is to support yourself becoming a better backend engineer.
Great News? Want me to research and create video????: / theprimeagen
Kinesis Advantage 360: bit.ly/Prime-K...
Get production ready SQLite with Turso: turso.tech/dee...

Пікірлер: 968
@kaizer-777
@kaizer-777 7 күн бұрын
We're not 32X faster than a 980ti today, so expecting us to advance that fast 10 years from now is more than optimistic. The only way this would happen is if we had a radical breakthrough and shifted to an entirely different way of manufacturing chips.
@no_mnom
@no_mnom 7 күн бұрын
We can generate 32x the frames in 10 years :)
@hammerheadcorvette4
@hammerheadcorvette4 7 күн бұрын
Most of the articles and people kicking around those rumors are going off of the supposed help in architecture and Fabrication with AI. Having the AI build the most optimal path for the process and the taper is what is supposed to incrementally help . . . *Supposedly*
@TheKingOfSpain
@TheKingOfSpain 7 күн бұрын
For real, absolute moronic take.
@monad_tcp
@monad_tcp 7 күн бұрын
A 3060 would be perfectly fine for ALL inference in LLM if it only had 40GB of RAM. No need for stupid H100s. But nVidia themselves are scalping by not allowing your RAM controller to address more RAM, and I bet its locked, and the hardware can in fact address more RAM, just like it is locked for overclocking. (you think that wouldn't be the first thing I do when buying a graphics card, pick my hot solder station and replace the memory chips for bigger ones)
@Infiniti.151
@Infiniti.151 7 күн бұрын
32x fake frames maybe😂
@JonLikesStats
@JonLikesStats 7 күн бұрын
I bought a 4090 RTX FE about a year ago not realizing I just bought an appreciating asset.
@addraemyr
@addraemyr 6 күн бұрын
I know, I was so hoping I’d be able to score a used one for cheap to replace my 3080 since all the whales would go for a 5090 but even the whales are backing out of this gen more than I thought
@VioFax
@VioFax 6 күн бұрын
Hell i got a 3090 refurb for $600... same card is $1200-2000 now... I'm like HOW!? Ive NEVER seen computer components go UP in value. Even retro tech, takes like half a century to accumulate value back to its retail price again.
@MsLemons12
@MsLemons12 6 күн бұрын
@@VioFax just buy AMD. a 7900 xtx is equivalent to a 4080 / 3090ti, and rocm is more or less on par with cuda if you use the popular libraries.
@k98killer
@k98killer 6 күн бұрын
The same thing happened during the crypto mining boom in the mid 2010s. Same shit, different asshole.
@sbarter
@sbarter 6 күн бұрын
​@@VioFaxI sold a 970gtx for the same price I paid for it after owning 6 years
@RedSntDK
@RedSntDK 7 күн бұрын
crypto, covid and now llm.. us normal consumers just can't catch a break.
@__Brandon__
@__Brandon__ 7 күн бұрын
But they are producing so many chips. The second hand market goes crazy
@Blackmamba-ce3nb
@Blackmamba-ce3nb 7 күн бұрын
Don’t forget about tariffs!
@yeetdeets
@yeetdeets 6 күн бұрын
Investment and progress is going nuts though. The newer cards are literally hitting the limits of currently known physics.
@Akio-fy7ep
@Akio-fy7ep 6 күн бұрын
Don't worry, civilization will have collapsed by 2035. It might happen this year; if not, it won't be for lack of certain people trying their hardest.
@shining_cross
@shining_cross 6 күн бұрын
​@__Brandon__ most of the US chip factories are in Taiwan. So if China invades Taiwan, it will belong to China 🤣🤣
@ZeroUm_
@ZeroUm_ 7 күн бұрын
Whoever said 16-32x is out of their minds. We got only 6.31x since Titan X in 10 years(released in 2015), and the pace slowed way down.
@Swiftriverrunning
@Swiftriverrunning 7 күн бұрын
But the level of investment in manufacturing and development has increased by almost unimaginable levels, especially in the last two years. Ten years ago, NVIDIA's market cap was 11B and GPUs were a rounding error in global semiconductors. NVIDIA's market cap is now 3000B. The amount of money pouring into this space is wild. R&D at NVIDIA has increased over 10x in that period and that doesn't even take into account TSMC and every startup in the world working on AI hardware. It's getting harder to shrink transistors but the effort going into improving the process in increasing at ever faster rates. I don't know if we'll get 16x but progress is coming.
@Fiercesoulking
@Fiercesoulking 7 күн бұрын
The only way to do 16x is by linking 4 cards and going from the current N4 to a N1 process
@llothar68
@llothar68 7 күн бұрын
@@Swiftriverrunning Market cap has nothing to do with the money that a company has on their hands fior R&D. Often it does not even matter to the company at all. You don't understand what stocks are. How many stocks has NVIDIA sold in the last 5 years? And i don't mean employees from NVIDIA
@kjell744
@kjell744 7 күн бұрын
@@Swiftriverrunning More investment have a strong diminishing return on the speed of improvement.
@Veptis
@Veptis 7 күн бұрын
2x every two generations. Which isn't yearly. More like 2.3 years per gen.
@adissentingopinion848
@adissentingopinion848 7 күн бұрын
You HAVE to talk to asianometry. He's the man to talk to about chip developments.
@nickreffner4574
@nickreffner4574 7 күн бұрын
THIS! If you want to deep dive into the geopolitics and the future of CPU/GPU architecture, Asianometry is your guy.
@woofcaptain8212
@woofcaptain8212 7 күн бұрын
A Collab between these two would be wild
@petrkinkal1509
@petrkinkal1509 7 күн бұрын
This would be awesome.
@Neuroszima
@Neuroszima 7 күн бұрын
But did he do a face reveal?
@spicybaguette7706
@spicybaguette7706 7 күн бұрын
Or Ian Cutress
@Endelin
@Endelin 7 күн бұрын
Gamers Nexus put out a video about how they couldn't find a 5090 on day one. Truly a wild market.
@sokrates297
@sokrates297 6 күн бұрын
There's no market if the market never existed to being with
@myfakeaccount4523
@myfakeaccount4523 3 күн бұрын
Yes it's called a fake paper launch. Companies and the government do paper only stuff all the fkn time. It's easy to lie about it and probably launder/embezzle.
@nicechock
@nicechock 2 күн бұрын
I guarantee you there will be a new more vram gpu coming soon after even though the guy says we are at limit. that is what he has heard. its not necessary the case.
@Endelin
@Endelin 2 күн бұрын
@ IIRC he was talking about the GPU die itself being at the current limit. The VRAM is just attached to the board. I think the limit there is cost, but I'm not sure. I agree that VRAM should increase a lot per card.
@hamzagoesgym
@hamzagoesgym 7 күн бұрын
Pulled the trigger on a 7900 XTX, cause I can't keep waiting forever. Nvidia only leaving crumbs to consumers.
@danhenderson8253
@danhenderson8253 5 күн бұрын
They work great for inference.
@acidwizard6528
@acidwizard6528 5 күн бұрын
That was probably a smart thing to do. You can't count on getting a new Nvidia card unless you either wait and watch retailers like a hawk day after day, or do the unthinkable and pay a filthy scalper.
@mrquicky
@mrquicky 5 күн бұрын
I wonder how much incentive the spillover will be to AMD. I really want to see them change their strategy. All I've seen in the next two year roadmap is more of the same, no 32GB on the roadmap yet.
@Deleteyourself83
@Deleteyourself83 4 күн бұрын
Same, about a year ago. I'm not paying nvidia prices anymore
@siim1129
@siim1129 4 күн бұрын
i waited too long and now i dont have any 7900xtx or 7900 xts in my country.
@SakshamG7
@SakshamG7 7 күн бұрын
As a GPU owner, I approve this message
@EnterpriseKnight
@EnterpriseKnight 7 күн бұрын
which kidney did you sell?
@呀咧呀咧
@呀咧呀咧 7 күн бұрын
@@EnterpriseKnightboth ☠️
@SakshamG7
@SakshamG7 7 күн бұрын
@@EnterpriseKnight why not both?
@br4252
@br4252 7 күн бұрын
But u have one…
@HylianEvil
@HylianEvil 7 күн бұрын
Holding onto my 2070 till I see how this shakes out
@Cahnisama
@Cahnisama 7 күн бұрын
Bought a 7800xt last month, from a 1060. It is pretty good value/benefit imo
@faiz697
@faiz697 7 күн бұрын
What are you doing with that ?
@DaveSheeks
@DaveSheeks 7 күн бұрын
@@Cahnisama I went from a 1060 to a 3060.
@ralnivar
@ralnivar 7 күн бұрын
Was running 1080ti until winter 2023 :)
@steelpanther88
@steelpanther88 7 күн бұрын
This was me. I had 2070s and skipped entire 3000 gen. Got 4090 at discount last year at spring time. Cant believe cards are scalped again like prev gen cards
@nict2
@nict2 7 күн бұрын
I see Casey, I click. Always an awesome conversation. Thanks for this video, this was exactly what I have been thinking about right now.
@notionSlave
@notionSlave 7 күн бұрын
why doesnt he include his name in the description? anyways these two are worthless retards.
@wsippel
@wsippel 7 күн бұрын
If you mostly care about LLM inference, and especially if you're on Linux, AMD is perfectly fine. Ollama works, llama.cpp works, VLLM works. Performance is pretty good, and you get a lot of VRAM for cheap. Things only really get hairy (sometimes) when Pytorch enters the picture. Also, current AMD cards don't support FP8 and FP4, which is a bit of a problem for image generation, but doesn't really matter for LLMs. I believe the 9070 will introduce FP8 support at least, but only has 16GB VRAM. That said, the upcoming Ryzen AI Max 395 might be a very interesting option for LLM inference, with 128GB unified RAM and a much wider memory bus than previous APUs.
@alc5440
@alc5440 7 күн бұрын
I'm by no means a power user and I've only ever wanted to do inference but AMD ROCm has always worked fine for me.
@zenko4187
@zenko4187 7 күн бұрын
This is solid advice (for LLM) its a pain and a half to work with AMD GPUs for image gen (Flux, Stable Diffusion, etc)
@Arcidi225
@Arcidi225 7 күн бұрын
Didn't somebody run 408B llama on 8 mining AMD GPUs at 50 tokens/s? To be honest, the more I look into that stuff AMD cards make sense for inference, cheap and high vram.
@comradepeter87
@comradepeter87 7 күн бұрын
I have an AMD GPU and I'm on Linux. I tried to apt install AMD ROCm, it asked for 50GB worth of library downloads 💀. Tried to push ahead anyways, and ended being bottlenecked on space in my root partition :(
@leucome
@leucome 7 күн бұрын
​@@comradepeter87 Find where it put the files then make a symlink to an other drive. It might also be possible to only install Rocm runtime. It is a way smaller but some software may need the full dev if they to compile stuff. Anyway I use symlink a lot to keep my most used model/checkpoint on the NVME drive while offloading everything else on the Sata drive. And the temporary download files are also sent to a HDD with a symlink to avoid filling my home SSD with temporary trash.
@meppeorga
@meppeorga 7 күн бұрын
I see chat spamming "Naive" / "Just you wait" to Casey's comment how we are at a point where we can barely push these new GPUs further, how dumb can people be. People are hearing this from a veteran game developer who has some of the greatest insight into these things and don't believe him. We are living at a time where you can have like a 1080 ti a goddamned 8 year old GPU and it can still compete with the lower tier of current generation of graphics cards (and it was also not that expensive at release btw, at least before the crypto boom). There's a reason Nvidia is pushing AI and software hard, because they know current rate of hardware improvement is ass, Moore's Law died a while ago, it's not early 2000s anymore.
@gggggggggghhhhoost
@gggggggggghhhhoost 7 күн бұрын
Moore's law is "dead" because of nvidia and/or tsmc monopoly. You can only innovate so far with a single brain. The world needs more fab from other countries
@richardnpaul_mob
@richardnpaul_mob 7 күн бұрын
The reticle limit is a thing, but then so are chiplet designs, both from AMD and Nvidia, even though only AMD have sold such cards as gaming GPUs. Nvidia didn't go for N3 this gen and stuck with N5 derivative node N4. So Nvidia are holding back, they could have gone further and didn't. N2 is just about to release and there are pathways to 18, 16 and 14 Angstrom nodes so whilst some aspects of chips are not really scaling anymore there's more than enough room for logic to keep shrinking and so GPUs to get more powerful in the next 10 years
@adissentingopinion848
@adissentingopinion848 7 күн бұрын
​​@@gggggggggghhhhoost my dawg, finfet and gate all around are literally scraping the bottom of the barrel. Silicon wafers don't have the atomic radii for your precious electrons not to escape your increasingly delicate gates. We're relying on asml here, not even tsmc! You don't even know what euv is! There isn't any nanometers below 9nm, it's all marketing!!!
@DustinShort
@DustinShort 7 күн бұрын
@@gggggggggghhhhoost the monopoly for sure doesn't help the situation but Casey is right about physics. A silicon atom has a diameter of 0.2 nanometers and our best process nodes are right around 2nm. We only have 10 atoms to play with between traces and at that level. At that scale everything from simple optics (diffraction) to quantum mechanics like tunneling become limiting factors. At 4nm, a single atom out of place is within a 10% (+-5%) manufacturing tolerance while at 2nm you need a 20% margin of error. Until we have tech that individually place atoms, lithography process improvements are dramatically slowing down the closer we get. I also didn't even talk about die size growth and how that affects yields. AMD does chiplet design which helps mitigate yield defects, but they are currently not that competative at the top end and the stranglehold of CUDA adoption does hurt them as well.
@adammontgomery7980
@adammontgomery7980 7 күн бұрын
I don't know enough to claim that Moore's law is dead, but we are at some physical limits with chip production. Most people claiming /naive probably don't understand any of the manufacturing challenges. I mean, they already can't use optical lenses because the EUV light won't pass through glass.
@SirSomnolent
@SirSomnolent 7 күн бұрын
"I just want to pay a $300-$500 more and have 48GB vram" nope. impossible.
@rkan2
@rkan2 7 күн бұрын
I can understand the GPU processing gates limitation relative to price, but more memory lanes and chips should be cheaper in comparison...
@RobBCactive
@RobBCactive 7 күн бұрын
GDDR is 32bits wide, moving from 2GB to 3GB modules which ARE coming in the next year or 2. Nvidia may have problems because they moved to GDDR7 which only Samsung supply currently. The lower end cards are going to need more VRAM, while bandwidth is improved by large caches.
@RobBCactive
@RobBCactive 7 күн бұрын
​​@@rkan2they aren't, the 5090 is so big because all of the outside is driving i/o memory controllers at 512bits there's no room left. VRAM is organised differently from the DDR modules used with CPU to maximise bandwidth.
@rkan2
@rkan2 7 күн бұрын
@@RobBCactive I can understand the 5090 being a bin of the datacenter SKUs that have more RAM or less defects, but surely you could still at least double the amount of RAM by limiting the processing performance..
@xwizardx007
@xwizardx007 7 күн бұрын
​@@rkan2 have you seen the 5090 pcb? its fucking full no more space they can make it 48gb if they REALLY want by using 16 3gb gddr7 instead of 16 2gb chips but 5080 chip could have been larger with 2-3 more chips they just didnt want to give you 20-24 gb of vram for 1000$ this gen
@windwalkerrangerdm
@windwalkerrangerdm 7 күн бұрын
if they can't produce enough of those chips, all they have to do is to activate mfgx4 to interpolate between two existing chips and it'll all be fine...
@yuu-kun3461
@yuu-kun3461 7 күн бұрын
"Why Buying GPUs Is a Disaster". Sorry. Edit: The title used to be "Why Buying GPUs *Are* a Disaster".
@_plaha_
@_plaha_ 7 күн бұрын
Good job, A.
@Shywizz
@Shywizz 7 күн бұрын
Good job 47, Fall back to base.
@kirabee4134
@kirabee4134 7 күн бұрын
I tought for a sec that "buying gpus" was a new category, as opposed to "gaming gpus" 😂
@GerardoScript
@GerardoScript 7 күн бұрын
As a non-native English speaker, I could understand that he did it on purpose, why couldn't some others?
@theondono
@theondono 7 күн бұрын
Prime is biting the forbidden apple of rage bait
@jasontang6725
@jasontang6725 7 күн бұрын
I bought my RTX 4090's two years ago when they came out, and now they are somehow worth 30% - 50% more than I paid for them. Wild times. I feel your disillusionment, prime.
@MrTheFinn
@MrTheFinn 6 күн бұрын
's ??? damn.
@davibelo
@davibelo 7 күн бұрын
Prime, if you feel bad for asking for a 5090.. Ask for a H100 to use at home. Will be a 1st YT content about it being using at home 😂 and I want to see this content
@heyhoe168
@heyhoe168 6 күн бұрын
Well, in terms of die area 5090 vs H100 is 744 vs 814 square mm. If you ask me, I say 5090 is way too overbloated for home GPU.
@snikta564
@snikta564 2 күн бұрын
Or just go for a MI300X since it's for inference.
@hb-hr1nh
@hb-hr1nh 7 күн бұрын
I like the people saying the 4090 will be the 5th fastest videocard after all the 50 series are out and think its a own. It quite didn't work out that way.
@soulextracter
@soulextracter 7 күн бұрын
I'm so happy that I don't need a beefy GPU. Mine is like a decade old or something.
@HandsOC
@HandsOC 7 күн бұрын
The market needs a 48gb $5k Titan to relieve some of the datacenter market pressure off the 5090
@4m470
@4m470 7 күн бұрын
That doesn't benefit Nvidia. They can overprice both Datacenters and Prosumers with their current strategy.
@Fan_of_Ado
@Fan_of_Ado 7 күн бұрын
I just bought a RX 570 (2017). Maybe I'll get a 3090 in 10 years time...
@javierflores09
@javierflores09 7 күн бұрын
should just get a rx 6600, it is $190 at the cheapest right now (or could splurge 50 bucks more for the XT, but the Arc B570 is the better choice at that price range), ~50% more performance than the card you have right now at a relatively good price
@deefeeeeefeeeeeeeeee
@deefeeeeefeeeeeeeeee 7 күн бұрын
Just stay on AMD a 7700 XT is 400 USD and is way than enough for most users in 1080p and 1440p
@javierflores09
@javierflores09 7 күн бұрын
@ that is still a very decent card, can play GTA V with very high settings at 1080p with 60+ fps and Cyberpunk with low settings at 1080p with 50-60 fps (can squeeze some more with FSR). People tend to forget about the older cards since everyone wants to have the latest shiny thing, however these cards still got a lot of potential, especially if you don't plan on playing the latest, most demanding AAA games. Though I don't know how well that one is going to perform if you used it for mining haha
@Fan_of_Ado
@Fan_of_Ado 7 күн бұрын
@ I got the RX570 for $40 and the games I play aren't really that intensive
@thesenamesaretaken
@thesenamesaretaken 7 күн бұрын
​@@Definesleepaltyeah, you can use them to play videogames, who knew
@davidding8814
@davidding8814 7 күн бұрын
There's a one word explanation for this phenomenon: MONOPOLY. ASML is a monopoly, which has little incentive to boost production and reduce sales prices. The high price/scarcity in turn raises the barrier of entry for chip manufacturers, resulting in TSMC being almost a monopoly with just a little more competition and a little more incentive to reduce scarcity/prices. That in turn makes Nvidia just a little less of a monopoly for the same reasons. The AI companies and their investors have been hoping that the same concept will make them monopoly/oligopolies, which is why the Deepseek advancements tanked stock prices.
@SPeeSimon
@SPeeSimon 7 күн бұрын
ASML does have competitors. Except not for their High-end machines that are being used to create those chips. They paid a high price to create a machine to create chips using EUV, which nog gives them a competitive edge. Just like nVidia also has a competitor in AMD and Intel. Except also there AMD has given up on the high-end chip and Intel is just beginning (again). So for now, we must wait until the AI hype is over. Just like for the beginning of the 4090 release we had to do with GPU usage for blockchain.
@Kartman-w6q
@Kartman-w6q 7 күн бұрын
26:38 bear in mind that combining architectures (Ampere and Ada) might give unexpected edge cases. most often it will result in either disabled Ada features (best case) or, depending on what you're doing, simply refuse to combine the vram
@Grubse
@Grubse 7 күн бұрын
Hey @prime, also, I hear you saying you want to not only run the models (inference) but train models. Training models require more VRAM then running them for inference. If a 16B model takes 24GB, then for training you’d need about 100 GB VRAM. This is because in training you also need to store the gradients for back propagation.
@Grubse
@Grubse 7 күн бұрын
Mostly an FYI in case you didnt know
@mattbillenstein
@mattbillenstein 7 күн бұрын
Database Mart - you can rent a cloud machine there for like $450/mo with an A6000 - I think you can even get an A100 for under $1k/mo. Or LambdaLabs - rent by the hour.
@frankjohannessen6383
@frankjohannessen6383 5 күн бұрын
You know you can rent datacenter-GPUs, right? An A100 80GB costs around 1$ per hour. You could rent one for an hour every day for 5 years for less than the price of a 4090.
@Yagami2027
@Yagami2027 5 күн бұрын
It might really be the best decision to do, especially in these hard times. It also makes a lot of sense financially as long as you are not interested in owning the GPU. Are these renting services good though? Like I mean Customer support or warranty.
@vrilgod4176
@vrilgod4176 5 күн бұрын
You will lose a lot of time downloading models and even more if you decide to store data on the cloud. I've thought about it as well and did it for a while. But too much time was lost on these procedures that I rather buy hardware now
@mrquicky
@mrquicky 5 күн бұрын
@@vrilgod4176 The hardware appears to be an appreciating asset. A 3090 is $300 more than it was 6 months ago 🤣
@LuisRoel
@LuisRoel 4 күн бұрын
@@vrilgod4176 Bruh...I use runpod all the time. It took me all of 2 minutes to download an 800Gb model.
@doleo_metal
@doleo_metal 4 күн бұрын
And that's not private
@Dani_el_Duck
@Dani_el_Duck 7 күн бұрын
Just wait for the Chinese GPUs to prosper and the US will suddenly have a lot of chips
@Psikeadelic
@Psikeadelic 7 күн бұрын
so another 5 years + then
@nakashimakatsuyuki4077
@nakashimakatsuyuki4077 7 күн бұрын
​@@Psikeadelic More like 2 years, if not 1. I have solid sources.
@Kwazzaaap
@Kwazzaaap 7 күн бұрын
There are GPUs in China you can buy that are ~1080 performance, for more than a year now. They struggle with driver support and aren't really viable commercially, but supporting AI applications with software is a lot easier than supporting all of gaming. China's bottleneck remains whether TSMC is allowed to take orders from China on the latest node or not.
@pr0newbie
@pr0newbie 6 күн бұрын
​@@Kwazzaaapharbin uni had an euv lithography breakthrough so more like 3 years. We dont need chinese gaming GPUs, all we need are AI ones to slash nvidia's margins and make gaming attractive again.
@suminshizzles6951
@suminshizzles6951 6 күн бұрын
I agree but the tariffs will hurt. The 5090 shortage is a manufactured event. The chinese are at least 10 years behind. Withe economic espionage then could shorten that gap. And they are trying the espionage route.
@frentsamuel7533
@frentsamuel7533 7 күн бұрын
Gtx 980 (165W) was released in 2014, and it scores 11110 benchmark points on passmark, Gtx 5080 (360W) was released in 2025, and it scores 37287 benchmark points on passmark, Basically you have a x3,34 improvement on this specific software benchmark. I believe that's not even fair because Gtx 5080 consumes x2.19 more power than Gtx 980. It was a bigger jump between 8800GTX released in 2006 and gtx 1080 released in 2016 , basically x26,9 better the newest one. I tend to agree that, if nothing special will not be discovered, this way of building GPUs will not bring much improvement.
@Kwazzaaap
@Kwazzaaap 7 күн бұрын
Comparing models of that age difference has caveats as they will be 3x in some aspects and 10x in others but your point stands that it is nowhere near 16x or 32x. AI hypers and Nvidia fanboys will just keep lying for free until the end of time though.
@sh_chef92
@sh_chef92 6 күн бұрын
In raster gaming its more like 6x plus more performance. Not even talking about RT performance which can leverage something like optix for path traced 3d renders etc. with much better boost than 6x. And lastly the tensor cores AI performance that is another world and it's uncomparable. Also games can leverage tensor cores now so performance difference is multiplied by double digit numbers.
@heyhoe168
@heyhoe168 6 күн бұрын
@@sh_chef92 this is why I hate RT. This tech should have never be released, let alone aggressively sold for online rendering. RT is beautiful for Blender enjoyers, but games? I say nvidia halted gaming progress with this one.
@StefanStanic95
@StefanStanic95 4 күн бұрын
According to Passmark, RTX 5090 is 24.3% faster than a RTX 4070 Ti, and that's simply not true. I don't think the Passmark benchmark is valid test for newer cards
@bankmanager
@bankmanager 7 күн бұрын
I'm disappointed in the framing of AMD cards here. ROCm is AMD's CUDA equivalent, and it has come a long way in the last few years. I have an RX 7900 XTX and I can run local LLMs and even image generation and text to video models (like Hunyuan) without any issues. The stranglehold on the narrative about AI that Nvidia has is gross and incorrect.
@bankmanager
@bankmanager 7 күн бұрын
Especially disappointed that Casey, as the expert, made the same claims. It's just nonsense.
@bankmanager
@bankmanager 7 күн бұрын
It's frustrating to hear Casey talking about "some PyTorch CUDA thing" as if PyTorch code isn't possible to run on non-CUDA cards. My estimation of his expertise has decreased.
@DF-ss5ep
@DF-ss5ep Күн бұрын
I hope you're right.
@dycedargselderbrother5353
@dycedargselderbrother5353 5 күн бұрын
According to TSMC, deficiencies with the Arizona plant are due to red tape, such as permits, environmental studies, OSHA concerns, etc. They'd like to upgrade faster but they can't. Adding to this, someone died about a year ago due to equipment failure, which has slowed things down even more. There isn't much reason to be optimistic about the plant's future, in my opinion. It simply can't be competitive in the existing environment.
@hastyscorpion
@hastyscorpion 7 күн бұрын
Dude all the people in the chats going “ITS THE SCALPERS” are so clueless. If you magically snapped your fingers and made scalping impossible, that wouldn't make magically more cards be available. They just aren’t making enough cards. You wouldn’t get one either way.
@Psikeadelic
@Psikeadelic 7 күн бұрын
apparently
@GregoryShtevensh
@GregoryShtevensh 7 күн бұрын
Not to mention, scalping thrives on supply and demand. This wouldn't be an issue if you were always able to go to the store and get one at msrp. Scalpers will always exist, but they're only successful when supply falls short of demand.
@monkemode8128
@monkemode8128 6 күн бұрын
I'd much rather have a lottery system at MSRP than high prices. Although I don't think fighting supply/demand like that will work (at least not without high costs and being intrusive - like suing everyone, implementing hardware limitations, background checks, and monitoring individual customers - that's not gonna happen, and even if it did it might just add barriers and raise the prices more).
@slebetman
@slebetman 7 күн бұрын
They won’t manufacture the 3090 because of the Sinclair lesson: don’t compete with your own product or you will end up with massive inventory you cannot move
@sevilnatas
@sevilnatas 7 күн бұрын
First of all, if you are either a small pro shop or an individual developer or a super serious hobbyist, you should never be buying current gen hardware. Not only are you competing way above your weight class, but the premium they add onto current gen chips, at the quantities you would be buying at, is just ripping yourself off. I just built a machine that is a dual xeon, 256GB ram, 4x Nvidia V100 16GB, 4x Nvidia V100 32GB for a total of 192GB vram, for less than $6k. The number of Cuda cores and amount of vram that a V100 has, is super price effective, because it is 2 gen old. Also, don't waste your money on consumer cards. You are competing gamers and miners, for no good reason and for whatever reason, Nvidia refuses to bump up the vram on consumer cards and when it comes to AI, vram is king.
@markdatton1348
@markdatton1348 7 күн бұрын
Unfortunately, the most cost effective option for someone not running a continuous service is to rent space on the cloud...
@christianferrario
@christianferrario 7 күн бұрын
That’s not really ungortunate, it’s exactly why the cloud was born
@markdatton1348
@markdatton1348 7 күн бұрын
@@christianferrario It IS unfortunate if the whole goal was to run these models offline
@Sub0x-x40
@Sub0x-x40 7 күн бұрын
@@markdatton1348 well it is offline but just on someone else system lol
@christianferrario
@christianferrario 6 күн бұрын
@@markdatton1348 Yeah, but it depends why, if the goal is to run it offline to avoid using their chatbot to preserve data leaks, you can still do so by using your own cloud space, if it was to use it without a network connection, yes you have to pay for your own machine, unlucky.
@NegraLi34
@NegraLi34 7 күн бұрын
The root of the problem seems to be TSMC. It's not like we had this shortage problem happen right now, this shi has been going on for years and TSMC seems either unable or unwilling to scale up production. At these premium prices you would expect competition to prosper, but we are going in the opposite direction. I still don't understand how billions of USD can't reproduce whatever they are doing there.
@LubosMudrak
@LubosMudrak 7 күн бұрын
Because money is not enough and you need EXTREMELY competent people to do it right.
@paulchamberlain7942
@paulchamberlain7942 5 күн бұрын
It's not TSMC making 300% margins....
@johnstonefield1935
@johnstonefield1935 5 күн бұрын
Like they pointed out early in the discussion: each part is heavily monopolized and monopsonized. There's few/one equipment supplier, one company has all of the expertise and employees, and they only have a handful of (huge) customers. So to enter the market you can't organically start small at the margins, there is no small and the efficiency doesn't make sense, you would have to enter large in all of those areas or at least have large contracts for each step. They're also not that incompetent, they're just trying to make tomorrows products for a premium not today's products for cheap.
@willo1345
@willo1345 5 күн бұрын
TSMC is as much of a monopoly as Nvidia and they are both taking advantage. I do not subscribe to what the guy was saying about being at the limit when that limit is being set by the companies. Its an artificial limit. If these companies really cared about pushing the limit, they would find a way.
@lLvupKitchen
@lLvupKitchen 6 күн бұрын
Putting tariff on TSMC is absurd when the US don't have a competing product. Basically the quantity of the imported GPUs will remain the same since the big techs can't get enough of them, and price will rise because of the tariff, but TSMC is not paying for that, the US companies will.
@krilektahn8861
@krilektahn8861 7 күн бұрын
And that's why deepseek was such a shake up. Proved that good AI models don't need cuda. And if you don't need cuda, it's *much cheaper* to run AI.
@foley2k2
@foley2k2 4 күн бұрын
LM Studio supports AMD ROCm and Apple silicon as well as CUDA for acceleration. A single mac mini is around half the speed of a 4090.
@TitelSinistrel
@TitelSinistrel 6 күн бұрын
Until 2 years ago I worked in HPC and I can tell you that there are 2 classes of cards in the "enterprisy" category. There are the RTX A(Ampere/3000) series, which replaced the qudro cards built around being pun into workstations, have their own fans/coolers are more consumer friendly, and stuff like the A10/A40/A400 class passive cooled cards that go into server chassis. At the base, they are pretty much the same thing as the consumer cards of similar class but with double the VRAM, same or better TDPs, higher VRAM bandwidth. They perform almost identically to the consumer card. The A40 or RTX6000 is in margin of error to the 3090 for this use case with the difference that the 3090 uses a lot more power.
@peterprokop
@peterprokop 7 күн бұрын
deepseek-r1:70b runs fine on a 64 GB M3 MacBook, at around 30 characters/second output, using ollama. To run the full deepseek r1 model, you will need 800GB of memory, to train it 1.5TB. You can use a few big CPUs with 128-256 cores. It will be slow, but it will work. Otherwise you need like 10 GPUs with 80GB Memory, or 20 with 48GB to run your model. First one might draw up to 5kW of power, 2nd up to 10kW. Thats $1-2 per hour in power alone 24-50 per pday, 1500-3000 per month. Double that if you want to train your model.
@llothar68
@llothar68 7 күн бұрын
Well i can get a 1.5TB RAM server with 44 cores for around $3000 at the moment.
@mryellow6918
@mryellow6918 7 күн бұрын
​@@llothar68bro how?
@NetrunnerAT
@NetrunnerAT 5 күн бұрын
​@@llothar68 I Think ... Cheaper is possible. DDR4 32gb Module are easy to get and Server with RAM riser are available.
@wwllsswtrustong
@wwllsswtrustong 6 күн бұрын
The 3090s used Samsung’s custom 8nm (8N) process for its GA102 die, packing 28 billion transistors. While powerful, this node was less efficient than TSMC’s alternatives, leading to higher power draw and thermal output. What are you even talking about, bro?
@TV-xv1le
@TV-xv1le 5 күн бұрын
Both AMD and Nvidia are to blame. I can't even find a 70ti super/4080 in stock, and 7900xt/xtx are all out of stock also. Its almost like neither company wants to sell to consumers at all. I'm so sick of creating artifical shortages. There is no way I'm paying $1200+ for an aib model either. 1k is already a ripoff.
@DeltaV64
@DeltaV64 7 күн бұрын
To add to the AMD side, AMD on windows also works pretty good. Never really ran into torch directml issues, and ollama itself runs nicely. XTX is such an underrated card.
@sh_chef92
@sh_chef92 6 күн бұрын
Now considering enormous HW accelerated AI power, transformer model DLSS features, RT performance, CUDA and OptiX and many more that Nvidia cards have, 7900xtx compared to rtx 4080 is much worse value. Basically its outdated already.
@danielhoover5169
@danielhoover5169 7 күн бұрын
You may be very interested in the tests done on AMD GPUs with the DeepSeek models. The 7900 XTX outperforms the 4090 on the 14G distilled R1 and all smaller distills, and barely loses for 32G.
@Abir-Faisal
@Abir-Faisal 5 күн бұрын
Yeah but when you have to spend 2 hours to figure out why your pytorch code suddenly doesn't work that performance increase is negated.
@Tenmar.
@Tenmar. 6 күн бұрын
If I recall, the reviewers get a review unit which they have to send back after. While it is true that some of the big youtubers do get GPUs like that for free (see the video game industry), it's mainly due to network connections, getting into the big club, and years of being towing the line.
@itmecube
@itmecube 7 күн бұрын
Tariffs are only going to inflate the costs of GPUs. It's about to get a lot worse.
@britneyfreek
@britneyfreek 7 күн бұрын
which will just crash this ridiculous hype train. ppl need to get grounded.
@JonJon69420
@JonJon69420 5 күн бұрын
damn, didnt know Mexico and Canada are in the microchip business xDDDDDDDDDDDDD
@jamesmorganmcgill5848
@jamesmorganmcgill5848 4 күн бұрын
​​@@JonJon69420taiwan is tho
@victorcadillogutierrez7282
@victorcadillogutierrez7282 6 күн бұрын
Well, it's called 1.58 bit quatization because a model is rounded to ternary weights instead FP32, FP16, or whatever and the new weights have only {-1,0,1} elements, this reduces the multiplications of matrices complexity to a binary operation level in LLMs, 1.58 bit comes from 2^(1.58) =~ 3 , and 3 is the said ternary weights quantization. Prime you are considering on having many apple's you can also parallelize many NVIDIA 3090, it's really hard to get a 4090, or just wait and try to buy them over time. You can also parallelize different NVIDIA brands as long it runs in CUDA with auxiliar pytorch libraries.
@mairex3803
@mairex3803 7 күн бұрын
The only things you should look at are: VRAM (nothing matters if you can not fit the model) Tensor core precision support. You really want BF16 since you keep the exponent size of fp32 with half the cost. Ampere and newer support this. Working with lower precision is annoying if you want to do it yourself. You have to do a lot of work to maintain stability and accuracy.
@Sl15555
@Sl15555 6 күн бұрын
VRAMis most important. and the mother board and cpu needs to have enough PCIe lane support. alot of mainboards only support 1 full PCIe lane while the remaining ports are nerfed.
@Gerry484
@Gerry484 6 күн бұрын
You know what we are not pushing enough? GAME OPTIMIZATION! Its a joke...
@ErazerPT
@ErazerPT 7 күн бұрын
Welcome to budgeting hell, have a nice stay. And be REALLY conscious about the VRAM aspect. The moment your model can't fit, you have different parts moving at different speeds, and the fast side will always be waiting on the slow side. Rough comparison, memory cache vs on disk. Thus, you have to account for what the speed up on GPU side means compared to the slow down on the "non GPU" side means. Ah well, me still waiting on the 5050, assuming it comes out, 5060 if not. It's a blessing when your models can fit on "small stuff" because they're not trying to be everything for everyone.
@monkemode8128
@monkemode8128 6 күн бұрын
CAN'T U JUST PUT IT IN MAIN SYSTEM MEMORY AND IF U CAN LOAD IN THE PARAMETERS YOU NEED FASTER THAN THE GPU CAN PROCESS THEM YOU'RE GOOD RIGHT? (I'm being fr)
@LorikQuinn
@LorikQuinn 4 күн бұрын
i just want a 75w gpu that doesn't suck too much. No one cares about us in the low end !!! 😂
@ErazerPT
@ErazerPT 4 күн бұрын
@@LorikQuinn Hell ye brother, i hear ya. But think that ship has passed. The 5050 COULD hit that, but it's way more likely to be close to the 4060, thus you're gonna need PCIE+some. Been postponing a new PSU, but now...
@dataolle
@dataolle 7 күн бұрын
For hardware config maybe a collab with Wendell from L1techs would be cool.
@LtdJorge
@LtdJorge 7 күн бұрын
Yes, fully agree
@102728
@102728 7 күн бұрын
I kinda wanna see wendell and casey nerding out on a call
@leoSaunders
@leoSaunders 6 күн бұрын
5090 costs ~€3,800 - €5,000 in Europe 5080 €1,300 - €2,300 Europe only has prices after taxes
@JonJon69420
@JonJon69420 5 күн бұрын
20% VAT is insane
@jonton6981
@jonton6981 7 күн бұрын
Mac mini route should be fine for your use cases. You probably only need RAG for the doc search / coding anyway. Finetuning without a sufficient dataset often only hurts performance.
@space.raider.2271
@space.raider.2271 4 күн бұрын
it's frankly confusing how big silicone can design and make these delicate infinitely intricate devices but simultaneously be such a pure mess as an industry
@hanes2
@hanes2 7 күн бұрын
Sad they never could get the SLI work properly since the professional cards, with NVLink you’re just stacking more. 6gpus working together as one big GPU.
@taylor-worthington
@taylor-worthington 7 күн бұрын
Yeah, honestly you would think this would be built into the OS (or standard drivers I guess) by now, and SLI would have been a temporary bandaid.
@mryellow6918
@mryellow6918 7 күн бұрын
They could, it worked but the method they used was to put it on the developer to integrate. They have more than enough staff and money to make a functional version now but they didn't like how you could get 2 cheaper cards and beat the flagship for cheaper. They are 100% gonna move to chiplets making any form of newer sli kinda pointless now
@mryellow6918
@mryellow6918 7 күн бұрын
​@@taylor-worthingtonthats what dx12 tried to do but nobody cared.
@RADkate
@RADkate 4 күн бұрын
that only gets you so far because of latency look at the amd ccd controlles
@GirlfightClub
@GirlfightClub 5 күн бұрын
You just earned my like and sub @14 min mark saying you’d feel bad using your influencer status to score a 4090. I been goin crazy looking for one. Btw no idea who the guy being interviewed is, it’s not in the description or anything
@dualmotkany
@dualmotkany 5 күн бұрын
His name is Casey Muratori. I had to look in the video until someone in chat said his name lmao
@HamsterHearthstone
@HamsterHearthstone 7 күн бұрын
Prime, just give it 1-3 months, you'll be able to get a 5090 by then if you're faster than a snail checking out at an online retailer.
@3choblast3r4
@3choblast3r4 4 күн бұрын
One of the problems is that they stuck with 4nm this gen, with 3nm and new ways to make transistors they can give us much better performance and over time the prices for manufacturing them would come down. But I don't think actual prices will ever come down. Because as a former business student I know that the number one "ethical duty" and copro has is to increase quarterly earnings.. and you don't do that by lowering prices when everyone is buying your products at ridiculously high prices. They will never, ever give up their profit margin. Prices will only go up, stabilize at an equilibrium where they maximize profits from sales. And only if the market crashes and suddenly no one can afford them or sales drop like a brick or something else goes wrong will they be forced to lower their prices to a point where people can afford them again. So yeah .. we f'd.
@dataolle
@dataolle 7 күн бұрын
Public cloud gpu instances perhaps?
@cricketbatman
@cricketbatman 6 күн бұрын
Never been happier I forked out MSRP for a 4090 over a year+ ago when I found one in stock. Big OOF
@Sl15555
@Sl15555 6 күн бұрын
same, i thought the supply issue was over. guess not.
@trietang2304
@trietang2304 7 күн бұрын
Remember the crypto bubble that make me unable to buy a dream pc I save up
@ccayco
@ccayco 4 күн бұрын
Can you believe the 3090 is only a few years old? Feels like it came out a decade ago.
@Telopead
@Telopead 7 күн бұрын
Id honestly go for amd 7900xtx, if im just trying run smaller models. If im going for deepseek r1 671B, cheapest way is somewhere between Mac Studio or some retired server parts with huge numbers of ram. Gpus are too expensive and hard to get rn
@paulwary
@paulwary 7 күн бұрын
So where is the market for used data centre chips?
@NytronX
@NytronX 7 күн бұрын
Prime, I'll sell you my RTX 4090 for like $2.3k. Would ship from MN. Excellent condition, never used for mining or AI.
@krunkey
@krunkey 7 күн бұрын
Get this to the top
@flipwonderland
@flipwonderland 6 күн бұрын
I just bought one new for 1.8k :0
@charlesscholton5252
@charlesscholton5252 7 күн бұрын
I am wanting to explore the use of Project Digits unit over setting up old server hardware loaded with GPUs.
@YTaccount11454
@YTaccount11454 7 күн бұрын
if you think its bad now, imagine the gpu AND cpu shortage (high ends cpu) if china attacks taiwan,... it would literally mean evolution of these chips would probably be delayed for years if not decades, its not as easy as it seems to just build a FAB and get going, even with the technology blueprint it would probably take at least 5-8 years to get a fab up to the level of what TSMC has...
@LtdJorge
@LtdJorge 7 күн бұрын
And the Taiwanese have systems in place to burn every plant of the big silicon foundries in case the Chinese set foot on their land. They prefer to destroy everything rather than let them get it.
@Kwazzaaap
@Kwazzaaap 7 күн бұрын
Decades? Samsung and Intel are right there, and ASML is in the Netherlands.
@mryellow6918
@mryellow6918 7 күн бұрын
​@@KwazzaaapSamsung and intel are not remotely close to what tsmc can do.
@defeqel6537
@defeqel6537 6 күн бұрын
@@mryellow6918 that's quite an exaggeration
@Sl15555
@Sl15555 6 күн бұрын
Almost every electronic device has at least one chip from TSMC. and all the major brands outsource to TSMC for the highest tech chips. we need TSMC like fabrication in the USA.
@timisa58
@timisa58 7 күн бұрын
The 'racket' is the pricing for less performance. Not necessary the current limitations of the tech. I learned about the idea that lower chips are actually more defective chips. It is nuts.
@thesenamesaretaken
@thesenamesaretaken 6 күн бұрын
What's nuts about it? You have a factory that spits out products. Some of them have more defects and some have fewer. Are you suggesting only keeping the perfect few and throwing the rest into landfill?
@Ray-gs7dd
@Ray-gs7dd 7 күн бұрын
I litterally started programming Fortran because of your guy's rant lol. I love whenever you two start cookin' on stream
@Henrik_Holst
@Henrik_Holst 7 күн бұрын
I hope you limit every string to 100 chars ;)
@dgo4490
@dgo4490 7 күн бұрын
With an ever increasing amount of transformers, fortran might be back in business!
@victorgabr
@victorgabr 7 күн бұрын
​@@Henrik_Holst yep, the plot twist is that "benchmark" was done using LLM slop and the Fortran code seemed faster because it limited the char size to 100. Lol😂
@theredwedge9446
@theredwedge9446 5 күн бұрын
what annoys me the most is years old graphics card being sold for less then their scalped price which still ends up being higher than the original msrp
@lilpepe545
@lilpepe545 7 күн бұрын
The definition of a monopoly. Government has to step in and divide NVIDIA up.
@LtdJorge
@LtdJorge 7 күн бұрын
no
@Neuroszima
@Neuroszima 7 күн бұрын
@@LtdJorge yes. But the correct answer is they will never do that
@warasilawombat
@warasilawombat 7 күн бұрын
Not really. There are alternatives but they aren’t as good. You’re absolutely allowed to do that. What they step in for would be anti-competitive behavior. Hard to say if they meet that bar.
@MrKlarthums
@MrKlarthums 7 күн бұрын
The problem is partially manufacturing capacity. There's no way TSMC can accommodate demand at this point, let alone allow for a competitive market. The other problem is that companies are using traditional gpu compute rather than ASICs. nVidia GPU prices will drop like a rock once some company figures out how to build a competitive AI-focused chip, cut costs from not needing 3D graphics support, cut costs by keeping traditional compute hardware external, and transpile CUDA (at least in some capacity) for adoption. It must be a very hard problem as this has been a needed area for about 15 years when scientific computing needed cheaper, more scalable alternatives to supercomputing clusters with thousands of traditional Intel/AMD CPU cores.
@AONTrappy
@AONTrappy 2 күн бұрын
I decided to make a comeback into the gaming world in 2025 but I'm not waiting around this time when I managed to survive by the hair of my balls during the 1080 wars.
@paprikar
@paprikar 7 күн бұрын
"...Is a Disaster", acshuawwy 🤓☝
@broadestsmiler
@broadestsmiler 7 күн бұрын
He changed the title! Huzzah!
@SteinBeuge
@SteinBeuge 3 күн бұрын
kenadra: "Would love to see the underlying math landscape change and invalidate GPU AIs entirely." Underrated comment, maybe the problem isn't the physical silicon but that we are doing the same old things and trying to do it with more energy and more silicon. The reason we have upscaling is because we've hit the limit of how many ones and zeros we can shove through the current technology, but I don't know how it would be possible to completely change the rendering and inference landscape, and do things in a different way?
@zetsuyoru662
@zetsuyoru662 7 күн бұрын
So many Nvidia apologists when they are clearly milking each buyer , be it corporate or personal.
@zetsuyoru662
@zetsuyoru662 7 күн бұрын
Sorry but the guy on the left talks a lot of crap to sound clever using word salad. All people want is to play games not do lllms. Everyines a youtuber to get a GPU. What a fool. What isnhis point
@thesenamesaretaken
@thesenamesaretaken 6 күн бұрын
​@@zetsuyoru662GPUs are for playing games? You might want to tell nVidia that, their income has skyrocketed since deep learning took off so that now, where GPUs used to be gaming devices that could also do some arbitrary computation on the side, now they are AI devices that can also do a bit of rasterisation.
@complexity5545
@complexity5545 7 күн бұрын
I made 2 A.I. builds in 2022. All server parts and welding and 3d printing and fans. This is disheartening: I want a 5090, but the situation is not a smart investment.
@one_step_sideways
@one_step_sideways 7 күн бұрын
Why making minor grammar mistakes are a disaster
@megatronreaction
@megatronreaction 7 күн бұрын
no. it can increase comments engagement
@aldi_nh
@aldi_nh 7 күн бұрын
clickbait work, with just title lol
7 күн бұрын
but the engagement you get from comments... :D
@Talic29
@Talic29 7 күн бұрын
I am at least 500% more likely to click a prime video when I see Casey. Prime you're great, but Casey is a GOD.
@theheatdeathiscoming
@theheatdeathiscoming 7 күн бұрын
I have a 4090 and an m3 macbook pro with 128GB of shared memory. If (~1.2* model_weight_size) doesnt fit into the 24 GB on the 4090 the macbook pro is way way waaaaaaaaaaaaaaay faster at inference. If the weights fit on the 4090 then its faster but at that point the model is pretty "small" anyways so its still acceptably fast on the macbook pro. Moving data from ram to the gpu is just hella slow.
@CaptTerrific
@CaptTerrific 7 күн бұрын
TRAINING, broseph!
@misiekt.1859
@misiekt.1859 4 күн бұрын
Radeon W7900 is cheaper than 4090, has 48GB. And stop BS about software. AMD stack is just fine for machine learning ("pytorch/cuda") via HIPS at least since '21-22.
@carnap355
@carnap355 7 күн бұрын
what does "Are" refer to?
@_plaha_
@_plaha_ 7 күн бұрын
Just so you can correct him
@emmanuelgoldstein3682
@emmanuelgoldstein3682 7 күн бұрын
He's H1B-ing
@MaxPicAxe
@MaxPicAxe 7 күн бұрын
@@_plaha_ haha
@xilix
@xilix 5 күн бұрын
23:50 OK I just need to interject here - not that I expect or would ever encourage you to get an H100; but the difference between an H100 and a 4090 isn't just a small percentage or even a 30% chunk - it is a MASSIVE MASSIVE jump in throughput. I have a 3090ti and a 4090 in my workstations, and whenever I've fired up rented H100's for training that my 90's struggle with or can't handle, they absolutely BLOW my mind. I'm talkin over 7-10x the speed for certain projects. ZERO hyperbole. The H100 is an absolute juggernaut freight train compared to a 4090 when it comes to AI work. If you don't believe me, go rent one for a few hours, throw some envs up and start screwin around. The difference is SURREAL.
@Unordinary-lg4yt
@Unordinary-lg4yt 7 күн бұрын
So not only are people punching air over AI, they’re punching air over pricing and availability - despite the fact this overlap doesn’t even care (so they claim anyway).
@zhe2en171
@zhe2en171 7 күн бұрын
My hot take / understanding (please corect if I'm wrong) - the fact that you can do 4bit and below (there are even 2-bit quantizations !!!) suggests that the current LLM architectures are oversized in terms of parameters for their compute ability. I think that if the neurons were "saturated", nearly any further quantization should significantly degrade the model's output.
@DF-ss5ep
@DF-ss5ep Күн бұрын
No, because using fewer bits has its own benefits. Not just multiplications may be faster, also more data can fit inside caches, and memory access is a big cost of running these models. So cutting bits off may have had a negative impact if the model is kept constant, but at the same time it can allow them to add more layers to the model, for example.
@basic204
@basic204 5 күн бұрын
I'm a little confused how was the 5090 the limit when they literally sell graphics cards with 98 GB of VRAM
@csh9853
@csh9853 4 күн бұрын
I'm glad i don't have this problem of having too much money.
@edwardallenthree
@edwardallenthree 7 күн бұрын
It used to be crypto. Now it's AI. What will we spin GPU cycles on next that is worthless?
@exploiting4daysd105
@exploiting4daysd105 5 күн бұрын
Virtual worlds
@MasamuneX
@MasamuneX 7 күн бұрын
asianometry would absolutely blow your mind with the depth of the process
@real_krissetto
@real_krissetto 7 күн бұрын
@ThePrimeTime Going for the beefiest single gpu you can get is probably the most satisfying setup right now, especially compared to using just 2 or 3 gpus in total (and not many more). Data transfer rate between the cards during inference puts a pretty hard cap on tokens/sec when the model is spread on multiple gpus, with the main benefit being you can run bigger models without parts of the model going into ram. If you can fit a model entirely in one gpus vram, then you can really see them fly on modern gpus.
@defeqel6537
@defeqel6537 6 күн бұрын
Strix Halo could also be good setup, seeing how it can address quite a lot of memory
@ZDoherty
@ZDoherty 7 күн бұрын
1:30 NVIDIA calls this Speed of Light, it’s a company value
@xilix
@xilix 5 күн бұрын
As someone who works in AI with these chips it is SO refreshing to see this guy on the right talking about GPU's. Screw gamers nexus and linus, they're too addicted to milking the hate train and never talk about the nuance and realities of what's happening.
@JackAnton
@JackAnton 5 күн бұрын
2x A6000 nvlink gives you technically 96G of VRAM for larger model inference because the overhead on transfer is low, but training it’s really two GPUs each with 48G. With nvlink your bypassing PCI on the GPU to GPU comms.
@tmerb
@tmerb 7 күн бұрын
just buy 7900 xtx. 24gb vram for cheap. i bought one the other day for $829.
@bankmanager
@bankmanager 7 күн бұрын
Sensible man. Casey and Prime seem to think PyTorch is exclusive to CUDA? Lmao.
@kalasmournrex1470
@kalasmournrex1470 7 күн бұрын
I’d expect the gpus to go to 3d layouts. Since they are emabarassingly parallel, it’s the perfect case for 3d layouts.
@LtdJorge
@LtdJorge 7 күн бұрын
Not exactly, because that reduces thermal transfer so much. What AMD is doing is putting their 3D V-cache below the CPUs, and that could be done for the GPUs too. But right now, if you stacked cores vertically, they would cook themselves.
@shaynethomas8880
@shaynethomas8880 5 күн бұрын
I figured all this out in a cave.....with a box of scraps
@Velereonics
@Velereonics 4 күн бұрын
And this is why advancements in materials science are necessary. We need to be able to stack, or control heat in some better way.
@ShiroKage009
@ShiroKage009 5 күн бұрын
Something as vital to modern data infrastructure as chip production should never be allowed to be monopolized so hard. It's why I'm glad that China is pumping so much money into developing their own fabs and I'm glad they're doing all the industrial espionage to liberate the patents for the lithography machines from current monopolies.
@foobar3
@foobar3 7 күн бұрын
14:30 Prime needs to obtain his will to power and email NVIDIA for that 5090. Expend some clout capital.
@TheMetacognologist
@TheMetacognologist 7 күн бұрын
Why am I able to preorder a samsung galaxy s25 like a month in advance and wait for it to ship directly to me and arrive a few days after launch when its in stock but not for GPUs? Make it make sense
@TheMetacognologist
@TheMetacognologist 7 күн бұрын
Why as a customer do I have to have email and text alerts and constantly refreshing some page and immediately purchase a gpu within less than a minute? Why isnt there a waitlist or reservation system? Cant there be sanity?
@bankmanager
@bankmanager 7 күн бұрын
​@@TheMetacognologistBecause as Prime said, scalpers with their Puppeteer bots have bought them all.
@RADkate
@RADkate 4 күн бұрын
because samsnug has their own fabs that arent even remotely at capacity lmao
@rydmerlin
@rydmerlin 7 күн бұрын
Did you watch Digital Spaceports videos? Ask the twitter guy you interviewed how to obtain the GPUs ... He had one in that video. Also, how many GPUs is Nvidia going to sell due to DeepSeek R1?
@Kwazzaaap
@Kwazzaaap 7 күн бұрын
It's gonna sell more, because now every company that can do a 30-200k investment into a local AI assistant will. You no longer have to worry about your trade secrets leaking.
@babybrocute
@babybrocute Күн бұрын
Jensen Huang CEO of NVIDIA said thank you if you buying the RTX 5000 series his jacket are gonna be so shiny than before.
"... maybe the problem is you" - Linus
1:01:59
ThePrimeTime
Рет қаралды 177 М.
Actually investigating "Gaming is Dying"
1:01:05
NeverKnowsBest
Рет қаралды 238 М.
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН
Grunt vs Sorceress - Warcraft 3
4:41
Epic Warcraft
Рет қаралды 75
Semantic Versioning sucks - but we can fix it
21:52
Theo - t3․gg
Рет қаралды 12 М.
The Best Software Engineering Advice | Prime Reacts
55:05
ThePrimeTime
Рет қаралды 501 М.
Happy Hour #97 - Justin Drake - ETH Foundation - Jan  31st 2025
1:25:07
EVMavericks - Ethfinance
Рет қаралды 212
What's Inside The World’s Worst eBike? Reevo Teardown
11:55
Berm Peak Express
Рет қаралды 304 М.
What if you just keep zooming in?
21:29
Veritasium
Рет қаралды 7 МЛН
Prime Reacts: The Flaws of Inheritance
29:05
ThePrimeTime
Рет қаралды 402 М.
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН