MI210s vs A100 -- Is ROCm Finally Viable in 2023? Tested on the Supermicro AS-2114GT-DNR

Рет қаралды 46,187

10 ай бұрын

Wendell discusses the race in machine learning, going over Google's, Nvidia's, and AMD's tech to see who's got what in 2023.
*********************************
Check us out online at the following places!
bio.link/level1techs
IMPORTANT Any email lacking “level1techs.com” should be ignored and immediately reported to Queries@level1techs.com.
-------------------------------------------------------------------------------------------------------------
Intro and Outro Music: "Earth Bound" by Slynk
Edited by Autumn

Пікірлер: 241

@kazriko 10 ай бұрын

AMD needs two modes, "Accurate" and "Green Team Inaccuracy"

@TheHighborn 10 ай бұрын

More like, red dot accurate, noisy green data

@sinom 10 ай бұрын

Calling it "green mode" and justifying it as "oh it uses less power because it is less accurate" might actually be something they could do

@ac3d657 5 ай бұрын

You need two mode... wrapped, and dropped

@TAP7a 10 ай бұрын

ROCm seems to have planted itself in the scientific HPC world, let’s hope it can grow from there

@tappy8741 10 ай бұрын

With CDNA yes. With RDNA1/2/3 they've severely dropped the ball and didn't adequately make it clear that that was the plan all along. On the consumer side which is where hobbyist compute lives the 6950X was the first card to approach the Radeon VII for a traditional (non-AI ML whatever) scientific workload. The 7000 series is actually worse as they cut FP64 performance and the memory model with infinity cache split 5/6 ways (and/or something else) seems to have hurt this specific (opencl which is why it can be tested) workload. George Hotz to the rescue would be awesome.

@flamingscar5263 10 ай бұрын

Honestly, im hopeful for ROCm on consumer hardware soon, and windows, if your someone that uses any form of creative app like blender or the adobe suite then you know how valuable CUDA is, this really could be the boost AMD needs, Ive been trying my best to recommend AMD but its surprising how many people go Nvidia because of how much better Nvidia is in creative apps, even if they dont use them, its always "well I might want to use them in the future so Ill just go Nvidia" Soon there will be little exscuse to not go AMD, and im all for it, competition is good, not that im in any way an AMD fanboy, I knoe for a fact that if somehow AMD dethroned Nvidia as the market lead they would pull the same shit Nvidia does, but competition is what is meant to stop that

@GlacikingTheIceColdKing 10 ай бұрын

funny enough, they most likely won't use them in the future. I've seen a lot of people using the same argument to go Nvidia but they don't even install any creative applications after buying their gpus. Also I've been using AMD for about 7 months, it isn't necessarily horrible for people who want to just do video editing with premiere pro and illustrator or photoshopping. I use those softwares almost regularly and I face no problem with it.

@reekinronald6776 10 ай бұрын

Yup. For about a decade I was scratching my head why AMD had such a lousy Software strategy. It had great hardware, but the drivers and the lack of tools or API for programmers just seemed like a huge business mistake. Perfect example was the time and resources spent on AMD's Prorender. Considering the multitude of professional and high quality open source renders, ProRender was a pointless exercise; better to spend the man power and money on driver development, or even on openCL when it was viable. At least with ROCm they now seem to understand that everything that is needed to support the hardware is as important as the hardware itself.

@nickelsey 10 ай бұрын

Tensorflow never directly competed with CUDA, it sits on top of CUDA - Tensorflow's primary competitor was (and still is) Pytorch. Both Tensorflow and Pytorch can be run on TPUs, but of course Tensorflow has 1st class support. Both Tensorflow and Pytorch have 1st class support for CUDA. I suspect the real reason Tensorflow hasn't been as popular lately is two-fold. First, a lot of internal Google development resources have moved on to develop JAX instead of TF, and secondly (and more importantly), Pytorch is simply better than Tensorflow. Its significantly more enjoyable and easier to use. And the reason CUDA has beaten out TPUs is also simple - you can only get TPUs using Google Cloud, whereas every cloud, every enterprise datacenter, and every school had direct access to CUDA capable devices. Everyone uses and develops for them, whereas TPUs and the XLA compiler basically only developed by Google. Also, in deep learning we actually don't mind the reduced accuracy for many problems. In fact, a mix of 32 bit and 16 bit is the *default* data format for deep learning now. Reduced precision deep learning is extremely important for large scale neural network development - for three reasons. First, obviously, if you use fewer bits for your model, you can fit a larger model in a single GPU's memory, which makes development easier. Second, the Tensor Cores basically double their FLOPs every time you halve the precision of your data. So if you have 256 TOPs using 32 bit floating point data, then you have 512 using FP16 data, and 1024 TOPs using FP8 data. Even further compression work is being done for INT8 and even INT4. Finally, one of the most important and oft-overlooked issues is that many neural net architectures require very high GPU memory bandwidth - thats why data center GPUs use HBM. When you reduce your data from 32 bit to 16 bit floats, you reduce the memory bandwidth pressure by half. We won't consider AMD cards until they're competitive at FP16 performance with CUDA, and even then, AMD would REALLY need to convince us that their software stack works as seamlessly as CUDA does - you have to add wasted developer and data scientist time to the total cost of the device to get a proper apples-to-apples comparison. We just started getting our H100 deliveries in, and they are truly beasts. I'm hoping we can get some AMD hardware in for benchmarking at some point soon.

@nexusyang4832 10 ай бұрын

Pin this comment above.

@seeibe 10 ай бұрын

It all sounds viable from the hobbyist / small company standpoint. But come on, if you can afford H100s, you're big and successful enough that you can just invest in AMD as a backup plan. This would basically be the equivalent of Valve saying "All PC gamers are on Windows, so we won't invest in Linux". At a certain point, you're the one who has to make it happen.

@GeekProdigyGuy Ай бұрын

Reduced precision is NOT the same as violating the FP standards. Going from FP32 to FP16 is a reduction in precision, but if the hardware implements the standards correctly, an FP16 calculation should have the exact same result no matter what card you run it on. Fudging the calculations probably doesn't make a huge difference for most ML applications, but for companies that need auditability (eg finance) or even big tech companies that want to debug an issue affecting a million users out of their billion users... Standards compliance is important, and Nvidia needs to fix their shit.

@dirg3music 10 ай бұрын

I completely agree, if history has shown is anything it's that when Lisa Su goes all in on something, that something tends to work and work well. I'm just excited to see the market get more diverse as opposed to "CUDA or gtfo", closed ecosystems like that are bad for everyone.

@psionx1 10 ай бұрын

except it was AMDs own fault that cuda became the standard for GPU compute work and they still have not learned adding features to hardware and slapping them on the box is not enough to win. they actually have to provide support and funding to develop 3rd party software that uses the features of the hardware.

@makisekurisu4674 10 ай бұрын

@@psionx1Give them a break, they are running on less than half the so of course they'd have to pick and choose their fights

@datapro007 10 ай бұрын

I hope to heck it is. It's NVidia or nothing until now. Terrific video Wendell. I like it that you have content for the working folks.

@jannegrey593 10 ай бұрын

Well - I hope for some competition. Standards are fine, but one company owning them is very monopolistic. And AMD's disadvantage seemed to be lack of software rather than hardware.

@ProjectPhysX 10 ай бұрын

We have both an MI210 64GB and A100 40GB for my FluidX3D OpenCL software. Both cards are fine, the software runs flawless, but they are super expensive. Value regarding VRAM capacity is better for the MI210, yet performance (actual VRAM bandwidth) is better on the A100. Somehow the memory controllers on AMD cards are not up to the task, 1638 GB/s promised, 950-1300 GB/s delivered. The A100 does the actual 1500 GB/s. Compute performance for such HPC workloads is irrelevant, only VRAM capacity and bandwidth counts.

@mdzaid5925 9 ай бұрын

What a time we are living in.... ~1000Gbps is not enough 😅

@ProjectPhysX 9 ай бұрын

@@mdzaid5925 crazy right? Transistor density and with it compute power (Flops/s) has grown so fast in the last decade that memory bandwidth cannot keep up. Today almost all compute applications are bandwidth-bound, meaning the CPU/GPU is idle most of the time waiting for data. Even at 2 TB/s.

@mdzaid5925 9 ай бұрын

@@ProjectPhysX True..... not sure about performance implications but computing has evolved very very rapidly. When I think how small each transistor it, how many and how closely they are packed, it feels impossible. Personally, I feel that eventually analog neural networks will take over and gpu's dependency should be reduced to only training / assisting the analog chipsets. Also, I don't have too much faith in current generation of "AI" 😅.

@Teluric2 2 ай бұрын

What kind of setup you use with this software? Windows redhat?

@ProjectPhysX 2 ай бұрын

@@Teluric2 for these servers openSUSE Leap, for others mostly Ubuntu Server minimal installation.

@leucome 10 ай бұрын

I got a 7900xt when rocm5.5 went out. Specifically to use with A1111. It works pretty good. To give an idea I tried 32 image of Dany Devito 768px 20samples,it took 2:30 min. Though I did 8x4 batch if I do 16x2 take 2:40 for 32x1 then it took 3 min. SO yeah the performance is there. I can just imagine how fast the MI300 will be.

@sebastianguerraty6413 10 ай бұрын

I thought Rocm was only suported in very few 6xxx gpus on AMD and their server class gpus

@chrysalis699 10 ай бұрын

@@sebastianguerraty6413 Rocm 5.5 fixed that. Added gfx1100 and thus the 7xxx support. I've been custom compiling pytorch with every new release of rocm. Can't wait for them to start leveraging the AI accelerators cores in the 7xxx series. Weather that is CUDA compatible, and will be exposed via HIP still needs to be seen.

@sailorbob74133 10 ай бұрын

@@chrysalis699 when you compile pytorch for gfx1100 how much of a uplift do you get over stock pytorch? What benefits do you see from the custom compile in general?

@chrysalis699 10 ай бұрын

@@sailorbob74133 The stock pytorch compiled against rocm 5.4.2 doesn't detect my card at all, so the uplift is infinity 🤣. I doubt it there is much difference for RX 6xxx cards, and there is still quite a bit of unlocked potential in the RX 7xxx cards, as I haven't seen any HIP APIs for the AI accelerators. There are actually barely any mention of them on AMD's site, just an obscure reference on the RX 7600. Probably have to wait for CDNA 3 to release those APIs.

@chrysalis699 10 ай бұрын

I just noticed that pytorch nightly is now compiled against ROCm 5.6, so I'll probably just switch to those. 🤞the next release will be build against 5.6

@s1ugh34d 10 ай бұрын

We need more high end AI comparisons like this. Hope you get more gear to test!

@Bill_the_Red_Lichtie 10 ай бұрын

I am such a geek, "Can't believe it's not CUDA" made me actually laugh out loud.

@astarothgr 10 ай бұрын

The worst thing about ROCm is the hit'n'miss support for commodity GPUs. Back in the 3.x / 4.x days of ROCm, commodity GPUs were half-heartedly supported, with bugs, and sometimes support retroactively withdrawn. These days at least they tell you that if you buy anything other than the W-series of GPUs (i.e. W6800) they don't promise anything. This however, will not increase the mind share; all students and budget-strapped researchers just buy off-the-self nvidia GPUs and go to work. If you've picked a commodity GPU card and are trying to get ROCm to work, be ready for tons of frustration; really, this use case is unsupported. Source: my own experience with ROCm 4.x, using rx480/580, vega 56/64 and Radeon VII (the only one that worked reasonably well).

@mytech6779 10 ай бұрын

I would add to the student/budget research thing, they may not be looking for high performance, but they do need the full feature set to do the primary development work, then once working and somewhat debugged they will upgrade to get performance. Even for big-budget ops it makes no sense to have top-end hardware sitting there depreciating for a year or four while the dev team runs experimental test builds. By the time it comes to a real production run another purchase will be needed anyway. That core functionality problem has always been AMDs GPU problem, promises that seem good on paper but ultimately don't deliver. "Oh yeah now that we have your money it turns out you need this specific version of PCIx with that CPU subfamily, on these motherboards with this narrow list of our cards (as we have terrible product line numbering so many in the same apparent series don't work) made in these years, with that specific release of this OS...." Years ago I bought a W7000 (well over $1000 12 years ago) specifically because I wanted to play with the compute side, and there were claims that it had compatible drivers and such (I use Linux, nVidia had terrible support), Nah oops something in the GCN1.x arch was screwed up and compute was never useable even after several major changes in drivers and supposed open sourcing. It worked OK for graphics but my graphics needs are minimal. Later, I switched to a much newer and cheaper equivalent performance consumer AMD card that claimed OpenCL support, nah again doesn't really. Gave me a rather bad taste for AMD. I'm hoping Intel can push some viable non-proprietary alternative to CUDA, I'm due for a new system in the next couple years.

@solidreactor 10 ай бұрын

Rumor says that ROCm might work for RDNA3 on Windows this fall (repo & comments). However something similar was said earlier for 5.6 and that might not be true anymore? I really hope the consumer RDNA cards could run ROCm on Windows and act both like an evaluation for the CDNA platform and as an entry for AI compute, to democratize AI access. Having ROCm support on consumer cars on Windows might also develop traction from other companies (like Tiny corp) to embrace the more open solution, who knows, maybe that will tip the scale to AMDs favor?

@flamingscar5263 10 ай бұрын

Everything points towards it being the case, AMD hasn't said anything offically but the in development documents leaked saying a fall time frame It will happen eventually, even if not fall, it will happen, AMD knows how far behind they are on the consumer side for creative work, they need this

@stevenwest1494 10 ай бұрын

I'm hanging my GPU choice on this date, because honestly I don't want a rtx 3060 12gb, and NGreedier's horrible GeForce experience 🤮 but I want to get into Stable diffusion. A 3080 12gb is just waaaay too much still! But I really want is a rx 6800, with RocM for Windows!

@eaman11 10 ай бұрын

Intel says the same thing, their stack working on Windows too.

@mytech6779 10 ай бұрын

AMD will see a squirrel by then and abandon yet another project with half implemented "support". Why they would even mess with Windows support at this point is dumbfounding, most systems in this realm run Linux unless they are forced to Windows by some 3rd party need for proprietary crap. Windows may still be king of Ma and Pa Kettle's desktop but that isn't this target market segment.

@reekinronald6776 10 ай бұрын

@@mytech6779 I would like to see a segment breakdown between corporate GPU computing and consumer. I would still think Windows Users running blender, Adobe, or some other video graphics program that use GPU rendering is quite large.

@tad2021 10 ай бұрын

If you didn't know, in A1111, change the RNG source from GPU to CPU and the optimizer to sdp-no-mem. That should make the differences running on different GPUs as little as possible. Using xformers on cuda can be faster (sdp on pytorch2 has mostly caught up), but the output isn't deterministic.

@P0WERCosmic 5 ай бұрын

ROCm 6.0 just dropped today! Love for you Wendell to do an update on this video to show off all the advancements with 6.0 and if there are any noticable performance bumps 🙏

@AndreiNeacsu 10 ай бұрын

I am really happy that Ryzen paid off. in 2017 I was one of the earliest adopters who pre-ordered two Ryzen 1700 (non X) systems with X370 boards; and I never pre-order stuff, did not before and have not since. Now, AMD is a proper force for innovation and competition in both the GPU and GPU spaces, for consumers and datacenters. Also, Intel ARC seems to become more interesting by the day. Got an Acer A770 16GB as a curiosity at the start of this year and I still haven't reached the final conclusions about it; seems like every second driver update makes things better.

@flamingscar5263 10 ай бұрын

Yea, it's honestly good ryzen happened, because there was reports they were on the road to bankruptcy All of this is thanks to Lisa Su, she really saved AMD

@peterconnell2496 10 ай бұрын

Well done. Therein lies a tale many of us would like to hear. The buying decision in the market of the day? The cost of an 8 core intel vs amd then e.g.? Lets not forget what a classic the 1600 proved to be.

@MatthewSwabey 10 ай бұрын

According to two senior AMD tech folks Zen was designed because they had to! bulldozer/etc. was a failure. Originally they aimed for 70% of Intel performance for 50% of the price, but then TSMC's silicon just kept getting better and Intel stopped innovating. [I had the chance to talk to some senior AMD tech folks when they were recruiting on campus and they were surprised how great Zen turned out too!]

@MaxHaydenChiz 10 ай бұрын

It'd be easier to get students experienced with AMD hardware and get open source support for it, if RDNA had more compatibility with CDNA / better performance parity against NVidia hardware. Students and hobbiests aren't spending 10+k on this kind of stuff.

@nexusyang4832 10 ай бұрын

Yeah, the fact someone can walk into best buy and get a prebuilt and download cuda sdk and learn says a lot on how easy and affordable someone can get into AI/ML. If AMD can do the same for their consumer/gaming hardware then that would be a big game changer.

@levygaming3133 10 ай бұрын

@@nexusyang4832exactly. There’s a lot of hand wringing about all the various things Nvidia does to needlessly segment their lineup, and that’s all well and good, but that’s not at all what CUDA is. CUDA’s advantage is that it’s the same CUDA wether you have an MX iGPU replacement, the same CUDA that’s in your old Nvidia GPU that you’re replacing (assuming you have an Nvidia gpu, obviously) and it’s the very same CUDA that’s in last year’s laptops, this years laptops, and is certainly going to be in next year’s laptops. It’s not like AMD makes CDNA laptops, and that’s kinda the point.

@nexusyang4832 10 ай бұрын

@@levygaming3133 You're spitting facts. 👍👍👍👍

@steve55619 10 ай бұрын

Excuse me??? Lol

@mytech6779 10 ай бұрын

Hobbiest /student stuff doesn't need performance parity with CDNA. What it needs is ease of access (Availible as a standard feature on commonly availible consumer priced cards, without hobbling); similarity of interface accross products for the user and for software portability between consumer stuff and CDNA; and performance that is good enough to not be frustrating. Reasonable Linux support is also needed. Linux may only make up 2% of total desktops, but Ditzy Sue and Joe Sixpack aren't GPU-compute hobbyists, so total desktops is the wrong stat; in reallity Linux is closer to 50% or more of the relevent market segments.

@mrfilipelaureanoaguiar 10 ай бұрын

That m.2 scanning multiple 4k videos to check for a choice of shape...really Nice what it can process and check at that size without cooling on it. As long is detected...

@joshxwho 10 ай бұрын

Thank you for producing this content. As always, incredibly interesting

@Owenzzz777 10 ай бұрын

You forgot to mention George Hotz’s discussion started with his frustration with AMD GPU. The so called “open source” software isn’t so open. Look at the “open” FSR 2 repo, no one is reviewing public pull requests, it’s used more as a marketing tool than supporting OSS community

@tstager1978 10 ай бұрын

They never said that fsr2 would be an open source project. They said it would be open source meaning free access to source code and the ability to modify for your own needs. They never said they would accept pull request from the public.

@steve55619 10 ай бұрын

Thanks for this video, this field is moving so quickly it's really hard to keep up to date on the latest advancements, let alone the current status quo

@bennett5436 10 ай бұрын

please do 'tech tubers by Balenciaga' next

@reto 10 ай бұрын

Got SD A1111 to work on an RX 6500 XT and an Arc A770. But I wasn't able to run it on Vega iGPUs. The A770 16GB crushed the 3060 12GB I usually use.

@littlelostchild6767 10 ай бұрын

hey, if you don't mind , could you please make a short test video on a770.? I'm thinking getting a770

@usamaizm 10 ай бұрын

I think the subtleties shouldn’t be an issue.

@spuchoa 10 ай бұрын

Great video Wendell!. This is good for the market, lets hope that the prices adjust in the next 12 months.

@SomeGuyInSandy 10 ай бұрын

Seeing those giant GPU modules gave me Pentium II flashbacks, lol!

@marktackman2886 10 ай бұрын

These videos empower my team to express ideas to upper management.

@mdzaid5925 9 ай бұрын

ROCm support is definitely need on consumer grade hardware. -This will give AI students some experience in Amd ecosystem. - Also, not all AI models run on the cloud. For local use, the companies have to consider the available options and currently it's only nvidia.

@post-leftluddite 10 ай бұрын

Wendell....this is seriously important work. Making the alternative to what many see as the default choice observably feasible is crucial to easing the hesitancy many people have, and just like in anything else [under the clutches of capitalism] a defacto monopoly can only harm consumers/users.

@Alice_Fumo 10 ай бұрын

This is such a curious way to create spot the difference images.

@jadesprite 10 ай бұрын

But what I really want to know is, can I use it to TRAIN models too?? Esp on voice and faces, I don't want to upload my family's private data to a cloud service and potentially have them save it forever, I would only trust that locally.

@cedrust4111 10 ай бұрын

Is ROCm supported on RDNA3 IGPU? By that i mean if one has a Minisforum UM790 Pro (with Ryzen9 7940HS) can that work?

@wsippel 10 ай бұрын

I run AI workloads on a 7900XTX. It's a bit of a headache sometimes, but it works. But there's so much performance left on the table. I recently played around with AMD's AITemplate fork, and it's really fast on RDNA. But it's also incomplete and unstable. Triton recently got lots of MFMA optimizations, no WMMA though. They're largely the same thing as far as I understand, except MFMA is Instinct, WMMA is Radeon. I think even most AMD engineers don't realize Radeon has 'Tensor Cores' now.

@whoruslupercal1891 10 ай бұрын

>They're largely the same thing as far as I understand Absolutely not, MFMA is 1 clock whatever matrice size MMA, WMMA is just running wave64 in however many clocks on double the SIMD width.

@wsippel 10 ай бұрын

@@whoruslupercal1891 Maybe, but the instructions are mostly the same, no? And WMMA on RDNA3 is actually accelerated (CDNA2, CDNA3 and RDNA3 are the only three architectures supported by rocWMMA, so I assume previous RDNA chips simply didn't have an equivalent), so AMD should probably use those instructions wherever possible.

@whoruslupercal1891 10 ай бұрын

@@wsippel >but the instructions are mostly the same, no no. >CDNA2, CDNA3 and RDNA3 are the only three architectures supported by rocWMMA Yea but MFMA is different.

@Stealthmachines 9 ай бұрын

You're simply the best!

@methlonstorm2027 10 ай бұрын

i enjoyed this thanks you.

@zachnilsson4682 10 ай бұрын

I'm going to Argonne National Lab later this week. Let me know if you want to sneak into the new super computer there ;)

@jordanmccallum1234 10 ай бұрын

the promise for ROCm is huge, but better hardware support and better communications about what is and what is intended to be supported is needed. I had to buy a GPU a few years back, and really wanted an AMD GPU for the Linux drivers but I needed tensorflow ability for university. ROCm existed, but there was barely any documentation about what was supported, nothing on what they intend to support, and no timeline for software development, so I got a 2080. I remember roughly at the same time, AMD were touting that "you don't need to buy an instinct to do datacenter compute", but how is "datacenter compute is locked to tesla" any different to "there is no software support for radeon" when you want to get real work done *now*?

@leucome 10 ай бұрын

Better communications for sure. One of the main issue is that the list they provide is not about the GPU that work with rocm but about the GPU AMD offer support. It is totally useless for people who want to know what GPU will actually run or not. As far as I know about all AMD GPU since vega are already working even if AMD dont offer official "support".

@shauna996 10 ай бұрын

Thanks!

@Level1Techs 10 ай бұрын

Thank you!!

@outcastp23 10 ай бұрын

Thanks for the stock tip Wendell! I'm selling all my TSLA and buying up AMD stock.

@KeithTingle 10 ай бұрын

love these talks

@AI-xi4jk 10 ай бұрын

Appreciate the work you’ve put into this Wendel. I think AMD needs to support not only frameworks like TF and Torch but also model conversion from one framework/hw to another. Basically the primitives mapping between systems.

@DOGMA1138 10 ай бұрын

I'm pretty sure your running torch with cu117 or older, the numbers are about 70% lower than what an A100 puts out with these settings on cu118.... if you did just pip install form the default repo it's cu117.

@jpsolares 10 ай бұрын

There is a tutorial for amd instict and stable difusion? thanks in advance.

@ddnguyen278 10 ай бұрын

Kinda hard to build for determinism when your hardware does lossy stochastic compression on compute.. Even multiple runs of the same data set wouldn't result in the same output on Nvidia. I suspect if the didn't do that they would be significantly slower.

@tad2021 10 ай бұрын

We've been using a lot of TPU the past few months. It's such a weird platform with interesting self-imposed bottlenecks, and doesn't help that Google will suddenly reboot or down our nodes for maintenance at least once or more times every few days without any warnings.

@Dallen9 10 ай бұрын

Pausing the Video at 11:37 If AMD is on the left, and Nvidia is on the right. AMD has the better Algorithm running than Nvidia. The Smart phone in Devito's hand isn't merging with the spoon and he has one button on his collar instead of two. Might have taken longer but the image looks more natural which is kind of nuts.

@b.ambrozio 4 ай бұрын

Well, why we don't have it on AWS, or GCP? I'm really looking forward to seeing it.

@dmoneyballa 10 ай бұрын

where do you find the model used? I can't find where it is in hugging face. icantbeliveitsnotphotography safe tensors that is.

@wargamingrefugee9065 10 ай бұрын

Maybe this, Google: civitai ICBINP - "I Can't Believe It's Not Photography". I'm downloading it now. Best of luck.

@dholzric1 10 ай бұрын

Is there any way to get the new version of rocm to work with the mi25?

@apefu 10 ай бұрын

This some guuud video!

@Icureditwithmybrain 10 ай бұрын

Will ROCm permit me to leverage my AMD 7900 XTX for accelerating the locally executing personal AI LLM on my PC? Presently, it operates on my CPU, causing sluggish responses from the LLM.

@Timer5Tim 10 ай бұрын

As nice as it is and as cool as it is, I expect ROCm for windows and Half Life 3 to come out on the same day.....

@VFPn96kQT 10 ай бұрын

Hopefully SYCL will abstract platform specific APIs like ROCm/CUDA etc.

@mytech6779 10 ай бұрын

I used to think that but realized I'll grow grey waiting on a decent implementation. SYCL seems to be stuck in some quasi-propriatary limbo with a company that won't or can't make it widely availible.

@VFPn96kQT 10 ай бұрын

@@mytech6779 The most popular Sycl implementations are #OpenSYCL and #DPC++ . Both are open-sourced and work on many different architectures. What do you mean - "stuck in quasi-propriatary limbo with a company" ?

@sinom 10 ай бұрын

I was waiting for this video since the teardown came out

@SlinkyBass0815 3 ай бұрын

Hi, I would like to get started with ML and currently do have 2 offers for graphics card. RX 6800 16 GB and RTX 4060 8 GB Do you know if the 6800 would be suitable for getting started or is it better to use the 4060? Thank you in advance!

@SamGib 10 ай бұрын

If AMD wants to get popular, they need to support their consumer grade GPU in ROCm. And also the used market.

@CattoRayTube 10 ай бұрын

Big fan of Evelon Techs

@cromefire_ 10 ай бұрын

One big problem for Google was that you only get full TPUs in Google Cloud, otherwise it'd be pretty different.

@callowaysutton 10 ай бұрын

Did you get to test out running LLMs on these GPUs? I'd be curious how many tokens per second these bad boys can push out, especially since it seems like LLMs are going to be a main point of interest for AI companies for at least the next 1-3 years.

@shieldtablet942 10 ай бұрын

AMD keeps dropping old GPUs in ROCm. RDNA has been ignored forever, I not even OpenCL worked at launch with regular drivers. So there will be little uptake when Nvidia has still something that performs ok at the lower end. Gaudi 2 is also looking OK and Intel seems committed to have the software running on potatoes.

@PramitBiswas 10 ай бұрын

Open standards for ML (read TF) kernel API will help massively to achieve cross-hardware support.

@samghost13 10 ай бұрын

could you use those AI parts on Ryzen? I think it is a Notebook CPU

@jp-ny2pd 10 ай бұрын

I always spun that technical difference as a "One is a more mature, but less complete offering." So then it became a question of what is good enough for their needs.

@Fractal_32 10 ай бұрын

I’m glad to be an AMD shareholder, although I guess I might grab a few more shares just in case. (My AMD shares have made a killing so far especially off this AI hype bubble.)

@doppelkloppe 10 ай бұрын

Are the differences in the images really due to different precision levels in the hardware or is it (also partly) due to limited determinism and reproducibility? After all you're not guaranteed to get the same image twice, even when using the same seed and HW.

@vsz-z2428 10 ай бұрын

thoughts on opensycl?

@skilletpan5674 10 ай бұрын

There is a fork of automatic that supports AMD. It's in the main project readme or a google. It seems they randomly decided to drop support for some older cards a few months ago (rocm). Rx 5xx isn't supported and I think vega was also dropped.

@MrMaximiliansa 10 ай бұрын

Very interesting! Do you know why Stable Diffusion seems to use so much more VRAM on the MI210 than on the A100?

@Level1Techs 10 ай бұрын

Maybe related to the accuracy stuff? I'm not sure tbh

@aacasd 10 ай бұрын

any benchmarks with AMD Ryzen AI?

@EvanBurnetteMusic 10 ай бұрын

Would love a better explanation for why the math is different. Could be that floating point math is not commutative. That is A * B does not equal B * A. Optimizing compilers sometimes break the order of operations in the name of speed.

@Level1Techs 10 ай бұрын

developer.nvidia.com/blog/tensor-cores-mixed-precision-scientific-computing/ mixed precision instead of full fat fp64. Usually the mantissa is not as many bits. Is why fp64 is a diff compute rate than "fp64" for ai

@EvanBurnetteMusic 10 ай бұрын

@@Level1Techs My first thought was that the AMD card was using f32 instead of bfloat16 but I googled and it looks like bfloat16 has been supported since MI100. Perhaps the port isn't using the bfloat16 yet?

@VegetableJuiceFTW 10 ай бұрын

LLMs, please next!

@wecharg 9 ай бұрын

Best in the world at what he does ^

@DSDSDS1235 10 ай бұрын

to be honest, you suggested that rocm went from can't train shit to can't train shit, which is what nvidia is specialises in. there are more inference startups dying each day than mi200s and mi300s combined shipped that day, and every vendor is coming up with their own inference chip. why would aws offer mi200 or mi300 when they can offer inf1 of their own and can abstract any software difference under ml frameworks? and if they do, why would anyone use that instead of inf1, or better yet, building their own?

@stuartlunsford7556 10 ай бұрын

AMD's FP64 cores are great, but they still need more dedicated AI silicon, preferably integrated on the same package.

@anarekist 10 ай бұрын

Aw was hoping to use rocm my 6800xt

@leucome 10 ай бұрын

Try it... I bet it will work. My 6700xt and 7900xt work fine with ROCm. SO I guess that the 6800xt will work too.

@Cadambank 2 ай бұрын

With the new release of ROCm 6.0 can we revisit this topic?

@dearheart2 10 ай бұрын

I am in AL and never have access to the newest HW. Damn ...

@mr.selfimprovement3241 10 ай бұрын

......I will never look at Danny DaVito the same again. 😱😳😂

@floodo1 10 ай бұрын

fascinating

@Artificial-Insanity 9 ай бұрын

The differences in the images stem from you using an ancestral sampler, not from the GPU you're using.

@SirMo 10 ай бұрын

Open Source > Proprietary Vendor Lock-ins

@zen.mn. 10 ай бұрын

"I can't believe it's not CUDA" dead

@WhhhhhhjuuuuuH 10 ай бұрын

This is really interesting I was to know about how a 4090 vs a 7900XTX compares for these workflows. I know both are consumer products but I feel at the top end the line is blurred.

@shrek22 8 ай бұрын

Will w7900 with 48gb compare MI210?

@ATrollAssNigga 10 ай бұрын

As a 7940hs @ 90w owner i wonder how the built in ai processing compares to that m.2 card. I need to test it.

@zherkohler4188 10 ай бұрын

Are you sure that the visual differences are because of the different hardware? Is Xformers disabled? I think it should be disabled for a test like this. I think it would explain the visual differences.

@Verpal 10 ай бұрын

Viable yes, mass market ready...not really. AMD best bet is grow from high end HPC, those people have budget to deal with BS, and slowly when other see these initial work getting done, cost of adopting ROCm will decrease by time.

@willz81 10 ай бұрын

Does ROCm work with Radeon 7900 series cards now?

@leucome 10 ай бұрын

Yes... I use a 7900xt+ROCm for generating image with A1111.

@sinom 10 ай бұрын

Nvidia going the inaccurate but faster route has always been a problem for AMD. Because Nvidia is the market leader most software actually expects the inaccuracies in implementations of standards that Nvidia has which will lead to the software not working on the technically more accurate implementations on AMD (or even Intel). So to the consumer that will then mean software doesn't work properly on AMD and at the same time runs slower. Before the recent rewrite this also was a problem for OpenGL, some versions of DX etc.

@sayemprodhanananta144 10 ай бұрын

training performance...?

@stevenwest1494 10 ай бұрын

Please please Wendell, use your mighty powers and shake down the answers from above when ROCm Windows support is coming around. I mean it'll actually bring more value to AMD's lackluster RDNA 3 so far!

@MrBillythefisherman 10 ай бұрын

Where is a Microsoft DirectX style layer that sits on top of the GPUs and makes ML vendor agnostic (even if it makes it OS dependent)? If you dont like the OS specific DirectX API then swap in Vulkan API. Ive heard of DirectCompute and OpenCL but they dont seem to have gained traction - why? Also why is ROCm needed when you have those APIs - what is it that makes CUDA compete against all of the above?

@WiihawkPL 9 ай бұрын

working in opengl for a long time i've come to sum it up as nvidia playing it fast and loose and amd being more accurate. and then there's mesa, which is as close to a reference implementation as you'll getp

@Zoragna 10 ай бұрын

Non-"PhD students at Oak Ridge" I love that

@bartios 10 ай бұрын

Hi Wendell, have you been following the Tinygrad stuff and their troubles with ROCm at all? They look like they have some real work™ they'd like to be able to use AMD for in ML so I think it would be interesting for you to check out.

@Level1Techs 10 ай бұрын

Someone didn't watch to the end of the video ;)

@bartios 10 ай бұрын

@@Level1Techs whoops sorry, don't have the time to watch rn and I know the best time to get an answer is in the first couple hours so I did a stupid

@paulwais9219 10 ай бұрын

the demo is for inference, but training is key advantage to nvidia. need to get compute cards at gamer card scale in order for that software support to level out. that's why Ponte Vecchio and TPUs are DOA consumer products. but let's supposed AMD does catch up for the desktop. for mobile, apple and google and Samsung own their own stacks. for robotics, nvidia already has jetson. the market beyond the desktop would need to be big for AMD to really be able to invest and nail AI

@eddietoro2682 10 ай бұрын

Canme here for the tech, stayed for the Danny DeVito AI memes

@grtitann7425 10 ай бұрын

Yet the so called Linux enthusiast at the phoronix forums will continue trashing AMD, the company that embraces open source. Go figure.

@marcogenovesi8570 10 ай бұрын

just posting on phoronix does not make them linux enthusiasts or even sane. There are quite a few that have very strange beliefs about hardware and Linux

@SamGib 10 ай бұрын

Unless Google sell TPU for enterprise to host themselves, I don't think there will be any large scale adoption to use in consumer products. See, OpenAI trained their model on GPU, best to assume that's Nvidia hardware.

@sailorbob74133 10 ай бұрын

What'll be interesting will be MI300C - which will be all CPU chiplets and Xilinix AI chiplets - Turin-AI... MLID has a video about it. A dual socket version could have more TOPs than an H100.

@samlebon9884 10 ай бұрын

I imagined AMD would develop that kind of chip. I enen named it MI300AI. Could you provide a link to the MI300C?

@sailorbob74133 10 ай бұрын

@@samlebon9884 There's a very reliable rumor channel I've tracked for a few years called Moore's Law is Dead which spoke about the MI300C chip which is all CPU chiplets with HBM3 and a separate AMD project called Turin-AI which is a mix of Zen5 chiplets together with Xilinix AI chiplets on a single package which in a 2P config would be about as powerful as an H100.