What a cool fucking idea and brilliant explanation man. You were clear and thourough. I especially loved your implementation with C++. Really solidifies these concepts so they don't just feel like floating abstractions that you just learn and accept
@ProceuTech20 күн бұрын
My more recent video has a much better programming tutorial portion if you’re interested!
@parad0x1cal833 жыл бұрын
With this level of content, it's a matter of time before your channel blows up! Thank you for the explanation!
@charlieike84148 ай бұрын
I'm taking C++ in college and we're currently learning about arrays. Only looked this video up after an LTT video where they turned AVX off in BIOS to mess up the pc. Fate brought me here to maybe ignite a deeper passion for programming.
@ProceuTech8 ай бұрын
The other Avx video I made more recently is a much better video than this one if you want better info- appreciate the support tho!
@alvaroj950921 күн бұрын
Linus inadvertently brought me here too 😂
@salvageddoor3 жыл бұрын
Damn this should've got more views... I was just searching for AVX offset feature on KZbin and this video just came by. I'm not familiar with any kind of image processing in C++, I'm more into embedded stuff but your content has served me quite a lot of interesting knowledge! Keep it up and your channel will blow up really soon!
@LegendLength Жыл бұрын
First hit for me for avx2. Great video too.
@ponchobob7 ай бұрын
@7:33 did I miss something? returning pointers of local variables is unsafe and leads to unpredictable behavior of the program.
@ProceuTech7 ай бұрын
Yeah this video is honestly not great- check out my more recent AVx video if you want to look into the syntax more deeply! I explain it way better and without any of these goofy mistakes on my end
@treelibrarian7618 Жыл бұрын
I thought it would be worth noting that just because AVX512 instructions work on 16 floats in one instruction doesn't make them faster than AVX2 instructions in practice, since as far as I know, in desktop and laptop CPU's the avx512 instructions are limited to a single execution port in the CPU, whereas the AVX2 instructions can execute on 2 ports simultaneously (duplication of the fast add and FMA capabilities) and for simpler 256-bit vector ALU functions like and, xor, blend and integer arithmetic, there's 3 ports they can execute through for 3 instructions per clock. The biggest benefit of the AVX512 instruction set seems to be versatility, with selective operation on partial vectors via the k-registers. I believe sapphire-rapids server and workstation CPU's have 2 AVX512 execution ports though. zen4 does avx512 instructions in 2 clocks, putting each half through the same 256-bit pipeline in turn.
@ItsAkile3 жыл бұрын
This video has been in my browser for about a month+, finally watched it. pretty dank video, thanks brother I'm still getting into the groove
@ProceuTech3 жыл бұрын
Glad you enjoyed! It’s admittedly a pretty niche programming concept.
@ItsAkile3 жыл бұрын
@@ProceuTech That it is, I had it on the list of things I dont fully understand
@salvageddoor3 жыл бұрын
Just one small question: How can you return the array ret[16] in the function linear::vector_add()? It's a local variable so how can it be returned? Or am I missing something that is possible in C++?
@ProceuTech3 жыл бұрын
Let me do some coding real quickly and do some tests. I’ll get back to you in a few minutes!
@ProceuTech3 жыл бұрын
Ok so I just reran the function in order to see what was actually going on in the array. Turns out it wasn’t returning proper values! Thanks for catching that! I’m so used to working with vectors (which can be returned), and don’t have as much experience with arrays. Sorry for the confusion!
@salvageddoor3 жыл бұрын
@@ProceuTech Thanks for clearing up my doubt! At least I know that it is possible to return a local vector in C++.
@ProceuTech3 жыл бұрын
Vectors can still be processed using AVX aswell- you just have to use “_mm512_set_ps(i[0], i[1], etc., i[15]);”, which takes up more space in your program but offers identical performance!
@vytah2 жыл бұрын
@@ProceuTech With std::vectors, you can just use i.data(), which is the pointer to the internal array. As for returning AVX values, you can just return __m512 directly, or populate a std::vector via data() and return it.
@anshumandhuliya2 жыл бұрын
Very nice and gentle introduction to the topic :)
@ohmygosh61765 ай бұрын
Update. Any AMD Zen 4 and up has AVX512 support. The game held diverse 2 uses AVX
@anonymouscommentator2 жыл бұрын
Amazing video! I was interested in what AVX512 (and AVX2 in general) actually are and i found your great video explaining more than i hoped for!
@RoboticusMusic Жыл бұрын
I came here because I vaguely remember someone mentioning something that can cause a CPU to overheat insanely fast. Is there something else that can overheat a CPU even faster, or was this it?
@sean81025 ай бұрын
Well AVX is very demanding, so the CPU uses a lot of power when executing AVX heavy instructions. And of course more power = more heat. Burn in apps like Prime95 I believe use or have the option to use AVX/AVX512 during the burn in test to really push the CPU as hard as possible. As for causing a CPU to overheat. Not it should not do that if you have a stable setup.
@dagoberttrump92908 ай бұрын
what happens if you align the simd processed vector to cacheline boundaries?
@mkvalor2 жыл бұрын
I know, I'm adding to this comment section nearly two years later BUT... AVX-512 was almost certainly more than 77.5% faster than scalar. The values for the arrays were read "cold" from RAM for the AVX-512 function call, but the memory reads for that operation placed those values in the L1 data cache for the scalar loop. Benchmarking is HARD!
@ProceuTech2 жыл бұрын
Is there an explanation as to why?
@mkvalor2 жыл бұрын
@@ProceuTech The first program to load a file from disk pays a time penalty for the disk I/O operations; however, the OS then keeps as much of that file in the system RAM as possible and some of the file even resides within the fast cache of the CPU itself. The next program you run which needs to read that file will retrieve the data very quickly from the CPU cache and system RAM. So that second program doesn't pay the same time penalty for disk I/O operations.
@lupsik12 жыл бұрын
@@mkvalor I have a problem understanding what you mean by the loading from disk. When the program is loaded those values are going straight into RAM. When the variables got initialised they get pushed onto the stack. When the AVX function is called all that happens is the address of a gets copied into the RAX register, and the address of b gets copied into RDX. The exact same thing happens when we run the linear function. Are you suggesting that the page containing this tiny program gets unloaded mid-execution?
@treelibrarian7618 Жыл бұрын
For sure there's a lot wrong with the test. first, the input data is unchanging, so the compiler should optimize out the loop entirely, or maybe just the memory reads. but this would also almost completely invalidate your argument about caching - which would be valid if the test actually had a significant volume of data and was storing the result somewhere. As someone already noted, though, the compiler may well have used vector instructions for the simple loop as well - more likely with clang I think - giving the somewhat poor showing of 70% speedup. It would all depend on compiler flags for optimization level and target architecture. If it didn't optimize everything well, then there may instead be a whole lot of overhead from the function calls and extra memory reads/writes involved. If I were to write assembler code to do what is presented in this test (on multiple data) it wouldn't take 80µs on a 5Ghz CPU. afaik these CPU's are capable of 2 reads 1 add and 1 write per clock, even at 512bit, so the whole process should take < 1µs with avx512 instructions. even with scalar instructions (which still execute on the vector alu, just through 1 channel) it should have happened in 15µs - slowed from 8µs only by the scalar reads of memory. to get a more reliable result, probably a significant chunk of data, and >1000000 iterations would be needed - and likely 100's of repetitions of the whole process to account for variations of CPU load, clock frequency (OS usually keeps clocks low till something starts happening - but it takes a few ms for it to respond), interrupting operations etc. and check the disassembly to be sure of what is being executed.
@ProceuTech Жыл бұрын
@treelibrarian7618 I made an updated video with this information in it; the tests done in this video were flawed
@vinstontan95022 жыл бұрын
Excellent video! Effectively explains AVX
@opoxious159210 ай бұрын
Up to this day, i have never seen a real benefit of a game that needed avx instructions. A good example is Cyberpunk 2077. In the very beginning it would only run with cpu's with avx support. And a few months later they were also made the game run without avx support. There is not a single bit of difference with or without avx regarding graphics or performance in fps. It's a good thing, that more and more games do not require avx anymore, due to the fact that it asks for more resources and energy of your system without any visible gain in performance
@subbastionbastion21678 ай бұрын
Sorry sounds like cuda would be way faster and you can have thousands of threads at once running at the same time in higher chunks of data
@ProceuTech8 ай бұрын
I've also got a video exploring CUDA and it's syntax- it's much more well put together in my opinion than this video! :)
@realforest2 жыл бұрын
Your explanation at the end was very helpful! Me: "Why the hell would I ever use AVX instructions?" AVX: "Umm, you can skip an extra loop to transverse a vector, giving you a lot of performance if you do a lot of vector arithmetic!"
@Quancept Жыл бұрын
Very underrated video!
@MrMonkeyZMemeZ2 жыл бұрын
I too am a fan of AVX
@LegendLength Жыл бұрын
How important is volatile when coding with AVX?
@treelibrarian7618 Жыл бұрын
no more than normal. Volatile is for when something (like another thread) might possibly modify the memory of a variable without the knowledge of the current thread, so the compiler should treat it as a volatile (subject to unpredictable change) value and re-read it whenever it needs to use it, and not assume it's value will stay the same if it hasn't changed it which prevents certain compiler optimisations that would assume the value is unchanging. AVX memory reads and writes happen in a single cycle like normal register reads and writes so there's no real difference. should also be noted there are no "locked" versions of AVX instructions, so if you are trying to operate on vector data with multiple threads, you should work out some other way to prevent race conditions, like data segmentation or mutexes (preferably with lock elision since the hardware memory synchronization involved in locks/mutexes is quite slow)
@panjak3233 жыл бұрын
You should make sure your memory is aligned to 32 bits when using load function. Also chances are, the non avx version got auto-vectorized by the compiler to use avx/2.
@KristianDjukic Жыл бұрын
thx for excelent video !
@naveediqbal56002 жыл бұрын
is there a way to remove AVX instruction from a game
@ProceuTech2 жыл бұрын
Some implementations have a toggle where you can switch between AVX and “Non-AVX” algorithms. Not all of them have this though :(
@juanme5552 жыл бұрын
i7 10700F vs 11700F , which one is better at AVX512 ???
@ProceuTech2 жыл бұрын
The 11700F! The 10700F only features AVX2!
@juanme5552 жыл бұрын
@@ProceuTech Is 11700F the same as 11700? I know the F doesnt have iGPU , but does the iGPU help with AVX512???
@ProceuTech2 жыл бұрын
No, AVX-512 units are in the CPU cores themselves!
@Psythik4 ай бұрын
Interesting video, but I only understood about 25% of it. What's a "vector element"? "Scalar looping function"? "ISA"? Last time I heard the term "ISA" was in the early 90s, and it referred to a slot on the motherboard. "Segfault"? "bFloat extension"? This is a lot of terms to learn for a layman like me who just wants to know what AVX-512 is, and why only AMD CPUs have it now. By the time you got to the C++ part of the video, I had to stop watching entirely cause anything coding-related all goes right over my head.
@ProceuTech4 ай бұрын
My more recent AVX video goes over it in much more detail and explain it in a way that’s more manageable for new comers, at least I personally feel. I put more effort into the more recent video specifically to help explain what it is and expand it to a greater audience.
@Psythik4 ай бұрын
@@ProceuTech I'll check it out; thanks.
@MrRayopt6 ай бұрын
Where is the beginning tutorial ? This makes no sense
@ProceuTech6 ай бұрын
The more recent video I made about AVX512 (linked in the first few seconds of the video) explain the concept and program much better
@WilliamBrown-x1f3 ай бұрын
Lemuel Point
@SystemCrasher1138 ай бұрын
I still have no clue what avx does after you explained it in detail. 😂 Don't worry though, it's me, not you.
@ProceuTech8 ай бұрын
I have another AVX video that goes more in depth as to what the instruction set entails, as well as a better guide on programming for it. Might be worth a watch if you’re confused!
@SandraYoung-w7o3 ай бұрын
Waldo Pine
@LaurynBody-y5l2 ай бұрын
Edward Views
@SusanLewis-p6b3 ай бұрын
Aida Stream
@youtubeshadowbannedmylasta26292 жыл бұрын
avx hinders performance it can take things that used to work and by putting AVX into programs (even decades old) it now makes it so they no longer even launch.
@sean81025 ай бұрын
As for AVX hindering performance. I'm not a programmer. My only guess is maybe the "AVX offset" a lot of motherboards have where it downclocks the CPU by some amount when using AVX (though I'm pretty sure that can be turned off on most motherboards). As for AVX being a problem because of compatibility, I guess if you have a really old CPU. On the latest Steam hardware survey (June 2024), 97% of steam users have a PC that support AVX. From what I understand Intel and AMD started shipping CPU's with AVX support in 2011.