The Truth About The Fast Inverse Square on N64 | Prime Reacts

  Рет қаралды 121,081

ThePrimeTime

ThePrimeTime

Күн бұрын

Recorded live on twitch, GET IN
/ theprimeagen
Reviewed video: • The Truth about the Fa...
By: Kaze Emanuar | / @kazen64
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact
Kinesis Advantage 360: bit.ly/Prime-K...
Hey I am sponsored by Turso, an edge database. I think they are pretty neet. Give them a try for free and if you want you can get a decent amount off (the free tier is the best (better than planetscale or any other))
turso.tech/dee...

Пікірлер: 205
@KazeN64
@KazeN64 Жыл бұрын
The second column becomes irrelevant because all the instructions to run the entire graphics thread can be cached at the same time, so we have no cache misses = no memory used (basically)
@AlbertBalbastreMorte
@AlbertBalbastreMorte Жыл бұрын
We are not worthy
@zeckma
@zeckma Жыл бұрын
this needs to get pinned
@konberner170
@konberner170 Жыл бұрын
Bravo!
@marcelocardoso1979
@marcelocardoso1979 Жыл бұрын
I'm still baffled at how you managed to fit an entire renderer in 12kB. Absolute genius!
@SICunchained
@SICunchained Жыл бұрын
Thanks for the awesome vids man. Love watching your shit. Hope to be able to achieve your knowledge and application of math someday. ^.^
@capsey_
@capsey_ Жыл бұрын
The fact that N64 is not bounded by CPU is so funny to me. Instead of punching on Sega because 64 > 32 their marketing team could've just said "our CPU is so fast it's fucking useless"
@0-Kirby-0
@0-Kirby-0 Жыл бұрын
It also creates such a deeply fascinating problem space, where you not only optimise for binary size over cycle count, which is already unusual, but you're specifically worried about fitting as much computation as possible into a single cache-load, so you don't have to go back to ram and disturb the renderer. It feels oddly multithread-y, where you bulk-copy what you want to work on into your own little bucket, so you don't have to acquire-release as much, except it happens with the instructions themselves, not just the memory being worked on. Absolutely nuts.
@Takyodor2
@Takyodor2 Жыл бұрын
@@0-Kirby-0 Is it really unusual? Copying data from memory is a couple of orders of magnitude more expensive than your average floating-point calculation (with respect to energy and latency), so I'd expect this type of optimization to be sort of common?
@Trisslotten96
@Trisslotten96 Жыл бұрын
The same kind of applies to modern CPUs as well.
@samphunter
@samphunter Жыл бұрын
​​​@@Takyodor2it is unusual. Your cpu will usually predict what instructions it will need ahead of time and load them long before the loading slows anything down. That is why branchless is more common optimization nowdays (avoid guessing what will run next) That said, it is common for data to be optimized to fit cache. There is a video about why linked lists are almost always slow which explains this.
@kreuner11
@kreuner11 Жыл бұрын
They really flunked on the ram and texture vram, if these were made in a different more sensible architecture, the n64 might have been so much more ahead of the competition
@kuhluhOG
@kuhluhOG Жыл бұрын
This video in a nutshell: Do you still have questions? [ ] No, I understood everything. [x] No, I didn't understand enough to even begin phrasing questions.
@bertram-raven
@bertram-raven Жыл бұрын
This reminds me of the developer who optimised drum-memory execution by scattering the instructions across the drum in such a way as the read-head was exactly over the next instruction just as it was required. He also added jump instructions which went nowhere so the motion of the drum would flatten loops and reuse code in a way which "increased" drum capacity - effectively storing 6.2KB of code in 6KB. This is next-level optimisation.
@Margen67
@Margen67 Күн бұрын
birb
@hardbrocklife2
@hardbrocklife2 12 сағат бұрын
​@@Margen67reddit comment
@triplebog
@triplebog Жыл бұрын
As a graphics engineer, this is one of the only primeagen videos that actually makes sense
@khhnator
@khhnator Жыл бұрын
ikr! im a game/systems dev and most of the time i completely go "what is this react stuff" to primagen stuff
@ramy701
@ramy701 Жыл бұрын
can you recommend any resources / books for someone wanting to get into graphics engineering ? :)
@headecas
@headecas 11 ай бұрын
​@@ramy701 dot
@Entropy67
@Entropy67 Жыл бұрын
7:50 he rewrote it so that it would all fit on a single page, its tiny, the CPU will never remove it from cache. That means that we never cache miss. I think
@isodoubIet
@isodoubIet Жыл бұрын
The reason the number of instructions becomes irrelevant is that, as far as the cpu running is concerned, what matters is the number of cycles. The number of instructions matters _only_ inasmuch as you're having to stream all that stuff from memory into the instruction cache, since the size of the machine code is roughly proportional to the number of instructions (this is not necessarily the case for monstrosities like x86 architecture, but it is the case for the n64 which has a MIPS processor with a very typical RISC pipeline). Since the entire renderer code fits into the cache anyway, you can have as many instructions as you like, and all you're comparing is how many cycles are being spent on running the instructions themselves.
@jeremylakeman
@jeremylakeman Жыл бұрын
And that's because the primary thing he's trying to optimise here is memory bus usage. Since the "GPU" needs to use it for texture mapping (etc) to improve frame rates.
@magfal
@magfal Жыл бұрын
11:52 the aggregate improvements he's made would have ensured a release of Mario 64 2nd adventure or something if Nintendo mysteriously received them back in the day. A 3X improvement would have enabled enough of a difference in capability to justify it. Also check out the Portal 64 project.
@NoNameAtAll2
@NoNameAtAll2 4 ай бұрын
portal64 failed the anonymity and got lawyer-ed
@magfal
@magfal 4 ай бұрын
@@NoNameAtAll2 I would love for someone like the GOG team to use their experience and contacts to broker a deal. It's not like it's existence is diminishing value for any of the parties involved, or that clearing things up contractually would take long if all parties are positive and constructive. It would be free advertisement for all involved parties, if they could all agree to a narrow licence that would allow it to move forward as a non-commercial project (labling the patreon funding the development as unrelated subscription revenue from the development streaming and videos).
@TheTrienco
@TheTrienco Жыл бұрын
Not thought about 20us in a long time? That depends on what you're working on. Granted, 20us over a full frame probably won't matter much, but for a function you call a lot, that can add up quickly. If you think about it: for 100fps, your frame budget is 10ms.. for AI, physics, game logic, rendering and whatever else needs to be done. That's only 500 of those 20us.
@someonespotatohmm9513
@someonespotatohmm9513 Жыл бұрын
In the micro electronics industry it is also quite common to see u or n as prefix.
@prgnify
@prgnify Жыл бұрын
@@someonespotatohmm9513 I've done a bit of embedded work, when he said "I don't think any of us has thought of 20us in a long time" I felt invisible.
@TrueHolarctic
@TrueHolarctic Жыл бұрын
​@prgnify tbh we dont think about 20us a lot. Thats just one eternity. That sweet sweet 100MHz clock
@prgnify
@prgnify Жыл бұрын
@@TrueHolarctic Yes, the first thing I though of was a joke from the embedded programming subreddit, that played with the concept of 20us being an eternity to us and completely indifferent for most others
@MikkoRantalainen
@MikkoRantalainen 8 ай бұрын
20 µs is quite important time even for web programming if you're computing JSON response that consist of 500 elements. If you take 20 µs per element, it will be 10 ms for the whole list already.
@Tabu11211
@Tabu11211 Жыл бұрын
My favorite programer streamer covering my favorite niche coder?! What a good day!
@Mempler
@Mempler Жыл бұрын
"The Mario64 command is my favourite linux command"
@halle0327
@halle0327 7 ай бұрын
glad I’m not the only one that says this
@whamer100
@whamer100 Жыл бұрын
god i LOVE Kaze, his contributions to the sm64 romhacking community is next level
@zebraforceone
@zebraforceone Жыл бұрын
I'm pretty sure that second column becomes irrelevant because the instruction set and it's operating memory is stored on the cache, as opposed to operations across the memory bus
@zweitekonto9654
@zweitekonto9654 Жыл бұрын
his brain was in so much shambles, he forgot the outro.
@chhihihi
@chhihihi Жыл бұрын
Caching and Cache misses are an easy concept to understand and will dramatically increase the speed of your code. I strongly recommend you check it out. Because there's a ton more distance between the cpu and ram, cache is blazingly fast and keeping everything on cache greatly reduces the amount of cycles (time) it takes to complete a set of instructions and what keeps you in cache is having few enough instructions and data to fit. Think of it as a primary resource and ram as a back up.
@entertain8648
@entertain8648 Жыл бұрын
what about apple chips that has ram close to cpu?
@snooks5607
@snooks5607 Жыл бұрын
@@entertain8648 still order(s) of magnitude difference, even apple's unified memory can have 100-400ns latency for a fetch depending on multiple things like TLB lookups, where as on-die L1 cache access has cost just couple cycles ever since they were introduced to PCs with 486DX in 1989 (on a modern ~4GHz machine about 1ns)
@Takyodor2
@Takyodor2 Жыл бұрын
@@entertain8648 It's still not _in_ the CPU. Analogy: going to the grocery store in your own town is faster than the one in the next closest town, but both are many times slower than accessing your fridge. (The hard drive would be on the moon in this analogy)
@entertain8648
@entertain8648 Жыл бұрын
@@Takyodor2 well I understand what kind of difference is that What I am asking if anyone knows how apples disign saves the situation
@Takyodor2
@Takyodor2 Жыл бұрын
@@entertain8648 It doesn't save the situation, just makes the huge performance hit slightly less huge.
@glauco_rocha
@glauco_rocha Жыл бұрын
as rob pike said: never tune for performance until you have measured, and even so, don't do it until the code you're measuring OVERWHELMS the rest of the code.
@MikkoRantalainen
@MikkoRantalainen 8 ай бұрын
I mostly agree but if you make *everything* super slow then your code ends up really bad until one single part overwhelms all the rest. I think it's better to decide the required latency at the start and then do whatever it takes to keep the final performance at least as good as your decided latency. For a game, the decided latency could be 1/30 or 1/60 seconds to match typical displays. For a web service, the target latency could be 50 or 100 ms. Then you know when you have to start optimizing: whenever your existing code cannot meet the latency requirement. What to optimize? The part that overwhelms everything else at that time.
@SianaGearz
@SianaGearz Жыл бұрын
N64 is a unified memory gal, there is no VRAM, there's just RAM. The whole RAM is connected to the GPU (semi-custom chip) and when the CPU (off the shelf chip) asks for something from RAM over the CPU bus, the GPU has to stop whatever it's doing to serve that memory request for the CPU. This is why verbose code that consists of a ton of individually fast machine instructions is generally BAD, since instruction loads use up your RAM bandwidth and slow down the GPU rendering. Except when you have a pass in your rendering or other code that munches a lot of data while using all the same code, and all the code is laid out contiguous across a handful pages of memory (as to avoid cache lines fighting for tag space) and all those pages of code end up stored in the CPU's onboard cache for all but the first iteration, then there's no memory hits. You'd call this cache L1 today, but L2 or L3 doesn't exist so it's just Cache. Then you don't care about how verbose the code is in each function, you just need the whole code for that whole pass to fit. So for code that is more sprawling and expected to be uncached, such as gameplay code, you want it to be as terse as possible; while for code that can be formulated as an L1-resident pass or kernel, you can actually make it verbose in places. I myself program on another classic device which also only has L1, so i'm learning from Kaze a lot. I had also been cursed by a very sharp and strict numerics professor 20 years ago, it took me YEARS of hard work to pass that exam, so well i suppose i'm getting a lot more than just hype from floating point trickery, even though i'm more rusty than i'd like to be.
@MikkoRantalainen
@MikkoRantalainen 8 ай бұрын
One could also explain N64 as having no RAM at all but being able to run CPU instructions from VRAM if GPU is stopped while that's happening. And there's 16 kB cache in the CPU itself so as long as you can keep everything in that space, the GPU doesn't need to be stopped.
@MrAbrazildo
@MrAbrazildo Жыл бұрын
7:08, in old hardware, the engine instructions/data didn't fit entirely on the cache. So, depending on how many instructions an action takes, CPU had to seek the RAM, which uses to be 100x slower (maybe less in a console). On modern hardware, all instructions/data are in the cache, which has much more memory than they require, for an old game. However, RAM is still used even nowadays, for multimedia stuff: images, video, audio, textures and other more than 64 KB sized. The optimization for these large things targets to load part of the RAM on the VRAM (GPU cache memory), in a moment the user doesn't care, like a loading scene - i.e. God of War's Kratos passing through some rocks. Sometimes this is used for loading from files to RAM too. 11:58, but he is doing it for modern hardware, isn't he? The video's goal is just to explain why Quake's alg. is not meant for all cases. 13:00, the sad truth is that these pointer transformations are UB (undefined behavour). That's why the guy commented it as "evil": he just wanted to get his job done, leaving the comment for the future masochist who will deal with the potential nasty bug. UB means the operation is not standardized. So, the app may someday start crashing or giving wrong values (out of nowhere!), if any thing change from the original setup: hardware, OS, any imaginable protocol that interacts to the game. Not even old C had an expected action for that, as long as I heard. 13:52, in math, a minus exponent means that the number is divided. So, x*0.5 == x / 2 == x*2^(-1). Instead of multiplying the whole number, it's possible to change its exponent, by sum or subtraction, which are faster operations.
@isodoubIet
@isodoubIet Жыл бұрын
No he runs this stuff on original hardware
@isodoubIet
@isodoubIet Жыл бұрын
As for it being UB, technically it is, but so many people do it that I don't think most compilers take advantage of it. C++ has only recently added a non-UB way to do it in C++20 with std::bit_cast. Before, the only way to do it was with memcpy, which would defeat the purpose.
@tannerted
@tannerted Жыл бұрын
Why are these pointer transformations UB? Casting to a different type doesn't change anything about the underlying bit representation. So cast and then shift and then cast back is just fine and deterministic according to the C spec. Am I wrong? (I might be; please teach me if I have something wrong) I feel like this is done all the time in embedded systems and OSs
@MrAbrazildo
@MrAbrazildo Жыл бұрын
​@@isodoubIet- Are you saying that Nintendo didn't tried to pack the data into the cache? This seems absurd. I can't imagine a game being made that way. It sounds so amateur. - I've forgot about this bit_cast. I need to study C++20 deeply. - Because memcpy would be slower?
@MrAbrazildo
@MrAbrazildo Жыл бұрын
​@@tannertedI heard this on a presentation. I don't read standards. But I also heard that C unions are now well defined. Since they were used for type punning, maybe this is now valid C - _UB in C++, because there are other resources, as this std::bit_cast_ .
@Kavukamari
@Kavukamari Жыл бұрын
getting some much needed neck nodding exercises in on this video, Prime is gunna be swole soon
@bozoc2572
@bozoc2572 Жыл бұрын
He's clueless
@tornoutlaw
@tornoutlaw Жыл бұрын
20ms per frame...does this mean an N64 could run Mario64 in 60fps?
@traister101
@traister101 Жыл бұрын
Yep. Kaze has a rom that does 60fps on a native N64.
@chainingsolid
@chainingsolid Жыл бұрын
1000/60 = ~16ms so almost..
@MrAbrazildo
@MrAbrazildo Жыл бұрын
​@@chainingsolid1000 ms / 20 ms = 50 FPS.
@binguloid
@binguloid Жыл бұрын
guys he said microseconds not miliseconds
@MrAbrazildo
@MrAbrazildo Жыл бұрын
​@@binguloidYeah, and that was a performance earn, not the entire time for the frame.
@Kiyuja
@Kiyuja Жыл бұрын
yeah Kaze always does great content
@alexaneals8194
@alexaneals8194 Жыл бұрын
The problem when you optimize for a CPU or GPU is that you should comment for which CPU and GPU was optimized for. Later versions may break your optimization or may offer options that are far better. If the code isn't commented then when someone goes in to make changes, they don't know whether the optimization still applies or if it should be changed without spending a few cycles trying to figure out why the optimization was done. The same principle applies for higher level optimizations. And a note to my past myself this includes personal projects.
@Minty_Meeo
@Minty_Meeo Жыл бұрын
Sure, but Kaze's SM64 codebase basically only works on MIPS-GCC and only on N64 with all of the inline asm and illegal code it uses to go fast. He is way beyond the point of cross-compiling his mod.
@starleaf-luna
@starleaf-luna 4 ай бұрын
hahaha! the comment about "your site runs slower on a faster processor than Mario 64" does not apply, because my website is just CSS, HTML and barely any JS just to not have to split the navigation bar across multiple files to make it easier to update! (there's probably a better way to do that, but still, most of it is just CSS and HTML. the JS runs just once on page load.)
@ShadowZero27
@ShadowZero27 Күн бұрын
until you realize that loading times have been lagging for 25 years now
@jerichaux9219
@jerichaux9219 Жыл бұрын
I see you and I both have mastered the ancient knowledge of almost-kind-of-remembering-floating-point-formats-but-not-really.
@madmax2069
@madmax2069 Жыл бұрын
Kaze just shows how much potential game consoles (in this case the N64) has that was never reached in their active lifetime (active meaning still manufactured and sold and supported by the manufacturer). This is something that the modding community and homebrew community are good for, figuring out every little aspect of the hardware inside a game console, making better SDKs, fixing bugs and issues in the games, optimizing code for said game, heck just look at the person making portal for the n64.
@Emil_96
@Emil_96 Жыл бұрын
Ocarina of Time is just pure nostalgia and it'll always have a spot in my heart
@Tobsson
@Tobsson Жыл бұрын
I watched so many videos of him. I'm not even 0.0000000000000001% smarter or more knowledgable since then, but it sure sounds cool.
@JohnSmith-ox3gy
@JohnSmith-ox3gy 11 ай бұрын
"I like your funny words, magic man." -JFK
@keyboard_g
@keyboard_g Жыл бұрын
Outside of the tiny texture cache and the ram latency (high bandwidth, bad latency), the N64 was a computational beast for the time. Nothing else was close.
@skilz8098
@skilz8098 Жыл бұрын
I don't know about that. The PS1 and the Sega Dreamcast were both impressive machine architectures too.
@jc_dogen
@jc_dogen Жыл бұрын
​@@skilz8098Dreamcast was the next generation and N64 runs at 3x the clock speed as the PS1
@skilz8098
@skilz8098 Жыл бұрын
@@jc_dogen Yeah, but the PS1 was a breakthrough in its day. It wasn't the first "CD-ROM" type console because there was the Sega CD "eh" then the Sega Saturn which was okay around the same time as the PS1. Panasonic even had their own, I think it was the 3DO but it didn't go over so well. The PS1 with its capabilities and affordable cost plus all of the available game titles made it very successful. Here's the 90s in a nutshell *Sega MegaDrive/Genesis - 88 (cart) Commodore 64 - 90 (cart) Neo Geo - 90-91 (cart) SNES - 90 (cart) Philips CD-i - 91 (disc) Sega CD - 91 (disc addon to the mega drive) 3DO - 93 (disc) Jaguar - 93 (cart/disc addon in 95) Sega 32X (cart-addon to the Genesis) Neo Geo CD - 94 (disc) Sega Saturn - 94-95 (disc) PS1 - 94-95 (disc) N64 - 96 (cart) 3DO M2 - 98 (disc) Dreamcast 98-99 (disc) *PS2 - 2000 (CD/DVD) Some of them were good, some were okay, some were flops. Some were great. For me, the SNES, the PS2 were of some of the best consoles. The Genesis was decent, the PS1 was very good. The N64 and Dreamcast were both good. Some of them I never played on such as the 3DO, CD-i, Neo Geo or the Commodore. The Sega Saturn was okay, it had potential but was inferior to other consoles. The Sega CD was a nice concept but didn't go over to well, and the 32X was a major bust. And it was around this time that PC Gaming started to become a commonplace thing too. They were definitely the Good old days. I kinda jumped ship from SNES to PS when Final Fantasy dropped Nintendo and migrated to the PS1. And then games such as Resident Evil 1 & 2, Silent Hill, Parasite Eve, Castlevania: Symphony of the Night, Tony Hawk, Cool Boarders, etc... The PS1 then later the PS2 just took over. The SNES was a very popular and long favorite with many great titles... but eventually the PS2 became the champ. I still have both my SNES and my PS2. I don't have my original Atari, NES, Genesis, or PS1 anymore but I still have my PS1 games and I use my PS3 for that. I stopped getting into the "console" fad after the PS2 and only picked up the PS3 used about 2 years ago just for a select few titles and to be able to use my PS1 games. I still have Diablo for PS1. The only game I'm really wanting or missing for my PS1 collection is Ogre Battle. And as for Dreamcast being next Gen... kind of. It only came out 2-3 years after N64 just before the PS2 released. So the N64 had a head start on them. And it wasn't until 2001 until Nintendo came back with the Gamecube. So that's basically the 90s in a nutshell. Well from about 1988 - 2002. There were a few other consoles but not really worth mentioning as some of the were really obscure or niche console markets. But yeah as for performance and being a console with 3D graphics on a Cartridge in 64 bit, yes the N64 was a very nice machine. Mario Kart 64, Bomberman 64, FZero was decent but wasn't as good as FZero on the SNES, yet the FZero version for the Gamecube was great. Then you had Metroid. I could go on... What can I say, I've been gaming since Warlords, Pitfall, Circus Circus, Space Invaders, Asteroids, Missile Command, Breakout, Pacman, and much more... Been at it since the early 80s.
@jc_dogen
@jc_dogen Жыл бұрын
@@skilz8098 text dump bro. lmao but I agree, the dreamcast was in-between gens, though it was still a very big jump. I would also say the n64 was (mostly) much more powerful than the ps1. Some aspects made this less obvious (cartridges, very small 4K texture memory), and the hardware had some serious problems that ate into it's performance (rambus latency and bandwidth problems) that were probably just mistakes. But, at the end of the day, performance sapping features like sub-pixel accurate rendering, perspective correct texturing, texture filtering, and z-buffering were only possible because of the extra power it had.
@skilz8098
@skilz8098 Жыл бұрын
@@jc_dogen Well as the saying goes a picture is worth a 1,000 words, and I have about 1,000 pictures in mind, lol...
@nonetrix3066
@nonetrix3066 Жыл бұрын
I think the cooler thing is that many of that he mentions in other videos where not known at the time the game was made, so we can take more advantage of the hardware today then we could have ever dreamed in the 90s
@tacokoneko
@tacokoneko Жыл бұрын
i hope some that some day kaze can improve the Linux kernel for N64 because right now it could already theoretically run any Linux program .. as long as it fits in 8 MB of RAM alongside everything else. It would be incredible if he could install a web server and then we can really build react for N64
@blarghblargh
@blarghblargh Жыл бұрын
n64 already runs doom. sometimes you gotta stop and ask why :D
@xdanic3
@xdanic3 Жыл бұрын
FINALLY! I've been waiting for you to react to kaze for a while now! And after this we could have a kaze reacts to ThePrimeTime, but you reacting first was more expected, now I gotta watch the video 👀
@JimWitschey
@JimWitschey 8 ай бұрын
Kaze Emanuar has maybe the strangest career of any programmer alive
@BeamMonsterZeus
@BeamMonsterZeus Жыл бұрын
As an amateur astrophysicist, I always knew the N64 was a universal anomaly, but not for the reasons discovered here
@CalamityStarForce
@CalamityStarForce Күн бұрын
If you think people aren't gonna notice 3% less girth, you haven't read The Princess and the Peen.
@jelliott3604
@jelliott3604 11 ай бұрын
It's that there is a threehalves label for .. 3 halves .. but the offset just gets a WTF!
@alfiegordon9013
@alfiegordon9013 Жыл бұрын
Lets all love Kaze
@noxlupi1
@noxlupi1 Жыл бұрын
The Fast Inverse Square Root, was ahead of its time, a long time ago.
@FastVideoProdInNash
@FastVideoProdInNash Жыл бұрын
2:24 😂😂😂😂😂😂 That was crazy.
@yxyk-fr
@yxyk-fr Жыл бұрын
There's Newton algorithm (already devised by Babylonians before). And then there's Newton-Raphson iterations that converges like crazy...
@n00blamer
@n00blamer Жыл бұрын
You guys... the Devil's in the details but the underlying maths is quite simple: reciprocal is negation of the exponent, 1/(n ^ m) is n ^ -m. sqrt(n ^ m) is n ^ (m / 2), and these can be combined into: n ^ (m * -0.5) == 1.0 / sqrt(n ^ m). The code gets a good initial value and Newton-Raphson iterations converge.
@ThatJay283
@ThatJay283 11 ай бұрын
5:10 "you should be ashamed of yourself" - yup. back before i knew any better, i started a react app with a backend in nodejs, express, and typescript. i thought it'd be the "easy way" and instead i just ended up with a nightmare of react components and pointless middle steps that are too late to leave out now. and on top of all of that, it also runs like shit.
@paulseitz5749
@paulseitz5749 Күн бұрын
Yep, Sorry I am a fool and I didn't know how to optimize image size. Why am I sending a 2k image when it only ever takes up an 1/8th of the screen. Why am I loading so many resources during load and not after. Why did I make all the icons high quality pngs and not simple svgs. So many whys but that's how we learn sometimes.
@rayanmazouz9542
@rayanmazouz9542 Жыл бұрын
apparently Mario 64 wasn't even compiled with optimization enabled
@RuySenpai
@RuySenpai Жыл бұрын
This is a myth and Kaze himself has a video on it.
@felixjohnson3874
@felixjohnson3874 Жыл бұрын
​@@RuySenpaiwell it's not so, if he does and it says what you are, it's wrong. I'd love to confirm that but I can't find the video your talking about so I can't. We literally have reverse engineered the source code and, using the compiler they were at the time, we can generate byte-for-byte the same code... when optimizations are disabled. The PAL version IS optimized, but it's already running about 16% slower anyway.
@RuySenpai
@RuySenpai Жыл бұрын
@@felixjohnson3874 got it confused, it wasn't kaze it was modern vintage gamer who made the video. It was my bad, it isn't a myth that usa sm64 had compile optimizations off, but it's overstated how significant it is.
@Bobbias
@Bobbias Жыл бұрын
@@RuySenpai That's primarily because compilers of the time didn't have great optimizations to begin with. Even without platform specific optimizations, modern compilers can do far more optimization than the compilers back in the day could, so whether or not it was optimized was not as big an issue back then as it would be now. That said, even with modern optimizations, that's not going to magically buy you a ton more performance. Kaze's massive performance improvements come from the fact that he's basically rewritten the entire game engine (with some of the only untouched code being the actual movement physics and such) with performance in mind.
@jc_dogen
@jc_dogen Жыл бұрын
​@@RuySenpaino he made a video explaining why it was reasonable for the computer optimizations to be turned off
@dsdy1205
@dsdy1205 9 ай бұрын
10:33 I nearly spat out my drink
@scottbuffington5964
@scottbuffington5964 2 ай бұрын
Floating points are easy to understand. Just say yes, may I have another until the end of time; or until you don't hear a response. =]
@csabaczcsomps7655
@csabaczcsomps7655 Жыл бұрын
Know you data and know we're dancing.
@jorge28624
@jorge28624 Жыл бұрын
2:54 we have come full circle lol
@lukasoliverleo3730
@lukasoliverleo3730 Жыл бұрын
I never expected anyone to react to Kaze
Жыл бұрын
I remember a qnx kernel fits in the cpu cache, weird times, isn’t?
@antoniogarest7516
@antoniogarest7516 Жыл бұрын
Prime and Kaze Subscribed
@dominikmuller4477
@dominikmuller4477 6 ай бұрын
to be fair, using the fast inverse square root and then inverting it is a dumb way to go about it. A fairer comparison would be to make an analogous fast square root algorithm. The same floating point magic that turns 1/sqrt(x) into (-1/2)* (int x) + magic number would also support turning sqrt(x) into (1/2)* (int x) + different magic number.
@v2ike6udik
@v2ike6udik Жыл бұрын
2:44 MAN DOWN, MAN DOWN!
@afhostie
@afhostie Күн бұрын
3:00 pre-reacted
@TheHackysack
@TheHackysack Жыл бұрын
shoutouts to simpleflips
@i3looi2
@i3looi2 6 ай бұрын
When that guy decides to invent another JS Framework after I just settled on my JS Framework of choice "BgRcky: fuck you"
@adroharv5140
@adroharv5140 2 күн бұрын
Link to the Past is definitely a an example of perfection. Ocarina is glorious and I've also completed this wonderful game just so many times now but not quite perfect. Majora's Mask is understandably seen as better by some including myself at times but they are quite different games to be fair
@blipojones2114
@blipojones2114 Жыл бұрын
"link to the past" is hard, just started playing and am pretty hard stuck
@rodrigoqteixeira
@rodrigoqteixeira 4 ай бұрын
7:36 if code stored cache code no need read from ram if code no read from ram renderer able use ram so game faster In readable words: if the code is stored in the cache instead of the ram, then the renderer can use the ram while the cpu is computing, because now it has no need to read from ram, since it's on cache, which doesn't occupie the ram bus and in the end os faster.
@apollolux
@apollolux Жыл бұрын
Sweet, it's a Kaze reaction! :)
@stevez5134
@stevez5134 Жыл бұрын
this is the one from only 3 weeks ago??? anyways great stuff
@MikkoRantalainen
@MikkoRantalainen 8 ай бұрын
5:00 100% agreed!
@skilz8098
@skilz8098 Жыл бұрын
I have over a billion transistors that are all rated with a 0.05 nano second propagation delay. One of them is working at 0.09 nano seconds. One of my logic gates is slower than the rest and it is the source of all my bottlenecks. I want a full refund! LOL!!!!
@mfc1190
@mfc1190 Жыл бұрын
Ocarina of Time was so much better than majora’s mask like who TF said that
@billynasir3146
@billynasir3146 Жыл бұрын
Majora Mask is way more alive and open-world despite having less dungeons
@623-x7b
@623-x7b Жыл бұрын
The Ocarina of Time is better than GTA 5. GTA 5 is better than Majora's mask
@guilhermeraposo6080
@guilhermeraposo6080 Жыл бұрын
I think about how nice an extra 20us would be every time the wife and I are erm... Playing SM64
@HrHaakon
@HrHaakon 10 ай бұрын
My backend runs smooth and it has something like 150 mhz (I have a few millicpus in the cluster to play with) of Xeon time. So a lot more than the N64, but since we moved to a cloud based platform CPU time got expensive.
@Bliss467
@Bliss467 Жыл бұрын
Game dev on limited hardware is truly next fuckin level
@michaelrobb9542
@michaelrobb9542 11 ай бұрын
Cause he can't stop hearing the music. 9:04.
@jordixboy
@jordixboy Жыл бұрын
Could someone explain in the measurements what cycles refer to? The amount of operations the cpu has to do? (whats the difference with instructions? cycles != instructions?) is is the amount of times it has to request/store data in registers/ram?
@ricardoamendoeira3800
@ricardoamendoeira3800 Жыл бұрын
Many instructions take several cycles to finish. Interacting with RAM is a good example. One cycle is generally the time needed for the shortest CPU instruction to run.
@bertram-raven
@bertram-raven Жыл бұрын
Adding my own piece of magic from the 1970s. a%=b b%=a a%=b Works for all types, structures, and cache swaps.
@n00blamer
@n00blamer Жыл бұрын
If the % is exclusive-or then that would swap in-place.
@MikkoRantalainen
@MikkoRantalainen 8 ай бұрын
@@n00blamer Typically XOR would use syntax a^=b because usually % means remainder for a division.
@n00blamer
@n00blamer 8 ай бұрын
@@MikkoRantalainen That is why I assumed OP mistakenly used % when he meant ^
@aeaehow
@aeaehow 8 ай бұрын
15:30
@oserodal2702
@oserodal2702 Жыл бұрын
Doesn't the original code of the fast inverse square root also technically has undefined behaviour.
@ea_naseer
@ea_naseer Жыл бұрын
yeah it talked about the fact since he's not going for accuracy they probably would never divide by zero so no undefined behaviour like original.
@isodoubIet
@isodoubIet Жыл бұрын
@@ea_naseer The UB is not in any division by zero, it's in accessing a pointer through a different type. That violates the C (and C++) aliasing rules. I don't think it's an actual problem on any real compiler since it's such a common thing to do (and C++ only recently added a standard way to do it), but technically it's UB according to the standard.
@AdhirRamjiawan
@AdhirRamjiawan Жыл бұрын
I'm ashamed i'm part of the bigger problem :'(
@jim0_o
@jim0_o 9 ай бұрын
A Link/link To the past to the past was the pinnacle of 2D Zelda(and adventure/exploration games at the time) Ocarina of Time was the base-point of 3D Zelda games, Majora's Mask was a good look back at the gritty darker Zelda games. (they all had darkness but Zelda 2(Link), Majora's Mask and Twilight Princess were different) so both ALttP and MM were better game'wise but Ocarina of Time was ground breaking... now back to the remaining 80% of the video. Edit: AFAIK the mask guy(The Happy mask salesman) was a kind of Deus Ex Machina, ie. he was the "hand of god" that started and ended everything, if you follow the story its all his fault (He brought the mask to Clock town getting it into the hands of the Skull kid, but he is also the one that bugs you to get it back.) this is probably also why he is designed to look like the creator of the Zelda series... now back to 50% of the video...
@BustinJustin951
@BustinJustin951 11 ай бұрын
"Majoro's Mask" 🙄
@bernicefenton
@bernicefenton Жыл бұрын
"take a moment and consider your life and what you've built... versus this... you should be ashamed of yourself" 😂 not so harsh
@BeamMonsterZeus
@BeamMonsterZeus Жыл бұрын
I admit I've only beat OoT a few times and MM a few as well. I was more into Goldeneye, all of this at age 4-9 btw I'm a babby
@StingSting844
@StingSting844 Жыл бұрын
This is a guy who gives imposter syndrome to the ones we get imposter syndromes from
@lMINERl
@lMINERl Жыл бұрын
5:16 im ashamed😢
@TheIridescentFisherMan
@TheIridescentFisherMan Жыл бұрын
My boy was like " DAMN 20 MICRO SECONDS??? THATS WAY MORE THAN I THOUGHT ".
@cweasegaming2692
@cweasegaming2692 Жыл бұрын
AHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
@andrewtfluck
@andrewtfluck Жыл бұрын
Kaze is awesome 😎
@ikirachen
@ikirachen 6 ай бұрын
MDK2 FTW :)
@o0shad0oo
@o0shad0oo Жыл бұрын
Fast inverse square root *
@connorskudlarek8598
@connorskudlarek8598 Жыл бұрын
Prime playing Ocarina of Time makes me wonder if he's ever played the Randomizer or Online Multi-player Randomizer with his kids.
@wesleymcbob
@wesleymcbob Жыл бұрын
Yeesss Kaze is the greatest
@cherubin7th
@cherubin7th Жыл бұрын
Majora's Mask is my favorite.
@remigoldbach9608
@remigoldbach9608 Жыл бұрын
John Carmack is a beast (sqrt approximation in Quake)
@jc_dogen
@jc_dogen Жыл бұрын
wasn't his code though
@BigDaddyMort
@BigDaddyMort Жыл бұрын
Incorrect, sir! The best Super Mario game was on the SNES: Super Mario World RPG, Legend of the Seven Stars.
@B1GL0NGJ0HN
@B1GL0NGJ0HN Жыл бұрын
New Switch remake is a lot of fun!
@tgirlshark
@tgirlshark Жыл бұрын
OMG I LOVE KAZE
@CatherineBert
@CatherineBert Жыл бұрын
I don’t know who you are, but I’m interested in the video you are showing. Enjoy this evidence you’re stealing this contents views. Never seen your channel before.
@chronxdev
@chronxdev Жыл бұрын
Yep, he cooked my noodle
@Raven-fu1zz
@Raven-fu1zz Жыл бұрын
I think his talent would really useful on making a compiler or IDE for new age video games, modern video games are needlessly using so many resources unfortunately
@MadaraUchihaSecondRikudo
@MadaraUchihaSecondRikudo Жыл бұрын
Remember that all of this is for a game that runs on a very specific-custom hardware whose entire specs is known and consistent. This will be a lot more difficult today even if you discount PC (which has so many different processors with so many different features, cache sizes, hardware optimizations, etc) and just go for modern consoles. It essentially became impossible to do these kinds of optimizations by hand a while back for any real scale and to be fair the compiler will do many of those optimizations for you (eg. SIMD, branch prediction, etc), Instruction cache in particular isn't that big of an issue anymore. What you still need to be aware of today is memory allocation. You generally want to be CPU bound and not memory bound - and the primary reasons high level languages are generally slower than low-level ones is that it's harder to track and control what memory you allocate and when. If you're smart about allocating memory and working with values you've already loaded (as they're cached), you're 95% of the way there, which is generally more than enough.
@sa_lowell
@sa_lowell Жыл бұрын
I absolutely hate that I laughed at the vector normies thing. I'm done.
@Valerius123
@Valerius123 10 ай бұрын
Was Majoras Mask better than Ocarina Of Time? I didn't think so for the longest time but in objective hindsight... yeah, probably.
@notapplicable7292
@notapplicable7292 Жыл бұрын
Honestly considering prime apparently at some point in his career did embedded programming he seems to know very little about it.
@jameshamann465
@jameshamann465 Күн бұрын
Majoras Mask is better than Ocarina of Time
@MemeConnoisseur
@MemeConnoisseur Жыл бұрын
Nintendo suing his ass, how dare he make a good mario game
@Georgggg
@Georgggg Жыл бұрын
TL;DR: this is all became irrelevant 25 years ago
@catskinner6
@catskinner6 Жыл бұрын
Ocarina of Time >>>>>>>>>> Mario64 Fact
@bobanmilisavljevic420
@bobanmilisavljevic420 Жыл бұрын
I guess it's True when they say, D = 8 8==D
@dwdadevil
@dwdadevil 4 ай бұрын
true
Prime React: Fast Inverse Square Root - A Quake III Algorithm
30:19
ThePrimeTime
Рет қаралды 173 М.
Case Of The Sabotaged Trains | Prime Reacts
31:28
ThePrimeTime
Рет қаралды 80 М.
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН
I Hate Nintendo and I’m Buying a Switch 2 IMMEDIATELY
12:05
Linus Tech Tips
Рет қаралды 578 М.
How Super Mario 64 was beaten without the A button
24:12
Bismuth
Рет қаралды 1,1 МЛН
"BEST C++ CODE ever written" // Code Review
27:38
The Cherno
Рет қаралды 89 М.
Runescape is 100% f*cked..
50:29
Asmongold TV
Рет қаралды 152 М.
HaskLUL
25:25
ThePrimeTime
Рет қаралды 106 М.
Multiple Speedrunners Caught Cheating In Trackmania!
18:06
Karl Jobst
Рет қаралды 1,8 МЛН
How Optimizations made Mario 64 SLOWER
20:41
Kaze Emanuar
Рет қаралды 682 М.
I Accidentally Saved HALF A MILLION $ | Prime Reacts
29:12
ThePrimeTime
Рет қаралды 391 М.
Fast Inverse Square Root - A Quake III Algorithm
20:08
Nemean
Рет қаралды 5 МЛН
The Perfect Programming Language
23:50
ThePrimeTime
Рет қаралды 418 М.