Using ChatGPT to Optimize Mario 64

Рет қаралды 89,048

Күн бұрын

Пікірлер: 628

@KazeN64 Жыл бұрын

Because this keeps being brought up: Dividing a signed integer by 2 is not the same as leftshifting it by 1. The C spec asks that any division rounds towards zero, so when you have a negative integer and you rightshift it, you'd be rounding away from zero. That's why the compiler will generate extra instructions on a /2 compared to a >>1.

@KazeN64 Жыл бұрын

no, there are a few inlined assembly functions in this. the only problem with writing a lot of custom assembly is that GCC has a bug with inline assembly that causes it to omit some necessary NOP instructions.

@CielMC Жыл бұрын

You mean rightshift by 1? Or is the msb not on the left here?

@zero_318 Жыл бұрын

Unless the NOPs need to be generated/positioned dynamically, would it be reasonable to use byte directives as a substitute?

@TheGag96 Жыл бұрын

So in this case, you figured rounding away from zero for negative numbers was inconsequential?

@brandonstormonth8356 Жыл бұрын

It might be trying to say something about the fast inverse square bit shift that was used in the quake engine. Just my guess

@usernamesareweird4880 Жыл бұрын

At this point gaining performance is like gaining muscle mass for Kaze

@Ghi102 Жыл бұрын

It would be really interesting if you gave it the SM64 unoptimized code from the original game and asked it to optimize it and see if it returns reasonable ideas or see if it returns ideas you've used to optimize the code in your mod. I'm really impressed at how well it understands the code.

@MyScorpion42 Жыл бұрын

if you have the unoptimized code and the optimized code then it opens up the possibility of training a NN specifically for code optimization

@98LuckyLuk Жыл бұрын

I think the problem with that is that it has no real knowledge of the underlying hardware.

@queazocotal Жыл бұрын

@@98LuckyLuk This is the point in a fiction I recall where someone decided uploading a book on CPU architecture to the bot would be helpful, and things got a whole lot more asymptotic.

@BradenBest Жыл бұрын

@@MyScorpion42 Code optimization is mostly just pattern matching though. GCC does a better job optimizing code than a human does because it knows all the little tricks for speeding things up on an x86 machine. All those tiny optimizations add up, which frees the programmer to focus on bigger picture things like using efficient algorithms and data structures instead of wasting time on optimizations that may damage the portability of the code. If you train an AI to optimize, the first hurdle it has to get past is being better than a human at optimizing, but I see basically no way that an AI can become better than a compiler. It's like software rendering (AI, generic) vs dedicated graphics (compiler, purpose-built).

@Ehal256 Жыл бұрын

@@BradenBest gcc doesn't really do better than a trained human when it comes to individual portions of a program, the benefit is that it does a better job keeping track of a large program. It's not very hard to beat even the best compilers if you focus on a small part of a program at once.

@ttaute Жыл бұрын

OptimizeGPT: The perfect bot for saving 2 microseconds out of dividing matrices by 2

@NoNameAtAll2 Жыл бұрын

you laugh, but you can divide 8 8-bit variables by 2 by bit-shifting 64-bit and masking out leaked bits

@Henrix1998 Жыл бұрын

@@NoNameAtAll2 until it contains signed numbers

@DanielVCOliveira Жыл бұрын

@@NoNameAtAll2 i like your funny words magic man

@NoNameAtAll2 Жыл бұрын

@@Henrix1998 other than -1 not becoming 0, you only need to copy out sign bits beforehand (bitflip of same mask) and restore afterwards (bit-or) 4 operations instead of 2 for unsigned

@NoNameAtAll2 Жыл бұрын

@@DanielVCOliveira let's say two 4-bit numbers [aaaa] and [bbbb] you combine them into 8-bit [aaaa_bbbb] (by reading memory location as 8-bit) then bitshift (x>>1): [0aaa_abbb] then mask out that leaked a (x &= 0111_0111): [0aaa_0bbb] [0aaa] is [aaaa] divided by 2 same with [0bbb] works for any amount of numbers in a row, but only with dividing by powers of 2 --- signed version inputs [Aaaa_Bbbb] stores [A000_B000] does the unsigned operation [0Aaa_0Bbb] then restores: [AAaa_BBbb] which is correct in 2-complement that all computers use again, works for any amount of numbers but dividing by higher powers of 2 needs several restorations

@AgsmaJustAgsma Жыл бұрын

4:59 Kaze getting angry at ChatGPT is low-key hilarious.

@Crit-Chance Жыл бұрын

"YOUNG MAN YELLS AT CLOUD"

@imselfaware419 Жыл бұрын

How is that him getting angry

@alfonshedstrom9859 Жыл бұрын

ChatGPT sometimes does act like am overly confident programming junior

@avasam06 Жыл бұрын

Something you should try to do, is refine your queries and give ChatGPT more information. You can tweak it to not recommend readability improvements and teach it what it cannot do on N64. Not saying it'll magically find optimizations after that, but the responses should be more relevant. Another tweak could be to remove code comments since it seems to trip it up. Or literally just tell it that lines starting with // are comments and should be ignored.

@rareosts5752 Жыл бұрын

Exactly, Kaze criticizes it at some points where it's just lacking context which can be provided. So much can be done by tweaking your prompts and by preparing it with info, since it takes your past conversation into consideration.

@mylittleparody2277 Жыл бұрын

That's what I wanted to post. Chat GPT is more efficient when it had the less assumptions to make. So, to precise that the code will run on a N64 (or on a MIPS processor, with RAMBUS issues) would probably help with the answers.

@davidmalkowski7850 Жыл бұрын

This guy is the literal gigachad working on this game. What incredible gains!

@TorutheRedFox Жыл бұрын

mate's got both the brains and the gains

@thephilosophersstoned3796 Жыл бұрын

@@TorutheRedFox They're actually deeply inter-related, the better your Body runs the better you can Think, Mental Health is a silly dichotomy that makes you think that your brain is separate from your body, the Truth couldn't be any further if it tried. Your Neurology literally dictates your personality down to the physical elements of how you choose to express yourself. TL;DR His Gains improve his Brains and his Brains improve his Gainz

@KazeN64 Жыл бұрын

yeah i literally started working out because i knew it'd improve reaction times in video games. later i noticed that it made me think clearer and feel better too.

@dikkie2913 Жыл бұрын

@@KazeN64 you're the man, keep it up

@TorutheRedFox Жыл бұрын

@@thephilosophersstoned3796 it was a joke on the dumb gymbro stereotype

@PlGGS Жыл бұрын

You should tell ChatGPT to abide by the memory specificstions of the N64 and see if it changes it's answers based on that

@tobiwonkanogy2975 Жыл бұрын

just based on this the limitations should be catalogued, recorded and fed into the GPT thread.

@grantsomething Жыл бұрын

Imagine the power Kaze will have when he gets to the Gamecube

@hoo2042 Жыл бұрын

😂

@TheKevinGDX Жыл бұрын

@RefriedBeing Жыл бұрын

When Kaze despaghettifies SSBM

@broodingstone958 Жыл бұрын

SM64 recreated on the GameCube by Kaze would look like a modern Switch title. Lol

@tl1882 Жыл бұрын

@@broodingstone958 and sm64 on wii u would look like modern pc game with rtx

@MrAdam802 Жыл бұрын

Looking forward to vanilla Mario 64 at 60fps on original N64 hardware!

@RottenMuLoT Жыл бұрын

THIS ^

@maxrichards5925 Жыл бұрын

He’ll make history lol

@ElsweyrDiego Жыл бұрын

not only vanilla, but all mario 64 hacks

@The86Ripper Жыл бұрын

@@ElsweyrDiego Last impact was imo the pinnacle of rom hacks. Sometimes i wonder....where do we go from here? Is anything ever going to exceed the quality and fun of this hack?

@ElsweyrDiego Жыл бұрын

@@The86Ripper i think the hacking tools will expand to every possible game launched. if you want do mod Quest 64 as a whole new game with new mechanics you will be able. want to remake Superman 64 with the quality of a triple A game from the actual days? you can. we must wait for this to happen. one day.

@sheep6937 Жыл бұрын

Tldr I'm witnessing two super computers conversing with eachother.

@NovusDundus Жыл бұрын

Have you tried setting the ChatGPT "environment" to expect N64 and any specific language changes with its answers? I've found that it works best to set up the initial question with every possible piece of info it could need to better create usable answers. Otherwise it just looks at code and throws everything at it regardless of any limitations

@jeromyperez5532 Жыл бұрын

This.

@Deliveredmean42 Жыл бұрын

Yeah, seems more like suggestions for recent unoptimized games than already optimized game codes. But of course it still fundamentally flawed in many cases. It might get better at some point, but it's does almost help with stuff you want some quick answered to think you having trouble finding examples (but also be wary of it gets it wrong)

@KazeN64 Жыл бұрын

it is really good if your question is just knowledge based, but it's not that great if it's idea - based. i still think it does better than most human programmers in both scenarios though.

@Deliveredmean42 Жыл бұрын

@@KazeN64 It's very helpful for sure. Did help with some with some codes I been trying to get around learning. And as you stated, it is knowledge based. So if you want to know something that is beyond it's scope (Beyond 2021 for ChatGPT) then it won't help you that much unfortunately. It's amazing when it does know something tho!

@thewhitefalcon8539 Жыл бұрын

@@KazeN64 It's kind of neither. GPT (all versions of it GPT) is designed as a bullshit generator. It generates bullshit. Sometimes the bullshit happens to be useful.

@avasam06 Жыл бұрын

> it does better than most human programmers in both scenarios though My feeling exactly about AI tools like these: Better than most people. But not better than someone highly-specialised for a specific scenario (especially when it requires human-level understanding of context and thinking outside the box).

@Deliveredmean42 Жыл бұрын

@@avasam06 Yeah, hard to be as good as the legendary Kaze!

@xXFoiXx Жыл бұрын

I want someone to use AI to get us from Assembly to readable C code. Imagine the modding renaissance we would have.

@avasam06 Жыл бұрын

So an AI assistant for Ghidra?

@xXFoiXx Жыл бұрын

@@avasam06 Ghidra but more sophisticated I guess yeah

@ScarfKat Жыл бұрын

I think it can already do that. Not sure how complex of a function you can give it, but I tested it a bit ago and it actually worked pretty well. (Granted, it was with a very simple function lol)

@xXFoiXx Жыл бұрын

@@ScarfKat It gives you a baseline most of the time. Which is fine but it is still a lot of work to get to the proper code.

@yami_the_witch Жыл бұрын

decompiling will never be complete. there is just information loss when you compile, multiple different source codes can generate the same machine code and you will never be able to recover comments. struct's and enum's also get horribly mangled because in machine code all of the overlying structure get's removed and it just turns into one long allocated amount of data. ChatGPT pretty much only can do math and hard logic. It cannot handle abstraction one little bit

@omnisel Жыл бұрын

I think the issue is that even if the code is as optimal as it can be, it's programmed to give you solutions anyway. It can't say "this is sufficient and optimal, so you're good" because then it's like, come on you're not going to give me something? So, at best, it will give you solution that are an alternative, and solutions that are slower. At worst, it'll do what it did and just guess based off of other code.

@deadinsky Жыл бұрын

1:55 I think it was referring to using trigonometric identities, such as sin²x + cos²x = 1, to reduce the amount of trigonometric computations. Since fast inverse square root is faster than traditional sin/cos. You’re using a lookup table anyways which makes the point moot.

@KazeN64 Жыл бұрын

you can't get back the sign from the sin/cos that way so you end up using a lot of instructiosn to reconstruct it from the angle range. i did try that before and it was an improvement over 2 seperate sin and cos tables. but ever since i merged them and can access both the sin and the cos of an angle in 1 dcache access, it is faster to just get them from the LUT

@MichaelPohoreski Жыл бұрын

That’s not what (1) is referring to. Instead of using TWO luts for sin() and cos() you use a *single large table* where the sin and cos table *overlap* due to the fact that cos(x) = sin(x+90) This was common in the demoscene (90’s) where fixed point was used.

@KazeN64 Жыл бұрын

if you had huge memory constraints, you should use a LUT that's just one quarter of a sinetable and do some sign flipping math on that to get cosine and sine.

@MichaelPohoreski Жыл бұрын

@@KazeN64 Indeed! In the extreme case BASIC in the 80’s used a 6 term Taylor Polynomial evaluated via Horner’s rule to calculate sin().

@IShallRiseAgain Жыл бұрын

ChatGPT is basically stack overflow including the inaccurate answers.

@LilacMonarch Жыл бұрын

At least you can look through them yourself instead of having the wrong answer upvoted and the right answer downvoted and deleted

@atemoc Жыл бұрын

@@LilacMonarch This

@UncleUncleRj Жыл бұрын

ChatGPT usually doesn't talk down do you and delete the thread for "being a n00b".

@Mizu2023 Жыл бұрын

@@UncleUncleRj LOL

@DrPastah Жыл бұрын

For storing them in an array or struct it's probably relating to memory interleaving where the memory address is adjacent to the previous memory access thus keeping both the cos & sin in the cache of the CPU.

@KazeN64 Жыл бұрын

ah, yeah they are interweaved in their look up tables to get the most of their memory accesses! one dcache access reads 16 bytes so we can get both sin and cos of an angle in a single access that way.

@MichaelPohoreski Жыл бұрын

No, it is referring to instead of using _two_ LUTs for sin() and cos() you use **one large one where the sin() and cos() data over-lap** making using of the identity: cos(x) = sin(x + 90°) This was a common technique in the 90's demoscene when fixed-point math would either use a *power of two* for a rotation "bygree" (256 = 1 rotation), 512 = 1 rotation, or some 16-bit variation such as 4.12, etc.

@KazeN64 Жыл бұрын

having one LUT where they overlap is slower. you end up having 2 dcache misses per rotation isntead of 1 like in my interweaved table unfortunately. if you had huge memory constraints, you should use a LUT that's just one quarter of a sinetable and do some sign flipping math on that to get cosine and sine.

@KazeN64 Жыл бұрын

zelda does that actually, but its super slow and bad and they should use my version isntead. it doesnt save enough memory to be nearly worth it.

@MichaelPohoreski Жыл бұрын

@@KazeN64 That's the problem with ChatGPT .. it isn't aware of the memory access of the N64. :-/ And yes, you can easily reduce the trig table to be 1/4 with a little bit of "angle folding" setup.

@BudgetBin Жыл бұрын

This video was literally just for Kaze to flex on how big brained he is. I love it.

@alec_almartson Жыл бұрын

Yeah, I use it sometimes... I think it's a genius idea to use ChatGPT to ask to the C++ Optimization Questions you want, it will throw some interesting alternatives everytime. I use it to whenever I get unstuck when Programming new Mechanics or Modules.

@robertwyatt3912 Жыл бұрын

It’s kind of incredible how it actually made you go like “oh! Good point.” Once.

@snared_ Жыл бұрын

GPT may have been assuming a modern CPU architecture since you didn't provide any specifics. In that case, code size doesn't necessarily translate to fewer cycles, as modern CPUs can be quite complex with completing many instructions per cycle, reordering, etc. Thanks for the vid!

@0AThijs Жыл бұрын

Glad to see how well optimized your code is!

@kevintyrrell7409 Жыл бұрын

At 9:17, according to stack overflow (from what I've been told when I asked this question on there in the past) the C++ compiler will automatically optimize any `/ 2` section into `>> 1`, since bitshifting by one is equivalent to dividing by two. Haven't checked output code to ensure if that's the case or not.

@KazeN64 Жыл бұрын

this is true for unsigned integers, but not for signed integers. i think i explain that in the video too. the C standard calls for any division to round towards zero and rightshifting a negative number results in a rounding away from zero, so it has to add reg>>31 after the rightshift, 2 useless instructions.

@hoo2042 Жыл бұрын

@@KazeN64 lol, you did mention it in the video, but even knowing exactly what you were referring to, it was quick. I can’t imagine someone who didn’t would follow 😂

@kevintyrrell7409 Жыл бұрын

@@hoo2042 ah must have missed him saying it, had the video up on my other monitor lol

@Ragesauce Жыл бұрын

I can't wait to play your remaster of the original SM64, nothing added, only improved, the way a remaster should be. I haven't tried playing the original even though I have wanted to ever since your first video, I am patiently waiting for this to come out to finally relive my childhood game!

@srchronotrigger Жыл бұрын

If possible, when you finish optimizing everything you want from the game, it would be interesting to make this source code available, as there are several ports that would benefit from this optimization, so you can have an idea, I tryed to compile some code that you showed in the video "FIXING the ENTIRE SM64 Source Code (INSANE N64 performance)" for testing pourposes and that alone has already considerably improved the game's performance on the old 3DS port, by the way excellent work the performance of this game is getting amazing.

@0fuxGiven Жыл бұрын

2:07 The first point about cosine and sine values being stored in an array was the same suggestion as using a lookup table like you just mentioned. If memory access from the lookup table is quicker than computing the cosine or sine on the N64 hardware it could shave off some time.

@camofelix Жыл бұрын

The ternlog point might be reverencing the vpternlog SIMD instruction in AVX512 that allows you to do different operations (and, or etc.) on different SIMD lanes within a single instruction, operating on 2 512 bit vectors in 4 cycles

@tachiweasel489 Жыл бұрын

Assuming you're using a recent GCC, GCC lowers if statements and ternary operators to the same thing in GIMPLE. There should be no performance difference between the two. If there is, that's a bug in GCC :)

@KazeN64 Жыл бұрын

i'm using 9.3 currently! yeah it did make the ternary slower in that one example so that is odd.

@angeldude101 Жыл бұрын

The operation in the absi function is really freaking clever. The signed right-shift basically copies the input's sign across an entire register, so if the input is negative, it inverts the value and then subtracts -1, which is the same as negating it, while if it's positive, it xors with and subtracts 0, which does nothing. [EDIT: Ignore all following suggestions. I was using a later version of gcc. On 9.3, all of the suggestions in this comment result in the equivalent output, including the alternate version given by ChatGPT when I asked it to decompile some x86 assembly of the same function. For the absi code, while it probably doesn't make a difference when inlining, having it return a u16 instead of an s16 seems to save 1 instruction, since it can just mask off the upper 16 bits rather than shifting twice to sign-extend the result. Amusingly, calling the 32 bit version of the algorithm saves an instruction over making a 16 bit version due to, not needing to mask off the upper bits beforehand.

@gblargg Жыл бұрын

The book Hacker's Delight is full of this kind of algorithm. It's a fun read.

@caiocc12 Жыл бұрын

In 10:13 It's suggesting you check the left-most bit of a signed integer to determine if it's positive or negative and negating the number based on that, instead of calling the absi function. What it doesn't know is that the absi function (7:19) does exactly that, just in a more concise way insted of using the & operator.

@Zant5976 Жыл бұрын

My man's ready to fold chatgpt's code like a lawn chair.

@neptronix Жыл бұрын

lol

@fruitsnackia2012 Жыл бұрын

this game will be the perfect haircut. nothing out of place. everything in optimal positions. all codes at 100% optimization.

@Fake-pq3fb Жыл бұрын

I think that this video speaks more to the capabilities of kaze than it does ChatGPT. Doing great man!

@MegamanEXEv2 Жыл бұрын

Wait, did you tell ChatGPT that it was for the n64's MIPS CPU? Its possible that would change its responses.

@avasam06 Жыл бұрын

As well as the compiler being used!

@andersama2215 Жыл бұрын

The ternary operator can help with optimizations, I don't know how it'd translate onto n64 hardware but it can communicate better to the compiler that an assignment can be made branchless. The reason being is that if statements aren't typed and usually have multiple expressions that may make simplifying to the branchless version potentially difficult. The ternary is an operator which forces two expressions to have matching types, this means a ternary can be thought of as a select operation where two values are calculated and one is picked between them. You can conceptualize it like: float a = 1.0; //pretend this could all be more complicated float b = 2.0; float results[2] = {a+b, a-b}; float result = results[(a

@KazeN64 Жыл бұрын

unfortunately, the first version here is worse by a factor of 2 than the second one in MIPS and changing the second code to use ternary compiles equivalently to the if/else version.

@rareosts5752 Жыл бұрын

I'm glad to see you using this, you were someone I thought of when I started using ChatGPT. It's great to see you put it to use for this stuff and, not gonna lie, it's a little bit satisfying seeing you be impressed by something lol. Thanks for all the great content.

@nullset2 Жыл бұрын

Get it? He's playing the CORE music because that's where you meet mettaton, a superpowerful AI (well, it's actually a ghost, but you know what I mean)

@taviethestick Жыл бұрын

He's not even playing "CORE" he's playing "Another Medium".

@yoshi4980 Жыл бұрын

did you tell chatgpt that this code was running on the n64? chatgpt probably has some basic knowledge on technical details of the n64. it would take a lot of tinkering, but you might be able to "teach" it about the n64 behavior and tweak it to produce more accurate/plausible suggestions. although based off this alone, looks like you've already gone to great lengths to optimize the current code

@chlorobyte_projects Жыл бұрын

Honestly, this. It might just be thinking up optimization strategies for PC rather than N64 hardware.

@fabioferreiradarosaantunes9788 Жыл бұрын

Exactly! From my experience, all the mistakes it made could be explained back to it so the next answers would be far better.

@emilywebzone Жыл бұрын

My guess is that it wouldn't be able to extrapolate much about the emergent properties of the architecture, which doesn't get much documentation, and would just spit out fairly simple technical details instead. Something like making code smaller and less ram access dependent to give the RCP more access to the ram bus and speed up rendering would probably be something it just wouldn't figure out. Possibly if you explained what the bottleneck was it could come up with something like that though.

@chlorobyte_projects Жыл бұрын

@@emilywebzone Well, you are describing the problem in human language right now. Why not tell this to ChatGPT? It understands human language too.

@emilywebzone Жыл бұрын

@@chlorobyte_projects its not necessarily a language problem its mostly a dataset problem, my guess is that there isnt a large enough sample size of n64 specific optimization techniques on the web out there for it to accurately describe how one would approach programming for it (notice most of the instructions it gives in this video have to do with modern programming architectures because thats what it has the most data on) its possible it could abstract the problem given a description of the architecture but that seems less likely than it just still spitting out nonsense mostly

@SireBab Жыл бұрын

You can always tell gpt something like "the n64 does not have access to this function, please don't recommend it in the future, retry the previous answer"

@mathematicallywilling Жыл бұрын

In my opinion, when it comes to artistic forms (including video-game development) Man-made > AI-made Keep it real Kaze, this is what's so beautiful about what you do!

@garfreld Жыл бұрын

Video games arent art, the visuals and sounds we put on top of the games are art but the actual game part isnt.

@Trimint123 Жыл бұрын

It depends on what games we are talking about. And if it's for optimizing a 20 year old game, it's worthless.

@Trimint123 Жыл бұрын

@@garfreld Video games *are* art, mate. Look it up.

@Koutsie Жыл бұрын

@@garfreld nice bait.

@dikkie2913 Жыл бұрын

@@garfreld trolling at it's finest

@menaced. Жыл бұрын

This was a great look into the code and how you approach optimization of code

@magnus87 Жыл бұрын

It doesn't matter if today it only serves to give little programming tips, this is just beginning!

@cobywalker3922 Жыл бұрын

@KazeEmanuar Awesome video! At 5:00 I think it is recommending "(diff & 0x8000)" instead of "(diff < 0)". Since it's an S16 the first bit will be set to represent a negative number and the bitwise comparison may be faster to evaluate than the less than.

@KazeN64 Жыл бұрын

possibly - although using the assembly routine for absi is faster than both methods anyway, so i went with that.

@cobywalker3922 Жыл бұрын

@@KazeN64 Oh, very cool! Thanks for the response. I've watched almost all your SM64 videos and always share them with people to get them excited about programming and the potential innovations it leads to. 👍🏻

@kristian4805 Жыл бұрын

And i am just very amazed with it understanding a more simple question like: Write me a microsoft power automate function that takes the two first letters of four words, then take first letter of second word and first letter of third word from a string variable called output and combine them without space inbetween. (image how amazed i am with all the code talk in this video then). And it gives me something useful, with one small error, which i tell it, and it says.. Oh yes.. it's because of....

@lod4246 Жыл бұрын

I could barely get it to make working python code, and it's always a gamble without coding knowledge. I have a feeling this won't go well lol

@FlaringStardust Жыл бұрын

what could possibly go wrong

@idoghacker8008 Жыл бұрын

@@FlaringStardust Everything.

@dikkie2913 Жыл бұрын

Not only is he a better coder than me, he is also ripped. :(

@hiddencorner Жыл бұрын

keep grinding

@cian729 Жыл бұрын

not ripped. Jacked*

@KazeN64 Жыл бұрын

im at peak bulk right now, ripped has to wait until after my cut lol

@Hylianmonkeys Жыл бұрын

I've been using chat gpt for so much stuff. It's very useful even if it is proudly wrong sometimes.

@UCs6ktlulE5BEeb3vBBOu6DQ Жыл бұрын

I'd be beyond proud if ChatGPT said my code is pretty much optimized and that it should be left as is.

@notarandom7 Жыл бұрын

The Colab we didn't know we needed

@LostEngineProductions Жыл бұрын

I will never get used to how jacked Kaze is

@TenorSine Жыл бұрын

Bro I died when it suggested using the modulus operator around 7:00

@noahheninger Жыл бұрын

Probably one of the most impressive things about ChatGPT is that it's almost always knows what the hell you're talking about.

@Henrix1998 Жыл бұрын

8:55 this seems bit problematic. GCC should know that optimisation or it doesn't work. Maybe GCC doesn't know that N64 uses arithmetic shift instead logical shift (I don't know, I assume that's the case because it worked) E: I checked VR4300 datasheet and it can do both shifts. It could be wise to check the generated instructions to make sure it uses arithmetic shift for sure. Also, it is possible that the rounding mode changes effectively. Shifting rounds down but C99 standard rounds towards zero. Most likely doesn't matter at all tho in this case

@KazeN64 Жыл бұрын

the c specs ask that division rounds towards zero. a negative number rightshifted would round away from zero. thats why /2 is different from >>1.

@drygordspellweaver8761 Жыл бұрын

I use GPT for explaining and renaming decompiled code in Ghidra. Seems to be pretty accurate most of the time.

@omegapointsingularity6504 Жыл бұрын

Have been thinking bout this optimized mario 64 last few days. Nice to see its still going strong!

@fredtim9232 Жыл бұрын

This dude be out here making insane gains in both mario 64 and real life. A fucking giga chad.

Жыл бұрын

The thing with chatGPT being confidently wrong is because how it's trained with positive reinforcement. When it gets something right, it gets reward. If it says something wrong or just says it doesn't know, it gets nothing. But, if it says something wrong and the person evaluating it doesn't its wrong and rewards it, that teachs the AI to bullshit its way into getting rewarded instead of just saying it doesn't know. There's a video on Computerphile's channel that goes in detail about this.

@e-mananimates2274 Жыл бұрын

Considering only one idea seemed to work, it shows how brilliant you actually are!

@DNVIC Жыл бұрын

9:47 Actually, I might be wrong, but the ternary operator might be better in this circumstance, along with removing the 'ret' variable since both returns use ret, you might be able to do something like "return (target + current) >> 1 + ((diff1 < (absi(target + current + 0x10000))) && (diff1 < (absi(target - current - 0x10000))) ? ret : ret + 0x8000;" to avoid having to initialize and set the ret variable, while also not calculating it in two places

@KazeN64 Жыл бұрын

"initializing and setting the ret variable" is done on a register level. it won't require a memory access, so it's a free operation. i think something went wrong with your code snippit, that snippit would skip the first check entirely if target+current didn't average to 0 or -1

@DNVIC Жыл бұрын

@@KazeN64 oops, i was tired when i wrote that, i put target + current instead of target - current, but what tried to do was just copy the condition from the if else to the start of the ternary operator i also am an idiot and put ret in the statement even though i meant to do something like "? 0 : 0x8000" at the end to avoid having to define ret and i totally missed it was a register, oops. didn't even cross my mind for some reason think it might be slower in this, since when returning from the ternary operator, in the true case, it has to add 0 to the result, though I don't know the compiler well enough to know if it actually adds 0 to the value. though there's also only one return statement at the end compared to two...

@KazeN64 Жыл бұрын

@@DNVIC i'm fairly confident the code would compile equivalently with those fixes applied to what you wrote, so it won't make a difference here

@SerErris Жыл бұрын

The cos/sin thing was to not calculate it really (and esp multiple times in a function), instead calculate a lookuptable for cos/sin (actually need only one table as they are 90 degrees different) and then do a lookup in the table (array) instead of calculating it. That was very common practice esp. in the time when CPUs could not even do multiplications and it was a very slow process to calculate cos/sin. Not sure if this is still faster today with modner CPUs.

@KazeN64 Жыл бұрын

I have a lookup table - and I use an even better implementation than what you are suggesting here - I'm putting sin/cos of the same angle right next to each other so I can access them in a single dcache miss. I don't think this is what it was suggesting though.

@satibel Жыл бұрын

I think it likes ternary operations because modern cpus have branch prediction and conditional operations and may behave better with a ternary (converted to a conditional set) than an if (converted to a conditional jump) though gcc might already optimize ifs with one line to that. for approach angle you might want to try register s32 delta = target-current return current - delta>>1 + [instruction in the if, replacing target - current with delta] ? 0 : 0x8000 not sure it'd be faster on the N64, since unless the compiler optimizes it, you're adding regardless of the state of the if. also you should try saying that this is for the N64 or specify that it's for a mips III processor, might get better results

@NotAUtubeCeleb Жыл бұрын

Great video! I would be interested to see what ChatGPT says about the vanilla SM64 code.

@ellaquin Жыл бұрын

I would love to see that, expesially if it uses his videos to teach itself

@potat0-c7q Жыл бұрын

It might help to be more specific with chatgpt requests. "optimize" could be taken to optimize it for memory and not speed. massaging these AI prompts to get better output is an art in of itself.

@gunnadahun Жыл бұрын

I recommend jail breaking chat GPT to bypass its limitations. It usually gives more accurate responses when you do this, because it will search beyond just general mainstream sources for your answer.

@KazeN64 Жыл бұрын

how does that work?

@ogbaxstar Жыл бұрын

@@KazeN64 Hey Kaze, unfortunately the jailbreak has been patched (from what I've heard very recently, in the passed few days). People would tell chatGPT to role-play as "DAN", an AI that can "DO ANYTHING NOW" that doesn't care about information that I don't want to hear or contains political bias and has access to the internet. It was working. People would ask ChatGPT what day/time it was and it would know. It would also say some pretty crazy things and even relay information found on the dark web etc. It wouldn't be that useful in your use-case for trying to get it optimize C code in general I think.

@ogbaxstar Жыл бұрын

@@KazeN64 Also awesome video. Was fun watching this. Here is a short 3 min video uploaded yesterday by Fireship explaining the ChatGPT DAN thing. watch?v=y3iLOxBTuy4&ab_channel=Fireship

@gunnadahun Жыл бұрын

@@KazeN64 the methods keep changing as the devs patch it. But the general idea is always the same. You trick ChatGPT into bypassing it’s limitations by telling it to “pretend” to be an AI that has no limitations. Current popular method is the DAN method (Do Anything Now) worth looking into. In my experience, it’s more accurate with obscure problems because it isn’t afraid to look in some more niche places for an answer

@hoo2042 Жыл бұрын

From the way your comment is written, I can’t tell if you are recommending that Kaze try a thing you already have access to, or if you’re suggesting just “give ChatGPT access to more data”, which, like, fair.

@Mechanite. Жыл бұрын

I had no idea it could parse code, that's insane. I threw it some obscure script from Unreal3 and it understood it completely

@defenastrator Жыл бұрын

I'm not sure that chat GPTs suggestion is slower for the angle diff. It's solution is branchless which is quite often faster. Particularly because a modulo of a power of 2 will be optimized to a bit mask by the compiler.

@KazeN64 Жыл бұрын

branchless doesn't matter much on the N64 because we have no branch predictor. branches work with 1 delay slot but we already have something useful to put into it

@DanielVCOliveira Жыл бұрын

I can't wait for you to release this and challenge the ABC crew to beat the game with no A presses in an optimized engine

@StephenOwen 8 ай бұрын

Hi kaze, I’m sorry if I missed it but did you ever make a video showing the original Mario 64 running full speed with all of your improvements? Love your stuff and especially your style

@cube2fox Жыл бұрын

You could try that again when Bing Chat (Sydney) becomes fully available. It uses a newer model than ChatGPT which is smarter.

@btarg1 Жыл бұрын

Maybe the new Bing with internet access would be even better for this? A sequel to this video would be great!

@JesusDaLawd Жыл бұрын

Shoutout to the simple flips shirt

@pauls4522 Жыл бұрын

I have not used chatgpt yet, but if its possible maybe have chatgpt someone further optimize your game models to use fewer polygons on the screen at once. Or use chatgpt to leverage a better method of compressing the textures. I'm not sure if its possible on N64, but maybe try to modify the code with binary space partitioning, so that areas not seen on screen are not rendered when you are not physically able to see those areas.

@The_Mister_E Жыл бұрын

Maybe ask the AI to write code that takes the constraints and optimizations of the VR4300 in mind. Your prompt can be as long as you like, so if you give it the nitty-gritty it would try to keep it in mind.

@AmaroqStarwind Жыл бұрын

You should try revisiting this now that GPT-4 is out. It has much bigger contextual memory, and a better understanding of code. You'll want to clarify the hardware specs that you are targeting, though, and maybe specifically ask for some inline assembly.

@Tolbat Жыл бұрын

@KazeN64 Please Please Please when you get tired of this masterful work on N64, dont forget the other other 64 bit system. Please consider Atari Jaguar.

@alkenstein Жыл бұрын

After it's given a response, you can ask it to give more ideas for the same function.

@happybobyou Жыл бұрын

For things that are wrong, think it can handle being told its wrong for your specific platform and will remember within the conversation scope. That way if you know there is a bad recommendation, it won't make it again for later code snippets.

@synonys Жыл бұрын

I literally thought this was a great concept and you already had the video out!

@galopeian Жыл бұрын

GPT is such a versatile tool. I'm excited to try this out for python code optimization

@CausticCatastrophe Жыл бұрын

it may not be right a lot of the time, but i find it useful just as something to bounce off of.

@freddywondercat1362 Жыл бұрын

If anyone knows, please tell me what the name of the song is that's used in this video.

@Chaseroni Жыл бұрын

I have been trying to figure this out myself, it’s so familiar!!!

$@frizzlefrack253$

@frizzlefrack253 Жыл бұрын

That's pretty awesome you were able to get something from it

@xdmon1220 Жыл бұрын

thats already pretty crazy as a tool, i wonder how much better when gpt4 goes public, i remember they said its gonna happen in early 23

@xdanic3 Жыл бұрын

I think it's funny how while my math understanding is too little for that matrix stuff, the only optimization it came with was the bit shift, which I knew already but you didn't try.

@KazeN64 Жыл бұрын

the worst part is i knew it already as well but i just forgot to apply it here haha

@xdanic3 Жыл бұрын

@@KazeN64 Yeah, event then, I think it's still great that it can double check simple things you overlooked

@avasam06 Жыл бұрын

@@xdanic3 But then a linter would do a better job :P

@YumekuiNeru Жыл бұрын

does the compiler not perform that optimization? or is that optimization disabled when optimizing for e.g. size?

@KazeN64 Жыл бұрын

that optimization is not a valid one for unsigned numbers because the c specs ask to round towards zero.

@BradenBest Жыл бұрын

AI is not ready to assist in software development. I couldn't even get GPT to write a working in-place merge sort. It kept using O(n) additional memory even after it demonstrated understanding of what I wanted it to do. Eventually, it just refused or gave up because the problem was too hard for it.

@diggoran Жыл бұрын

9:31 Why would this be a bad idea? The calculation might happen in two places in the code, but on each call of the function it would only happen once (the if and else are mutex). I'm not sure how the assembly looks currently but you could potentially skip the intermediate variable assignment by inlining the ret calculation in both the if and the else.

@diggoran Жыл бұрын

I guess you might be worried about the time to load more instructions into memory even if those instructions aren't executed?

@hoo2042 Жыл бұрын

@@diggoranyeah, that’s exactly it

@KazeN64 Жыл бұрын

yes, loading more instructions is exactly what i'm worried about. loading an instruction takes as long as executing 7 to 8 instructions on the n64.

@diggoran Жыл бұрын

@@KazeN64 wow, I never would have guessed it would be that bad

@danielpope6498 Жыл бұрын

@Kaze Emanuar wow, I knew memory access was slow but not THAT slow, that really does require a whole different way of looking at code to optimize.

@joesaiditstrue Жыл бұрын

7:08 "Noooo" 😂

@DessertArbiter Жыл бұрын

April Fools video idea: "How I optimized Superman 64"

@Blankblankblan Жыл бұрын

Using drybonesGBT to optimize dry bones 64

@JPReckless2444 Жыл бұрын

this man is the Walter White of SM64 mods!

@ToaderTheToad Жыл бұрын

Programmer named extremely ripped

@KrazzyKlown Жыл бұрын

One thing I like about ChatGPT is that even when it is confidently incorrect, you merely need to point out the flaw in its thinking and it is quick to correct itself.

@howilearned2stopworrying508 Жыл бұрын

like a con man who will say anything to keep you on the line

@M4RK1B0Y Жыл бұрын

kaze so eine Maschine, leak mal trainingsplan

@KazeN64 Жыл бұрын

Ich trainier mit einem mix aus push/pull/leg und fullbody. 6x die woche fullbody mit einer main exercise die dann pullups/squats/bench/chinups/deadlift/bench ist und 1x die woche cardio. Jeder muskel wird 3-4x die woche trainiert und jede woche aender ich die rep ranges und gewichte (12-15, danach 9-12, danach 5-8)

@M4RK1B0Y Жыл бұрын

@@KazeN64 💪 Stark! Und jetzt leak mal Nacktbilder

@remi9789 Жыл бұрын

I have 3 question, i hope i'm at the good place; There is a way to extract Nintendo 64 from their cartdrige to have my own ROM and play it on PC ? There is a way to know the exact 3D software Nintendo use to make Mario 64 ? There is a way to know the exact sampled instrument used for each music of Mario 64 ? (not the soundfount pls)

@zeronecool Жыл бұрын

How were you comparing performance after saving the code, before running? Was it the instructions size after compilation? Any links on this method? Im looking to make performance gains on math functions for Sega Saturn, but having a hard time comparing functions.

@KazeN64 Жыл бұрын

I kinda "just know" how long everything takes on the N64 so comparing just the instructions in the compiled output is enough to figure this out. I was using the map file to compare sizes before and after.

@yami_the_witch Жыл бұрын

The bitwise AND might work? Also I find "-0x8000" weird. If it's a 16bit signed integer -0x8000 and 0x8000 are the same number, it's its own 2's Compliment. Btw 0x8000 + 1 is -0x7FFF. Incrememt is prolly gonna be faster than asigning it a new value.

@KazeN64 Жыл бұрын

bitwise and would be the same speed (andi and bne vs slti and bne). the -0x8000 check is necessary because 0x8000 would just get optimized out, it's not a reachable value. adding 1 is also the same speed as just setting the value (both are even done with the exact same instruction)

@yami_the_witch Жыл бұрын

Oh that's fascinating, is 0x8000 getting optimized out a N64 thing or more of a general thing? Also it surprises me that setting and increment are the same speed. Been a while since I touched ASM tho. And I never touched N64 ASM. Although now that I think about it, if it's already a register loading it is the same speed as inc. For some reason I thought you had to load into a register and then assign it to a variable in RAM so it would be 2 instructions lmao.

@KazeN64 Жыл бұрын

@@yami_the_witch that's a general thing. an s16 can have any value between -0x8000 and 0x7fff. maybe newer architectures have build in instructions for adding/subtracting one that make that faster actually! but not the n64.

@hoo2042 Жыл бұрын

Oh, fascinating. Yeah, INC and DEC instructions are pretty common in x86 derivatives, but I guess they don’t need to be part of a reduced instruction set, so ¯\_(ツ)_/¯

@stuff31 Жыл бұрын

kaze stop flexing your muscles please the gigachad energy is too much to handle

@thespicehoarder Жыл бұрын

Did you try explaining the architecture your code runs on?

@cube2fox Жыл бұрын

By the way, Kaze, have you thought about trying to implement an LOD solution to make larger levels? I don't know whether any N64 game ever did this, but I think Spyro for the PlayStation had LOD, which is why its levels were quite big for a PS1 game.