Timestamps 00:07 - Creating a reimplemented GPT-2 from scratch in C without using libraries 03:25 - Begin basic math operations and data set exploration in C. 10:19 - Transforming input into a 768-dimensional embedding space in GPT-2. 13:44 - Introduction to embedding and position embeddings in GPT-2 sequence processing 20:18 - Layer normalization essential in Transformer blocks 23:43 - Explanation of layer normalization, attention block, residual connections, and MLP in GPT-2 architecture. 30:37 - Introduction to the attention block in GPT-2 34:08 - Overview of attention heads and dimensionality transformation 40:39 - Explaining the mechanism of attention blocks in GPT-2 model. 43:47 - Overview of attention computation in GPT-2 from Scratch in C 50:34 - Discussing function graphs and transformations in GPT-2 programming 53:52 - Model architecture overview & Python implementation planning 1:00:02 - Conversion between tokens and original string in GPT-2 1:03:11 - Introduction to saving weights in a safe tensor file format 1:09:27 - Creating torch tensor from a byte array and reshaping it 1:13:01 - Transformation of 50,000-dimensional vector into 768 dimensions explained. 1:19:05 - Training sequence prediction model for token prediction 1:22:02 - Exploring matrix dimensions and tensor usage in embedding functions. 1:28:26 - Normalization process in GPT-2 model 1:31:54 - Explanation of linear layers and their components 1:38:33 - Splitting the matrix into Q, K, and V parts for 12 attention heads. 1:41:55 - Splitting Q Matrix and Matrix Multiplication 1:48:37 - Implementing attention masking before softmax calculation 1:51:45 - Setting elements in a Matrix to negative infinity 1:58:28 - Performing weighted sum of V vectors with attention C projection matrix. 2:01:31 - Explaining matrix multiplication in C for GPT-2 from Scratch 2:08:29 - Implementing the linear layer and residual connection in the attention block. 2:11:53 - Describing the Layer Norm and Multi-Layer Perception block 2:18:51 - Implementing multiple layers with input/output connections in a for loop. 2:22:13 - Implementing key components of attention mechanism in GPT-2 model. 2:28:54 - Generating an output matrix based on tokens in the sequence. 2:33:00 - Debugging issues with layer indexing and normalization in model implementation 2:39:27 - Implementing GPT-2 in C involves cross-entropy loss evaluation 2:42:16 - Transition to handling data set IO in C from Python implementation 2:48:10 - Memory allocation preprocessing for data structure in C 2:51:28 - Creating a struct in C to store offset and size of a string 2:57:47 - Writing UTF-8 encoded struct members to a file 3:01:09 - Exploring data manipulation in C and compiling the program 3:07:49 - Discussing memory allocation and file reading strategy 3:11:13 - Explanation of struct allocation with offsets and pointers 3:17:49 - Printing strings in C using offset and size 3:21:44 - Discussing differences in struct size alignment 3:28:12 - Pack and write unsigned shorts to a file in C 3:31:44 - Array of 16-bit unsigned integers used as tokens for file processing. 3:37:51 - Aligning data structures and testing for alignment accuracy. 3:42:02 - Decoding tokens back into strings using an encoding table. 3:48:52 - Optimizing data type consistency for printing operations 3:52:16 - Extracting relevant info from JSON with predictable format 3:58:48 - Working with pointers in C for data offsets 4:02:16 - Extracting start and end offsets from a Json string and converting them to integer values. 4:09:49 - Defining expected size for a tensor 4:13:26 - Testing weight embedding matrix 4:19:26 - Structuring parameters for model in C programming. 4:22:35 - Calculating parameter size based on JSON definition and file structure. 4:29:10 - Discussing token encoding weight matrix in parameter manipulation 4:32:20 - Preparing for matrix transformations in the code. 4:39:18 - Setting up matrices and weights for neural network layers. 4:43:01 - Setting up model parameters and proceeding towards calculations 4:49:23 - Creating pointers to select rows in matrices 4:52:56 - Summing elements of vectors using C code 4:59:41 - Implementing mean and standard deviation calculations in C for GPT-2 model initialization 5:03:04 - Explaining the calculation of variance and standard deviation. 5:09:32 - Implementing attention block in C for GPT-2 5:13:28 - Matrix multiplication in GPT-2 transformation process 5:20:01 - Incrementing the output vector based on weight matrix calculations 5:23:05 - Incrementing values in matrix multiplication. 5:29:31 - Calculating dot product of vectors in C implementation 5:32:53 - Implementation of Q and K vectors in dot product loop 5:40:24 - Implementing Softmax operation for probability values. 5:43:44 - Calculating Softmax in C 5:50:21 - Implementing the V Vector multiplication to generate an output Vector. 5:53:47 - Implementing weighted sum calculations efficiently. 6:00:58 - Identifying and addressing errors in the coding process 6:04:22 - Implementing Softmax function with max value initialization 6:11:47 - Introduction to the attention block and multiplication with the c projection matrix 6:14:51 - Introduction to key components of data processing in C 6:22:45 - Implementing FC call in C for GPT-2 neural network training. 6:25:45 - Implementation of attention block in C from GPT-2 tutorial.
@aleksey2960Ай бұрын
Wow this looks long and hard. Will put these in my “to do” playlists. Also just subbed cos your content looks awesome 🤩
@MCroppered29 күн бұрын
Is your todo playlist also in the 1000s of hours like mine? 😂
@vpn74029 күн бұрын
@@MCroppered yours is only 1000s of hours? 😄
@aleksey296028 күн бұрын
@ yeah my “watch later” playlist had like 1000+ videos in it lol 😂
@harshsharma0328 күн бұрын
that's what she said
@4kumetsu24 күн бұрын
pause
@s-codes1429 күн бұрын
GigaChad
@hypermeero4782Ай бұрын
i sometimes wonder if i'll ever be that smart
@JJGhostHunters29 күн бұрын
Don't be so hard on yourself. It is not always "smart" to recreate the wheel when you have deadlines, life and property that may be dependent upon the stability of a machine learning based system.
@ChristosChristides28 күн бұрын
@@JJGhostHunters Bro is casually spitting philosophy on a random comment on a random video
@gatogordo413128 күн бұрын
But that’s the best part, the endless path of knowledge
@1n4f4bl328 күн бұрын
Building something that already exists is not necessarily smart. However, if it helps this person get an AI job, he might end up overprepared and working under others who know less but hold higher positions. In that sense, it’s not smart at all.. it’s actually quite foolish, but he shows a naive young generosity.
@yotelolailo27 күн бұрын
@@1n4f4bl3 Not everything has to be done to get a job. Some people just like learning and build from scratch just for the pleasure of learning or as a challenge.
@actualBIAS25 күн бұрын
THIS IS THE CONTENT I VE BEEN LOOKING FOR! Haha like Andrej! I love this!
@ibrahimnaser523322 күн бұрын
most basic project required for new grad in 2025
@speedybonsky25 күн бұрын
Premium mental insanity 🔥🔥🔥
@nicholaskomsa177726 күн бұрын
Hello, I think my comments get deleted, but thank you, I have seen 90% and plan to experience it all much more, while writing in C++. I appreciate your data-oriented-c much, the next part in this video is on my radar.
@florianionescu18527 күн бұрын
nah, I'll just continue watching netflix, thank you.
@pkmx-um9vb27 күн бұрын
🤣
@Abhijith-e5t15 күн бұрын
Big L haha
@someone-x64Ай бұрын
this is going to be so much fun.
@omarwagmesАй бұрын
Thank you for this video love from Morocco
@PauloDutraАй бұрын
Bro, this is insane... Thanks
@ditdit-dahdah-ditdit-dah24 күн бұрын
Looks good. Keep it up . Waiting for more.
@qaqkirby978128 күн бұрын
thank you for your sharing. love from china
@gatogordo413129 күн бұрын
Amazing, already subscribed. thanks for sharing this!
@wodniktoja8452Ай бұрын
cant wait to see second part :)))
@MrManGuyManGuy6 күн бұрын
Now time for DeepSeek in C 😁
@danielhemmati29 күн бұрын
i thought it's andrej karpathy
@AmanKumar-jk1qu27 күн бұрын
love you please do more such videos
@mamitianasolofo72326 күн бұрын
Damn it, I do not have time for all of those good things :'(
@antixon29 күн бұрын
Thanks for sharing!😊
@AntonioLopez888825 күн бұрын
This is really what work-life balance means for a dev
@Jemal-k2u20 күн бұрын
sigma
@mahmoudabdelsattar886027 күн бұрын
what is you specialty, this vid was great
@abrarmasumabir3809Ай бұрын
keep doing the good work.
@pedrogutierrez143027 күн бұрын
Legend
@KshitijKaushik-w2g28 күн бұрын
this is crazy af
@ffaheem19 күн бұрын
what's his resume like? I bet this is one of bro side project
@yashsehgal141029 күн бұрын
Pop me some popcorn and I am good to go
@rssszz720823 күн бұрын
Am learning cpp this gose over my head😅
@ogulcan2877Ай бұрын
what is this bro i understand nothin
@ashleigh302128 күн бұрын
Negative IQ
@DriftJunkie28 күн бұрын
Blqck magic
@user-plgmgrs32628 күн бұрын
english please 🥺
@eyzake25 күн бұрын
i am 30 minutes in, kinda remembered this comment (about a year agoi had a math basics for ai course) i did understand the math but rest is going over my head. man i am only here for c but i cant seeeeeeee
@a-medee6 күн бұрын
@@eyzake you can't C
@mbarrioАй бұрын
What is that text editor? SublimeText?
@joefitahiana29 күн бұрын
i think, it's zed editor
@html6927 күн бұрын
its the last important thing
@himalayo26 күн бұрын
@@html69 it does look pretty and comfy, though
@philtoa33422 күн бұрын
Nice.
@anren744528 күн бұрын
madlad
@SyedAsifShah-w4h4 күн бұрын
Please Code Editor name
@lmnottryhard5626Ай бұрын
Bro legend
@pavelyankouski491327 күн бұрын
I would start with the application interface. Then I would use Gemini
@benjaminryan935425 күн бұрын
Does anybody know what IDE/text editor he is using?
@ffaheem19 күн бұрын
such monster really exist. don't they?
@ibrahimnaser523322 күн бұрын
what is your background
@skope2055Ай бұрын
Cool video :)
@uptimehalil29 күн бұрын
I think rocket science is easier 😅
@BURN-ADDiCT28 күн бұрын
As rocket science, I can confirm I am easier
@richardappow6770Ай бұрын
what drawing app are you using here?
@msakg25 күн бұрын
Notability
@shakilkhan430624 күн бұрын
Awesome bro.. ❤
@mahmoudabdelsattar886027 күн бұрын
@na50r24Ай бұрын
I haven't watched the whole video but 'plan' to do it eventually. To confirm, are you just trying to get inference to work? There is a video about a guy doing the MNSIT number identification problem in Minecraft with Red Stone. So the weights he trained in Python and then he had to figure out how to implement something that performs the inference in Minecraft with those weights pre-established. Are you doing the same but now in C with GPT-2? Because while prepping for an ML exam, I also kind of got into implementing things frpm scratch and I ended up figuring out a way to implement backpropgation and got it to work to solve the XOR problem. However, what I implemented was a computational graph, so I had to hardcode matrix multiplications. This video made me think I could first try to implement my CompGraph in C which should be doable and then built on top of it from there and figure out how to connect Matrix Multi to my comp graphs, assuming if that's even possible... If I actually manage to get that to work we have a PyTorch-like thing for C (probably already exists but is a good exercise anyway to check your ML understanding on a lower level)
@StatisticallySpeaking126 күн бұрын
how is this even possible
@StatisticallySpeaking126 күн бұрын
and why are you torturing yourself? Quant dev?
@arunray2986Ай бұрын
I have no idea what's going on😅
@gr4ytxx433Ай бұрын
Bro WTF;
@santiagomartinez341729 күн бұрын
Python would be too slow right?
@programmingpillars680527 күн бұрын
not really , things that are consty in terms on computation are done on cpp even ur using python, bcz the libraries like PyTorch uses cpp in the backend to do the heavy computation ...
@emptycode1782Ай бұрын
Wtf
@surajmandal_56729 күн бұрын
I will come back after 15 days and start the series. 🎉
@KyozАй бұрын
🤍
@Wggwjzjjxmsk25 күн бұрын
Wtf
@bArda2624 күн бұрын
weak, you should have built your own transistors first.
@grcreed548024 күн бұрын
Can anyone tell me what topics to learn to be able to do this ......i know dsa in C