DeepSeek R1 Coldstart: How to TRAIN a 1.5B Model to REASON

Рет қаралды 45,418

Күн бұрын

Пікірлер: 134

@HaraldEngels 11 күн бұрын

I am using DeepSeek releases since over 9 months. The results have been great all the time but are getting better and better. I am running locally on my Linux PC all Qwen based DeepSeek R1 models and they are all great. The 1.5B model works fantastic when you are using it in the q16 variant. It is really killer. Inference is not very fast since I am running all models (from 1.5B up to 32B) on my CPU Ryzen5 8600G WITHOUT a dedicated GPU adapter. The CPU uses up to 40GB of my 64GB RAM for the 32B model. With good prompting the results are fantastic and save me hours of work every day. The dynamic memory allocation of the 8600G is great and allows me to run powerful LLMs with a small budget. My PC has cost me $900.

@Aurelnpounengong 11 күн бұрын

wait you're able to run a 32B model on just your CPU? i have a RTX 4060 TI with 16 gB of VRAM and I'm scared to download a 32B model 😅

@rhadiem 11 күн бұрын

@@Aurelnpounengong The Ryzen5 8600G has a GPU on the processor and can use system memory for VRAM, but much more slowly. (40gb out of the 64gb system memory) He provided the details to research the parts you don't understand.

@gracegoce5295 11 күн бұрын

really ? all this cost you 900 ? 64 gb ram ?

@Aurelnpounengong 11 күн бұрын

@@rhadiem ahhh I see, i did not know it used system emmory as VRAM. I also have 64GB DDR4 memory do you think I'll be able to run a 32B model with my Graphics card with some Memory offset to system memory?

@trevoC132 10 күн бұрын

@@Aurelnpounengong It will run, just slow. I can run a 32b on my 4090, but anything larger and it has to swap in and out of memory which is painful.

@songlining Күн бұрын

I am so glad I have encountered this series. This is real gold. Thanks you so much for the effort. Looking forward to the next episode!

@DriftlessCryptoToo 7 күн бұрын

Chris, this is great. The math training is cool. What we need is a set of coding trained small models that are experts in the top programming languages. Start with Python and Javascript, HTML and CSS. You get the idea. Then, everyone can have a set of these models for the languages they use.

@agenticmark 11 күн бұрын

ive also had luck gettign the model to reflect using - reversing the calculation (math) - writing the documentation while it codes - writing a tutorial while it codes. this is one of the best videos I have seen in some time Chris!

@chrishayuk 11 күн бұрын

Awesome, so glad you’ve seen similar results

@greghampikian9286 11 күн бұрын

Thanks for answering all the basic questions I had. Great teaching style, even for the non-programmer.

@chrishayuk 11 күн бұрын

Glad it was useful, I had a lot of fun making this video

@bytemoney5655 7 күн бұрын

this is my official go to youtube channel thanks man for these videos

@chrishayuk 3 күн бұрын

thank you, glad it's useful

10 күн бұрын

Excellent ! Bravo! I am spending hours analyzing how DS1-R 32B works with my 4090. I am getting amazing results everyday...

@aaronabuusama 11 күн бұрын

It would be awesome if you did a tutorial on fine tuning a reasoning model with tool calling abilities

@chrishayuk 11 күн бұрын

That is a really good shout, I will do that

@zacharielaik8652 11 күн бұрын

Yes that would be awesome !

@punchster289 11 күн бұрын

yes! i want to train a model for z3 use when doing logical reasoning. very powerful solver

@OpenAITutor 9 күн бұрын

Hey Chris, Great video. Really enjoy the way you teach. Keep up the good work. Can't wait for your next video on RLHF.

@PunitPandey 6 күн бұрын

Thanks for the reply Chris. I was able to run your code. I had to make slight adjustments as I am on Windows / RTX 4090 but finally I have my aha moment. I was able to train and infer from my first reasoning model. THANKS once again for the tutorial.

@chrishayuk 6 күн бұрын

Awesomeeeeeee, it’s really a great feeling when you train the model and it’s reasoning and doing better than much bigger models, glad it helped. I’m hoping to have for the next video and dual trialing thing that works for Mac and Windows

@mikeparker2486 7 күн бұрын

R1 has five main advantages: *1) it gives you the reasoning behind its thoughts, you can find and tell it to correct it's mistake if you find one 2) it is much more DEPLORABLE it's nothing short of "first invented Personal Computer PC"!! You don't have to have a huge Data center or large amount of GPUs to run it, in fact, you can even run it on your phone without internet 3) it is cheaper and faster than O1 4) most of all it is free 5) open source so you can open you can edit it update it any way you like* Any of the reasons above should be a game changer by itself but combination of five you got a stock crash like yesterday

@PhilWeinmeister 10 күн бұрын

I may no longer be at IBM, but I was curious to hear your thoughts on DeepSeek. Very insightful video, thanks!

@hareram4233 7 күн бұрын

Why left IBM

@PhilWeinmeister 7 күн бұрын

@@hareram4233 Ha - that's for a chat over a pint...

@d.d.z. 11 күн бұрын

Keep doing helpful videos Chris 😊

@chrishayuk 11 күн бұрын

Always, glad it was useful, I was particularly happy with this one

@PunitPandey 11 күн бұрын

Great video. Looking forward to RL video.

@chrishayuk 11 күн бұрын

Coming soon!

@wwkk4964 11 күн бұрын

Brilliant work! Yes i do remember you mentioning that o1 was mcts and and r1 was not. I agreed with you that r1 was surely not, will be exciting to see if o1 or o3 used similar techniques or used mcts!

@chrishayuk 11 күн бұрын

I’m 100 percent convinced that o1 is using search (specifically mcts) at inference time, and I’m 100% convinced that R1 will do the same in a future release when they figure it. But the results they’ve gotten without it, is pretty incredible

@wwkk4964 11 күн бұрын

@chrishayuk It just blows my mind every time I think about it still! That one can converge through search or learning at these endpoints so long as one is bootstrapped with some notion of correctness! Your demo was incredible work. thanks again.

@chrishayuk 11 күн бұрын

thank you, yeah, i came up with the concept of getting the compiler to do the calc, and the ai to do the explanation, a while back, i think i did a video on this in june 2024. so it seemed a natural fit when i saw the long chain of thought coldstart piece from deep seek. felt like a good merge. i was blown away also on how good the results were

@dusknone 17 сағат бұрын

no reinforcement, just coldstart. wow... definitely going to look into this with other than just math related data

@sakchais 7 күн бұрын

This video has been amazing. I look forward to the RL video.

@chrishayuk 3 күн бұрын

RL video is cool, i promise, i just can't record as sick at the moment, frustrating

@sakchais 3 күн бұрын

@@chrishayuk Get well soon buddy!

@chrishayuk 3 күн бұрын

thank you, just a cold or a flu or something, but frustrating. appreciate the well wishes

@geocorpsys 10 күн бұрын

Thank you Chris. I am hoping I will be able to replicate this on my old windows laptop. I want to be able to train a base model from scratch like you did here.

@cryptonianbond 7 күн бұрын

Amazing. Thank you so much for this.

@chrishayuk 3 күн бұрын

awesome, glad it was useful

@kishoretvk 11 күн бұрын

hello Chris Hay ! this is crazy, you made this amazing tutorial. thats mind blowing. while openAI is cloased, open source community is actually builidng it openly for community. although comanies like deepseek are validation and inspring. community is doing its own discovery. you are very inspring as well. thanks again for a wonderful video

@chrishayuk 11 күн бұрын

Thank you, I appreciate it, I was pretty pleased with this one, glad it’s useful

@kishoretvk 10 күн бұрын

@@chrishayuk we might not need MOE now , as we need only cold start data for different tasks 1. fuction calling 2. coding 3. summarization 4. role play 5. nlq and others we can do this on colab as its 1.5b, its going to crazy

@chrishayuk 10 күн бұрын

It’s cool right

@waneyvin 6 күн бұрын

👍can't wait for the RL part, BTW, can you share the prompt as well?

@chrishayuk 3 күн бұрын

the rl video is coming, just sick at the moment, so can't record, frustrating

@waneyvin 3 күн бұрын

@@chrishayukSorry to hear that. Hope you recover quickly! Rest up and take care.

@chrishayuk 3 күн бұрын

@@waneyvin just a cold or a flu or something, but frustrating. appreciate the well wishes

@phillipneal8194 9 күн бұрын

Thank you for a great presentation, especially for your explanation and examples of the "cold start' part. The 'Incentivizing' paper and the technical report are heavy going, especially the reinforcement learning algorithm. When will you have a video out explaining the RL algorithm ?

@chrishayuk 3 күн бұрын

thank you, yeah the RL video will be soon, sick at the moment, frustrating, but i'm pretty pleased with where the RL video will be

@midcore2071 12 сағат бұрын

Great content. unSloth is an excellent framework for training. You can create the same/potentially better COT reasoning using an advanced system prompt in an Ollama Modelfile and quickly turn most Ollama supported models into reasoning models using the Ollama create command. I’ve been using the technique for about one month now and it’s works surprisingly well. No qlora training required. The outputs are very similar to DeepSeek R1. My most recent success was using this technique on the most recent Mistral-Small LLM. Wondering if anybody else has figured this out or achieved similar results with reasoning system prompts.

@mrd6869 11 күн бұрын

One cool addition.i use TwinMind AI on screen assistant to explain what your doing exactly,as i watch the video.(Reads transcript im guessing) Anyway it makes understanding the topic far easier.

@chrishayuk 11 күн бұрын

oooh, that sounds pretty sweet

@Kaushik-RoyChowdhury 7 күн бұрын

From a creator point of view I am interested in knowing how do you manage to superimpose screen recording over yourself speaking in the background ! The video is quite informative off course.

@lenreinhart2020 2 күн бұрын

Very informative video, I look forward to the next one. I am currently running the 32B version of R1 and I asked it about persistence of what it learned during our session and it said that unless I saved the session and fed it back, it was lost. It suggested using: '''bash ollama generate --model your_model_name | tee chat_history.txt ''' Is there any other way you know of for getting it to learn without re feeding everything back to it after restart? It also said it did not have access to any files on my computer and it would take modifications to get it to do this by itself.

@sumitmamoria 10 күн бұрын

Good work. One tiny suggestion - Maybe try using word-wraps for long lines , for better readability when watching a video.

@seanplynch 11 күн бұрын

Fantastic, well done

@chrishayuk 11 күн бұрын

Thank you! Cheers!

@usget 11 күн бұрын

Can a reasoning model figure out that it doesn't know something, and ask for inputs? Or could it be trained to ask?

@chrishayuk 11 күн бұрын

That’s an awesome idea

@johntdavies 9 күн бұрын

I'm sure you're aware ot the Qwen-maths models but using these reasoning techniques it would be interesting to see if a small (Qwen2.5-1.5b) model could be trained to reason geometry or integration in the same was a mathematician would simply apply the rules they know to see what fits. I think the only limitation with this is the size of the context. I put the DeepSeek-R1-7b (Q4) on my phone and it was good but limited. I increased the context to 8192 and wow, it solved things o1 struggled with and failed.

@ianhaylock7409 11 күн бұрын

14:52 isn't the answer it gives here incorrect?

@ApolloGemini11 10 күн бұрын

Awesome video 👏🏼👏🏼👏🏼

@danson3038 9 күн бұрын

excellent!

@chrishayuk 3 күн бұрын

Thanks!

@SDGwynn 10 күн бұрын

Very much appreciate your videos. Thank you. I noticed your training data jsonl format is different than your validation and test jsonl format. Could you please explain?

@PunitPandey 6 күн бұрын

Hi Chris, do you have your training dataset available on github as well? I am not able to find it out. Putting it somewhere will be really helpful in following your instructions.

@chrishayuk 6 күн бұрын

Yeah it’s in the verifiers repo

@AndyHuangCA 11 күн бұрын

Given that the intention is not so much to train new knowledge, but synthesize chain of thought capabilities on existing models, how good would it work if we were to use R1 to generate a bunch of non-math questions/thinking/answers input output as the cold start seed?

@chrishayuk 11 күн бұрын

That’s pretty much what happens with the RL stage.. but I also think you can use verifiers to do this well also

@AndyHuangCA 11 күн бұрын

@@chrishayuk Thanks! I was playing around with Granite 3.1 MoE 3B, found it to be insanely fast even on CPU only. I'd be really curious to see how much "intelligence" we can extract from smaller MoE models like that by synthesizing chain of thought. I'll have to find some time to play around and see what could be extracted. I'm thinking a semi-capable thinking model, with MCP (thanks to your MCP-CLI project), that requires no GPU will be a very powerful local assistant!

@EliSpizzichino 10 күн бұрын

Can you actually fine tune DeepSeek R1? I see you used Qwen-2.5

@aperson1181 3 күн бұрын

which deep seek model is better to download?

@rodnet2703 10 күн бұрын

Thanks for the info! I followed your instructions and it’s training the model but it’s pretty slow on my M1 Mac. Is there a similar software for Linux that I can coldstart train the model on a VPS?

@Memsido 2 күн бұрын

What HW specs do you use for training? Thank you🙏

@chrishayuk 2 күн бұрын

macbook pro m3 max

@blue-y3r 11 күн бұрын

Are you saying there is a math compiler in deepseek R1 ? Its open source, so that can be checked

@chrishayuk 11 күн бұрын

They said in the paper they use a math verifier

@blue-y3r 11 күн бұрын

In your newly trained qwen model. What is the verifier step doing? Since there is no math compiler in qwen

@chrishayuk 11 күн бұрын

I’m not verifying yet, I’ll do that in the RL stage in the next video. I’m just generating long and accurate chain of thoughts for coldstarting training

@snehmehta 11 күн бұрын

Hi Chris, it's pretty cool thanks for sharing. can we try to generate the cold start data from deepseek-r1-zero just like the paper and train lora, what do you think of that?

@chrishayuk 11 күн бұрын

Yes, I plan to do a pure version with RL, so will do that when I have that ready (which should be very soon)

@snehmehta 11 күн бұрын

@@chrishayuk that would be great! I would like to contribute in researching, writing script or generating data if possible

@santoshtelwane1776 10 күн бұрын

WOW Superb

@andrewcameron4172 11 күн бұрын

How about a video on creating a jsonl to finetune a model to write computer code

@chrishayuk 11 күн бұрын

Yeah I plan do a new one on that using verifiers

@barefeg 4 күн бұрын

Is this RL though or just SFT?

@chrishayuk 4 күн бұрын

RL is the next video, this is SFT with long chain of thoughts, i.e. the coldstart

@barefeg 4 күн бұрын

@ awesome can’t wait to! Btw what hardware are you using?

@chrishayuk 4 күн бұрын

i was hoping to record this weekend, but got a sore throat, so i'm a few days away from recording. i think the RL version is pretty cool, i think you will like. macbook pro m3 max

@danson3038 9 күн бұрын

a video on local agentic ide please.

@blue-y3r 11 күн бұрын

So what you are saying is that R1 will not perform good on non-logical and non maths like queries, where they cant use a verifier? Like what if I want to use R1 in a healthcare domain?

@chrishayuk 11 күн бұрын

Nope, because verifiers work for that also, which I’m gonna show in an upcoming video

@cheesehead9980 7 күн бұрын

i’ve found that the 1.5B model is usually terrible with math or calculation, but it has extensive capabilities in generating humanlike thoughts in an eerie way. don’t play mind games with it unless u wanna spook yourself

@wrusty3767 5 күн бұрын

Chains-of-thought, surely, and not Chain-of-thoughts?

@chrishayuk 4 күн бұрын

lol, no clue, tbh

@andrewcameron4172 9 күн бұрын

Have a look at the Open R1 repo from huggingface as they work with the community to replicate deepseek r1 datasets etc

@user-qe2ps9vm9o 11 күн бұрын

Is NVDA going to die?

@chrishayuk 11 күн бұрын

I think a new grand theft auto game is coming out, they’ll be fine

@leophysics 4 күн бұрын

7b model run on my hplaptop 16gb ram i5 intel no graphics card

@dalsenov 11 күн бұрын

This resembles "first principle" , -Don't teach me how to reason, I will find it myself!

@chrishayuk 11 күн бұрын

Exactly

@anubisai 11 күн бұрын

N.Ireland / N.American accents is wild.

@chrishayuk 11 күн бұрын

Agreed, love those accents. Mine is Scottish though

@wwkk4964 11 күн бұрын

@@chrishayukhaha :)

@mrd6869 11 күн бұрын

@@chrishayuk.u look like a musician that got into AI😂.Like I can see you on a synthesizer in a music video.

@chrishayuk 11 күн бұрын

hahaha, i'm terrible at music.. but i think there is a lot of synergies. i like using lots of tools and techniques and meshing them together

@clarkcampbell1110 7 күн бұрын

As a fellow Scot - many’s the time I’ve had (usually American tourists) ask “which part of Ireland are you from”. I’m always kind & say I’m Scots. When they get embarrassed I explain the accents can be similar & at its closest point there’s only 12 miles between Ireland & Scotland. If they comment that I’m quite understandable for a Scotsman - I’ll throw in a bit o’ auld Scots leid tae mak a muckle ow ther heids.😂

@did28 11 күн бұрын

real open ai

@drudru3591 7 күн бұрын

Nobel prize for the china man

@HiteshKrishanKumar 11 күн бұрын

*_Who do you think will win the AI race: China or the US? Please reply._*

@chrishayuk 11 күн бұрын

I don’t believe there will be a winner… I believe the game is an infinite game, and players will join and drop off. There are no winners….

@HiteshKrishanKumar 11 күн бұрын

@ Don't you think so it will be like space race?

@llIllIlI 11 күн бұрын

@@HiteshKrishanKumar To what finish line? AI is already here and people use it every day.

@EliSpizzichino 10 күн бұрын

unfortunately, I think, it's a military race, and we'll never know for sure until it's too late. For the general public, open-source model will win, this video shows it already pretty much

@aiknownc 10 күн бұрын

Unlike the space and nuclear arms race where spies were the only way to get the latest technological advances, DS has OPEN SOURCED everything they did to produce this model. Imagine how much faster the space/nuclear arms race would have been in that case! Open Source has been the biggest if not nearly the biggest accelerator for AI advancement in my opinion, especially within the last ~2 years.

@dmalex321 11 күн бұрын

Wait a minute.. you used a how many billion parameter LLM to solve what a card-sized Casio calculator could solve in the 80s?

@agenticmark 11 күн бұрын

one is hardware one is ml ml can do things hardware cant. generalize.

@aiknownc 10 күн бұрын

Obviously this is a toy example. The purpose is to explain how to generate accurate synthetic Chain of Thought data to use during the training process, which is quite valuable. Even better, he walks through it end to end within the context of DeepSeek's COLDSTART methodology.

@LokeKS 11 күн бұрын

how to do this in windows? i guess peft from huggingface. cool.

@agenticmark 11 күн бұрын

bitsnbytes releases (bnb) many small models for ollama on windows/linux and yeah peft adapters. i am pretty impressed with mac ml, but I cant imagine not being on linux with direct access to my 4090!

@chrishayuk 11 күн бұрын

I’ll do a regular PyTorch video for the next one

@LokeKS 10 күн бұрын

@@chrishayuknice

@LokeKS 7 күн бұрын

cool @@chrishayuk