For someone who can not afford that much on reading paper, your video is just amazing. You are an excellent teacher.
@ArunRamakrishnan16 күн бұрын
This was brilliant analysis from a systems perspective. Looks familiar to how we designed our large NUMA machines in terms of prefetchers, branch predictors, directory caches and what not.
@SiD-hq2fo17 күн бұрын
amazing video, the quality of content is insane, please keep them coming!
@TrelisResearch17 күн бұрын
cheers! appreciate that
@86dansu17 күн бұрын
What a solid video ❤
@wilfredomartel778111 күн бұрын
Has anyone successfully executed the model? Perhaps there's a way to further optimize this Kraken of a model?
@TrelisResearch11 күн бұрын
What do you mean? I show running it on remote GPUs. For local check out unsloth but you still need lots of compute
@rupnikj3 күн бұрын
great job! keep it up!
@linuxdevops72617 күн бұрын
I love the way you explain things I appreciate that
@wayne886317 күн бұрын
US puts sanctions on GPUs, but Chinese company uses limited resources to build a powerful model and open source it to the world.
@Pure_Science_and_Technology17 күн бұрын
Yup, this has been going on for some time. Models like this are definitely putting pressure on the big US tech giants, forcing them to slow down. I’ve got an inference server with 4 Nvidia RTX 6000 ADA GPUs, but even with the INT4 quantized version, I can’t run DeepSeek-V3 locally. Guess I’ll invest in these online GPU rental companies. Lol.
@pewpewpistol17 күн бұрын
what's insane is that US keeps compulsively sanctioning
@ArunRamakrishnan16 күн бұрын
The sanctions wont simply work as the mathematical models are trying to seek computational solutions and at some point they can jump purely to CPU based ones if needed given enough cores and vector processing assistance. All the supercomputers started with custom accelerators (Blue Gene etc) and these AI clusters will go down the same given the ginormous scarcity for GPU boards.
@dr.mikeybee9 күн бұрын
@@pewpewpistol rather than supporting Nvidia, we forced the Chinese to create a competitor. I don't see how this is helpful.
@brentknight931817 күн бұрын
Super helpful
@mmasa117 күн бұрын
instruction following in English definitely not there.. as for chinese, it seems much better. I wonder if these stats were made using chinese?
@TrelisResearch17 күн бұрын
Those stats are mostly English benchmarks. But yeah the vibes seem to be that instruction following is not as good as Claude in English
@DDD-yi2ze16 күн бұрын
Did you test yourself or you just reading slides?
@TrelisResearch16 күн бұрын
Howdy! what dyou mean?
@Little-bird-told-me17 күн бұрын
Why do I get this feeling that models are becoming of programmers, by the programmers, for the programmers.
@TrelisResearch14 күн бұрын
Ah yeah I see now what you mean. Indeed it’s quite coding focused in terms of performance - although that tends to help everything
@Little-bird-told-me11 күн бұрын
@@TrelisResearch Do you think the future model would be based on clever design rather than brute force compute. Looks like we are entering the territory of law of diminishing returns, and Deepseek seems to proving it
@TrelisResearch9 күн бұрын
@@Little-bird-told-me well there is tons of room left for improvement. Human brains are far more energy efficient per unit compute than LLMs.
@slyracoon2317 күн бұрын
Do you have a patron or something. I want to support the channel and the content but buying a life-time membership is a little too much.
@TrelisResearch16 күн бұрын
Thanks! Yeah I try to focus on just products rather than be donation based. There’s still this kofi here though: ko-fi.com/trelisresearch
@slyracoon2315 күн бұрын
@ just donated. Keep up the good work!
@semtex641217 күн бұрын
try asking it "what's the most significant and memorable event in China in 1989?" i promise you it won't answer it
@suresht188517 күн бұрын
😅
@TrelisResearch14 күн бұрын
It’s true the censorship can be an issue
@GIRcode13 күн бұрын
It's actually only an issue if you use Chinese API sources. DeepseekV3 is less censored at the training level than previous Chinese llms.
@TrelisResearch12 күн бұрын
@@GIRcode yeah actually that's an important distinction