How Deepseek v3 made Compute and Export Controls Less Relevant

  Рет қаралды 4,443

Trelis Research

Trelis Research

Күн бұрын

Пікірлер: 31
@PunitPandey
@PunitPandey 17 күн бұрын
For someone who can not afford that much on reading paper, your video is just amazing. You are an excellent teacher.
@ArunRamakrishnan
@ArunRamakrishnan 16 күн бұрын
This was brilliant analysis from a systems perspective. Looks familiar to how we designed our large NUMA machines in terms of prefetchers, branch predictors, directory caches and what not.
@SiD-hq2fo
@SiD-hq2fo 17 күн бұрын
amazing video, the quality of content is insane, please keep them coming!
@TrelisResearch
@TrelisResearch 17 күн бұрын
cheers! appreciate that
@86dansu
@86dansu 17 күн бұрын
What a solid video ❤
@wilfredomartel7781
@wilfredomartel7781 11 күн бұрын
Has anyone successfully executed the model? Perhaps there's a way to further optimize this Kraken of a model?
@TrelisResearch
@TrelisResearch 11 күн бұрын
What do you mean? I show running it on remote GPUs. For local check out unsloth but you still need lots of compute
@rupnikj
@rupnikj 3 күн бұрын
great job! keep it up!
@linuxdevops726
@linuxdevops726 17 күн бұрын
I love the way you explain things I appreciate that
@wayne8863
@wayne8863 17 күн бұрын
US puts sanctions on GPUs, but Chinese company uses limited resources to build a powerful model and open source it to the world.
@Pure_Science_and_Technology
@Pure_Science_and_Technology 17 күн бұрын
Yup, this has been going on for some time. Models like this are definitely putting pressure on the big US tech giants, forcing them to slow down. I’ve got an inference server with 4 Nvidia RTX 6000 ADA GPUs, but even with the INT4 quantized version, I can’t run DeepSeek-V3 locally. Guess I’ll invest in these online GPU rental companies. Lol.
@pewpewpistol
@pewpewpistol 17 күн бұрын
what's insane is that US keeps compulsively sanctioning
@ArunRamakrishnan
@ArunRamakrishnan 16 күн бұрын
The sanctions wont simply work as the mathematical models are trying to seek computational solutions and at some point they can jump purely to CPU based ones if needed given enough cores and vector processing assistance. All the supercomputers started with custom accelerators (Blue Gene etc) and these AI clusters will go down the same given the ginormous scarcity for GPU boards.
@dr.mikeybee
@dr.mikeybee 9 күн бұрын
@@pewpewpistol rather than supporting Nvidia, we forced the Chinese to create a competitor. I don't see how this is helpful.
@brentknight9318
@brentknight9318 17 күн бұрын
Super helpful
@mmasa1
@mmasa1 17 күн бұрын
instruction following in English definitely not there.. as for chinese, it seems much better. I wonder if these stats were made using chinese?
@TrelisResearch
@TrelisResearch 17 күн бұрын
Those stats are mostly English benchmarks. But yeah the vibes seem to be that instruction following is not as good as Claude in English
@DDD-yi2ze
@DDD-yi2ze 16 күн бұрын
Did you test yourself or you just reading slides?
@TrelisResearch
@TrelisResearch 16 күн бұрын
Howdy! what dyou mean?
@Little-bird-told-me
@Little-bird-told-me 17 күн бұрын
Why do I get this feeling that models are becoming of programmers, by the programmers, for the programmers.
@TrelisResearch
@TrelisResearch 14 күн бұрын
Ah yeah I see now what you mean. Indeed it’s quite coding focused in terms of performance - although that tends to help everything
@Little-bird-told-me
@Little-bird-told-me 11 күн бұрын
@@TrelisResearch Do you think the future model would be based on clever design rather than brute force compute. Looks like we are entering the territory of law of diminishing returns, and Deepseek seems to proving it
@TrelisResearch
@TrelisResearch 9 күн бұрын
@@Little-bird-told-me well there is tons of room left for improvement. Human brains are far more energy efficient per unit compute than LLMs.
@slyracoon23
@slyracoon23 17 күн бұрын
Do you have a patron or something. I want to support the channel and the content but buying a life-time membership is a little too much.
@TrelisResearch
@TrelisResearch 16 күн бұрын
Thanks! Yeah I try to focus on just products rather than be donation based. There’s still this kofi here though: ko-fi.com/trelisresearch
@slyracoon23
@slyracoon23 15 күн бұрын
@ just donated. Keep up the good work!
@semtex6412
@semtex6412 17 күн бұрын
try asking it "what's the most significant and memorable event in China in 1989?" i promise you it won't answer it
@suresht1885
@suresht1885 17 күн бұрын
😅
@TrelisResearch
@TrelisResearch 14 күн бұрын
It’s true the censorship can be an issue
@GIRcode
@GIRcode 13 күн бұрын
It's actually only an issue if you use Chinese API sources. DeepseekV3 is less censored at the training level than previous Chinese llms.
@TrelisResearch
@TrelisResearch 12 күн бұрын
@@GIRcode yeah actually that's an important distinction
LiteLLM - One Unified API for for all LLMs
17:37
Trelis Research
Рет қаралды 1,7 М.
Advanced Embedding Models and Techniques for RAG
49:45
Trelis Research
Рет қаралды 858
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
Build anything with DeepSeek V3, here’s how
14:34
David Ondrej
Рет қаралды 223 М.
Reasoning Models and Chinese Models
33:21
Trelis Research
Рет қаралды 1,1 М.
Evolution of software architecture with the co-creator of UML (Grady Booch)
1:30:43
The Pragmatic Engineer
Рет қаралды 113 М.
How to Build Effective AI Agents (without the hype)
24:27
Dave Ebbelaar
Рет қаралды 19 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Turn ANY Website into LLM Knowledge in SECONDS
18:44
Cole Medin
Рет қаралды 112 М.
AI Is Not Designed for You
8:29
No Boilerplate
Рет қаралды 345 М.
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН