How Deepseek v3 made Compute and Export Controls Less Relevant

Рет қаралды 4,443

Trelis Research

Күн бұрын

Пікірлер: 31

@PunitPandey 17 күн бұрын

For someone who can not afford that much on reading paper, your video is just amazing. You are an excellent teacher.

@ArunRamakrishnan 16 күн бұрын

This was brilliant analysis from a systems perspective. Looks familiar to how we designed our large NUMA machines in terms of prefetchers, branch predictors, directory caches and what not.

@SiD-hq2fo 17 күн бұрын

amazing video, the quality of content is insane, please keep them coming!

@TrelisResearch 17 күн бұрын

cheers! appreciate that

@86dansu 17 күн бұрын

What a solid video ❤

@wilfredomartel7781 11 күн бұрын

Has anyone successfully executed the model? Perhaps there's a way to further optimize this Kraken of a model?

@TrelisResearch 11 күн бұрын

What do you mean? I show running it on remote GPUs. For local check out unsloth but you still need lots of compute

@rupnikj 3 күн бұрын

great job! keep it up!

@linuxdevops726 17 күн бұрын

I love the way you explain things I appreciate that

@wayne8863 17 күн бұрын

US puts sanctions on GPUs, but Chinese company uses limited resources to build a powerful model and open source it to the world.

@Pure_Science_and_Technology 17 күн бұрын

Yup, this has been going on for some time. Models like this are definitely putting pressure on the big US tech giants, forcing them to slow down. I’ve got an inference server with 4 Nvidia RTX 6000 ADA GPUs, but even with the INT4 quantized version, I can’t run DeepSeek-V3 locally. Guess I’ll invest in these online GPU rental companies. Lol.

@pewpewpistol 17 күн бұрын

what's insane is that US keeps compulsively sanctioning

@ArunRamakrishnan 16 күн бұрын

The sanctions wont simply work as the mathematical models are trying to seek computational solutions and at some point they can jump purely to CPU based ones if needed given enough cores and vector processing assistance. All the supercomputers started with custom accelerators (Blue Gene etc) and these AI clusters will go down the same given the ginormous scarcity for GPU boards.

@dr.mikeybee 9 күн бұрын

@@pewpewpistol rather than supporting Nvidia, we forced the Chinese to create a competitor. I don't see how this is helpful.

@brentknight9318 17 күн бұрын

Super helpful

@mmasa1 17 күн бұрын

instruction following in English definitely not there.. as for chinese, it seems much better. I wonder if these stats were made using chinese?

@TrelisResearch 17 күн бұрын

Those stats are mostly English benchmarks. But yeah the vibes seem to be that instruction following is not as good as Claude in English

@DDD-yi2ze 16 күн бұрын

Did you test yourself or you just reading slides?

@TrelisResearch 16 күн бұрын

Howdy! what dyou mean?

@Little-bird-told-me 17 күн бұрын

Why do I get this feeling that models are becoming of programmers, by the programmers, for the programmers.

@TrelisResearch 14 күн бұрын

Ah yeah I see now what you mean. Indeed it’s quite coding focused in terms of performance - although that tends to help everything

@Little-bird-told-me 11 күн бұрын

@@TrelisResearch Do you think the future model would be based on clever design rather than brute force compute. Looks like we are entering the territory of law of diminishing returns, and Deepseek seems to proving it

@TrelisResearch 9 күн бұрын

@@Little-bird-told-me well there is tons of room left for improvement. Human brains are far more energy efficient per unit compute than LLMs.