Mercedes BENZ: Small LM for In-Vehicle Function Calling

Рет қаралды 1,124

Күн бұрын

Пікірлер: 13

@code4AI 3 сағат бұрын

With the automatic audio dubbing from KZbin /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.

@coldlyanalytical1351 Күн бұрын

Anecdote: I was once working on an oil-rig single board computer .. which was simply too weak for what it needed to do. After about 4 weeks and a major software rewrite it was ALMOST usable but not quite ... I had run out of road. By chance, one of the hardware techies came by and saw my despair. He told me to wait a moment and then returned with a replacement processor chip which was maybe 5x faster than the original. Problem solved! Apparently he was an official beta tester of Motorola CPUs! Moral of the story : we may need to optimise today - but hardware and software will get better and faster tomorrow!

@coldlyanalytical1351 Күн бұрын

Re: the need for all this compression work : I used to work for BMW Research, and I found most engineers aim for small, fast code. This is an ingrained habit which is probably not really needed : Nvidia is about to produce a $3000 retail powerful AI processor box which can do what a car needs without needing tiny models. In quantity Mercedes could possibly get a smaller version in quantity for maybe $2000. This is not a huge sum, especially for fancy cars.

@shaneoseasnain9730 Күн бұрын

It feels like gene editing. Interesting video

@coldlyanalytical1351 Күн бұрын

Excellent - I have been waiting for this video for months! Everyone is aiming for HUGE models ... but tiny models will have great opportunities too! I have tried machine control LLMs where I get the LLM to emit and process special strings success as [1,5] which are picked up or emitted by a C wrapper to interface with I/O. That said, cars might need medium size rather than tiny models. The tiny models will be need for toasters and cookers.

@andikunar7183 Күн бұрын

Very cool video, thanks!!! A few thoughts (I'm not an automotive expert at all, but have a tiny bit of embedded background): Q4_0 can be accelerated nicely on current arm CPUs (llama.cpp was able to accelerate PP 2-3x and also TG a bit by using special arm CPU-instructions for GEMM/GEMV operations), a GPU/NPU helps not much for SLM inference as TG/token-generation is mainly limited by RAM-bandwidth. A current NVIDIA Jetson Orin Nano 4GB embedded module (50GB/s memory-bandwidth, with its 64-bit bus) would be a platform with approx. the performance constraints mentioned for SLM token-generation in this paper. To me, it does NOT seem like old/cheap hardware. It's GPUs could be usd for other tasks like e.g. vision. Probably the SLM runs on hardware dedicated to "dashboard" centric features - a Jetson Orin Nano would already be a VERY luxurious processor for this. Servus aus Wien

@Tuscani2005GT 23 сағат бұрын

Great video! Can you the link to paper(s) in the description for these types of videos please?

@irbsurfer1585 23 сағат бұрын

Tiny LMs? Yes please!

@ozne_2358 21 сағат бұрын

From a Lossless (~1.5:1) Compression Algorithm for Llama2 7B Weights to Variable Precision, Variable Range, Compressed Numeric Data Types for CNNs and LLMs on vixra (re-arrange the letters)

@fdavis1555 Күн бұрын

Interesting concept!

@wwkk4964 Күн бұрын

Thabks for sharing!

@tantzer6113 19 сағат бұрын

“ … while preserving its ability to handle both general language tasks and specific vehicle functions.” One wonders then what, if anything was lost. The answer may be that a language model is trained to handle not only general language tasks, but many domains that would be of little interest inside a car, such as vegan cooking, theoretical physics, or the rise and fall of the Abbasid empire. So, there must be plenty of domain-specific parameters that could be either eliminated or divested of their domain-specific knowledge and be redirected towards the domain of interest, in this case, warming the car seat to ensure warm buttocks. So, to reduce the size of a model, an efficient approach might be to figure out the locations of specific domains in the model and remove them surgically. Yet, this might not work if domains are not localized within the model.