These small models are not only good for low memory situations but also where you can have multiple models run at once. Work is being done where you can run 405B by loading and unloading layers (epochs) in small memory configurations to run more advanced models much slower and run these small models for routing and interactivity at the same time. All this could be done locally in situations where you don’t want to send the data it is working with (like personal information) off the device.
@samwitteveenai2 ай бұрын
Very good point about multiple models, totally agree.
@comfixit2 ай бұрын
Yes please a video on fine tuning these models would be awesome. Also videos showing the tiny models running on edge devices and or in browser would be super cool as well.
@i_accept_all_cookies2 ай бұрын
This is great news! Can't wait to start using the lightweight models.
@ibrahimhalouane81302 ай бұрын
No intro no music right to the point amazing work Sam.I wish to know your opinion about unsloth ?
@samwitteveenai2 ай бұрын
I love unsloth. Its a simple but good way for people to do LoRAs
@aminzarei15572 ай бұрын
Hey Sam, Great video 👌 Will be waiting for fine-tuning 1b json in and out
@samwitteveenai2 ай бұрын
yeah thats a good use case.
@SirajFlorida2 ай бұрын
11 and 90B make since because it's 3b and 20B vision parameters respectively? That's what I would guess right off the bat.
@IvarDaigon2 ай бұрын
Another obvious use case for the mini models is moderation. APIs like OpenAI require you make a moderation call before making the inference call which means two round trips to the server before you get any content you can show to the user. If you can do moderaion on device, then you only need one round trip, making your realtime chats appear faster to the user. Moderation, routing, summarization = mini models for the win.
@autoflujo2 ай бұрын
Nice video! It would be awesome if you can make a video of how to fine tune these small models.
@chenqu7732 ай бұрын
Thank you for this quick update Sam! BTW, "QWen" should probably be pronounced as "qian wen" in original Chinese with the hidden meaning of "capable of answering to thousands of questions". 😀
@samwitteveenai2 ай бұрын
lol I tried to pronounce it like their devrel guy does. Is there an audio some where I can hear it ?
@jmspat14b2 ай бұрын
A video on how to finetune these small models would be great! By the way, being from Denmark I always test these models in Danish as well as in English. Llama 3.2 3B is by far the best small, multilingual model I have tested - far better than Gemma 2 2B!
@pozytywniezakrecony1512 ай бұрын
they all kinda fail in Polish :D but well, in english it's quite nice
@samwitteveenai2 ай бұрын
ohh that is super interesting to know. Is Danish one of the 8-9 prioritized languages or is it just getting better at European languages in general I wonder.
@pozytywniezakrecony1512 ай бұрын
@@samwitteveenai It appears it doesn't understand some language rules or I am using too small models - tried o1-mini:latest / DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.i1-Q4_K_M.gguf:latest / Qwen2.5-14B_Uncencored-Q6_K_L.gguf:latest . I.e. I asked all to write me 4 verse poems in Polish about "Bocian" . It does create some correct lines but in the middle it mixes wrong words here and there and most of the time it doesn't make sense like it would be saying a story of sort. Here o1 mini : Bocian wysoki, z wody unosi się swobodnie, Czerwone dzióbki biały kaptur trzyma. Lecąc lecieli nad pól i lasów brzegi, Piekne słońce oświetla mu skrzydła jak diamenty." "Lecąc lecieli" sounds bad :) It's like "Flying they flew ...."same word repeated. However I think this one is quite good compared to the other output 3/4 actually.
@jmspat14b2 ай бұрын
@@samwitteveenai I feel the need to clarify that its abilities are, of course, no where near what it is in English. But it is the first small language model I have tried, that is able to produce a Danish summary of a Danish text, which is mostly correct and coherent. It does still suffer from making up words (I think it sometimes confuses Danish with Swedish and Norwegian), but gemma 2 and other models are much worse in this regard. Also, its knowledge regarding Denmark is very limited - as you would expect for such a small model, I suppose. If for example I ask it to list the last 5 prime ministers of Denmark it only knows the current one and hallucinates the rest. When asking it to list the last 5 governors of any US state, I find that it typically gets 4-5 right.
@samwitteveenai2 ай бұрын
I looked up both these languages and they aren't in their main multilingual priority languages. Speaking to a friend they pointed out that there aren't huge amounts of Facebook users there, so that might be a reason. Meta themselves are benefiting from all the data they have for training etc. I think it also prioritizes some of their training decisions
@IsmailIfakir2 ай бұрын
is there is a multimodal llm can fine-tuning for sentiment analysis from text, image, video and audio ?
@Nick_With_A_Stick2 ай бұрын
It kind of makes me sad that meta trained llama two on audio and pictures and made it where I can output, audio and pictures, and then Nerfed the model removed the decoders for “safety” reasons. And released it even though L3 was already out, and now they are using that llama three version of the model on their app where you can talk to it, as if it was GPT4 Omni.
@nosuchthing8Ай бұрын
Can you train a model with a new conputer language