Llama 3.2 Deep Dive - Tiny LM & NEW VLM Unleashed By Meta

Рет қаралды 27,204

Күн бұрын

Пікірлер: 54

@bycloudAI 3 ай бұрын

Might be a bit late about the llama-3.2 news as usual oops. Working on the New Claude 3.5 Sonnet video now (see u in 3 weeks lol and check out NVIDIA's Llama-3.2 resources 👇 Jetson AI Lab: nvda.ws/4eO2VFU Multimodal RAG with Llama 3.2 GitHub: nvda.ws/4eyspY0 NVIDIA NIM for Developers: nvda.ws/4dCXys1

@mmmm768 3 ай бұрын

You missed Pixtral 12B, a literal gem when it comes to multimodality. It is miles ahead of Llama 3.2 11B and comparable to Llama 3.2 90B.

@dronesflier7715 3 ай бұрын

pixtral is actually insane. it got thrown under the radar so hard it isn't even funny. AFAIK no major inference backend supports a quantized version of it ;-;

@cdkw2 3 ай бұрын

NVIDIA as a partner goes crazy, I am happy you are getting the attention you deserve 👍

@BrainSlugs83 3 ай бұрын

RAM on mobile devices isn't actually shared between the CPU and GPU like that. They actually do have dedicated VRAM, it's just in the order of like ~12 KB or so (literally orders of magnitude less than the frame buffer takes up, so it's usually not mentioned), but the GPU draws the screen as a sequence of tiles, so it never needs the whole frame buffer anyway, but it does spend a ton of time transferring data from RAM to tile memory just to render each screen frame. Some older phones let you adjust the resolution to to half/quarter of their native resolution to save power, because it drastically reduces the amount of RAM to VRAM transfers it takes to render the framebuffer on the GPU.

@IsaacFoster.. 3 ай бұрын

I assume everyone will have a LLM running in their phones locally in 3-5 years

@yungdkay1008 Ай бұрын

You will need high cloud infrastructure. I believe cooperations or businesses will have LLM rather than individuals. Individuals can use gpt and other LLM already available

@countofst.germain6417 3 ай бұрын

I run phi 3.5 mini locally on my phone, it is really good for the size and runs pretty well too.

@pneuma23093 3 ай бұрын

What does it help you do? I’m a noob just getting into this. Would love to know what kind of augmentation it would have on the workflow of an average Joe.

@DriftJunkie 3 ай бұрын

What do you mean "I run it on my phone"? What interface do you use? How do you manage the models?

@countofst.germain6417 3 ай бұрын

@@DriftJunkie they have apps that handle all that now, you're living in the past lol

@PurpleBird-mh7vb 3 ай бұрын

@@countofst.germain6417name pls

@pneuma23093 3 ай бұрын

@@countofst.germain6417 can you gimme the name of the app which I can use to run it on a phone?

@rawallon 3 ай бұрын

Its like they say: Thats cray cray

@laden6675 3 ай бұрын

that's completely delulu

@MilesBellas 3 ай бұрын

LLAMA in ComfyUI = interesting

@theresalwaysanotherway3996 3 ай бұрын

I would love to see video talking about how the hell these AI labs are making multi-modal models, it's obviously the direction that the industry is moving in and has been since GPT-4o, increasing scale on text-only data is getting harder and harder, including other modalities is the next obvious direction to scale in and could lead to models that generalise much better.

@רותםישראלי-כ3ד 3 ай бұрын

In those models, the image tokens are not fed through cross-attention but are instead provided alongside the text as input

@zekehobbs7738 3 ай бұрын

Llama uses cross attention. Qwen is as you said.

@duckyblender 3 ай бұрын

The > are in the wrong direction and llama 3.2 is the same as 3.1 but with vision, no difference in text weights so it’s not trained from a larger model

@4.0.4 3 ай бұрын

The 11B and and 90B aren't distilled but the 8B and 70B with vision encoders on top. Yeah a 20B vision encoder on the big one.

3 ай бұрын

Definitely do a video on multi-modality, thanks!

@barry_wastaken 3 ай бұрын

Why aren't you covering 1bit LLMS, it sounds very promising and the tests are testifying to that too.

@bits360wastaken 3 ай бұрын

Because its been 8 months and still theres no publicly available models that can compete with even old models, not to mention needing to run at native precision plus decompression cost anyways due to lack of hardware?

@barry_wastaken 3 ай бұрын

@@bits360wastaken Hmm, you're missing my point though. Even though i agree with the lack of available public competitive models that's not what im asking, I'm talking about the coverage of the overall technology since it was open sourced with a huge update lately and great improvements with promising potentials. But besides the llama3-8B-1.58Bit-100B tokens model that can literally run a single core with 6-7 tokens per second, there's no public model as good as main stream models, but generally 1bit quantization is proven to be as close as float points quantization precisions but a lot more performant and efficient.

@mmmm768 2 ай бұрын

Also, when will you talk about Qwen 2.5?

@jameschen2308 3 ай бұрын

The < should be >. Nice video tho

@AdityaMayukhSom 3 ай бұрын

Could you please make a video about running these models in INT8 for local inference? There seems to be no content over the internet for 1B and 3B model with quantization for inference locally.

@bycloudAI 3 ай бұрын

they just released some new quants x.com/AIatMeta/status/1849469912521093360

@shashikanthpatil2686 3 ай бұрын

What kind of hardware do we need if you want to run this model's

@AVX512 2 ай бұрын

1:13 damn, it's bad at all the things I wanted it to be good at (Tool Use etc...)

@dragonmares59110 3 ай бұрын

I just wish they would try to focus on efficiency and smaller size model at one point, we are reaching a point where this getting out of reach of the common mortal and their hardware

@DouhaveaBugatti 3 ай бұрын

Man 😎you missed the lord of the underworld MOLMO BY Ai2

@vickeythegamer7527 3 ай бұрын

Qwen 0.5b to 70b all have 130k tokens i think thats what you haven't heard of 😅

@setop123 3 ай бұрын

comment for the algorithm

@MrPicklesAndTea 3 ай бұрын

I wonder when something is going to convince me to abandon mistral.

@mattmmilli8287 3 ай бұрын

Mistral don’t give af 😅

@NoHandleToSpeakOf 3 ай бұрын

Try questions with wrong preposition. Mistral and Gemma cannot handle them. Llama and Qwen can.

@lule-ahmed 3 ай бұрын

it should v been " Boy Cloud " 😁😹✌

@Tracing0029 3 ай бұрын

It makes sense the model to be good at summarizing social media posts because Meta uses their platforms as data 😂

@JoeVSvolcano 3 ай бұрын

For the life of me I dont understand why the model sizes go from tiny to huge with nothing in between.. Why dont they make models that fully utilize 24GB RTX cards...

@4.0.4 3 ай бұрын

Pixtral, Qwen VL, Phi, there's so many. There's an open one that can ingest videos too, forgot the name. Sadly you ask any of them to OCR Japanese pages and they can't do it properly.

@abdelrahmanmostafa9489 3 ай бұрын

Talk about nvidia new model

@maxarq 3 ай бұрын

Is the video sponsored by nvidia?

@telotawa 3 ай бұрын

no, 90b is 70b, 11b is 8b, you didn't pay attention to the papers, dude. the extra parameters are the vision adapter

@bycloudAI 3 ай бұрын

im pretty sure they only said the image adapter is 100B for 405B params. As for the 90b and 11b, they didn't clarify how they have done it.

@szebike 3 ай бұрын

In my opinion the 3.2 series is unusable its extremely censored to a point its absurd. I thought for a second that Meta did something cool but once I saw 3.2 NOPE.

@da7e 3 ай бұрын

is there a way to disable? what examples made you think it's censored?

@szebike 3 ай бұрын

@@da7e Its even in trivial things it is too "triggerhappy" Even if you say to a vision model like "lets create an UI similar to XY" It says "sorry I can't do that its copyrighted material (though being inspired by an UI is not illegal). In contrast to prior models it blocks automatically all further conversation rather than explaining whats up. It can be bypassed with some prompting but its annoying to discuss every top wher eit i s triggered with it. From what I have seen its "built in" its hw they trained those models (you can add an even stricter layer on top optionally but can't take the current layer away without resorting to specific prompts). I guess they implemented this "trigger happyness" to make it a better product rather a better assistant.