Extending Llama-3 to 1M+ Tokens - Does it Impact the Performance?

  Рет қаралды 11,527

Prompt Engineering

Prompt Engineering

Күн бұрын

In this video we will look at the 1M+ context version of the best open llm, llama-3 built by gradientai.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Advanced RAG:
tally.so/r/3y9bb0
LINKS:
Model: ollama.com/library/llama3-gra...
Ollama tutorial: • Ollama: The Easiest Wa...
TIMESTAMPS:
[00:00] LLAMA-3 1M+
[00:57] Needle in Haystack test
[02:45] How its trained?
[03:32] Setting Up and Running Llama3 Locally
[05:45] Responsiveness and Censorship
[07:25] Advanced Reasoning and Information Retrieval
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 43
@engineerprompt
@engineerprompt Ай бұрын
CORRECTION: There is a mistake in testing long context in the video [haystack test towards the end of the video (Tim Cook and Apple related question)]. If you set the context length in a session in ollama and exist it, you will have to reset the context length in the new session again. Parameters set in one session do not persist across sessions. An oversight on my end and thanks to everyone for pointing it out.
@antonvinny
@antonvinny Ай бұрын
So did it work correctly after setting the context length?
@user-cl7vn1eg3u
@user-cl7vn1eg3u Ай бұрын
I've been testing it. It has a hallucination issue when large text is put in. However the writing is good so even the hallucinations are interesting.
@maxieroo629
@maxieroo629 Ай бұрын
Have you tried lowering the temperature?
@john_blues
@john_blues Ай бұрын
The AI went from scholarly professor to unintelligible drunk quite quickly.
@abadiev
@abadiev Ай бұрын
need more information about this model. How it work with autogen or another agent system?
@sergeaudenaert
@sergeaudenaert Ай бұрын
thank you for the video. When you exited and reran the model, shouldnt you also reset the context window to 256K ?
@engineerprompt
@engineerprompt Ай бұрын
that's a valid point. I thought (mistakenly) it persists for the llm but seems like you actually have to do it for each session.
@supercurioTube
@supercurioTube Ай бұрын
Thanks a lot for this showcase! That test you've done is fantastic: "A glass door has 'push' on it in mirror writing. Should you push or pull it? Please think out loud step by step." I've tried with several llama3 8b, down to llama3:8b-instruct-q4_1 quantization which ends quickly with a spot-on: "So, to answer the question: You should pull the glass door" I'm able to reproduce the infinite output you get with llama3-gradient:8b-instruct-q5_K_M. So there was something broken in this fine tune for larger context indeed. I was hoping to leverage larger context in an application with llama3 but that won't be the model for that I guess.
@supercurioTube
@supercurioTube Ай бұрын
And I've tried with the full fp16 from Ollama too. It does seem to stop consistently but answers wrong: "Conclusion: Even though the word is written backwards when looking from within your reflection, I should still try opening the glass doors by pushing them like they would say."
@engineerprompt
@engineerprompt Ай бұрын
there are 16k and 64k versions which are finetuned. Might be interesting to look into those.
@supercurioTube
@supercurioTube Ай бұрын
@@engineerprompt thanks for the suggestion, I will 😌
@henkhbit5748
@henkhbit5748 Ай бұрын
Thanks for the update. Anybody try to do "real" RAG using multiple documents? You cannot access it using GRoq?
@engineerprompt
@engineerprompt Ай бұрын
you can look into localgpt :)
@jeffwads
@jeffwads Ай бұрын
Yes, without multiple needle runs, the test is pretty weak.
@engineerprompt
@engineerprompt Ай бұрын
agree.
@hoblon
@hoblon Ай бұрын
You need to set context size in each session. Not just once. That's why the needle test failed.
@JoeBrigAI
@JoeBrigAI Ай бұрын
The setting isn't persistent? Major oversight in the video if this the case.
@hoblon
@hoblon Ай бұрын
@@JoeBrigAI They are persistent per session. Once you enter /bye that's it.
@engineerprompt
@engineerprompt Ай бұрын
that is true. I thought it to be otherwise. Added a pinned comment to highlight this.
@user-gp6ix8iz9r
@user-gp6ix8iz9r Ай бұрын
Can you do a review on AirLLM it’s lets you run a 70b model on 4gb of vram
@engineerprompt
@engineerprompt Ай бұрын
havne't seen that before. Will explore what it is.
@HassanAllaham
@HassanAllaham Ай бұрын
does it let us run such size without GPU.. i.e. on CPU only ??
@ikjb8561
@ikjb8561 Ай бұрын
Due to regressive nature of LLMs there is an exponential chance of producing errors for every passing token. Be careful what you wish for.
@smartduck904
@smartduck904 Ай бұрын
So I guess this will not run on a GTX 1080 TI?
@Vadinaka
@Vadinaka Ай бұрын
May I ask which system you are using to run this?
@engineerprompt
@engineerprompt Ай бұрын
I am using M2 Max 96GB to run this.
@GetzAI
@GetzAI Ай бұрын
you need to pick up an M4 Mac Studio when it comes out ;)
@engineerprompt
@engineerprompt Ай бұрын
indeed :D
@unclecode
@unclecode Ай бұрын
Interesting, this one didn't bring a ladder to the party for the joke haha. About the model not stopping, it's probably related to RoPE (Rotary Positional Encoding). If someone messed with that, things could go forever. Anyway the quantized version definitely affects the model's behavior.
@engineerprompt
@engineerprompt Ай бұрын
haha, that's true. Pleasantly surprised with the joke :) that's actually a good point with RoPE.
@vertigoz
@vertigoz Ай бұрын
Phi3 128k got worse against 4k when trying to analyze a program I gave to him
@R0cky0
@R0cky0 Ай бұрын
13:16 it appears the llm was suffering Schizophrenia that moment 😅
@8eck
@8eck Ай бұрын
100+ GB of vram for 4-bit quantized model? 🙄Are you sure about quantized one?
@acekorneya1
@acekorneya1 Ай бұрын
The issue with all these "BENCHMARKS" is that they are all lies. We need a better, real benchmark for LLM because what we get from people who make these models are all lies. They don't perform well when it comes down to doing real work, or anything in real production. They all suck compared to closed models. It's like the people who benchmark them show very cherry-picked examples.
@ritpop
@ritpop Ай бұрын
Yes, some models are good to Daily use and in some case better than gpt 3.5 of chatgpt but I never used one that is close to gpt 4. And in some use cases the 3.5 still better than mistral in my own experience. So they really should put the real breachmarks
@farazfitness
@farazfitness Ай бұрын
Lmao 64gb vram I'm using rtx 4070 which only has 8gb vram
@engineerprompt
@engineerprompt Ай бұрын
:)
@kecksbelit3300
@kecksbelit3300 Ай бұрын
how did you manage to pick up an 8gb 4070 even the founders edition has 12gb
@farazfitness
@farazfitness Ай бұрын
@@kecksbelit3300 using acer predator hellos neo 16 laptop
@jamesvictor2182
@jamesvictor2182 Ай бұрын
why are you using ollama not llama.cpp directly?
@engineerprompt
@engineerprompt Ай бұрын
Just ease of use.
@HappySlapperKid
@HappySlapperKid Ай бұрын
64gb vram 😂
Free Copilot to Take Your Coding to the NEXT LEVEL
12:39
Prompt Engineering
Рет қаралды 13 М.
Get your own custom Phi-3-mini for your use cases
17:46
Prompt Engineering
Рет қаралды 13 М.
Always be more smart #shorts
00:32
Jin and Hattie
Рет қаралды 34 МЛН
Luck Decides My Future Again 🍀🍀🍀 #katebrush #shorts
00:19
Kate Brush
Рет қаралды 8 МЛН
⬅️🤔➡️
00:31
Celine Dept
Рет қаралды 49 МЛН
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 118 М.
AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"
23:47
Llama 3 - 8B & 70B Deep Dive
23:54
Sam Witteveen
Рет қаралды 34 М.
So You Think You Know Git - FOSDEM 2024
47:00
GitButler
Рет қаралды 1 МЛН
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 827 М.
Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
11:13
New LLaMA 3 Fine-Tuned - Smaug 70b Dominates Benchmarks
12:50
Matthew Berman
Рет қаралды 45 М.
💅🏻Айфон vs Андроид🤮
0:20
Бутылочка
Рет қаралды 735 М.
#miniphone
0:16
Miniphone
Рет қаралды 3,6 МЛН
Lid hologram 3d
0:32
LEDG
Рет қаралды 8 МЛН