Host Your Own Llama 3 Chatbot in Just 10 Minutes! with Runpod & vLLM

  Рет қаралды 3,259

Data Centric

Data Centric

Күн бұрын

Пікірлер: 11
@ShakaZulu77799
@ShakaZulu77799 9 ай бұрын
Game Changer. Appreciate you mate! Exactly what I was looking for
@Data-Centric
@Data-Centric 8 ай бұрын
Glad it was helpful.
@msjogren
@msjogren 7 ай бұрын
wow. I had a feeling a person could spin this up, but your video is the 1st one I have seen to lay it all out. makes sense to me. it would be cool to see you train it on some specific data.
@benjamind4343
@benjamind4343 Ай бұрын
hi, is runpod + vllm still the best combo pour ai chatbot? Thanks you
@kareemyoussef2304
@kareemyoussef2304 8 ай бұрын
Great video, i ran into a limitation using claudes api while building my code. Everytime I send a request (Trained on my pdfs) It uses up like 37,000 tokens (i've already tried RAG and ways to only feed relevant context) but it still uses lots of context each message causing me scale issues.. I wonder, is there a way to do the larger models like llama 70b on runpod?
@user-ze3sg6ix1u
@user-ze3sg6ix1u 7 ай бұрын
Can I use this for serverless? It downloaded and said it says worker completed but when I test it with a request it just IN_QUEUE forever
@legendaryman4336
@legendaryman4336 8 ай бұрын
unfortunately it doesnt work now. the web terminal fails to start :(
@Data-Centric
@Data-Centric 8 ай бұрын
I think you have misunderstood the instructions. It definitely still works, I used it just today.
@w3whq
@w3whq 9 ай бұрын
Humanize...yes!
@PrinzMegahertz
@PrinzMegahertz 9 ай бұрын
Isn't this very token intensive to always feed the whole chat history into every message?
@Data-Centric
@Data-Centric 9 ай бұрын
This isn't designed for production, really just a quick tutorial to get you up and running. You can adapt the script to cut-off after a certain token length if you wish. To add, because this is an open source deployment and not a proprietary API, you're only limited by the model's context length, the available RAM on your GPU, and your in memory cache. The cost of infrastructure is still charged hourly regardless of the number of tokens you use and not by the token.
How to get LLaMa 3 UNCENSORED with Runpod & vLLM
10:08
Data Centric
Рет қаралды 6 М.
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,8 МЛН
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей
00:19
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 1,6 МЛН
Run ANY LLM Using Cloud GPU and TextGen WebUI (aka OobaBooga)
7:51
Matthew Berman
Рет қаралды 76 М.
Deploy LLMs using Serverless vLLM on RunPod in 5 Minutes
14:13
LocalAI: The Open-Source Alternative You've Been Looking For?
13:52
Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private
12:24
Prompt Engineering
Рет қаралды 16 М.
Llama 3 BREAKS the industry !!! | Llama3 fully Tested
13:52
Reda Marzouk
Рет қаралды 2,4 М.
Llama3 + CrewAI + Groq = Email AI Agent
14:27
Sam Witteveen
Рет қаралды 57 М.
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН