Host Your Own Llama 3 Chatbot in Just 10 Minutes! with Runpod & vLLM

Рет қаралды 3,259

Data Centric

Күн бұрын

Пікірлер: 11

@ShakaZulu77799 9 ай бұрын

Game Changer. Appreciate you mate! Exactly what I was looking for

@Data-Centric 8 ай бұрын

Glad it was helpful.

@msjogren 7 ай бұрын

wow. I had a feeling a person could spin this up, but your video is the 1st one I have seen to lay it all out. makes sense to me. it would be cool to see you train it on some specific data.

@benjamind4343 Ай бұрын

hi, is runpod + vllm still the best combo pour ai chatbot? Thanks you

@kareemyoussef2304 8 ай бұрын

Great video, i ran into a limitation using claudes api while building my code. Everytime I send a request (Trained on my pdfs) It uses up like 37,000 tokens (i've already tried RAG and ways to only feed relevant context) but it still uses lots of context each message causing me scale issues.. I wonder, is there a way to do the larger models like llama 70b on runpod?

@user-ze3sg6ix1u 7 ай бұрын

Can I use this for serverless? It downloaded and said it says worker completed but when I test it with a request it just IN_QUEUE forever

@legendaryman4336 8 ай бұрын

unfortunately it doesnt work now. the web terminal fails to start :(

@Data-Centric 8 ай бұрын

I think you have misunderstood the instructions. It definitely still works, I used it just today.

@w3whq 9 ай бұрын

Humanize...yes!

@PrinzMegahertz 9 ай бұрын

Isn't this very token intensive to always feed the whole chat history into every message?

@Data-Centric 9 ай бұрын

This isn't designed for production, really just a quick tutorial to get you up and running. You can adapt the script to cut-off after a certain token length if you wish. To add, because this is an open source deployment and not a proprietary API, you're only limited by the model's context length, the available RAM on your GPU, and your in memory cache. The cost of infrastructure is still charged hourly regardless of the number of tokens you use and not by the token.