Game Changer. Appreciate you mate! Exactly what I was looking for
@Data-Centric8 ай бұрын
Glad it was helpful.
@msjogren7 ай бұрын
wow. I had a feeling a person could spin this up, but your video is the 1st one I have seen to lay it all out. makes sense to me. it would be cool to see you train it on some specific data.
@benjamind4343Ай бұрын
hi, is runpod + vllm still the best combo pour ai chatbot? Thanks you
@kareemyoussef23048 ай бұрын
Great video, i ran into a limitation using claudes api while building my code. Everytime I send a request (Trained on my pdfs) It uses up like 37,000 tokens (i've already tried RAG and ways to only feed relevant context) but it still uses lots of context each message causing me scale issues.. I wonder, is there a way to do the larger models like llama 70b on runpod?
@user-ze3sg6ix1u7 ай бұрын
Can I use this for serverless? It downloaded and said it says worker completed but when I test it with a request it just IN_QUEUE forever
@legendaryman43368 ай бұрын
unfortunately it doesnt work now. the web terminal fails to start :(
@Data-Centric8 ай бұрын
I think you have misunderstood the instructions. It definitely still works, I used it just today.
@w3whq9 ай бұрын
Humanize...yes!
@PrinzMegahertz9 ай бұрын
Isn't this very token intensive to always feed the whole chat history into every message?
@Data-Centric9 ай бұрын
This isn't designed for production, really just a quick tutorial to get you up and running. You can adapt the script to cut-off after a certain token length if you wish. To add, because this is an open source deployment and not a proprietary API, you're only limited by the model's context length, the available RAM on your GPU, and your in memory cache. The cost of infrastructure is still charged hourly regardless of the number of tokens you use and not by the token.