If you want to build robust RAG applications based on your own datasets, this is for you: prompt-s-site.thinkific.com/courses/rag
@unclecode3 ай бұрын
👏 I'm glad to see you're focusing on DevOps options for AI apps. In my opinion, LlamaCpp will remain the best way to launch a production LLM server. One notable feature is its support for hardware-level concurrency. Using the `-np 4` (or `--parallel 4`) flag allows running 4 slots in parallel, where 4 can be any number of concurrent requests you want. One thing to remember the context window will be divided accordingly. For example, if you pass `-c 4096`, each slot will have a context size of 1024. Adding the `--n-gpu-layers` (`-ngl 99`) flag will offload the model layers to your GPU, providing the best performance. So, a command like `-c 4096 -np 4 -ngl 99` will offer excellent concurrency on a machine with a 4090 GPU.
@thecodingchallengeshowАй бұрын
can we finetune it using lora? i need it to be about ai so i have doqnloded data about ai and i want to add it to this model
@johnkost25143 ай бұрын
Mozilla's Llamafile format is very flexible for deploying LLM(s) across operating systems. NIM has the advantage of bundling other types of models like audio or video.
@Nihilvs3 ай бұрын
amazing thanks !
@andreawijayakusuma60083 ай бұрын
bro, I wanna ask, do I need to use GPU to run this ?
@sadsagftrwre3 ай бұрын
No, llama-cpp specifically enables llms on cpus. its just going to be a bit slow, mate.
@andreawijayakusuma60082 ай бұрын
@@sadsagftrwre oke thanks for the answer. I just want to tried it but afraid it won't worked without GPU.
@sadsagftrwre2 ай бұрын
@@andreawijayakusuma6008 I tried on CPU and it worked.
@marcaodd3 ай бұрын
Which server specs did you use?
@engineerprompt2 ай бұрын
Its running on A6000 with 48GB vRAM. Hope that helps.