Ollama in a RASPI | Running a Large Language Model in a Raspberry Pi

Рет қаралды 1,123

Күн бұрын

Hi and welcome back to DevXplaining channel! In todays video, we'll go through how to install Ollama and a large language model in a Raspberry Pi. I'll also cover the prerequisites needed to make it happen smoothly, and give you a glimpse at the performance and use cases for this.
So join me for a bit and as always, I appreciate any likes, feedback and comments. Feel free to subscribe to my channel for more content like this! Might get inspired to do a follow-up with some coding as well.
Links in the video:
www.raspberryp...
ollama.com/

Пікірлер: 6

@LanJulio 3 ай бұрын

Thanks for the Video !!! Will try on my Raspberry Pi 5 with 8GB of RAM !!!

@DevXplaining 3 ай бұрын

Perfect! It's gonna be slowwww... But fully local too :)

@tarunarya1780 Ай бұрын

Thanks for this and your "How to run ChatGPT in your own laptop for free? | GPT4All" video showing the practicalities of running a language model. I think that an important aspect and benefit of a local model is being able to train it. Please cover this or point us. Being able to read pdfs to learn would be great.

@okra3000 28 күн бұрын

I'm trying to turn my RPI 5 into a local virtual assistant that only communicates data from PDFs, with low latency. I've installed a 1TB Samsung 980 PRO PCIe 4.0 NVMe M.2 SSD to it hoping it will help with all the pdf data as well as whatever LLM I decide to install. But im in a rut; I'm not familiar with RAG, not to great at coding Im not even sure if the RPI 5 can handle all this (been alternatively considering the Jetson Orin Nano Developer Kit) 😮‍💨... Could you please offer your wise council?

@DevXplaining 27 күн бұрын

Hi, thank you for sharing this, sounds awesome! Your SSD will handle things just fine, but as you can see in my video, unfortunately the performance when run in raspberry is not going to be low latency, typical speed of a raspi (depending on the model, memory, and LLM being used) tends to be around 1 token per second. Meaning, it's going to be slowwww, and there's not many ways around it. So it's okay for usecases where response does not need to be immediate, but it's pretty far from low latency. Mostly great for experimentation, I would say. I've been toying with virtual assistants that I actually use myself, and for this, a raspberry won't cut it. You want: - A heavy machine with a lot of oomph, definitely a good graphics card and working CUDA drivers. The more the better, most models being run as a service run on CRAZY hardware. But my personal gaming machine does okay. - Coding approach centered around streaming, aka: give me the tokens immediately as they are generated, don't wait for full answer. You have to play a bit with granularity. I think good starting point would be to grab the tokens, and send them to speech interface when you have full sentences. Otherwise the intonation will be far off. - Fastest, real-time, gpu-accelerated versions of any parts, so use a very low-latency text-to-speech solution, preferably gpu accelerated, along with the model. - Unfortunately offline models you can run locally are somewhat slower and stupider than the big models you use via API. But they make up for that by being more secure, especially with RAG, or at least letting you control the security. And being potentially more cost-effective, depending on how you calculate costs. So, just general advice, but if you see the slowness demonstrated in this video, you can see that a raspberry is not good for low-latency case. I built my own virtual assistant on top of a local model, running it on my gaming beast, and it runs with acceptable latency, aka some seconds most of the time. To get natural dialogue going on, you actually want faster, and that requires heavier hardware. But all is good for research and experimentation! Latency and speed is an optimization question once you know what you want to be building.

@okra3000 27 күн бұрын

@@DevXplainingWould you recommend Nvidia's Jetson nano then? And thanks by the way, I appreciate the detailed response.