Self-Hosted AI That's Actually Useful

Рет қаралды 131,065

Күн бұрын

Пікірлер

@TechnoTim 5 ай бұрын

Hey everyone! Thanks for watching and asking for the tutorial! I've just posted it on my new channel! Enjoy! kzbin.info/www/bejne/r6DdlmR_rcl1mq8

@QuizmasterLaw 8 күн бұрын

nice build. i'm still searching for a good easy local rag and fml it appears to be nowhere on the internet. context model only goes so far and no way am i trusting google with my proprietary data.

@JoeyDee86 5 ай бұрын

Are you going to release any how to’s for this? Preferably with you explaining what each step does rather than just going down a list of steps

@TechnoTim 5 ай бұрын

Yes, coming soon on my Techno Tim Tinkers channel! Subscribe there to know when it's available!

@traxeonic3600 5 ай бұрын

@@TechnoTim I'm surprised it will end up on Tinkers, given these videos would seem to hit your core main channel. Interesting.

@iamkiber 5 ай бұрын

If not for reading this comment I would never have know about tinkers

@espressomatic 5 ай бұрын

@@TechnoTim Any chance that you might post your AI rig's hardware composition in this description before finishing up with the more detailed video on the other channel?

@TechnoTim 5 ай бұрын

I can understand that but this tutorial might be close to 40 minutes long (or longer) 😅. Videos that long do not perform well and ultimately hurt the channel.

@TheMrDrMs 5 ай бұрын

9:51 nah, I work for a company where we've been doing this since before "AI" was mainstream and the e2e models have not only helped accuracy but improved performance, even with our CPU workloads. It's been incredible to be working on this and seeing the sudden rapid development.

@VillSid 5 ай бұрын

There are two features that I am especially looking forward too: a) Video text search: I have security cameras that are using Frigate NVR that is using AI for image recondition to trigger if a person entered and are; audio AI model that listens for fire alarm or breaking glass but they are working on implementing text search for video clips, so you could search for clips with a guy in red jacket. b) Local audio transcription*: Tested whisper large models for transcribing non English call recording and it works but it is sloooow. I ran out of time on google collaboratory. I saw that there is optimized whisper version that I can run on google coral locally without a GPU, so I still need to test that one out. I would love to be able to search my calls.

@GuilhermeMirandaRaiol Ай бұрын

you should try faster whisper and insanely faster whisper. I got a good performance in Portuguese even with smaller models, or you can try a fine tuned model for your language, there's some people on huggingface that already did for various languages

@prathamshenoy9840 5 ай бұрын

One of the most useful tech videos of this year. Unlike some other channels that post so many videos but 95% of them are useless

@questionablecommands9423 5 ай бұрын

I'm all about self-hosting these technologies. Ever since DALL-E hit the scenes, I've been thinking that artists should train a model on their own art so if they get creatively stuck, they can, "ask themselves" for inspiration.

@TechnoTim 5 ай бұрын

That's an awesome idea. I really wish I knew more about training. Maybe soon!

@Breeegz 5 ай бұрын

@@TechnoTim If you can train a dog... actually that's nothing like training an A.I.

@TechnoTim 5 ай бұрын

On top of that one of my dogs still bites me!

@nicoschroder8692 5 ай бұрын

Great video as always :) Would love to see a video about the hardware setup & requirements and some guidelines for which models to choose for different hardware configs

@2ndtlmining 5 ай бұрын

Man this looks interesting, you gotta show us how you set this all up

@showmequick2245 5 ай бұрын

I second this

@droneforfun5384 5 ай бұрын

I approve this

@Nastalas 5 ай бұрын

There is a HACS version of ollama support where you already can control your devices with it in Home Assistant

@tomthompson1198 20 күн бұрын

dude. love it. plz focus on more content that runs locally and free from so many saas providers with their monthly fees. open source and locally ran solutions must be the future!.

@TechnoTim 20 күн бұрын

@@tomthompson1198 🫡

@ewenchan1239 5 ай бұрын

I've played with Ollama, the open-webui, a different open-webui, and Automatic1111. One of the models ended up needing about 40 GB of VRAM, so I had to use two 3090s to be able to have enough VRAM for the model. Pretty nifty though. Not perfect, but still fun to play with.

@Drkayb 5 ай бұрын

You can also set up piper as a server, and just feed it text by curl (local or remote). Then it generates audio-files super quick. It can also be piped to stdout iirc if you don't need the files.

@TechnoTim 5 ай бұрын

Thank you! I will look into how to connect this to HASS!

@Drkayb 5 ай бұрын

@@TechnoTim I think the problem was that is just a ton of overhead every time you run the executable file, so by keeping a server running the .exe is "running" all the time.

@llortaton2834 5 ай бұрын

glasses off? it's about to get serious!

@raul17533 5 ай бұрын

Yeah. Nice try Tim AI

@robotics_and_stuff Ай бұрын

Local AI TTS is available now, called F5-TTS

@The_Mup 5 ай бұрын

1:45 - Third option: Let surf shark snoop on you. VPN providers are no more trustworthy than your mobile ISP. VPNs are for getting around region blocks, NOT for privacy.

@TechnoTim 5 ай бұрын

They both have data logging, selling, sharing, and trading policies... ISP is to do it ...VPNs like this is to not.

@lalalalelelele7961 2 ай бұрын

VPNs are for creating encrypted tunnels for sensitive data. Not all privacy revolves around torrenting and hiding IPs.

@Nextrix 5 ай бұрын

I wonder how well these tools work in an offline or no-internet VLAN. Most still tend to connect to third party domains/servers, and we have no clue what data is being sent when it does. I'm not ready to trust these yet. Would make a good video to showcase the endpoints they do try and connect to.

@jwr6796 5 ай бұрын

What are the gpu requirements for all this? Are we talking a recent-enough gaming gpu like a 3060, or do you have to shell out for those enterprise cards with no video output?

@TechnoTim 5 ай бұрын

3060 should work fine. Smaller models should fit fine!

@jwr6796 5 ай бұрын

@@TechnoTim good to hear!

@Unselfless 5 ай бұрын

What hardware are you using to run this?

@Solus-Regnator 5 ай бұрын

this teaser was nice, where is the setup video ? :D

@abhijithabhi58 5 ай бұрын

What are the hardware requirements ?

@jacobnollette85 5 ай бұрын

that dances with wolves earned my thumb

@CharafEddineCHERAA 5 ай бұрын

For anyone who's using Ollama, what's the minimum hardware needed to run a 70b model?

@krurschak2653 5 ай бұрын

I would say RTX 4090 but with poor performance experience. For GPT Like experience you will need something like 4x RTX 4090. But than you could deploy Mixtral 8x7B which is a GPT-4 class LLM with good Performance and Context Window.

@antaishizuku 5 ай бұрын

Id say 2 4090s or a 4090 plus another nvidia card. Like a 4060 or 3060. You will need about 40gb of vram for decent quantization but if you are willing to give up decent responses go for about 30-ish. Just keep clear of the 2k quantization. The 3k is okw with 4k being a standard. 8k/q is about the same as the full float 16 model but need huge amounts of vram. Anyway more vram/cuda = better

@antaishizuku 5 ай бұрын

Phi3 14b 128k is really good and i heard good things about gemma 2 27b. Though overall im still a fan of llama3

@brandonmansfield4328 5 ай бұрын

It varies since you can adjust the quantization for fit. For the big models (70b) I would suggest > 40GB if you can swing it. >70 GB if you want to run 120b models. A pair of p40s off eBay isn't too bad to buy. Probably the best budget path presently.

@macthiswork3006 5 ай бұрын

what is the project called that you use for the whisper webui?

@spaceco1 5 ай бұрын

Awesome video. Would love to see a follow up video where you go over the hardware for inferencing these models. And what kind of performance changes you noticed when playing around with different components

@wyattarich 5 ай бұрын

This seems to be covered in many other places, and it's almost entirely subject to the models you run. Hard to generalize such a thing. Google for Llama.cpp benchmarks and INT8 performance for GPUs.

@Mishanw 5 ай бұрын

What kind of GPU are using ? I have a Dell R730, I wanted to try to put a GPU on that and run Ollama . I reallly wish there was a low power AI processor that we could plug into any device with sufficient RAM and be able to run models effectively and efficiently at a relatively affordable cost

@Jairoglyphx 5 ай бұрын

We need info on the hardware setup! Like are Nvidia GPUs the only option or can we use NPUs in the newer Intel processors?

@brandonmansfield4328 5 ай бұрын

NPU performance is going to be bound by memory bandwidth performance and ddr5 isn't where you want to be. The soldered lpddr5x is going to have much better memory bandwidth and will be when these chips start to get some reasonable performance. Lunar lake and Zen 5 should both come in this configuration at some point.

@Techonsapevole 5 ай бұрын

well done, local LLMs are the future

@tohur 5 ай бұрын

I have pretty much been running local AI from the onset of all the opensource models and have ran plenty of backends and now am on ollama and plan to stick with it as its the fastest backend I have ran out of all of them.. and on Linux so easy to run the models on AMD OR Nvidia.. run 7b-13b models on my little ol RX 6600 XT with Rocm and tbh it runs great and also IMO running locally 7b-13b bout all anyone needs just have specialty models on the ready for different tasks which ollama makes that easy af haha.. best feature to me with ollama is having it setup to auto unload models when not in use

@amirzo12 2 ай бұрын

What is the hardware stack you are using for your AI solution

@BCKammen 5 ай бұрын

Ok, Tim, where is the guide for how to set this all up ? Especially the Home Assistant stuff....

@TechnoTim 5 ай бұрын

Soon on my Techno Tim Tinkers channel!

@BCKammen 5 ай бұрын

@@TechnoTim Standing by then......

@xythiera7255 5 ай бұрын

If you dont have a realy realy powerfill gpu its not realy possibel in turms of usability if you have to wait ages for something to happen its kind of pointless

@TechnoTim 5 ай бұрын

@xythiera7255 It really depends on the GPU, I will cover this in my tutorial!

@krurschak2653 5 ай бұрын

@@xythiera7255 4090 is enough for llama3 8B. 4x 4090 or one A100 will work for the 70b version or even for Mistral 8x7b nearly as good as GPT-4 and super fast :) but phi-3 and llama3 8B are really not that bad. They are better than GPT-3.5, so i see this as a good starting point. I would recommend waiting for new hardware like llm specific GPUs because they can be much cheaper like 1/4 of the price.

@underfluked3832 23 күн бұрын

Bring your own model is a big thing with corporates, it’ll come.

@Squirrel4Gir 5 ай бұрын

Love the vid. Please also try to include a notice to help these free models either via training or donations to accelerate their further development

@knutblaise9437 5 ай бұрын

Curious if there is a self-hosted AI which could serve as a replacement for Grammarly? I recently noticed my Office 2016 had a new AI process running. From a privacy perspective I'd prefer not sharing my documents with organizations like MS/Google/Grammarly.

@brandonmansfield4328 5 ай бұрын

You don't need a full ai for grammar. Language tool is self hostable and they have browser extensions you can configure for your local copy.

@raymondx137 5 ай бұрын

Do you have a part list and or setup tutorial?

5 ай бұрын

This is 12 minutes of pure gold, thank you very much. 😊

@Rodent007 5 ай бұрын

Thank you, great video. I wish you would run through what hardware you run this on.

@TechnoTim 5 ай бұрын

Thanks for the feedback. I have a video on it, it's my new All in One HomeLab server. More to come!

@jason-budney7624 5 ай бұрын

Really cool video TIm! I've been wanting to play with some image to image "AI" stuff, but it's been hard to find much about it when self hosting is involved. I'll be poking around with the tools you mentioned to see if I can find something.

@DorZ1983 5 ай бұрын

What is the UI that shows the app stack flow? Is it an actual app or just after effect?

@ezequieligomez2135 2 ай бұрын

Do you know what's the most cost-effective GPU to get this done as I doubt it will work well generating images or processing PDF smoothly on a CPU?

@neponel 10 күн бұрын

great little show case of what you can now do locally. continue to share this with us. perhaps specific tutorial on the mentioned things? tired of wasting my money on siloed saas products/services.

@TechnoTim 10 күн бұрын

@@neponel tutorial is the pinned comment! Thank you!

@WMRamadan 5 ай бұрын

I tried this a while back with an nVidia 3060 RTX 12GB and of course bigger models wouldn't load. Would using two GPU's help load bigger models giving a combined memory of 24GB? Also do you know if mixing GPU's works, for example having a 3060 12GB with a 4060 16GB to give a combined 28GB?

@xythiera7255 5 ай бұрын

If you dont have a 3090 at least you are realy limited . Yes that exist but you coud also just buy a workstation card means insane costs . So if you realy want to play with AI you need a 4090 becouse of the Vram its the only real option other then going with a NVIDIA RTX 6000 for 6 grand and 48gb Vram

@WMRamadan 5 ай бұрын

@@xythiera7255 I'm going for the cheapest option, If I can buy two 4060 16GB to have a combined 32GB of GPU memory then I will do that!

@matthias3231 2 ай бұрын

Hi! Nice video! Can you dive a bit deeper in how to set it up, what the draw backs are, hardware requirements (CPU/GPU/disk space/...). The positive things as well but those are covered a lot here already. Thanks!

@TechnoTim 2 ай бұрын

There’s a link in the description and pinned comment for the full tutorial

@_coderizon 4 ай бұрын

what is the Difference between Ollama with WebUI and LangChain for NLP tasks

@alexjohansson328 5 ай бұрын

Super awesome video - unique cutting edge I can't wait to give it a go

@Arthur-o2y 3 ай бұрын

which rack is that? at 0:44

@angryox3102 5 ай бұрын

You’ve just given me so many ideas. This is awesome.

@tchesnokovn 5 ай бұрын

What’s the nocode workflow looking thing you are using?

@vaidkun 5 ай бұрын

the thing with AI is that even if you are running it locally you need to get the training data from somewhere, so someone still has to give up their privacy :)

@TechnoTim 5 ай бұрын

touché

@FreedomToRoam86 5 ай бұрын

Very cool idea, the private search AI!

@ivlis32 5 ай бұрын

HA Voice integration is, unfortunately, very strange. They insist on using HA "add-ons" for voice what I really don't want because I do not use HAOS, but deploy HA as any other container.

@dragonhunter2475 5 ай бұрын

The addons are just docker containers, you can find them in the rhasspy git repo

@abudi45 5 ай бұрын

It was a great video but U didn't show us how we can install it in our home lab 😢

@skelious 3 ай бұрын

Great video thank you so much for the info. I am a completely new person to this space. (Boomer status front and center) But i am going to try to go all in on a self host scenario and try to have fun and learn including taking some python stuff to enhance my experience. Keep up the great work.

@TechnoTim 3 ай бұрын

@@skelious thank you! It’s never too late!

@santiago69 5 ай бұрын

Hello what is the name of the open source web based version of whisper that is mentioned please?

@user-ic6xf 5 ай бұрын

I was so ready for you to do a video on this.

@l0gic23 5 ай бұрын

Been waiting for this one. Let's go!

@andrewbennett5733 5 ай бұрын

I've watched a few videos on people setting up AI like this, but this just has the perfect blend of information AND instruction. Your 230K follows should be more like 2.30M. Thanks for sharing so much good stuff!

@TechnoTim 5 ай бұрын

Thank you so much! If you can believe it, it's actually more difficult to say less. I had to constantly remind myself to not ramble or go on side quests 😅. Thanks for noticing and a full tutorial will be coming soon on my other channel, @technotimtinkers

@andrewbennett5733 5 ай бұрын

@@TechnoTim I get that! I used to be an educator and it's hard not to tell everyone you meet all of the facts you know, especially when it's stuff that excites you. For the record I would happily listen to all of the side quests haha. And how did I not know about your other channel??? HERE I GO

@TechnoTim 5 ай бұрын

@@andrewbennett5733 Sometimes side quests are more fun than the main quest!

@andrewbennett5733 5 ай бұрын

I need you to go the @JeffGeerling route and start a third channel for side quests 🤣

@TechnoTim 5 ай бұрын

That's what Techno Tim Tinkers is for ;)

@Squirrel4Gir 5 ай бұрын

Gonna need a video of whisper. Also any chance it can be integrated into Plex drafting subtitles

@SyedZainUlHasan 5 ай бұрын

What are the system spec?

@gemargordon6885 5 ай бұрын

I’m loving Gemini for sure! It’s a bit better than llama or ChatGPT.

@FatalSkeptic 5 ай бұрын

haven't been able to get Home Assistant to give me any data back from AI agents, so frustrating

@truckerallikatuk 5 ай бұрын

Why do so many services go with such odd names? Like Sear XNG, which is how I'd pronounce it,, not search NG. That's how it's written after all.

@benhillard919 5 ай бұрын

I think in the area it comes out of the "x" makes a "ch" sound.

@TechnoTim 5 ай бұрын

@@benhillard919 I think so too, and I totally guessed so I hope that's how it's pronounced! Also, now that I see it again, it might be "searching". 🤣

@eliaskallelindholm8339 5 ай бұрын

This is the first time I have done something Techno Tim is showing before he did show it :D

@TechnoTim 5 ай бұрын

Ha! It took a while for me to build, integrate, and actually evaluate all of these systems!

@eliaskallelindholm8339 5 ай бұрын

Did you try the 70B model from Llama? (because I saw you also used the 8B model only) I read some stuff about this with 2 rtx 4070 or an Ada 6000 but I sadly dont have the hardware to run that purely on Grafic cards yet. The results should even be better than the payed ChatGPT stuff.

@eliaskallelindholm8339 5 ай бұрын

RTX 4090 with 24GB VRam I mean.

@OvernightSuccess721 5 ай бұрын

This is Tim’s evil twin brother NoTechTim. Insert Travolta meme looking for the tech.

@TechnoTim 5 ай бұрын

TechNOTim 😂

@lakshaynz 4 ай бұрын

Thank you 😊

@coletraintechgames2932 5 ай бұрын

Im ready for the how to! I have messed with it and have something running,but these features look awesome!

@TechnoTim 5 ай бұрын

Soon on my other channel!

@huseinnashr 5 ай бұрын

You have other channel?

@dcoidua 5 ай бұрын

Would this all run well on a 4090?

@brandonmansfield4328 5 ай бұрын

The bigger models need more vram than a single 4090 provides. You can run the smaller models just fine. You will lose out on some performance the bigger models provide but it runs!

@koevoet7288 5 ай бұрын

You can run homeassistant faster whisper on gpu, ive been doing it for months. I’ve got a dockerfile for this, lmk if you want it

@TechnoTim 5 ай бұрын

Thank you! I found a forked version of wyoming whisper but it didn't seem to help. I figured I'd wait for the official one to get updated.

@koevoet7288 5 ай бұрын

@@TechnoTim I’m also using someones fork, don’t remember if i changed it in any way but its running perfectly on my quadro p2000

@dhmybiker5034 5 ай бұрын

How to define a graphics card on Docker in Ubuntu

@TeambitDK 5 ай бұрын

This was really interresting, now I want to build it :D

@TheJoaolyraaraujo 5 ай бұрын

Mac Whisper is amazing

@TechnoTim 5 ай бұрын

100% agree! I bought it for better models and they work even better for scripted talks (like this). It's so accurate!

@OGH3294 5 ай бұрын

Can I do these things with a 4060 TI 16Gb version ?

@TechnoTim 5 ай бұрын

Yes, just use smaller models.

@xythiera7255 5 ай бұрын

you can but it will be realy slow

@OGH3294 5 ай бұрын

Ok. Plan dropped . I will just keep watching TechnoTim 😁.

@jensodotnet 5 ай бұрын

I currently run two 1070 (8gb), while a little slow it works fine, but for image generation you would need more vram, 8b llm models works fine on single 8gb vram. A 3090 is much faster and does images very well and can run larger models. imho integrating search had bigger impact than using a larger model of the same type(not tested 70b)

@sree_nath 5 ай бұрын

Love your videos, even though there are plenty of how to videos on these topics, I would love to hear it with your mesmerizing voice 😊

@TechnoTim 5 ай бұрын

🥰. thank you! Audio in this old wooden / plaster room is hard, so hopefully it sounds ok!

@showmequick2245 5 ай бұрын

Nice, welcome to Minnesota btw 😂

@itaco8066 5 ай бұрын

Great video! ❤

@Rohinthas 5 ай бұрын

I am generally skeptical of the AI hype but your way of going about it has piqued my interest. Hope more in-depth guides on setup and hardware are coming, subscribed ;)

@Rohinthas 5 ай бұрын

Ah I just found your homelab video! That answers some questions!

@tendosingh5682 2 ай бұрын

So power hungry the good AIs Gpus are.

@TechnoTim 2 ай бұрын

@@tendosingh5682 for sure.

@droneforfun5384 5 ай бұрын

Just subsribed for the upcoming guides on local Ai 😃🥰😎

@TechnoTim 5 ай бұрын

@@droneforfun5384 soon!!!

@djstraylight 5 ай бұрын

I see a future video of you building a dedicated AI server with multiple GPUs and benchmarking the tokens per second depending on the setup. It would get many views from r/LocalLLM or r/LocalLLaMA groups for sure.

@TechnoTim 5 ай бұрын

Thanks! Sounds awesome! I am always hesitant to share my content on subreddits other than my own, but if you feel this is worthy of it feel free to!

@Mrtrunks 5 ай бұрын

Glass off so we don’t see that DeskPi

@HenryBiglin 5 ай бұрын

Damn, you just sent me down a rabbit hole.. lol

@famousartguymeme 5 ай бұрын

this is awesome!

@voodoochild420ai 5 ай бұрын

nice vid

@TheRowie75 5 ай бұрын

Surfshark privacy??? Open Source?

@yewbacca 5 ай бұрын

What happened to Tim? Who is this imposter?

@TechnoTim 5 ай бұрын

🤓

@hamdibougattaya 5 ай бұрын

That's awesome, I like ur vids...

@alexey_sychev 5 ай бұрын

Sure, electricity is free nowadays

@TechnoTim 5 ай бұрын

It uses a lot less power than a gaming machine since you only use it in spurts, nothing new here, just shifting the workload that's using the card.

@ClayBellBrews 3 ай бұрын

I was promised cookies!!!!

@romayojr 5 ай бұрын

you forgot to mention the script for this video was made by AI 🤖

@TechnoTim 5 ай бұрын

Ha! Nope, 100% me! Bad grammar, bad jokes, stutters were all compliments of HI (Human Intelligence)

@romayojr 5 ай бұрын

@@TechnoTim i love AI but HI will always win my heart. but seriously, thanks for this video, i've been waiting for this one. now i need to integrate more stuff to my open webui!