stay up-to-date on the latest AI research with my newsletter! → mail.bycloud.ai/ Minor correction: GGUF is not the predecessor to GGML, GGUF is the successor to GGML. (thanks to danielmadstv)
@Sebastian-oz1lj6 ай бұрын
please make step by step guide how to install locally and private for example Mistral-7B. im trying to do this with multple guides and all time im stuck at something
@danielmadstv9 ай бұрын
Thanks for the video! Minor correction: GGUF is not the predecessor to GGML, GGUF is the successor to GGML.
@ambinintsoahasina9 ай бұрын
The amount of infos you give both in the videos and the descriptions is insane dude! Keep up the good work!
@noobicorn_gamer9 ай бұрын
I hoooonestly don't know how to feel about the thumbnails looking too similar to you-know-who that got me accidentally clicking this video but meh... One's gotta do what one's gotta do I guess.
@EdissonReinozo9 ай бұрын
Same
@Dedjkeorrn429 ай бұрын
I don't know who, who?
@nathanfrandon27989 ай бұрын
@@Dedjkeorrn42 Fireship
@NIkolla139 ай бұрын
Bycloud removed the frame and the grid background on his thumbnails, I think those work great as his signature style. I hope he keeps them
@seanrodrigues81849 ай бұрын
Let's just hope he doesn't get _burned~_
@flexoo79 ай бұрын
Poor Faraday nearly always gets overlooked when people talk about local LLMs, but it is without a doubt the most easy to use "install and run" solution. Unlike nearly all other options it's near-impossible to mess something up and default settings out of the box are not sub-par.
@hablalabiblia9 ай бұрын
How much is Faraday?
@flexoo78 ай бұрын
@@hablalabibliaLike all the best things in life - it's free.
@joure.v8 ай бұрын
@@hablalabiblia It's free and very easy to use! It's really meant just for chatting, it's basically a Silly Tavern kind of app, just not with that many options but it has its own back end with a focus on GGML models. If you're looking to just run models through character cards I'd say, give it a go!
@Elegant-Capybara5 ай бұрын
Faraday has outdated models and whenever you download models, you have to fumble with model cards and directory structures, plus it's not as fast as other options. LM Studio is better than Faraday.
@jbol24543 ай бұрын
@@Elegant-Capybara LM Studio is closed source.. no thanks..
@RetroPolly9 ай бұрын
A thousand thanks! Finding a good LLM model was a complete nightmare for me + it is difficult to figure out which formats is outdated and which - new hot stuff.
@Leo_Aqua9 ай бұрын
You can also use ollama. It even runs on a raspberry pi 5 (although slow)
@siddhubhai25084 ай бұрын
Yeah you're right OLLAMA can be run on the raspberry pi 5 even, but don't forget that ollama is made for using local llms, and if you try to run local llms like llama 3 or deepseek, ready for FBI at your home catching you for building a unknown b*mb. Important life lesson - FIRST TRY THEN CRY. GOOD LUCK! 💣
@NickH-o5l3 ай бұрын
I got Gemma 2b running on my end I got faster token per second with this really small model from alibaba (yes, it’s biased) with 0.5b parameters, but if you ask it right maybe there’s some use case But it’s kinda dumb
@siddhubhai25083 ай бұрын
@@NickH-o5l Low parameters = Low accuracy 👍
@아잉뀨잉뀨잉-u5q4 күн бұрын
I have been struggling on this issue for few months, and seems like this video already had the answer more than half an year ago. Thank you for your awesome vid!! Really love your work!
@Veptis9 ай бұрын
Now we just need a cheap inference card with 128GB memory to run 70B models locally... Maybe we can hope for Qualcomm
@cbuchner19 ай бұрын
I’d love to see AI inference accelerator cards with dual or quad channel DIMM slots.
@Veptis9 ай бұрын
@@cbuchner1 Qualcomm AI 100 Ultra is using LPDDR5
@nyxilos91679 ай бұрын
groq is using something of the sort, an LPU. although only usable through an api. no consumer cards yet that i know of, but it shows the trend towards it
@Veptis9 ай бұрын
@@nyxilos9167 you can buy a single groq card right now. it costs 21k and has 230MB on board. So to run 70B models at fp16 you need like 572 cards.... which is several racks. 14+ million to buy and 30kW to power. It will run the model at 400 tok/s easily. You can buy a ready made 8x H100 box for maybe 350k and run that with like 8kW and it might be slower than the groq card. none of that are consumer solutions. The one I am hoping for is Qualcomm AI 100 Ultra. Which comes with 128GB LPDDR5 and 150W. They say it's for edge inference, but it would be perfect for workstation.
@Vifnis9 ай бұрын
idk Qualcomm SoCs are for phones mostly... maybe iPhone 30 will have it XD
@CarinaAdele5CA2 ай бұрын
The market trend can turn around very quickly. In fact, the indexes often switch from a bear market to a bull market when the news is at its worst and the mood of investors is at its lowest point. I read an article of people that grossed profits up to $150k during this crash, what are the best stocks to buy now or put on a watchlist?
@AmaliaGiselae8g2 ай бұрын
In particular, amid inflation, investors should exercise caution when it comes to their exposure and new purchases. It is only feasible to get such high yields during a recession with the guidance of a qualified specialist or reliable counsel.
@AlbrechtChristoph0162 ай бұрын
True, initially I wasn't quite impressed with my gains, opposed to my previous performances, I was doing so badly, figured I needed to diverssify into better assets, I touched base with a portfolio-advisor and that same year, I pulled a net gain of 550k...that's like 7times more than I average on my own.
@BerchtwaldElias1EH2 ай бұрын
This aligns perfectly with my desire to organize my finances prior to retirement. Could you provide me with access to your advisor?
@AlbrechtChristoph0162 ай бұрын
NICOLE ANASTASIA PLUMLEE’ is the licensed fiduciary I use. Just research the name. You’d find necessary details to work with a correspondence to set up an appointment.
@BerchtwaldElias1EH2 ай бұрын
She appears to be well-educated and well-read. I ran an online search on her name and came across her website; thank you for sharing.
@Paulo-ut1li9 ай бұрын
Boy, Chat with RTX is my personnel oracle for now on. Its RAG really indexes local documents without that whole hallucination from previous tools.
@chevalier56915 ай бұрын
I don't get these complaints about the thumbnails. Are you guys new to youtube? We have been through the era of fake or nsfw thumbnails and yet you're still complaining about similar style? If you're not willing to check the uploader channel name or profile, then enjoy getting scammed by phishing links online.
@NeostormXLMAX8 ай бұрын
but anyways this video was very helpful because no one made it very clear on what are the best front end interfaces to install, I kept trying to make one myself to no avail and give up after a while after testing stuff in the command prompt
@bossdaily55759 ай бұрын
Nice video! Can you do a video about fine tuning a model?
@papakamirneron25149 ай бұрын
Immensely helpful video. I hope the future has tonnes of user controlled locally ran llms for us in store!
@remboldt039 ай бұрын
Stup osing Fireship thumbnails😭
@idk-dk7bq5 ай бұрын
Y
@aouyiu4 ай бұрын
Stop neglecting proofreading comments 😭
@remboldt034 ай бұрын
@@aouyiu I apologize. I normally proofread all my comments, but I suspect that I was drunk while writing this one. As I don’t like editing comments afterwards, I didn’t change the spelling mistakes.
@HPTRUE2 ай бұрын
Never heard of fireship....
@juanantonionieblafigueroa3778 ай бұрын
Your videos are way more fun than my algebra homework
@robertmazurowski59747 ай бұрын
I was pretty sure this was a fireship video, but the video is great and informative. Exacly what I was looking for.
@magfal9 ай бұрын
The one thing I hope to see soon is offloading different layers to different GPUs I have a 4090 mobile in my laptop and an RX6800 in my eGPU. I do have 96GB of system memory in addition to these two 16GB cards so I can do some fun stuff already.
@H1kari_19 ай бұрын
I love your adhd-friendly edits cloudy.
@the_gobbo9 ай бұрын
I can finally start my side project to take over the world, thanks!
@johnsarvosky5338 ай бұрын
Thanks, this is great. Please make a comprehensive video on Fine-tuning locally 101..Cheers
@lunadelinte9 ай бұрын
that was awesome, thanks for the concise information bycloud! 🔥
@jawbone12189 ай бұрын
Curious headcount? 🙋How many of us watching these type videos are not developers?
@proflead3 ай бұрын
A video about fine tuning a model would be nice!
@a.........1._.2..__..._.....__8 ай бұрын
Ive been hamfisting my way through llms for over year. Just ramming squares into circles till it worked since informations so sporadic. 100% checking out your other videos. Learned more in 5 min then 4 hours reading github docs
@trolik91137 ай бұрын
Absolutely fantastic and informative video. Well done! I will say I feel like the information certainly speaks to the grip that OpenAI has, especially from a development standpoint, despite the whole video being about open-source models. The procedures, time, research, and money required for any rando or small (even mid size) business owners to integrate open-source and local AI without any practical knowledge about it is near impossible. OpenAI wraps up RAG, "fine-tuning", and memory nice and neat into Assistants which can be easily called via the API. It would be amazing to have a completely standardized system that allows for the same type of application, but geared towards the variety of open-source models out there. Some platforms like NatDev let you compare multiple models based on the same input. Being able to see how RAG and fine tuning affects different models, both open-source and non, from the same platform would be unreal.
@pedrogorilla4839 ай бұрын
Where ollama?
@sZenji9 ай бұрын
agree, with the new windows installer its so easy for everyone to get local models
@4.0.49 ай бұрын
For a while it was only Mac-based, so it saw limited use with most AI folks who have Nvidia cards. If you're stuck on a Mac I hear it's really the better one for that.
@zikwin9 ай бұрын
wow now on support windows too ?@@sZenji
@babbagebrassworks42789 ай бұрын
I use it on my Raspberry Pi5 to run LMM's, which is seriously cool, er hot when working.
@lintalyor65355 ай бұрын
I like this simple explanation with the video editing thanks!
@WINTERMUTE_AI9 ай бұрын
LM STUDIO and TRINITY 1.2 is my favorite non-GPT entities!
@vladislava52379 ай бұрын
Very nice, tons of useful info Thank you!
@plagiats9 ай бұрын
Ollama + openwebui is the way to go. Same ui as ChatGPT, plenty of convenient functions. It's a no brainer.
@NoidoDev4 ай бұрын
The important part for me is accessing it from CLI or Python. Ideally, doing the whole configuration in there. Because I need it to be automatized (no NodeJS of course).
@Kevin.Kawchak6 ай бұрын
Thank you for the discussion.
@WINTERMUTE_AI9 ай бұрын
I keep canceling my GPT4 subscription and then renewing it... 'Just when I thought I was out, they pull me back in.' GPT4 reminded me of that phrase from The Godfather. :)
@DanielHayes-p2u2 ай бұрын
The thumbnail style is just like Fireship
@Anthonyg50059 ай бұрын
EXL2 does support AMD GPUs. Turbo bought a couple just to make sure it runs with rocm
@hyposlasher9 ай бұрын
2:17 Bro lives in the future where M4 is already released
@artursvancans97029 ай бұрын
You pay 20$ for convenience. Spending 1 day to set up the flow, Waiting 2 minutes every time for your model to load when you have a quick question, your GPU + CPU setting your room on fire cuz of how hot they're running... Unless you need some really specific usecase that cloud models censor, then it's just easier to pay those 20$ for instant access
@thatguyalex28358 ай бұрын
Patience is a virtue. I got Mistral 7B running on an 2018 laptop, and it takes two minutes to respond, but it works well. Why have 8 GB of RAM when I don't use all 8 GB. The AI uses all my RAM. :) But, for people who have to use AI for a job, $20 is cheap, and workplaces cover the cost. For AI at home, a fast enough computer could work.
@fennecthechoosenone51899 ай бұрын
Koboldcpp crying in the corner
@rotors_taker_0h9 ай бұрын
Basically to understand this video one should already know everything mentioned in this video by heart.
@Trahloc9 ай бұрын
Eh, it provides terms to hunt for and sometimes that's all someone needs, a starting point. The video is short and covers a lot of ground.
@MonkeeGeenyuss9 ай бұрын
Dude wants a 16 part lecture to explain it all😂
@rotors_taker_0h9 ай бұрын
@@MonkeeGeenyuss I mean, I can only follow because I know it all and cannot imagine someone unfamiliar to understand anything from this firehose, lol.
@fra48979 ай бұрын
what about ollama as a backend, what is your take on that? Thank you so much for the video, sending love from switzerland
@zyxwvutsrqponmlkh9 ай бұрын
I just really really like how many serious people have to say ooobabooga. It's like, almost as good of a joke on science as when that guy named the seventh planet.
@valeriapadilla58608 ай бұрын
Hope this works better than the time I tried to download more RAM
@shoddits21569 ай бұрын
does the a Giveaway has country restriction?? I mean maybe you can't send it overseas due to shipping cost or something else.
@krzysztofmaliszewski25899 ай бұрын
That's a great question.
@aketo80827 ай бұрын
Thank you. Very interessting. Is it possible in LM Studio to work with own files? Or create own LLM or extend LLM for own cases?
@exaltedjoseph79638 ай бұрын
Your thumbnail reminds me of fireship
@itisallaboutspeed9 ай бұрын
as a car content creator i approve this video
@mzafarr6 ай бұрын
Please make a video about making our locally running LLMs available for others to use maybe like our own API which people can use or a webUI interface to use our local LLM.
@tutacatАй бұрын
You don't need finetuning, just do more prompts
@dipereira01232 ай бұрын
To be fair, at 8:53, the 10 bucks you will be "saving" from running you LLM locally instead of paying github copilot will probably become more expensive in your energy bill... (your GPU will be working at max capacity) and lets not talk about the time it will take to set it all up unfortunatelly... the AI rev is something that will be in the hands of big corps
@abdelkaioumbouaicha9 ай бұрын
📝 Summary of Key Points: 📌 The video discusses the landscape of AI services in 2024, highlighting the abundance of hiring freezes and the prevalence of subscription-based AI services. 🧐 Various user interfaces for running AI chatbots and language models locally are explored, including UABA, Silly Tarvin, LM Studio, and Axel AO. 🚀 The importance of choosing the right model format, understanding context length, and utilizing CPU offloading for running local language models efficiently is emphasized. 💡 Additional Insights and Observations: 💬 "Garbage in, garbage out" is a crucial principle highlighted when fine-tuning AI models, emphasizing the significance of quality training data. 📊 Different model formats like GGF, AWQ, and EXL 2 are explained, showcasing how they optimize model size and performance. 📣 Concluding Remarks: The video provides a comprehensive guide on running AI chatbots and language models locally, emphasizing the importance of model selection, context length, and fine-tuning techniques. Understanding these key aspects can help individuals navigate the AI landscape effectively and optimize performance while saving costs. Generated using TalkBud
@Afro__Joe4 ай бұрын
Lol nobody reads anymore by these comments. Ooh shiny picture, click! Thanks for the info, I was looking for a video like this yesterday.
@seasong76558 ай бұрын
I'm glad I avoided the 20 dollar subscription by buying a $500 GPU and $100 ram
@rougeseventeen6 ай бұрын
thanks, this videos is very funny and helpful!
@fmachine867 ай бұрын
You're not fireship, cut it out.
@fmachine866 ай бұрын
@@MusicalMessagges The he should make his own videos and not try to copy someone else.
@moresignal6 ай бұрын
Yep, when you rip off the style of Fireship with your title card you're tricking our brains into thinking we're going to see his content. Develop your own style.
@MusicalMessagges6 ай бұрын
@@fmachine86 In my defense, I realized it was copying fireship after I typed the message and didn't realise I sent it
@fmachine866 ай бұрын
@@MusicalMessagges Saul Goodman.
@bungamawar-i8m6 ай бұрын
Yeah he's better
@siddhubhai25084 ай бұрын
Oh Fireship's second hidden channel! 😂😂
@u13e126 ай бұрын
Just to clarify then. For inference speed is more important GDDR6 will be GDDR5, but for fine tuning more more having 2x the amount of GDDR5 will be the GDDR6?
@bigglyguy84299 ай бұрын
How did you miss Faraday? Very easy to use and runs faster than LM Studio
@rumali_roti74064 ай бұрын
You added models in the description but specify their usage. Can you add more details, please?
@alfamari76753 ай бұрын
So acording to the description llama 3 killed deepseek coder, wizard, and mistral? I just started getting into this stuff recently and those were some of the top performing models I had heard about (though they existed before llama 3).
@LinkEX4 ай бұрын
1:32 I typed "i am new to github" into my search bar, and sure enough, the autocompletion suggested the thread title. Came for the replies, which were more tame and not as many than I had expected. I initially thought this was an older image meme and you merely reused the screenshot. But since the original post was in fact posted 5 months ago (like this video), and the screenshot was shot 15 minutes after the post, I conclude you probably frequent r/github.
@NeostormXLMAX8 ай бұрын
I spent so much time trying to get something like this set up, but ended up back to gpt, most of these models are also censored just like gpt, and unlike gpt they are much slower AND on top of that they canot use plugins or special api's that let you access the internet or generate images etc. its sad but currently gpt has no peer
8 ай бұрын
So, AI is the new computer, everyone will have one? Seems good to me. I wonder how the job market will be, hardware will be on top for sure and Open AI will still being a giant. But the thing is how other industries will be affected.
@alan_yong9 ай бұрын
🎯 Key Takeaways for quick navigation: 00:28 *🤖 Running AI chatbots and LM models locally provides flexibility and avoids subscription costs.* 00:43 *📊 Choosing the right user interface (UI) for local AI model usage is crucial, depending on individual needs.* 02:05 *🖥️ UABA is a versatile UI choice for running AI models locally, supported across various operating systems and hardware.* 02:33 *💡 Installing UABA enables access to free and open-source models on Hugging Face, simplifying the model selection process.* 05:18 *🤔 Context length is crucial for AI models' effectiveness, affecting their ability to process prompts accurately.* 06:12 *⚙️ CPU offloading allows running large models even with limited VRAM, leveraging CPU and system RAM resources.* 06:52 *🚀 Hardware acceleration frameworks like VM inference engine and TensorRTLM enhance model inference speed significantly.* 07:36 *🎓 Fine-tuning models with tools like Kora enables customization for specific tasks, enhancing AI capabilities.* 08:47 *💰 Running local LM models offers cost-saving benefits and customization options, making it an attractive option in the AI landscape.* Made with HARPA AI
@Paulo-ut1li9 ай бұрын
Please make a video on how to fine tune a model using local documents.
@4.0.49 ай бұрын
Dunno why my comment isn't going through, but try Kobold! Better for GGUF. Current fav is "Crunchy Onion" Q4_K_M GGUF. Give it a taste! 10t/s on a 3090 and pretty smart.
@violet-trash6 ай бұрын
Kind of sucks that the GPU brand that works best with AI is the one that skimps on VRAM. 💀
@dungeon49719 ай бұрын
what about ollama
@kernsanders39739 ай бұрын
In regards to context, would LLM Lora's help with that? Lets say im busy with story writer LLM and the fantasy world I'm working with would be as big as something like Middle Earth from LOTRs. Would a Lora help with that? Like if I train a Lora on all our past chat history about the story etc. Also more text regarding the lore of places and history of characters and family trees. So taking that into consideration, would that assist in keeping the context low? So I don't need to keep a detailed summerized chat history etc. What would the requirement be for training such a Lora and what would the minimum text dataset require for a coherent training?
@squfucs9 ай бұрын
i run LM Studio and i think its great, good video my dude
@Cergorach8 ай бұрын
I was highly disappointed, Shōji Kawamori and Kazutaka Miyatake are not on the panel about Transformers... ;)
@bibr23939 ай бұрын
>this model list
@cristianionascu8 ай бұрын
I guess my machine is not good enough, 2019 intel imac, because running any model locally is usually lagging way behind ChatGPT 3, Gemini, Perplexity, etc.
@GraveUypo5 ай бұрын
running LMs on linux and windows, for some unknown (to me) reason, linux is over 5 times as fast as windows at prompt evaluation. it's not even close.
@samuelpeery5 ай бұрын
Total newbie with running an LLM locally. What is the best llm for summarizing books and being able to ask questions about the books?
@hardik49423 ай бұрын
What's the best for investigation and data analysis?
@Zonca29 ай бұрын
Wish you made more down-to-earth guide on how best to chat up waifus in Sillytavern, the community is super small for what you can achieve with minimal knowledge, running something like Noromaid on google collab, for completly free and uncensored roleplay, it needs to get more well known, plus I dont really know my way around the different settings and models, having a hard time to get the waifus to put in more dialogue over descriptions for example.
@Quell__9 ай бұрын
jesus I truly hate how intertwined the ai community is with the anime community
@kernsanders39739 ай бұрын
Mistral buyout from MS was a huge blow to the open source AI community, the future is looking very corporate controlled...
@AleNovelasLigeras9 ай бұрын
We're gonna need a bigger GPU
@Saeed_al-moumen6 ай бұрын
my brain hurts ( i only reached 4:08 I just watched the video to see if there anything I need to know about sillytavren since that what I searched but i don't thinks there any more )
@gregNFL7 ай бұрын
Stanford open source LLama model is free. 🎉
@joseph-ianex8 ай бұрын
with local models are you able to make much longer responses given that you have enough ram and vram?
@harrymahon48 ай бұрын
We love LM Studio 😫
@kingki19536 ай бұрын
I thought it was a video from Fireship 😂
@RedOneM9 ай бұрын
What 3 models do you recommend with 24 GB VRAM? Preferably 21-22GB / 24GB in practical usage.
@nyxilos91679 ай бұрын
huggingface lists models with their respective memory requirements. any 7b model will likely work very well and be under 21gb. you could also go with a bigger model but at a lower quantization. mistral models are among the most popular, open source, and very competitive.
@Necessarius9 ай бұрын
Fireship thumbnail is working for me
@keffbarn7 ай бұрын
OOOGABOOOOGAAAAH 💪😎🍺
@zzador4 ай бұрын
Damn...I got my 2015 Laptop ready with 4GB of RAM and BAM: Every model needs at least 8GB RAM. What a bummer.
@christopheralvarez10909 ай бұрын
Where do I upload the photo once GTC comes around ?
@Frab19859 ай бұрын
The best RP model atm is Kunoichi v2
@AshishKumar-kv4hr9 ай бұрын
Are you the same as fireship?
@swaggitypigfig84139 ай бұрын
Different human being
@I_SEE_RED9 ай бұрын
it’s fireship experimenting with 100% channel automation
@XxXVideoVeiwerXxX9 ай бұрын
silly tarven....he used AI voice for this
@NatalyOrtíz-z3m8 ай бұрын
Good to know.
@felipetesta3 ай бұрын
Anyway I can set a local AI that can access PDF files from my university folder and help me summarize and introduce the themes I have to study using the PDFs as primary source of content?
@tja92129 ай бұрын
timecode 1:18 is a very questionable use of footage
@Evagoesbrrr9 ай бұрын
Next step - chub chars. Next step - writing own bots. Next step - quant own models. Next step - uncensor models with ability to update context. Next step - mooooom, i need money for new nasa servers! 256k context is too small! My waifu getting stupid!
@dustindustir5219 ай бұрын
Step 4 is Clear, but How can I unlock step 3? I only see questionmarks. Do I have to do step 1 and 2 to unlock what I have to do at step 3, Or do I just need to gain more XP for the unlock. Maybe I just have to do step 4 twice to make up for the missing third step...
@knoopx9 ай бұрын
lm studio/ollama are probably the simplest ways to get started, not sure why you picked these ones
@FlafyDev9 ай бұрын
this isn't fireship.. where am I?
@myname-mz3lo8 ай бұрын
same . the thumbnail got me and then i realised this guy took fireship's entire style