Replace Github Copilot with a Local LLM

  Рет қаралды 123,381

Matthew Grdinic

Matthew Grdinic

3 ай бұрын

If you're a coder you may have heard of are already using Github Copilot. Recent advances have made the ability to run your own LLM for code completions and chat not only possible, but in many ways superior to paid services. In this video you'll see how easy it is to set up, and what it's like to use!
Please note while I'm incredibly lucky to have a higher end MacBook and 4090, you do not need such high-end hardware to use local LLM's. Everything shown in this video is free, so you've got nothing to lose trying it out yourself!
LM Studio - lmstudio.ai/
Continue - continue.dev/docs/intro

Пікірлер: 198
@toofaeded5260
@toofaeded5260 3 ай бұрын
Might want to clarify to potential buyers of new computers there is a difference between RAM and VRAM. You need lots of "VRAM" on your graphics card if want to use "GPU Offload" in the software which makes it run significantly faster than using your CPU and system RAM to do the same task. Great video though.
@chevon5707
@chevon5707 3 ай бұрын
On macs the RAM is shared
@MeisterAlucard
@MeisterAlucard 3 ай бұрын
Is it possible to do a partial GPU offload, or is it everything or nothing ?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Yup. On LM Studio PC you set the number of layers, the higher the number the more GPU resources used. On ARM Macs yes, RAM is shared, and your option is to enable Metal or not.
@JacobSnover
@JacobSnover 3 ай бұрын
​@@matthewgrdinic1238 I always enable Metal 🤘
@alexdubois6585
@alexdubois6585 3 ай бұрын
and also for that reason, you pay for copilot (and give your data) or you pay for VRAM... Not really free. Return on investment might be a bit faster on discreate GPU... That you can upgrade...
@RShakes
@RShakes 3 ай бұрын
Your channel is going to blow up and you deserve it! Fantastic info, concise and even gave me hints to things I may not have known about like the LM Studio UI hint about Full GPU Offload. Also interesting take on paying for cloud spellcheck, I'd agree with you!
@RobertLugg
@RobertLugg 3 ай бұрын
Your last question was amazing. Never thought about it that way.
@programming8339
@programming8339 3 ай бұрын
A lot of great knowledge compressed in this 5 min video. Thank you!
@levvayner4509
@levvayner4509 3 ай бұрын
Excellent work. I was planning to write my own vs code extension but you just saved me a great deal of time. Thank you!
@mikee2765
@mikee2765 2 ай бұрын
Clear, concise explanation of the pros/cons of using local LLM for code assist
@phobosmoon4643
@phobosmoon4643 3 ай бұрын
3:15 dude, you illustrated the 'break tasks into smaller steps and then build-up'-thing PERFECTLY. Well done! Its surprisingly hard to do this. I think about how to do it programmatically a lot.
@Gabriel-iq6ug
@Gabriel-iq6ug 3 ай бұрын
So much knowledge compressed in only 5 minutes. Great job! I will give it a try to see if it would be possible to make it faster on Apple silicon laptops using MLX
@Aegilops
@Aegilops 2 ай бұрын
Hey Matthew. First video of yours that KZbin recommended and I liked and subbed. I tried ollama with a downloaded model and it ran only on the CPU so was staggeringly slow, but I'm very tempted to try this out (lucky enough to have a 4090). I'm also using AWS Code Whisperer as the price is right, so am thinking your suggestion of local LLM + Code Whisperer might be the cheap way to go. Great pacing of video, great production quality, you have a likeable personality, factual, and didn't waste the viewers time. Good job. Deserves more subs.
@MahendraSingh-ko8le
@MahendraSingh-ko8le 3 ай бұрын
Only 538 subscribers? Too good of a content. Thanks You.
@madeniran
@madeniran 3 ай бұрын
Thanks for sharing, I think this is very important when it comes to Data Security.
@Franiveliuselmago
@Franiveliuselmago 3 ай бұрын
This is great. Didn't know I could use LM Studio like this. Also FYI there's a free alternative to Copilot called Codeium
@square_and_compass
@square_and_compass 3 ай бұрын
Keep up the momentum and you will be arguably among the well-organized content creators, I really liked your explanation and demonstration process
@therobpratt
@therobpratt 3 ай бұрын
Thank you for covering these topics - very informative!
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Glad it was helpful! I believe these tools will become common place in the next few years, it's fun to be here where it all starts.
@bribri546
@bribri546 3 ай бұрын
Great video Matt! Silly I came across this video. I have been playing around with integrating LLMs with Neovim some helpful content here! Hope all is well!
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Had not heard of Neovim before, looks like an outstanding project!
@rosszeiger
@rosszeiger 2 ай бұрын
Great video and tutorial! Very well explained.
@Gunzy83
@Gunzy83 3 ай бұрын
Great video. Just earned my subscription. I'm a heavy copilot user and have a machine with a great gpu (little bit short of the 4090's VRAM though) so ill be keen to see how your teating of completions go (will have time to play myself when im back from a long awaited vacation).
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Thank you so much for the sub, it very much appreciated! It's late so I can't recall if I mentioned in this video, but somewhat shockingly the MacBook is actually faster then the 4090 with LLMs. Granted this isn't with the Tensor RT LLM framework (which should give a 2x bump), but for me, shows how lots of RAM and dedicated, modern hardware make Inference not only possible, but surprisingly easy. This bodes well for the future of local AI, and I'm super excited to see the PC space evolve.
@TheStickofWar
@TheStickofWar 3 ай бұрын
Been using gen.nvim and Ollama for a while on a MacBook M1 chip. Will try this approach
@paulywalnutz5855
@paulywalnutz5855 3 ай бұрын
Great content! Straight to it and i leart something
@mammothcode
@mammothcode 3 ай бұрын
Hey this is excellent! This is exactly what i was looking for recently
@mammothcode
@mammothcode 3 ай бұрын
Is there any way perhaps we can configure that vscode extension to point to a hosted runtime of the same llms? There are a couple of hosted llm providers that seem to be serving llms of our choice for very cheap prices
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Not mentioned in the video, but Continue defaults to hosted LLM's. This video was to show how the local side works, but it's not required.
@BauldyBoys
@BauldyBoys 3 ай бұрын
I hate inline completion, within a couple days of using co-pilot I noticed the way I was coding changed. Instead of typing through something I would type a couple letters and see if the ai would read my mind correctly. Sometimes it would don't get me wrong but it was overall a bad habit I didn't want to encourage. This tool seems perfect for me as long as I'm working on my desktop.
@andrii_suprunenko
@andrii_suprunenko 3 ай бұрын
I agree
@RickGladwin
@RickGladwin 3 ай бұрын
Same. I trialed GitHub Copilot for a while but ended up ditching it. I found I was spending time debugging generated code rather than actually understanding problems and solutions. And debugging someone else’s code, for me, is NOT my favourite part of software engineering! 😂
@cnmoro55
@cnmoro55 3 ай бұрын
When you really understand how copilot works, and how to actually BEGIN writing and structuring the code, in order to trigger the right completion, then you start speeding things up. At first, I was just like you, but then I got the hang of it, and man, copilot is awesome.
@Fanaz10
@Fanaz10 2 ай бұрын
@@cnmoro55 yeaaah, it's amazing when it starts "getting" you.
@niksingh710
@niksingh710 3 ай бұрын
you are underrated. like it
@juj1988
@juj1988 3 ай бұрын
Thanks Matt!!!
@jf3518
@jf3518 3 ай бұрын
Worked like a charm.
@d3mist0clesgee12
@d3mist0clesgee12 3 ай бұрын
Great stuff, thanks.
@AdamDjellouli
@AdamDjellouli 3 ай бұрын
Really interesting. Thank you.
@apolloangevin9743
@apolloangevin9743 3 ай бұрын
Question: would you get much benefit by using multiple mid-range GPUs for the extra VRAM, for instance, I have a few 3060s I could use for a dedicated machine if I wanted to go down that path.
@masnwilliams
@masnwilliams 3 ай бұрын
Really love this format of video. Would love to get your thoughts on Pieces for Developers. We are taking a local-first approach when it comes to developer workflows, allowing users to easily download local LLMS like Llama2 and Mistral to use in our Pieces Copilot. We do also support the latest cloud llms as well.
@Mr_Magnetar_
@Mr_Magnetar_ 3 ай бұрын
Good video. It would be cool to implement a project/model (I don't quite understand this) that would know about the entire codebase of the project. For now, autocomplete performs the function of searching and copying a solution from StackOverflow. I think if an LLM knew all the code and understood what it does, we could get significantly better code.
@DRAM-68
@DRAM-68 3 ай бұрын
Great content. Your videos have gotten me interested in local AI processing. Next computer…M3 Ultra with max RAM.
@steveoc64
@steveoc64 3 ай бұрын
Ha, same here. I’ve been comfortable with 16/32 go for my Mac’s. But now suddenly I can justify a 128gb monster.
@CorrosiveCitrus
@CorrosiveCitrus 3 ай бұрын
"Would you pay a cloud service for spell check?" Well said.
@howardjones543
@howardjones543 2 ай бұрын
Would you pay $1900 for a spell check co-processor? I know you can do plenty of other things with it, but that's still the price of a 24G RTX 4090. That's a lot of cloud credit.
@CorrosiveCitrus
@CorrosiveCitrus 2 ай бұрын
@@howardjones543 Yeah the hardware needs are expensive atm, but hopefully (and I would think most definitely) that will start to become less of an issue. I think for the people that already have the hardware today though, the choice is very obvious.
@skejeton
@skejeton 2 ай бұрын
Well said, but I wouldn't pay $2000 on a GPU to use LLM only on that computer
@howardjones543
@howardjones543 2 ай бұрын
@@skejeton Sure but you can play Starfield and encode video for significantly less. THIS is the application that requires all that VRAM - typical game requirements, even for heavy games, don't get near 24GB. The implication at the end of the video is that you would save $10/month with this free local LLM, but that's bending the truth a bit. If things like these new NPU-equipped processors and different models can remove the need for these gigantic GPUs, then it might be interesting.
@paulojose7568
@paulojose7568 2 ай бұрын
@@howardjones543 the benefit of freedom tho (and privacy?)
@clpr635
@clpr635 2 ай бұрын
nice I liked that you kept it short
@UnrealFocus-le7ox
@UnrealFocus-le7ox 3 ай бұрын
great video subbed
@NeverCodeAlone
@NeverCodeAlone 2 ай бұрын
Very good. Thx a lot.
@gusdiaz
@gusdiaz 3 ай бұрын
It seems Tab autocomplete is now available in pre-release (experimental), would you be able to setup a tutorial as a follow up for this if possible? Thank you so much!
@aGj2fiebP3ekso7wQpnd1Lhd
@aGj2fiebP3ekso7wQpnd1Lhd 3 ай бұрын
I use a good portion of my available computer resources just developing. At $10/mo, it's cheaper than the additional hardware to self-host currently, plus I don't have to manage, configure, or maintain anything. Copilot gets smarter automatically with time. For example, it's made huge strides on PHP lately.
@KucheKlizma
@KucheKlizma 3 ай бұрын
I learned a lot of things I didn't know before, I thought that hosting local llms is much more HW restrictive. Might give it a spin.
@praveensanap
@praveensanap 2 ай бұрын
can you make a video demoing how to fine tune the model with private code repositories.
@ariqahmer5188
@ariqahmer5188 3 ай бұрын
I was wondering... How about having a dedicated server-like PC at home to run these models and have it connected to the network so it's available to most of the devices on the network?
@JasonWho
@JasonWho 3 ай бұрын
Yessss, would love this. Lightweight app for devices, access to AI in VSCode or image gen locally without being on the physical machine.
@aGj2fiebP3ekso7wQpnd1Lhd
@aGj2fiebP3ekso7wQpnd1Lhd 3 ай бұрын
That's what I would do and have a TPM or two in it. Used ones come down in price pretty quickly.
@connoisseurofcookies2047
@connoisseurofcookies2047 2 ай бұрын
You could implement a REST API for a home server. If it only runs on the local network and doesn't talk to the internet at all, maybe on an isolated VLAN, there shouldn't be any security issues to worry about.
@petercooper4536
@petercooper4536 2 ай бұрын
Totally possible. I have Win11, Ollama running Dolphin Mixtral, ollama-webui running in a Docker locally (looks just like ChatGPT), a tunnel set up through Cloud flare to a subdomain in my own DNS. It's available externally, securely. All because work blocks Open ai😄 but I can access my own local LLM anywhere. API could be exposed in the same way
@CaribSurfKing1
@CaribSurfKing1 2 ай бұрын
How many tokens can be used in repo context window? Can it do \explain I.e. understanding a repo and app you have never seen before?
@helloworld7796
@helloworld7796 3 ай бұрын
Hey, I see in the comments people do understand LLM really well. I never played with it, what would you recommend me as a start? I am a software developer, so understanding it wont be an issue
@AlbertCloete
@AlbertCloete 3 ай бұрын
Interesting that you need such a powerful graphics card for it. I always thought you only need that power for training the model, not if you only want to use the trained model.
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
A great point - I've got a related comment on the same theme I need to reply to but in short: I *happen* to have this beefy laptop but it's not at all required for local inference! The biggest constraint right now is memory, but their are plenty of (much) smaller models that fit just fine in more modest hardware. Long-term AI's here to stay, and AMD and Intel are already adding AI specific hardware to their chips. Yes dedicated hardware will be faster, but in 2 years local inference on CPU will be more than fast enough.
@subzerosumgame
@subzerosumgame Ай бұрын
I have an Macbook pro m3 11/14 with 36gb would that make it work?
@electroheadfx
@electroheadfx 3 ай бұрын
hey, thanks for the video, with how much ram you bought your M3 Max ?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Apologies for not listing this more clearly - 36
@coalacorey
@coalacorey 3 ай бұрын
Interesting!
@dlsniper
@dlsniper 3 ай бұрын
Would this be possible to run on an AMD 7900XTX? It has 24GB of VRAM, but I'm not sure if CUDA is a must or not for these tasks?
@Phasma6969
@Phasma6969 3 ай бұрын
Yes there are alternatives which let you use a local inference server with many Llama CPP has one built in too But I'd recommend another alternative Some use ollama as the inference endpoint You could even use others like fireworks or a custom endpoint
@juicygirls3989
@juicygirls3989 3 ай бұрын
using dolphin mixtral with ollama and vscode extension on 4090 and it's working great, must say never tried copilot or it's alternatives, also it helps for boilerplate code, for more serious tasks it sucks as expected
@coreyward
@coreyward 2 ай бұрын
I tried Dolphin Mixtral out on my M3 Max and it wasn't all that great at a ReactJS code exercise that I’ve used in interviews. It came back with code that didn't meet basically any of the requirements, so I nudged it the same I would have done for a candidate, but really couldn't produce anything better before it said it wasn't able to fix them with the context it had (incorrect). I tried the same prompt with OpenAI’s GPT-4 via ChatGPT and it did better in the initial shot but made some mistakes, which I again prompted like I would have done a candidate and it nailed the exercise on its 3rd response. It took devs with ~3-5 years of total experience (at least 1yr with React) around 25-45 minutes to complete this, so GPT-4 nailing it in about 2 minutes is pretty good.
@jumanjimusic4094
@jumanjimusic4094 2 ай бұрын
Mixtral not mistral.
@RegalWK
@RegalWK 2 ай бұрын
What about privacy? I mean about copyright? Can we use it with company/client code?
@secretythandle
@secretythandle Ай бұрын
To me the cost of Copilot is such a small factor, if you consider the cost of all the hardware required to run a half decent model, and the ongoing electricity bills, you're paying FAAAR more for a local setup. But the real beauty of LLMs is the privacy, being able to put whatever you in there and not send it to Microsoft to use for later fine tuning is a huge win and makes this investment worth it. Not to mention, there is no fear of that service one day just being taken away from you, or censored to the point that it's completely useless. Now if only GPUs weren't so damn expensive... thanks big tech.
@eotikurac
@eotikurac 2 ай бұрын
i bought copilot about a year ago but was unable to find clients and i never really used it :( this is an interesting alternative
@uncleJuancho
@uncleJuancho 2 ай бұрын
this is the first time I watched your video and I can’t stop noticing that your mic is behind you
@SuperCombatarms
@SuperCombatarms 2 ай бұрын
Is there anything I can do with 12gb VRAM that is worth trying?
@kenneth_romero
@kenneth_romero 3 ай бұрын
pretty cool. i wonder if you could get a model to be specific to a language to make them smaller to run for most commodity hardware. don't wanna shill out for a 4090, when Codeium is copilot like and is also free.
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
An excellent idea, and the answer is yes! Now I haven't heard of or tried language-specific models myself, but one project I'm super keep to see released is Nvidia's "Chat With RTX". The basic idea being training a model on data *you* provide - for example a book on your favorite programming language. www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/
@spaceshipdev
@spaceshipdev 3 ай бұрын
Codeium all the way for me, outperforms CoPilot AND it's free.
@3dus
@3dus 3 ай бұрын
Dammm... you missed codellama 70b by one day. Nice video!
@merlinwarage
@merlinwarage 3 ай бұрын
Yeah. Run locally a 70b model as copilot xD
@cureadvocate1
@cureadvocate1 3 ай бұрын
Running the 33b deepseek-coder model locally was slow enough. (The 6.7b model is REALLY good.)@@merlinwarage
@camsand6109
@camsand6109 3 ай бұрын
@@merlinwarage If you have a silicon mac, totally possible.
@TheStickofWar
@TheStickofWar 3 ай бұрын
@@camsand6109 it's possible but just not practical. Do you like waiting?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
I know right lol. Downloading the smallest 2-bit version now.... Well it runs (36 gig M3, ~16 tokens / second), but...huh. The results are total nonsense. I realize this is the 2bit version but still. I guess even tough the model loads it still needs loads of extra RAM, or the low quantization is more punishing than I've seen in other models. The good news: you can try it here: catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/codellama-70b I will say: from a few tests I ran I still prefer Dolphin Mixtral - didn't expect that but it's a *crazy* good model.
@valdisgerasymiak1403
@valdisgerasymiak1403 3 ай бұрын
I am trying to find how I can run local LLMs to replace copilot completion - very useful to just write the small part of code where I begun. Any thoughts?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
You bet - 1. Download LM Studio and download a code-centric model that fits on your hardware (Mixtral Instruct or Code Llama are great places to start). 2. In VS code install the Continue Extension. 3. Click Continues extension tab on the left side of VSCode's interface, then click the little gear icon (at the bottom of the screen). 4. Add this code within the JSON's models block: { "title": "Mixtral", "provider": "lmstudio", "model": "CodeLlama7b" } You're ready to go, though please let me know if you run into any issues!
@Andy_B.
@Andy_B. 3 ай бұрын
Hi, one question: That local LLM code advisor you run, does it provide contextual coding help?! Means, does it index/ overlook a bunch of files, and gives recommendations, or is it only working in one single (c++, python...) file?! Thanks
@ilearncode7365
@ilearncode7365 3 ай бұрын
This is an important question. With copilot, it is aware of stuff that I have written in other files for the same project. Dont know if it is just doing so by storing keystrokes, or if it knows to look at the entire project
@Andy_B.
@Andy_B. 3 ай бұрын
yes, in vscode copilot you just need to open all the files, which you wnt to have indexed by the copilot. closed files won't be indexed@@ilearncode7365
@hmdz150
@hmdz150 2 ай бұрын
Ollama + Codellama
@berndeckenfels
@berndeckenfels 3 ай бұрын
Does it make to share a gpu server remotely used by multiple developers - use same model instance or dedicated?
@merlinwarage
@merlinwarage 3 ай бұрын
You can do that, but calculate the price of the server (for 4-5 developers you will need an 4090 at least, A100 for 10+ devs) + electricity cost vs the copilot's $10-19/month per user plan. Even with the $19 business plan you can use copilot for ~2 years with a 5 person team for the price of a server. ~4 years with the personal subscription.
@berndeckenfels
@berndeckenfels 3 ай бұрын
@@merlinwarage but I have to expose my code to Microsoft and can’t train own codebases. But yes I would expect more like 200 devs on a single A100 so it pays off. Most use it quite sparingly
@morsmagne
@morsmagne 3 ай бұрын
I’d have thought that the gold standard would be GTP-4 128k using Playground Assistances mode with Code Interpreter enabled?
@D9ID9I
@D9ID9I 3 ай бұрын
So if you buy CPU like R7-8700G with 64GB RAM and dedicate integrated GPU to that model processing it will have enough RAM to run complex models. And you can use external GPU for usual display tasks.
@Bryan-zo6ng
@Bryan-zo6ng 3 ай бұрын
Can 3000 series run LLMs?
@addictedyounoob3164
@addictedyounoob3164 3 ай бұрын
I also strongly believe I've noticed that GPT's have been dumbed down due to censoring and other mechanism of which we don't exactly know how they work. if OpenAI doesn't account for this, the opensource community might surpass them with this new type of multiple expert architecture
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Well said.
@MadHolms
@MadHolms 3 ай бұрын
but Copilot does know the contex of your project and applies suggestions based on that. Does the local model and Continue doing this?
@airplot3767
@airplot3767 2 ай бұрын
With Continue, you need to manually choose which files to upload with each request. I guess copilot does this automatically
@CharlesQueiroz25
@CharlesQueiroz25 2 ай бұрын
can I do it in IntelliJ IDE?
@JohnWilliams-gy5yc
@JohnWilliams-gy5yc 3 ай бұрын
Between M3 Ultra vs 4090 24GB, who wins on this LLM arena? How about the AI accelerator Intel Habana Gaudi3?
@D9ID9I
@D9ID9I 3 ай бұрын
Jetson AGX Orin 64GB wins, lol
@elvemoon
@elvemoon 3 ай бұрын
Can i run LM Studio on my desktop and connect to the server with my laptop using this plugin?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
Unfortunately I haven't tried this yet, but I'd imagine just as you can point locally networked computers to say, a web server, this would be no different.
@DS-pk4eh
@DS-pk4eh 3 ай бұрын
If we forget being online/offline, local solution requires an investement in high end hardware. An 4090 is around 1500USD and an M3 Max with 36Gb could cost much more than M3 Pro with less RAM. So, monthly fee for copilot wil cost you 100 for a year, that means it will start to cost as much as RAM/GPU upgrade on MacBook after abour 6-7 years or it will cost you the same only after 15 years!. You would imagine that copilot performance will gradually improve with the time (on the hardware level) so you will have better underlying hardware after 3 or 4 years, while you would be stuck with the same hardware if you purchased your own hardware. However, if you already have this hardware than it is much easier decision. Do not forget, local AI will take some ressources on your computer, so you will have less computer for other things.
@a.yashwanth
@a.yashwanth 3 ай бұрын
There is a chance that copilot subscription cost would increase from the current 10$ to maybe 15 or 20 or more. Each user costs 20$ for Microsoft apparently with the limited features it has now.
@burajirusan4146
@burajirusan4146 3 ай бұрын
A top notch video card is a requirement? High PC ram can run these offline AIs?
@merlinwarage
@merlinwarage 3 ай бұрын
For 7b models you can use any GPU with at least 8-10GB VRAM. For 13b - 30b models 10-16GB VRAM, For 40-70b models you'll need 16-24GB VRAM. The system RAM doesn't really matter in this case.
@D9ID9I
@D9ID9I 3 ай бұрын
@@merlinwarage igpu shares system ram
@2xKTfc
@2xKTfc 3 ай бұрын
@@merlinwarage For the $1500 for a 4090 you can buy a LOT of DDR5 memory and make that available to the GPU - Windows does it automagically. It's not anywhere as fast as GDDR6 (like, not even close) but you can get 64GB spare memory quite easily. I'd be curious about how usable (or unusable) that would be.
@a.yashwanth
@a.yashwanth 3 ай бұрын
How much cpu/gpu usage does each code completion command take and how long?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
A great question: On PC we can separate and scale between the two, on Mac it's all or nothing. At least on the Mac then, a response from Open Oraca that generated at 28 tokens / sec pinned the GPU for the duration of that specific 3.2 second response time. I'd imagine that;s the case for all responses.
@-E42-
@-E42- 3 ай бұрын
since I don't want all of my code to be transparent to Microsoft, Github Copilot is out of the question for me and a local LLM sounds like a great idea - but how do you feel about the user agreement of VSCode - it seems Microsoft gets to see and evaluate all of your code anyway. Data politics to me are a largely underconsidered aspect of AI tech.
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
A great point, and one I haven't considered deeply yet. At first blush then so long as the code's part of a virtuous feedback loop I'm ok with it. As well, Microsoft has a vested interest in the quality of AI beyond just siphoning data, so again, for now....I'm ok with it.
@hope42
@hope42 3 ай бұрын
I have a t1660 6gb vram, i7 10th been, 32gb ram. Is the anything I can run?
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
VRAM is the key and yes, that should be more than enough to run Llama 7b.
@freddieventura4382
@freddieventura4382 3 ай бұрын
just quickly copy paste from Chat gpt?
@bowenchen4908
@bowenchen4908 Ай бұрын
what is your machine's spec? I have a M2pro but I can't even run 3bit quantization :/
@jonathanozik5442
@jonathanozik5442 3 ай бұрын
I'm so Jel of You having enough hw to run mixtral locally
@KryptLynx
@KryptLynx 2 ай бұрын
I will argue, it is faster to write the code than to write code description for AI
@rasalas91
@rasalas91 3 ай бұрын
5:32 - damn.
@vmx200
@vmx200 2 ай бұрын
Would a 3090 and 64gb or ram be good enough?
@kotekutalia
@kotekutalia 2 ай бұрын
Why don't you just install and try?
@vmx200
@vmx200 2 ай бұрын
@@kotekutalia I will when I have time I was just curious if someone out there knew
@marcopfeiffer3032
@marcopfeiffer3032 3 ай бұрын
It’s not exactly free if you need a potent gpu with lots of ram. My m1 with 16GB is already struggling with my Docker containers. I’d have to calculate how long I can use copilot until it is more expensive than a Mac ram upgrade.
@LeandroAndrus-fn4pt
@LeandroAndrus-fn4pt 3 ай бұрын
Can it do TDD like copilot?
@dwhall256
@dwhall256 3 ай бұрын
It is up to you, the driver, to make that happen.
@aidencoder
@aidencoder 3 ай бұрын
Hmm, while not spellcheck ... people _do_ pay a cloud service for help with AI help with grammar and structure. It's closer to that I think.
@nasarjafri4299
@nasarjafri4299 3 ай бұрын
Yeah but doesn't a local LLM needs atleast 64GB of ram? How am I suppose to get that as a college student. P.S correct me if Im wrong
@shoobidyboop8634
@shoobidyboop8634 3 ай бұрын
Is mac ram field-upgradeable, or does apple force people to buy their overpriced ram?
@michaelcoppola1675
@michaelcoppola1675 3 ай бұрын
extra note: local works on airplane mode - useful for travel
@gh0stcloud499
@gh0stcloud499 3 ай бұрын
pretty interesting but I can't justify giving up that many resources just for code completions. Especially on a Mac where you will already need to dedicate a significant chunk if you are using Docker or some other virtualisation software. I guess on a windows/linux machine with a dedicated GPU this won't matter as much, unless you are a game developer.
@fandorm
@fandorm 3 ай бұрын
Well, it's free as long as you first pony up $2,379.99 for the GPU. It will take 230 months (almost 20 years) of copilot use but then it will be essentially free!
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
On the Windows side LM Studio let's you run on CPU, "offload" to the GPU, or a combination of both. On CPU only, Llama 7b on a 2 year old Intel 10850H laptop pulled 5 tokens / second (tokens roughly equate to words). Humans generally read between 5 and 10 words per second, so even this "worse-case" is nearly real-time. And that's just now - Intel and AMD are already starting to add dedicated AI hardware to their new chips. Faster helps bus is not needed, and basically any future hardware is going to be far more performant. Copilot is must faster now but, that won't always be the case.
@ilearncode7365
@ilearncode7365 3 ай бұрын
imagine paying microsoft to let you help them train their AI that they intend on replacing you with.
@tbird81
@tbird81 3 ай бұрын
Yeah, but you'll be able to run GTA6 when in comes out on PC.
@fandorm
@fandorm 3 ай бұрын
@@tbird81 🤣
@gwrydd
@gwrydd 3 ай бұрын
Does it just support web development or is it like general purpose?
2 ай бұрын
why tho
@angrygreek1985
@angrygreek1985 2 ай бұрын
"would you pay to use spellcheck?" Well, when the minimum hardware specification to run an LLM locally (well) is a $1600 USD GPU, then yeah, I would. For a hobbyist it would take years to make up the cost in paying for the cloud service, and by that time the GPU will be out of date.
@steveoc64
@steveoc64 3 ай бұрын
Great video, thanks. I still find AI to be completely useless for any Dev work outside of webdev It has no idea whats going on when I code in zig for example :(
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
I've been dealing with insurance companies lately. When faced with composing yet another email to yet another repair shop, I finally gave in and had chat to write one for me. Not only was the result more concise and clearly stated than what I was writing by hand, it was far faster. It's little things like that are slowly winning me over and more importantly, freeing up time to do more meaningful work.
@steveoc64
@steveoc64 3 ай бұрын
@@matthewgrdinic1238 my commiserations that you need to deal with insurance companies at all … some of my first career projects were with insurance companies, so I feel your pain :) Yep agree, I am a big adopter of AI in my workflow too, and it’s been invaluable for web work, particularly dealing with css ! That’s why I’m really keen on setting up a local model, so I can hopefully train it over time to be great at zig development as well. The challenge here is that we are building a new language and std lib which evolves daily, so prior art is neither available or helpful. Exciting times
@zizzyballuba4373
@zizzyballuba4373 3 ай бұрын
There's a perfect LLM in my brain, i'll use that
@tbird81
@tbird81 3 ай бұрын
Hate to break it to you, it's not perfect.
@zizzyballuba4373
@zizzyballuba4373 3 ай бұрын
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO ARGHHHHHHHHHHHHHHHHHHHHH@@tbird81
@kishirisu1268
@kishirisu1268 3 ай бұрын
Do you ever realize how much VRam do you need to run smallest language model? I can say - 32Gb! Go buy consumer GPU with such memory volume..
@2xKTfc
@2xKTfc 3 ай бұрын
The smallest language model takes like 1MB of space, so you're blatantly wrong. Note that you said "smallest" and not "best" or even "usable".
@MrBrax
@MrBrax 3 ай бұрын
neat concept, but copilot for like 30 years is still cheaper than buying a 4090 haha
@YA-yr8tq
@YA-yr8tq 2 ай бұрын
As of now ans afaik, no tool tops aider-chat
@dackerman123
@dackerman123 2 ай бұрын
But local is not free.
@YuriMomoiro
@YuriMomoiro 3 ай бұрын
I also recommend the AMD card as they come with huge VRAM for much cheaper. Of course they are slower but if you care more about quality and speed, worth the consideration.
@camsand6109
@camsand6109 3 ай бұрын
"Would you pay a cloud based service for spell check" grammarly not gonna like this take lol
@jeikovsegovia
@jeikovsegovia 3 ай бұрын
“sorry but i can only assist with programming related questions" 🤦
@sergeyagronov9650
@sergeyagronov9650 3 ай бұрын
could not reproduce the effects and i copied everything you wrote or said, can you please paste text here
@matthewgrdinic1238
@matthewgrdinic1238 3 ай бұрын
You bet! Create a web page that contains a blue circle. The ball should be centered on the screen horizontally, and placed at the top of the page. Update this page so that the ball drops to the bottom of the page using a realistic gravity equation written in JavaScript. When the ball reaches the bottom of the screen, it stops. Update the logic so that when the ball reaches the bottom of the page it bounces up realistically like a rubber ball. On each bounce it looses some momentum until it stops completely. Do keep in mind the only model I've found to be truly great at this specific task is Dolphin Mixtral Q3. That is, it basically one-shots everything on the first pass. It also helps to, in LM Studio, to check the box in advanced settings that keeps the entire model in memory Finally and to be clear, the more of a conversation you have with the AI the better. That is, it's totally ok to have it clarify its reasoning, tell it something doesn't work and ask it to fix it - it's very much a case of well, chatting with it.
@Hobbitstomper
@Hobbitstomper 3 ай бұрын
So, you are telling me if I don't want to spend $20/month on copilot, I should instead buy a $2000 graphics card or a $4000 MacBook.
@bcpr
@bcpr 3 ай бұрын
No…if you already have the tools, it’s a good alternative. If you don’t already have them, then pay for Copilot lol
@Hobbitstomper
@Hobbitstomper 3 ай бұрын
@@bcpr Yeah that kind of a title and description would make more sense. The way he described it in the video: "superior to paid services [...] results are absolutely worth it [...] local LLM is free (as opposed to paying $20/month)", makes it sound like he's saying that people should consider a $2000/$4000 upgrade over a $20/month plan.
@ScreentubeR
@ScreentubeR 2 ай бұрын
I have 4090 which I bought for VR gaming. Now I know I can use it for local LLMs with good enough output for coding or other tasks, even better. Would I go and buy 4090 to be able to use local LLMs only, hell no.
@thethiny
@thethiny 3 ай бұрын
"As you see with the exception of speed" the reason people use coding assistance is speed. Until then, I guarantee you lots of people just watch these videos for the wow factor.
@CorrosiveCitrus
@CorrosiveCitrus 2 ай бұрын
But it's still fast, just not as fast obviously
@thethiny
@thethiny 2 ай бұрын
@@CorrosiveCitrus 20 seconds isn't fast 🤔
@CorrosiveCitrus
@CorrosiveCitrus 2 ай бұрын
@@thethiny something a local LLM can do in 20 secs is most definitely much faster than manual
@thethiny
@thethiny 2 ай бұрын
@@CorrosiveCitrus faster than manual indeed but I'm not gonna wait 20 seconds looking at a blinking cursor while my PC is running out of RAM 😂
@CorrosiveCitrus
@CorrosiveCitrus 2 ай бұрын
@@thethiny Well, each to their own lol Eventually this stuff will be better, more accessible - never going to compete with cloud for speed - but there's a lot of other benefits so hopefully it doens't take too long to improve
it begins… developers LEAVING Copilot
6:55
Alex Ziskind
Рет қаралды 47 М.
GitHub Copilot in VSCode: Explaining the Basics
6:48
Max on Tech
Рет қаралды 15 М.
Stupid man 👨😂
00:20
Nadir Show
Рет қаралды 26 МЛН
Buy Feastables, Win Unlimited Money
00:51
MrBeast 2
Рет қаралды 85 МЛН
маленький брат прыгает в бассейн
00:15
GL Show Russian
Рет қаралды 3,8 МЛН
Filament Plugin: Approvals for Workflow Management
3:27
Filament Daily
Рет қаралды 41
OpenAI SHOCKED Everyone! Voice, Vision, & Free?!
8:58
Theoretically Media
Рет қаралды 38 М.
Im FRUSTRATED With M3 Macbook Pros
6:26
Boogidaball
Рет қаралды 12 М.
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 88 М.
RTX 4090 Performance in 3D and AI Workloads
16:54
Matthew Grdinic
Рет қаралды 2,3 М.
5 Signs of an Inexperienced Self-Taught Developer (and how to fix)
8:40
Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM
11:13
ChatGPT’s Amazing New Model Feels Human (and it's Free)
25:02
Matt Wolfe
Рет қаралды 151 М.
5 мая 2024 г.
0:24
Dragon Нургелды 🐉
Рет қаралды 2,4 МЛН
Smart Sigma Baby #funny #comedyvideo #viral
0:13
CRAZY GREAPA
Рет қаралды 11 МЛН
Cool Items! 2024🥰 New Gadgets @reversevideos_  #youtubeshorts #shorts
0:18
😳 Life Hack 🫗🔪 #littos
0:19
Littos Media
Рет қаралды 11 МЛН