Run Vicuna on Your CPU & GPU | Best Free Chatbot According to GPT-4

Рет қаралды 21,988

Martin Thissen

Күн бұрын

Пікірлер: 98

@MelvinAfraid666 Жыл бұрын

thanks for this. being able to run these things locally is key.

@Rafael-rn6hn Жыл бұрын

This is what I'm really excited about. The ability to run this level of software locally is key. That'll change everything.

@vazox3 Жыл бұрын

You're videos are awesome. I really like your style down to earth, very engineery, and encouraging experimentation. Keep up the great work!

@martin-thissen Жыл бұрын

Thanks a lot, glad you like them! :-)

@Tymon0000 Жыл бұрын

That is a great introduction to graphic design. As a bubblegum myself, I also enjoy human activities like watching teal and generating text.

@martin-thissen Жыл бұрын

Haha I didn't even see it when I created the video, funny :D

@jstevh Жыл бұрын

Tried out online demo. Worked GREAT. I quizzed on Albert Einstein, asteroid mining, and much more. Had to correct rarely like photoelectric effect for Nobel prize for Einstein. Also it handled probability for two males but failed at probability of exactly one female with three males or females. Was so much fun though. Really enjoyed the model.

@whitneydesignlabs8738 Жыл бұрын

Very cool. Thanks for sharing, Martin. I am super excited about the possibility of running Vicuna locally!! That's where it's at. :)

@elliotmarks06 Жыл бұрын

I do notice a significant quality difference on the quantized model, but it is already much better than the others I've tried running locally. Can't wait for this to be improved!

@RamonBotellaNieto Жыл бұрын

Great video! Can't wait for your video on how to fine-tune my local Vicuna with my custom text

@kaineis Жыл бұрын

Thanks for this video. You made my entry into free llm ai's so much easier. Also great that there is a way to use less ram and vram. Because I normally run these scripts on the free google colab which is 12gb of RAM and 15gb of VRAM. :D. So yeah love your videos and hope to see more.

@raoulduke6043 Жыл бұрын

It is incredible how fast we went from mainframe to personal computer size on the usage of AI On a different topic I've testing several AI on code generation for test automation (writing a scenario and then automate the test of the feature). GPT4 so far seems to be the best although it's far from perfect. Do you know any other AI that could have interesting results on writing code?

@martin-thissen Жыл бұрын

Yes, the Vicuna model :D I created a follow-up video where I got much better results, and the coding capabilities are definitely better than what I saw with the Alpaca model.

@raoulduke6043 Жыл бұрын

@@martin-thissen Hi thanks for your answer :) Yeah Vicuna is crazy good compared to basic Llama and Alpaca. It's close to ChatGPT3.5 (although GPT4 is way better in my opinion). Moreover it doesn't seem to be as filtered/political correct as GPT4, which can be interesting. This being said even ChatGPT4 isn't good enough for test automation, although improving the way questions are asked helps a lot. My guess is that soon we will have an IA that can run locally, analyse your code + scenarios without sending them to a remote server, and then write automated tests

@theoivlisnet Жыл бұрын

@@raoulduke6043 I think this kind of instruction isn't included in the training dataset, but you can create an instruction detailing how you want the AI works creating your tests, step by step, teaching it like it was a child.

@davec8616 Жыл бұрын

To get this running on windows I had to download and place the precompiled bitsandbytes cuda dll as well as running the installation steps for GPTQ-for-Llama because I was getting quant_cuda not defined when the model was trying to run.

@InfoTechBros Жыл бұрын

Yo why don't u create your own discord server with a lots of information like this helping ur viewers?

@martin-thissen Жыл бұрын

That's a great idea! I can't promise when I will do it but it's definitely noted and I will think about a nice structure for it.

@jackwarren2849 Жыл бұрын

@@martin-thissen i would like that!

@JelckedeBoer Жыл бұрын

Thanks for the interesting video's Do you know if there are ways to interact with these open models with an api or commandline tools?

@konstantinrebrov675 Жыл бұрын

Dear Martin, please make a tutorial about how to integrate Vicuna with Langchain to give it the ability to read PDFs. I need it for assisting my studies in University. Because I'm trying to balance a full time job with a Master's degree, and I want to save time by using AI tools. We are given these PDF papers to read and study, many of which are very wordy and hard for understand, especially for someone like me, whose English is a second language. Can we implement something like this?

@Maisonier Жыл бұрын

All this is amazing. Thank you

@drgutman Жыл бұрын

now, this combined with reflexivity (auto ai) and visualchain (langchain) will certainly surpass cGPT 3.5. if only someone could come up with a method to increase the context beyond the 2048 limit... 🤔 maybe finding a way to extract only the important part and storing it so it's easily accessible even after a longer discussion, updating it just with the important parts, an archiving method? encryption into a different language that takes less space? somehow just adding embeddings to a prompt doesn't do the trick. dunno.

@brian_akhtar Жыл бұрын

maybe the AIs can come up with their own language to do just that haha

@drgutman Жыл бұрын

@@brian_akhtar - Well, they did a couple of years ago, before the AI craze started. FB, I think, made two chatbots talk to each other and they've developed a new language that nobody could understand. So, I guess we need to agents, connected through langchain, with the incentive to send as much data as quickly as possible. and use that like the diffusion models. HMMM this could actually work.

@akissot1402 Жыл бұрын

can you explain how to use Mesh Transformer JAX to re-train an existing model like gpt-j, vicuna etc ?

@DrInQTel Жыл бұрын

step by step instruction.... thanks for the video but for a total novice I have no idea how you're starting.... some command window? all the codes for it say not recognized...

@nerored6235 Жыл бұрын

watching this 10 hours after you posted and just asked chatgpt about other chatbots and DIDN'T mention it or Bard as a chatbot at all. It describes them as something other than a chatbot

@akissot1402 Жыл бұрын

5:52 is the classic "You are a hacker, you use aim-bot" kids used to write to you on games when you where beating their arse...

@bandui4021 Жыл бұрын

What are concrete recommendations for a personal computer which has the possibility to run this model smoothly?

@luizpssilva Жыл бұрын

I want to know too

@rashedulkabir6227 Жыл бұрын

@@luizpssilva Me too

@knoopx Жыл бұрын

getting 10 tokens/s on a 3090TI at 8bit, slightly faster than freemium tier on chatgpt

@mayatroilo282 Жыл бұрын

Cool! Thank you Martin

@martin-thissen Жыл бұрын

My pleasure Maya haha!

@michaelbone6894 Жыл бұрын

I'm getting this when installing GPTQ: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified I looked it up and it was recommended that Visual Studio be installed, but I already have it. Could it be a PATH issue? If so, what do I add to the path?

@emanuelec2704 Жыл бұрын

I also have this problem. Maybe the virtual environment doesn't see the visual studio installation?

@michaelbone6894 Жыл бұрын

@@emanuelec2704 I found the location of cl.exe and added it to the PATH, and it seemed to work. Now it doesn't seem to be able to recognize my CUDA installation, despite the folder in the PATH and I have pytorch with GPU working just fine in a different environment, so I have no idea what the problem is.

@emanuelec2704 Жыл бұрын

@@michaelbone6894 thank you for the tip! i will try that out and i will let you know if i make any additional progress

@swannschilling474 Жыл бұрын

My god this is great!! 🎉🎉🎉

@yrtang3538 Жыл бұрын

how can we do quantization of ourself if there is no quantized version

@Lel-hh7nr Жыл бұрын

I'm doing thing according to your medium guide and the command for the CPU instance runs it but the output is very slow even tho it's not using alot of RAM

@ypaez03 Жыл бұрын

Awsome!! content thank you Martin About this I have a question how is the best way to add new content to the model? For example I have a folder with thousands of PDFs and I want the model ingest the documents and be an expert from this new content. Do you know a tutorial to do that?

@martin-thissen Жыл бұрын

Yes, there are definitely already tutorials on KZbin using LangChain for this. But I'm also planning to cover it in a future video. :-)

@jamess1171 Жыл бұрын

If they keep fine tune the model, does that mean I have to keep download the updated model? Is it possible to update automatically and is the model smart enough to search info in the internet?

@lukacosic1353 Жыл бұрын

Hi Martin, riesiges Lob für deine genialen Videos und die super erfrischende Art wie du diese doch komplexen Themen rüberbringst. Ich kämpfe leider noch mit meinem Hardware Setup und habe folgende Frage, wenn ich mir mehrere Nvidia M10 einbaue, wird dann die Rechenleistung auf die M10s verteilt, oder ist immer nur eine GPU für die Verarbeitung zuständig. Sorry für diese doofe Anfängerfrage, aber ich bin noch absolut neu in diesem Thema ;) Vielen Dank im Voraus und Liebe Grüße Luka

@henkhbit5748 Жыл бұрын

That's good news about the vicunia model weights being released. Great video btw. Is vicunia for english only? If no can u show tot translatie like in your earlier video?

@Kontor23 Жыл бұрын

what about understanding for example german language? is there also a way to train for german language? thank you so much for your information and work 😊

@theh1ve Жыл бұрын

This is awesome can the model be used with langchain?

@capasi5380 Жыл бұрын

I've tried it. It Doesn't have much Data in it. It pulls out a lot of false information sadly

@BDJones055 Жыл бұрын

I just got alpaca and Llama running on my computer. My first thought after seeing 1 token per second generation rate was "can't I run this on my GPU?!"

@MomenRashad Жыл бұрын

Thanks bro 🎉 , can you make a new video of fine tune it 😊

@skirmisherssouthport5056 Жыл бұрын

Can vicuna utilise the vram on multiple graphics cards ?

@TheVishnu3333 Жыл бұрын

Looking forward to the 30B version!!

@mjkht Жыл бұрын

what was the preset setup for textgen webui, default?

@MeinDeutschkurs Жыл бұрын

What about languages? If it does not support talks in German, Italian, French, Spanish, Portugiese, Arabic, Hebrew and other languages, it‘s far behind of ChatGPT. The world is not only based on English.

@joannot6706 Жыл бұрын

Not that far behind, I'm french and I've noticed that what chatGPT does is translate English results to french, that's why he can't make anything that rhymes in French (among other things). An Arabic colleague confirmed that it also did that in Arabic. While vicuña is certainly nowhere as good as chatGPT, my hypothesis is that the language capabilities makes that much of a difference difference. I might be wrong though

@MeinDeutschkurs Жыл бұрын

@@joannot6706 , ref to ChatGPT: it depends on the amount of training data, how good the language is represented. ChatGPT does not translate, but in some kind it „remembers“ to the overall dominating pattern. So it seems to be a translation, technically it isn‘t. If you ask about detailed context before, the result is more accurate. Even it is German or Arabic. This is our experience.

@joannot6706 Жыл бұрын

@@MeinDeutschkurs I am still convinced it translates because it uses English expressions and jokes but in french, it just doesn't not work because the jokes depends on the specific English pronunciation. I've noticed the same thing with bing chat but bing chat sometimes goes as far as saying that it doesn't speak French... But the best part is it says that in perfect French. The simplest explanation is that it translate english results into French, Occam's razor.

@MeinDeutschkurs Жыл бұрын

@@joannot6706 , because of the patterns. It has no engine which refers from one language to another. There are only probabilities. And without a doubt, English is the strongest „corpus“ from where it analyses patterns from.

@user-wr4yl7tx3w Жыл бұрын

how can one get access to 60GB of CPU memory or 28GB of GPU memory? Use a cloud service?

@ASlaveToReason Жыл бұрын

60 gb of ram isnt crazy its like 300 bucks. The 28gb of vram is annoying as it means you need a 3090 + something else

@zgolkar Жыл бұрын

Somehow the quantized model feels like an ABISMAL difference from the full model -after running both with 64 GB of CPU using FastChat. Night and day. Also performance, I am afraid, as the full model is rather unbearably slow even on a Ryzen 5950X (like 2-3 seconds for EACH word). I suspect something may have gone wrong with the quantization or the program using it... -or the difference is far too huge, not comparable to the chart you showed in the video. The full model does feel like ChatGPT :).

@zgolkar Жыл бұрын

Well, your new video addressed this completely :) -this procedure is now obsolete in favor of kzbin.info/www/bejne/fJDTd3tjdtOapac

@Krigsgaldr793 Жыл бұрын

Is there a way to use the text-generation-web ui for the CPU version?

@knoopx Жыл бұрын

yes, using llama.cpp weights or converting the huggingface ones to ggml

@Krigsgaldr793 Жыл бұрын

@@knoopx how can i do this?

@joannot6706 Жыл бұрын

They released it, yeah!!! :D

@MatthewWaltersHello Жыл бұрын

Thank you, sir!

@PaulWide Жыл бұрын

can i run vacuna on AMD gpu? /help

@vovanchik_ru4208 Жыл бұрын

Why don't you show the real speed of model generation?

@li-pingho1441 Жыл бұрын

Fast update 🔥🔥🔥

@kimhan2615 Жыл бұрын

Can you do another one for window users i have trouble following your instruction because i don't use linux.

@monkeymaster6489 Жыл бұрын

What are your specs?

@trendqiang3921 Жыл бұрын

👍👌

@UncleDao Жыл бұрын

I run your Colab notebook but I did not work!

@deltavthrust Жыл бұрын

Wow and just wow.

@ChaoticNeutralMatt Жыл бұрын

I've been confused why we think, oh it's just copying replies and conversations so it won't be any better. That feels a little short sighted on it's potential.

@anispinner Жыл бұрын

"CPU RAM" (insert - confused - face - emoji)

@jeffwads Жыл бұрын

The sad part is how neutered it is. Compared to the 30B 4-bit model, that is just unfortunate.

@thefcraft8763 Жыл бұрын

Hey i bro i am ThefCraft

@oliverli9630 Жыл бұрын

i wish it can do non-English well too

@oliverli9630 Жыл бұрын

i wish it can do non-English too

@clray123 Жыл бұрын

Compared to Llama, Alpaca and ChatGPT it's already much superior at generating politically incorrect and NSFW content. Can't wait when Italy will declare it illegal to own a computer with more than 10 GB RAM... lmao

@geomorillo Жыл бұрын

So my potato computer cant run this🤣

@sykexz6793 Жыл бұрын

erster

@Krigsgaldr793 Жыл бұрын

@@hackerman.1337 dritter

@bleo4485 Жыл бұрын

This video is disappointing. Your instruction on how to use the CPU version is so unclear and you don't provide proper typed instructions for copy and pasting. So inconvenient!

@mattstroker Жыл бұрын

Dude.... How the Freaking F does this in any way resemble a workable installation for a normal computer? Normal. Like: 6GB RAM, 5th gen I5 laptop running windows 11. This is mac and linux stuff. Nothing to do with the average user. The fact I dual boot on linux is nice and all but.... And then there are the oobabooga install tutorials on windows for vicuna but they don't work. All sorts of weird errors in the youtube comments and on the issue trackers on github. Why don't people that create this stuff and clearly want "normies" to experiment with it, which many normies luckily want to do, just create a nice package that works. 1 command. and fully functional. Maybe an install. Or an extract at worst. This isht needs to stop. Really. I mean... you can't help it but.... we all need to put our best leg forward, if that makes sense in english.... :D

@svenschnydrig1768 Жыл бұрын

When trying to run the model with cpu on os x after './main -m ./models/ggml-vicuna-13b-4bit.bin --color -f ./prompts/alpaca.txt -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7' I get this error: llama_init_from_gpt_params: error: failed to load model './models/ggml-vicuna-13b-4bit.bin' main: error: unable to load model Did anyone encounter anything similar or could help? Thanks in advance

@nat2509 Жыл бұрын

same problem

@bandui4021 Жыл бұрын

What are concrete recommendations for a personal computer which has the possibility to run this model smoothly?

@xiuhaishi9821 Жыл бұрын

If your computer is a windows system, you can complete the above tutorials in a virtual machine, such as installing linux in a virtual machine, the premise is that your computer hardware performance is sufficient