The EASIEST way to RUN Llama2 like LLMs on CPU!!!

Рет қаралды 12,603

10 ай бұрын

Python bindings for the Transformer models implemented in C/C++ using GGML library.
Models
GPT-2
GPT-J, GPT4All-J
GPT-NeoX, StableLM
Falcon
LLaMA, LLaMA 2
MPT
StarCoder, StarChat
Dolly V2
Replit
Google Colab used in the code - colab.research.google.com/dri...
github.com/marella/ctransformers
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1littlecoder

Пікірлер: 52

@user-ps3ip1oc1c 10 ай бұрын

Top Tier content , first thing i check on youtube tbh

@1littlecoder 10 ай бұрын

Seriously? Thanks a lot!

@moondevonyt 10 ай бұрын

mad props to the creator for diving deep into c Transformers and the capability to run dgml models on local setups, especially on CPU but tbh, not everyone might be convinced that this is the best method, especially when other tools provide GPU support out of the box still, the walkthrough was thorough and helps demystify a lot good on ya for sharing this knowledge

@eprd313 8 ай бұрын

what tools for GPU support would you suggest?

@HarshVerma-xs6ux 10 ай бұрын

Hello, excellent video! I was wondering if there are any benefits to using it instead of the llama-cpp-python library?

@bukitsorrento 10 ай бұрын

Thanks for the video, just wondering if you could do a video autogpt using ctransformers, running locally with GPU.

@41Labs 3 ай бұрын

Can ctransformer be used in production level application or it's just to play with the model? P.S. I think it's integrated with Langchain, so, it should be good to use in production.

@darshan7673 9 ай бұрын

Please make a end to end video where you use pretrained model and fine tune it for a specific use case and also deploy it on streamlit or any other UI. That too with using CPU only and no Open AI API key so totally free, which can help students to practice and showcase it in their projects.

@Unknown-du4vl 6 ай бұрын

Hi there, Been following you since a long time. I tried this same code with another llm of orca with 3B parameters and I'm getting unexpected errors. I wonder if you could help me out. I'm in need of help as my project is not being completed due to this error

@MohitSK 10 ай бұрын

Great! How can I generate large paragraph when It Stop generating in the middle?

@tusharnisal1935 8 ай бұрын

Hi, Great video. Thank you. I tried the model on Google Collab with 12 GB RAM. The model gave me the output but it was too slow. It also did not use all the available RAM or max RAM. It just used 1.3 - 1.4 GB of RAM. How can I speed up the output? If you could please help me with this it would be great. I tried increasing the batch size but still had no success.

@luigitech3169 10 ай бұрын

Is there an advantage to use it over llamacpp ?

@kwabenaanim7446 10 ай бұрын

Keep up the good work. Also, Is there a way to run ggml on huggingface spaces?

@1littlecoder 10 ай бұрын

The same thing should work. Let me try to put a sample spaces app there.

@walidflux 10 ай бұрын

is it possible to to train Qlora to speak other languages other than English and Franch?

@rashedulkabir6227 10 ай бұрын

Can you make a video about how to run Batgpt?

@DaRedT 10 ай бұрын

thank you 👍👋👋

@1littlecoder 10 ай бұрын

You're welcome:)

@winxalex1 10 ай бұрын

Great Video. Keep the passe. Please make video with GPU thx

@tabarnacus5629 10 ай бұрын

What if I wanna run it on my 4090? That makes a lot more sense to me. Btw I did manage it but it wasn't easy to figure out. Also it's not using 100% of the GPU because 'NVIDIA'. I'm not a youtuber so I won't make a video about this but for pointers forget windows, you need Linux or WSL2 and work with Cuda 11.8 that should get you in the right direction at least. Even by using a fraction of the 4090 you'll get around 10 times the performance of CPU. Don't even bother trying 8bits models the drivers don't support it 'yet' (it will work but will be incredibly slow like 10 words per minute) the newer drivers do support 8bits but they don't work with Torch as of writing this.

@user-bp2su2wd6j 10 ай бұрын

Can you make a video for running DollyV2 and Falcon with ctransformers as well?

@1littlecoder 10 ай бұрын

Sure Thanks. Will try to make it!

@csowm5je 10 ай бұрын

The response was very fast on my local CPU. However, When asked 'merge sort code for JavaScript", it gave a response omitting the last few lines of code. Is there a maximum length of response?

@csowm5je 10 ай бұрын

Found the answer in the video: max_new_tokens= 512, the response started with Chinese 种于匹配算法 though...

@mfinixone1417 9 ай бұрын

I really want to see phi-1.5 model from microsoft to be supported

@Ryan-yj4sd 10 ай бұрын

How much slower than GPU?

@gaurav241182 10 ай бұрын

I need some help. How to contact you?

@technolysys1998 10 ай бұрын

hey plz can u make a video on how to use this code to fine tune our llama 2 model and lets i want any other llm other than available in hugging face then what to do

@1littlecoder 10 ай бұрын

Please check this - kzbin.info/www/bejne/m5awZ4lrlrWdns0

@DarkWarren744 10 ай бұрын

Can please explain the difference between llamacpp and c-transformer. Honestly, I think both do the same thing.

@IronMechanic7110 10 ай бұрын

You can't running llama2 based models with llamacpp instead ctransformers.

@DarkWarren744 10 ай бұрын

@@IronMechanic7110 nope you can run, I use it to run llama2 model

@IronMechanic7110 10 ай бұрын

@@DarkWarren744 Not with langchain.

@IronMechanic7110 10 ай бұрын

@@DarkWarren744 I was at 100% on llama.cpp but now i use ctransformers at 100% for my python devellopment

@santoshshetty6 3 ай бұрын

I used lama.cpp to run the models and it works well for me. Would like to know the difference too.

@divyamaskar6950 10 ай бұрын

Could you please make a video how we can give schema to llama2 model for text to sql?

@zyxwvutsrqponmlkh 10 ай бұрын

Am confused. You asking for it to create your sql quarries or to automate the querying and stuff the results in sql, or for a sql query result to trigger a prompt, or for sql query result to trigger a prompt that get's saved in a sql table? Or for the llm to learn how to speak with squirls, so you can train them to collect nuts for you? Or for LLaMA2 to help you catch a Squirtle?

@1littlecoder 10 ай бұрын

Based on a schema, do you want llama2 model to generate SQL ?

@divyamaskar6950 10 ай бұрын

@@1littlecoder that's right!

@abishek7583 10 ай бұрын

Can we prompt this model ?

@1littlecoder 10 ай бұрын

Yeah, why?

@tikendraw 10 ай бұрын

Yes, that's why ABD is not floating dead in space.

@1littlecoder 10 ай бұрын

Did you mean to say AMD Processor?

@tikendraw 10 ай бұрын

@@1littlecoder I was commenting on other video, then clicked on this video notification so comment got posted here. But works here too. AMD* 🤣

@iigdgocydiyoch2929 10 ай бұрын

I run 30b ggml model on mu cpu imao

@1littlecoder 10 ай бұрын

Woah impressive. What do you use ?

@zyxwvutsrqponmlkh 10 ай бұрын

My name is Giovanni Giorgio, but everybody calls me Giorgio.

@1littlecoder 10 ай бұрын

I'm sorry, I didn't understand

@zyxwvutsrqponmlkh 10 ай бұрын

@@1littlecoder put it in the youtube search box, gotta get with the hot new memes man. I watched the video at 3x speed. Blazing. Also, about the video, personally meh, I went and gots me a 3090 so not super interested in 4 bit quantization on a cpu myself. Maybe a quantized 40b 8 bit or 70b 4 bit quantized cuda to fit in a 24gb card could be something I would poke around at. TBH my biggest interest RN is ways to normalize and deduplicate dirty corpuses I'm collecting and training off it... on a 3090 because umm yeah, I'm selfish.