The EASIEST way to RUN Llama2 like LLMs on CPU!!!

  Рет қаралды 12,603

1littlecoder

1littlecoder

10 ай бұрын

Python bindings for the Transformer models implemented in C/C++ using GGML library.
Models
GPT-2
GPT-J, GPT4All-J
GPT-NeoX, StableLM
Falcon
LLaMA, LLaMA 2
MPT
StarCoder, StarChat
Dolly V2
Replit
Google Colab used in the code - colab.research.google.com/dri...
github.com/marella/ctransformers
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1littlecoder

Пікірлер: 52
@user-ps3ip1oc1c
@user-ps3ip1oc1c 10 ай бұрын
Top Tier content , first thing i check on youtube tbh
@1littlecoder
@1littlecoder 10 ай бұрын
Seriously? Thanks a lot!
@moondevonyt
@moondevonyt 10 ай бұрын
mad props to the creator for diving deep into c Transformers and the capability to run dgml models on local setups, especially on CPU but tbh, not everyone might be convinced that this is the best method, especially when other tools provide GPU support out of the box still, the walkthrough was thorough and helps demystify a lot good on ya for sharing this knowledge
@eprd313
@eprd313 8 ай бұрын
what tools for GPU support would you suggest?
@HarshVerma-xs6ux
@HarshVerma-xs6ux 10 ай бұрын
Hello, excellent video! I was wondering if there are any benefits to using it instead of the llama-cpp-python library?
@bukitsorrento
@bukitsorrento 10 ай бұрын
Thanks for the video, just wondering if you could do a video autogpt using ctransformers, running locally with GPU.
@41Labs
@41Labs 3 ай бұрын
Can ctransformer be used in production level application or it's just to play with the model? P.S. I think it's integrated with Langchain, so, it should be good to use in production.
@darshan7673
@darshan7673 9 ай бұрын
Please make a end to end video where you use pretrained model and fine tune it for a specific use case and also deploy it on streamlit or any other UI. That too with using CPU only and no Open AI API key so totally free, which can help students to practice and showcase it in their projects.
@Unknown-du4vl
@Unknown-du4vl 6 ай бұрын
Hi there, Been following you since a long time. I tried this same code with another llm of orca with 3B parameters and I'm getting unexpected errors. I wonder if you could help me out. I'm in need of help as my project is not being completed due to this error
@MohitSK
@MohitSK 10 ай бұрын
Great! How can I generate large paragraph when It Stop generating in the middle?
@tusharnisal1935
@tusharnisal1935 8 ай бұрын
Hi, Great video. Thank you. I tried the model on Google Collab with 12 GB RAM. The model gave me the output but it was too slow. It also did not use all the available RAM or max RAM. It just used 1.3 - 1.4 GB of RAM. How can I speed up the output? If you could please help me with this it would be great. I tried increasing the batch size but still had no success.
@luigitech3169
@luigitech3169 10 ай бұрын
Is there an advantage to use it over llamacpp ?
@kwabenaanim7446
@kwabenaanim7446 10 ай бұрын
Keep up the good work. Also, Is there a way to run ggml on huggingface spaces?
@1littlecoder
@1littlecoder 10 ай бұрын
The same thing should work. Let me try to put a sample spaces app there.
@walidflux
@walidflux 10 ай бұрын
is it possible to to train Qlora to speak other languages other than English and Franch?
@rashedulkabir6227
@rashedulkabir6227 10 ай бұрын
Can you make a video about how to run Batgpt?
@DaRedT
@DaRedT 10 ай бұрын
thank you 👍👋👋
@1littlecoder
@1littlecoder 10 ай бұрын
You're welcome:)
@winxalex1
@winxalex1 10 ай бұрын
Great Video. Keep the passe. Please make video with GPU thx
@tabarnacus5629
@tabarnacus5629 10 ай бұрын
What if I wanna run it on my 4090? That makes a lot more sense to me. Btw I did manage it but it wasn't easy to figure out. Also it's not using 100% of the GPU because 'NVIDIA'. I'm not a youtuber so I won't make a video about this but for pointers forget windows, you need Linux or WSL2 and work with Cuda 11.8 that should get you in the right direction at least. Even by using a fraction of the 4090 you'll get around 10 times the performance of CPU. Don't even bother trying 8bits models the drivers don't support it 'yet' (it will work but will be incredibly slow like 10 words per minute) the newer drivers do support 8bits but they don't work with Torch as of writing this.
@user-bp2su2wd6j
@user-bp2su2wd6j 10 ай бұрын
Can you make a video for running DollyV2 and Falcon with ctransformers as well?
@1littlecoder
@1littlecoder 10 ай бұрын
Sure Thanks. Will try to make it!
@csowm5je
@csowm5je 10 ай бұрын
The response was very fast on my local CPU. However, When asked 'merge sort code for JavaScript", it gave a response omitting the last few lines of code. Is there a maximum length of response?
@csowm5je
@csowm5je 10 ай бұрын
Found the answer in the video: max_new_tokens= 512, the response started with Chinese 种于匹配算法 though...
@mfinixone1417
@mfinixone1417 9 ай бұрын
I really want to see phi-1.5 model from microsoft to be supported
@Ryan-yj4sd
@Ryan-yj4sd 10 ай бұрын
How much slower than GPU?
@gaurav241182
@gaurav241182 10 ай бұрын
I need some help. How to contact you?
@technolysys1998
@technolysys1998 10 ай бұрын
hey plz can u make a video on how to use this code to fine tune our llama 2 model and lets i want any other llm other than available in hugging face then what to do
@1littlecoder
@1littlecoder 10 ай бұрын
Please check this - kzbin.info/www/bejne/m5awZ4lrlrWdns0
@DarkWarren744
@DarkWarren744 10 ай бұрын
Can please explain the difference between llamacpp and c-transformer. Honestly, I think both do the same thing.
@IronMechanic7110
@IronMechanic7110 10 ай бұрын
You can't running llama2 based models with llamacpp instead ctransformers.
@DarkWarren744
@DarkWarren744 10 ай бұрын
@@IronMechanic7110 nope you can run, I use it to run llama2 model
@IronMechanic7110
@IronMechanic7110 10 ай бұрын
@@DarkWarren744 Not with langchain.
@IronMechanic7110
@IronMechanic7110 10 ай бұрын
@@DarkWarren744 I was at 100% on llama.cpp but now i use ctransformers at 100% for my python devellopment
@santoshshetty6
@santoshshetty6 3 ай бұрын
I used lama.cpp to run the models and it works well for me. Would like to know the difference too.
@divyamaskar6950
@divyamaskar6950 10 ай бұрын
Could you please make a video how we can give schema to llama2 model for text to sql?
@zyxwvutsrqponmlkh
@zyxwvutsrqponmlkh 10 ай бұрын
Am confused. You asking for it to create your sql quarries or to automate the querying and stuff the results in sql, or for a sql query result to trigger a prompt, or for sql query result to trigger a prompt that get's saved in a sql table? Or for the llm to learn how to speak with squirls, so you can train them to collect nuts for you? Or for LLaMA2 to help you catch a Squirtle?
@1littlecoder
@1littlecoder 10 ай бұрын
Based on a schema, do you want llama2 model to generate SQL ?
@divyamaskar6950
@divyamaskar6950 10 ай бұрын
@@1littlecoder that's right!
@abishek7583
@abishek7583 10 ай бұрын
Can we prompt this model ?
@1littlecoder
@1littlecoder 10 ай бұрын
Yeah, why?
@tikendraw
@tikendraw 10 ай бұрын
Yes, that's why ABD is not floating dead in space.
@1littlecoder
@1littlecoder 10 ай бұрын
Did you mean to say AMD Processor?
@tikendraw
@tikendraw 10 ай бұрын
@@1littlecoder I was commenting on other video, then clicked on this video notification so comment got posted here. But works here too. AMD* 🤣
@iigdgocydiyoch2929
@iigdgocydiyoch2929 10 ай бұрын
I run 30b ggml model on mu cpu imao
@1littlecoder
@1littlecoder 10 ай бұрын
Woah impressive. What do you use ?
@zyxwvutsrqponmlkh
@zyxwvutsrqponmlkh 10 ай бұрын
My name is Giovanni Giorgio, but everybody calls me Giorgio.
@1littlecoder
@1littlecoder 10 ай бұрын
I'm sorry, I didn't understand
@zyxwvutsrqponmlkh
@zyxwvutsrqponmlkh 10 ай бұрын
​@@1littlecoder put it in the youtube search box, gotta get with the hot new memes man. I watched the video at 3x speed. Blazing. Also, about the video, personally meh, I went and gots me a 3090 so not super interested in 4 bit quantization on a cpu myself. Maybe a quantized 40b 8 bit or 70b 4 bit quantized cuda to fit in a 24gb card could be something I would poke around at. TBH my biggest interest RN is ways to normalize and deduplicate dirty corpuses I'm collecting and training off it... on a 3090 because umm yeah, I'm selfish.
@PhotoshoppersStop
@PhotoshoppersStop 10 ай бұрын
I want to use this model - llama-2-7b-chat.ggmlv3.q3_K_S.bin on WSL 2 linux from Windows, how can I do that, plz help.
Meet Stable Chat - Stable Beluga with Free Chat UI!!!
4:41
1littlecoder
Рет қаралды 2,7 М.
Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:07:11
Комедии 2023
Рет қаралды 6 МЛН
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 119 М.
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,2 МЛН
Run ANY Open-Source Model LOCALLY (LM Studio Tutorial)
12:16
Matthew Berman
Рет қаралды 132 М.
Run Your Own LLM Locally: LLaMa, Mistral & More
6:55
NeuralNine
Рет қаралды 50 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 129 М.
New Tutorial on LLM Quantization w/ QLoRA, GPTQ and Llamacpp, LLama 2
26:53
Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:07:11
Комедии 2023
Рет қаралды 6 МЛН