LangChain - Using Hugging Face Models locally (code walkthrough)

  Рет қаралды 116,308

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 84
@insightbuilder
@insightbuilder Жыл бұрын
Keep up the great work. And thanks for curating the important HF models that we can use as alternate for paid LLMs. When learning new tech, using the free LLMs can provide the learner a lot of benefits.
@bandui4021
@bandui4021 Жыл бұрын
Thank you! I am a newbie in this area and your vid´s are helping me a lot to get a better picture of the current landscape.
@قناة_لحظة_إدراك
@قناة_لحظة_إدراك Жыл бұрын
How can the ready-made projects on the platform be linked to Blogger blogs? I have long days searching to no avail
@sakshikumar7679
@sakshikumar7679 6 ай бұрын
saved me from hours of debugging and research! thanks a ton
@morespinach9832
@morespinach9832 10 ай бұрын
This is helpful because in some industries like banking or telcos, it's impossible to use open source things. So we need to host.
@tushaar9027
@tushaar9027 Жыл бұрын
Great video sam , i don't know how i missed this
@luis96xd
@luis96xd Жыл бұрын
I have a problem, when I use low_cpu_mem_usage or load_in_8_bit, I get an error about I need to install xformers , When I install xformers , I get an error I need to install accelerate, When I install accelerate, I get an error I need to install bitsandbytes, And so on: einops accelerate sentence_transformers bitsandbytes But finally, I got an error *NameError: name 'init_empty_weights' is not defined* I don't know how I can solve this error and why it happens, could you help me please?
@prestigious5s23
@prestigious5s23 Жыл бұрын
Great tutorial. I need to train a model on some private company documents that aren't publicly released yet and this looks like it could be a big help to me. Subbed!!
@intelligenceservices
@intelligenceservices 5 ай бұрын
is there a way to compile a huggingface repo to a single safetensors file? (compiled from a repo that has the separate directories: scheduler, text_encoder, text_encoder_2, tokenizer, etc...)
@atharvaparanjape9585
@atharvaparanjape9585 7 ай бұрын
How can I load the model for some time later, once I download it on the local drive
@steev3d
@steev3d Жыл бұрын
Nice video. Im trying to connect an LLM and use Unity 3d as my interface for STT and TTS with 3d characters. I just found a tool that enables connex to a LLM on huggingface which is how I discovered that you need a paid endpoint with GPU support to even run most of them, I kinda wish I found this video when you posted it. Very useful info.
@anubhavsarkar1238
@anubhavsarkar1238 6 ай бұрын
Hello. Can you please make a video on how to use the SeamlessM4T HuggingFace model with langchain ? Particularly for text to text translation. I am trying to do some prompt engineering with the model using Langchain's LLMChain module. But it does not seem to work ...
@Chris-se3nc
@Chris-se3nc Жыл бұрын
Thanks for the video. Is there any way to get an example using the lang chain JavaScript library? I am new to this area, and I think many developers would have a node versus a python background
@MohamedNihal-rq6cz
@MohamedNihal-rq6cz Жыл бұрын
Hi sam , how do you feed your personal documents and query them and return response in a generative question answering format and not as extractive question answering , I am bit new to this library , I don't want to use Openai api keys please provide some guidance on using with open source llm models, thanks in advance!
@samwitteveenai
@samwitteveenai Жыл бұрын
that would require fine tuning the model, if you want to put the facts in there. That is probably not the best way to go though.
@binitapriya4976
@binitapriya4976 Жыл бұрын
Hi Sam, Is there any way to generate question answer from a given text in a .txt file and save those questions answers in another .txt file with the help of free huggingface model?
@botondvasvari5758
@botondvasvari5758 7 ай бұрын
and how can I use big models from huggingface ? I can't load them into memory because many of them are bigger than 15gb, some of them are 130gb+ . Any thoughts?
@samwitteveenai
@samwitteveenai 7 ай бұрын
you need a machine with multi GPUs
@megz_2387
@megz_2387 Жыл бұрын
how to fine tune this model so that it can follow instructions on data provided
@marko-o2-h20
@marko-o2-h20 Жыл бұрын
If we cannot afford to get A100, what's the cheaper option you would recommend to run these? I understand the models differ in size also. Thanks Sam.
@jzam5426
@jzam5426 Жыл бұрын
Thanks for the content!! Is there a way to run a HuggingfacePipeline loaded model using M1/M2 processors on Mac? How would one set that up?
@DanielWeikert
@DanielWeikert Жыл бұрын
I tried to store the KZbinDownloader loads in FAISS using HuggingFace Embeddings but the LLM was not able to do the similarity search. Colab finally ran into timeout. Can you share how to do this instead of using OpenAI? With OpenAI I had no issues but like to do it with HF Models instead e.g. Flan br
@computadorhumano949
@computadorhumano949 Жыл бұрын
Hey, why it take time to response out? This needed of my CPU to be fast?
@samwitteveenai
@samwitteveenai Жыл бұрын
yeah for the local stuff you really need a GPU rather than a CPU
@induu954
@induu954 Жыл бұрын
Hi.. i would like to know that, Can we chain 2 models like a classification model and a pretrained model using langchain?
@samwitteveenai
@samwitteveenai Жыл бұрын
You could do it through a tool. Not sure there is anything in built in LangChain for the classification models if you mean something like a BERT etc.
@luis96xd
@luis96xd Жыл бұрын
Amazing video, everything was well explained, I needed it, thank you so much!
@magnanimist
@magnanimist Жыл бұрын
Just curious, do you need to redownload the model everytime you run scripts like these? Is there a way to save the model and use it after it's been downloaded?
@samwitteveenai
@samwitteveenai Жыл бұрын
If you are doing this on a local machine the model will be there and HuggingFace should save it to a model. You can also do model.save_pretrained('model_name')
@AdrienSales
@AdrienSales Жыл бұрын
Excllent tutorial, ad so weel explained. Thanks a lot.
@younginnovatorscenterofint8986
@younginnovatorscenterofint8986 Жыл бұрын
Hello Sam,how do you solve Token indices sequence length is longer than the specified maximum sequence length for this model (2842 > 512). Running this sequence through the model will result in indexing errors. thank you inadavance
@samwitteveenai
@samwitteveenai Жыл бұрын
this is a limitation in the model not LangChain. There are some models on HF that are 2048.
@yves1893
@yves1893 Жыл бұрын
i am using huggingface model chavinlo/alpaca-native however, when i use those embeddings with this model pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_length=248, temperature=0.4, top_p=0.95, repetition_penalty=1.2 ) local_llm = HuggingFacePipeline(pipeline=pipe) my output is always only 1 word long. can anyone explain this?
@SomuNayakVlogs
@SomuNayakVlogs Жыл бұрын
can you create for csv as input
@samwitteveenai
@samwitteveenai Жыл бұрын
I made another video for using CSVs with langchain check that ou
@SomuNayakVlogs
@SomuNayakVlogs Жыл бұрын
@@samwitteveenai Thanks Sam,i already watch that video that is with opeiai but i wanted lang chain with csv and huggingface
@SomuNayakVlogs
@SomuNayakVlogs Жыл бұрын
can you please help me on that
@Marvaniamehul
@Marvaniamehul Жыл бұрын
I am also curious if we can use hugginface pipeline (local run) and langchain to load csv file.
@brianrowe1152
@brianrowe1152 Жыл бұрын
Stupid question, so I'll take a link to another video/docs/anything. Which Python version, cuda version, pytorch is the best to use for this work? I see many using python 3.9 or 3.10.6 specifically. The pytorch site recommends 3.6/3.7/3.8 on the install page. Then the cuda version 11.7 or 11.8 - it looks 11.8 is experimental? Then when I look at my nvcc output its says 11.5, but my nvidia-smi says cuda Version 12.0 .. head explodes... I'm on Ubuntu 22.04. I will google some more, but if someone know the ideal setup.. or at least the it works setup.. I appreciate it!!! Thank you
@stonez56
@stonez56 8 ай бұрын
Please make a video on how to convert Safetensors to. GUFF format or format that can be used for Ollama? Thanks for these great AI videos!
@human_agi
@human_agi Жыл бұрын
what kind of cloab you need, becuase i am using $10 version with high ram and GPU on, and still cannotr run ValueError: A device map needs to be passed to run convert models into mixed-int8 format. Please run`.from_pretrained` with `device_map='auto'`
@samwitteveenai
@samwitteveenai Жыл бұрын
If you don't have access to the bigger GPU then go with a smaller T5 model etc.
@rudy.d
@rudy.d Жыл бұрын
I think you just need to add the argument device_map='auto' in the same list of arguments of your model's "*LM.from_pretrained(xxxx)" where you have "load_in_8bit=True"
@hiramcoriarodriguez1252
@hiramcoriarodriguez1252 Жыл бұрын
I'm a transformers user and I don't still get the point to learn this new library. Is just for very specific use cases?
@samwitteveenai
@samwitteveenai Жыл бұрын
Think of it as an abstraction layer for prompting and and how to manage the user interactions with your LLM. It is not an LLM in itself.
@hiramcoriarodriguez1252
@hiramcoriarodriguez1252 Жыл бұрын
@@samwitteveenai I know, it's not a LLM, the biggest problem that I see is learning a new library that wraps Open AI and HuggingFace libraries just to save 3 or 5 lines of code. I will follow your work, maybe that will change my mind.
@insightbuilder
@insightbuilder Жыл бұрын
Consider the Transformers as the first layer of abstraction over the neural nets which create the LLMs. In order to interface with LLMs, we can use many of libraries including HF. HF Hub/ Langchain will be the 2nd layer. The USP of langchain is the ecosystem that is built around it, especially using the Agents, Utility Chains. This ecosystem lets the LLMs to be connected with the outside world... The devs at LC have done a great job. Do learn it, and share this absolutely brilliant vids with your friends/ team members etc.
@samwitteveenai
@samwitteveenai Жыл бұрын
great way of describing it @Kamalraj M M
@neilzedd8777
@neilzedd8777 Жыл бұрын
​@@insightbuilder beyond impressed with how healthy their documentation is. Working on a flan-ul2 + lc app right now, very fun times.
@alexandremarhic5526
@alexandremarhic5526 Жыл бұрын
Thank for he work. Just let you know Loire Valley is in the north of France ;)
@samwitteveenai
@samwitteveenai Жыл бұрын
Good for wine ? :D
@alexandremarhic5526
@alexandremarhic5526 Жыл бұрын
@@samwitteveenai depends of your taste. If you love sugar wine, south is better. Specialy for withe wine like "Jurançon".
@srimantht8302
@srimantht8302 Жыл бұрын
Awesome video! Was wondering how I could use Langchain with a custom model running on sagemaker? Is that possible?
@samwitteveenai
@samwitteveenai Жыл бұрын
yeah that should be possible in a similar way.
@hnikenna
@hnikenna Жыл бұрын
Thanks for this video. You just earned a subscriber
@halgurgenci4834
@halgurgenci4834 Жыл бұрын
These are great videos Sam. I am using a Mac M1. Therefore, it is impossible to run any model locally. I understand this is because PyTorch has not caught up with M1 yet.
@samwitteveenai
@samwitteveenai Жыл бұрын
actually I think they will wrong. I use an M1 and M2 as well but I run models in the cloud. I might try to get them to run on my M2 and make a video if it works.
@daryladhityahenry
@daryladhityahenry Жыл бұрын
Hi! Do you find a way to load vicuna gptq version using this? I try your video with gpt neo 125M and it's working, but not vicuna gptq. Thank youu
@venkatesanr9455
@venkatesanr9455 Жыл бұрын
Thanks for the valuable series and highly informative. Can you provide some discussions on in-context learning(providing context/query), reasoning & chain of thoughts
@samwitteveenai
@samwitteveenai Жыл бұрын
Hi glad it is helpful. I am thinking about doing some vids on Chain of Thought prompting, Self Consistency, and PAL going through the basics of the paper and then looking at how they work in practice with an LLM. I will in the basics of in-context learning as well. Let me know if there are any others you think I should cover.
@DarrenTarmey
@DarrenTarmey Жыл бұрын
It would be nice to have someone do review fir noobies as there are so much to learn and it's hard to know we're to start from.
@samwitteveenai
@samwitteveenai Жыл бұрын
what exactly would you like me to cover? Any questions I am happy to make more vids etc.
@fintech1378
@fintech1378 Жыл бұрын
how to do telegram chatbot with this
@surajnarayanakaimal
@surajnarayanakaimal Жыл бұрын
Than you for the awesome content, it would be very helpful if you make tutorial on how to use custom model with langchain embed it, so i want to train some documentations , so currently we can use open ai or other service APIs But it is very costly consuming their APIs, so can you teach how to do that locally please consider training a custom documentation of site, and it can answer from the documentation, more context aware and also history remember. Currently for that we depend on open ai APIs. So if it's achievable using local modal it would be very helpful
@XiOh
@XiOh Жыл бұрын
u are not doing it locally in this video.....
@samwitteveenai
@samwitteveenai Жыл бұрын
The LLMs are running locally on the machine where the code is running. The first bit shows pinging the API as a comparison.
@SD-rg5mj
@SD-rg5mj Жыл бұрын
hello and thank you very much for this video, on the other hand the problem is that I am not sure to have understood everything, I speak English badly, I am French
@samwitteveenai
@samwitteveenai Жыл бұрын
Did you try the french sub titles? I upload English subtitles so I hope youtube does a decent job translating them. Also feel free to ask any questions if you are not sure.
@pankajkumarmondal4490
@pankajkumarmondal4490 3 ай бұрын
really good!
@samwitteveenai
@samwitteveenai 3 ай бұрын
thanks this video is actually a bit out of date for now it is almost 18 months old
@KittenisKitten
@KittenisKitten Жыл бұрын
Would be useful if you explained what program your using, or what page your looking at, seems like waste of time if you don't know anything about the programs or what your doing 1/5
@samwitteveenai
@samwitteveenai Жыл бұрын
The Colab is linkedin in the description, its all there to use.
@mrsupremegascon
@mrsupremegascon Жыл бұрын
Ok, great tutorial, but as a French from Bordeaux, I am deeply disappointed by the answer of google about the best area to grow wine. Loire valley ? Seriously ???? Name one great wine coming from Loire, Google, I dare you. They are in the b league at best. The answer is obviously Bordeaux, I would maybe had accepted Agen (wrong) or even Bourg*gne (very very wrong). But Loire, it's outrageous and this answer made me certain that I will never use this cursed model.
@samwitteveenai
@samwitteveenai Жыл бұрын
lol well at least you live in a very nice area of the world.
@nemtii_
@nemtii_ Жыл бұрын
What happens always with this setup langchain + HuggingFaceHub is that it only increments on 80 characters for each call, anyone else having this problem, I tried max_length: 400 and still same issue
@nemtii_
@nemtii_ Жыл бұрын
it's not local to langchain, I used the client directly and still getting the same issue
@samwitteveenai
@samwitteveenai Жыл бұрын
I think this could be an issue with their API. Perhaps on the Pro/paid version they allow more? I am not sure, to be honest I don't use their API , I tend to load the models etc. could also the max_new_tokens setting rather than max_length, that could help.
@nemtii_
@nemtii_ Жыл бұрын
@@samwitteveenai wow! thank youuu!! worked with max_new_tokens
@nemtii_
@nemtii_ Жыл бұрын
@@samwitteveenai I wish someone one do a list, mapping of which model sizes runs on google colab free, versus the paid colab, and so to see if it's worth to pay, and what can u experiment with within that tier, I'm kinda lost in that sense, at a stage where I just want to evaluate models myself, and see for a production-env later
@samwitteveenai
@samwitteveenai Жыл бұрын
This would be good I agree
@ELECOEST
@ELECOEST 11 ай бұрын
Hello, Thanks for your video. for now it's : llm_chain = LLMChain(prompt=prompt, llm=HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.9, "max_length":64})) temperature must bu >0 and model : flan-t5-xxl
@litttlemooncream5049
@litttlemooncream5049 10 ай бұрын
thx! helped a lot! but stuck at loading model...it says google/flan-t5-xl is too large to be loaded automatically (11GB > 10GB)....qaq
@samwitteveenai
@samwitteveenai 10 ай бұрын
try a smaller model if your GPU isn't big enough google/flan-t5-small or something like that
PAL : Program-aided Language Models with LangChain code
19:00
Sam Witteveen
Рет қаралды 15 М.
All You Need To Know About Running LLMs Locally
10:30
bycloud
Рет қаралды 189 М.
LangChain Basics Tutorial #1 - LLMs & PromptTemplates with Colab
19:39
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 331 М.
$0 Embeddings (OpenAI vs. free & open source)
1:24:42
Rabbit Hole Syndrome
Рет қаралды 271 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 377 М.
Ollama + HuggingFace - 45,000 New Models
7:48
Sam Witteveen
Рет қаралды 11 М.