Tagging and Extraction - Classification using OpenAI Functions

Рет қаралды 20,006

Sam Witteveen

Күн бұрын

Пікірлер: 65

@daffertube Жыл бұрын

Thank you for making these tutorials. They are very helpful!

@jaimemelon2621 Жыл бұрын

Incredible channel on LangChain and AI in general.

@djpremier333 Жыл бұрын

You have the best channel about langchain, I love your content.

@samwitteveenai Жыл бұрын

Thanks.

@borisw1166 Жыл бұрын

Thank you for the video. Did some testing today with Kor and Kor seems to work better in my cases. Tested it with a bill with instruction for the customer reference for the bank transfer. With Kor not only do I get the right customer reference (based on the instructions on the bill), but it also calculates the right amount (since no total amount is on the bill). With functions, it only works with tagging (instead of extraction) but it does not calculate the amount. This is also a good example for a "positiv" prompt-injection, since the instruction on how to use the right customer reference was on the bill and got "injected" into the prompt :D

@samwitteveenai Жыл бұрын

I like Kor and made a video about that in the past, so not surprised it may work better for certain use cases. Be really careful relying on any of the models to calculate correctly, that sounds like it is the kind of thing that could break easily.

@borisw1166 Жыл бұрын

@@samwitteveenai Sure, we are not relying on that. I was actually surprised it did calculate the amount. I am currently testing out different ways for data extraction, and that bill actually failed in my banking app (with the photo transfer feature) so I just gave it a try.

@qingdong801 Жыл бұрын

Thank you so much for sharing the code in colab and github!

@jasonlosser8141 Жыл бұрын

Hi Sam, once again the quality of your videos are amazing. I built an extraction using OpenAI functions this weekend to get excellent JSON returned. I have a few scripts that attempt to do this with basic prompting, but they can hallucinate occasionally. As of now, this function concept is working great. Your tip on enum is phenomenal - I hadn’t thought of that. Also, a criticism I have is turbo isn’t quite as solid as davinci3 on its returns. I don’t have api gpt4 yet - I will try that once granted by OpenAI. Anyways, last thing - do you feel running through langchain is even necessary? I have felt the OpenAI function implementation can eliminate langchain for a great deal of what I do -- perhaps a bit more scripting on my part, but eliminates a friction/fail point

@samwitteveenai Жыл бұрын

You raise a number of key issues here, let me try to address each one. 1. I agree the turbo model is often not as good as Davinci 003 etc. I personally think that is because turbo is a distilled smaller model (but I have no inside knowledge on that) 2. GPT-4 is coming to more people soon. 3. I think for some things LangChain is the right solution and for others not. This week I have worked on a number of things that ping the OpenAI APIs directly just because it was easier for what I was doing. LangChain & Llama Index is still very cool and probably the best ways to go for using data and tools with LLMs, If you can get away without it then that is fine to do

@sachinsarathe1143 6 ай бұрын

Hey Sam ... Awesome Video... Really Helpful. Just want to know is there any mechanism for evaluating it.

@anubiseyeproductions2921 Жыл бұрын

You didn’t start the video with “Okay…”. I aways look forward to that.

@sethhavens1574 Жыл бұрын

also this ^^ 👍

@samwitteveenai Жыл бұрын

lol

@kesavanr5341 Жыл бұрын

Your Videos are awesome, Thank you for the langchain series, I was wondering if there are any tagging chains with open source llm like palm or llama

@samwitteveenai Жыл бұрын

yeah there are some with some of the new Mistral models. I will try to make some new vids over time

@sskarimirelandsskarimirela8750 Жыл бұрын

Dear Sam I'm really quite worried that almost the integration is with open ai which is not open source or limited is that means that you can't or langchain can be used only with open ai ? Thanks

@samwitteveenai Жыл бұрын

LangChain really seems to be OpenAI first and then other models later (though it supports the other models fine as well). The problem is most of the open source models just can't do this kind of task, meaning they don't have the reasoning skills to do it.

@sskarimirelandsskarimirela8750 Жыл бұрын

@@samwitteveenai many thanks dear for your great effort . I feel like we can't run business for small company with dominance from few big companies and pay for each token ....

@samwitteveenai Жыл бұрын

Trust me often paying for tokens is much cheaper than running 4-8GPUs to run your own model etc.

@sskarimirelandsskarimirela8750 Жыл бұрын

@@samwitteveenai many thanks 👍👍👍

@toddnedd2138 Жыл бұрын

Thanks for the explanation. The new features (functions & larger prompt window) of the openAI models are a little bit like you buy a technical device with a lot of buttons but the vendor does only give you an example usage instead of a detailed manual. It would be very helpful if openAI would publish some training data of the models. On the other hand, maybe this try&error attempt creates the new jobs that everybody is talking about ; - )

@sethhavens1574 Жыл бұрын

now that’s some clever stuff 👌

@kunalmundada8754 Жыл бұрын

Nice video! Just wondering if I could add type as list or array and define what I want in the array in the description

@samwitteveenai Жыл бұрын

Yes especially in the Pydantic classes you should be able to do that.

@shraey2021 Жыл бұрын

Hi Sam, just came across your channel. Pretty cool stuff. I had a query. I saw your video (maybe couple back) where we convert function calling from open ai into Tools and then call it as an agent. Here we call it more like a chain, am thinking multiple functions like multiple tools and chain calls it. Are both techniques equal or is one way better than the other? Cheers

@MadhavanSureshRobos Жыл бұрын

Looking forward for more projects with open LLMs too. I feel OpenAI is the best no doubt but it's a Supercomputer vs PC fight now and we'd rather have PCs. Just my opinion. Anyway always love your content.

@samwitteveenai Жыл бұрын

don't worry I haven't given up of OpenSource LLMs. I will make some more vids soon.

@rkenne1391 Жыл бұрын

Thank you so much, insightful + notebook. Awesome, quick question, how do you combine it with few shots / icl ? FewshotPromptTemplate ?

@samwitteveenai Жыл бұрын

Can just add it into the prompt template or yes look at that prompt template.

@youssefsalah5265 10 ай бұрын

how to add examples to the prompet

@kentl658 6 ай бұрын

Is it possible to do tagging and extraction at the same time? In the context of event planning, I want the event details (tagging), as well as a list of vendor services and requirements (extraction). I have difficulty forming the class/schema

@dare2dream148 Жыл бұрын

Thanks agani for sharing Sam! I've got two qns. 1. Is these optimizations why GPT-4 seems to be becoming worse in some other tasks over time? Does it imply OpenAI is focusing more on API usage than Chat usage going forward? 2. On applying few-shot ICL to these functions. What are some of ways/ideas to implement it?

@RyanScottForReal Жыл бұрын

Hmm I guess since I'm needing to do both extraction and tagging I need to run 2 steps - is there a way to do both in a single shot?

@samwitteveenai Жыл бұрын

yes just put it all in one pydantic class.

@JOHNSMITH-ve3rq Жыл бұрын

Bro your mic seems to max out at a certain frequency or something? 13:26 or so the word “extract” gets flattened or cut off. It seems to be a sound setting.

@samwitteveenai Жыл бұрын

Its because I recording in a room with a lot of reverb and then I run it through a denoiser to remove the reverb. Certainly not ideal and open to any suggested solutions you have.

@gramothy_taylor Жыл бұрын

Have you tried a microphone isolation shield? Cheaper than doing a whole room of acoustics, and should help a lot with that.

@Xaddre Жыл бұрын

I created a terminal using it where it will return the command necessary to do whatever the user asks in plain English it’s pretty prototype like but I actually use it a lot when I forget a terminal command for something.

@HessenBougueffaeutamene Жыл бұрын

hi Sam can we use LLM for detecting images fake or real ? thank you

@yurijmikhassiak7342 Жыл бұрын

Thanks. Do you think GPT4 whould do better?

@samwitteveenai Жыл бұрын

Yes totally. GPT-4 does better pretty much on most things especially reasoning stuff.

@MrOldz67 Жыл бұрын

Hey Sam Thanks as always for your great video that's a big work you're doing for the community I am curious to get your thoughts about a potential usage with document summarization and QA. I am currently building a solution like this that will allow you to query your documents and ask them questions or ask the chatbot for summarization or even generation of a copy using a document template etc I was looking to build that with unstructured data and using a faiss or vectordb solution to build a database is there any benefit to use tagging and extraction or can that even be a solution? Thanks in advance for your answer as always

@samwitteveenai Жыл бұрын

Yes often you will extract meta data and then do the search with both vectors and meta data.

@Arjay2186 Жыл бұрын

Thanks. Awesome as usual. Is there a way to combine this with Retrieval QA for loooong documents and changing information to be extracted each time?

@samwitteveenai Жыл бұрын

Probably not simply if you want to change the schema each time. there is a new Retrieval QA using functions too. I may make a video on that.

@Arjay2186 Жыл бұрын

@@samwitteveenai Big thanks. That would be great. Need to find time to play around with Langchain again. Whole thing is changing way too fast.

@dangerous235 Жыл бұрын

thank you for your great video. just a question: why it's runnable while we don't have a local function named information_extraction?

@samwitteveenai Жыл бұрын

good question. Its because we don't have anything in the outparser that tries to run a a function, like we did in the other ones.

@dangerous235 Жыл бұрын

could you explain how could we check (print) outparser, for example, in case of the agent with stock price tool in your previous video? I want to better understand the differences between 2 cases (run local function and these)

@dangerous235 Жыл бұрын

I realize with these cases, instead of require function to be executed, the parameters themselves are our expected output, correct?

@andy111007 Жыл бұрын

Hi Sam, thanks for the amazing video. Any plans for. any follow ups. This is an interesting concept underutilized

@csharpner Жыл бұрын

Is this available for locally run models yet?

@samwitteveenai Жыл бұрын

Not yet especially not how they structure the input etc like this. But there might be some things coming soon.

@dtkincaid Жыл бұрын

I must be missing something. The use cases you're showing here were already possible using output parsers. What's new here? I've been doing these things for a while now.

@samwitteveenai Жыл бұрын

This is far more stable than output parsers because the model is fine-tuned to use these functions.

@vinsi90184 Жыл бұрын

Hey Sam, follow and love your videos. I am trying to create a bot where I need to extract an entity and pass on to some APIs depending on the question. For example, what's the weather in my area? should respond to which area you reside and if I pass a name that matches in the list, then I can call the API and return the answer. So, problem 1 is to ask user recursively to get complete question and in that process also solve the problem 2 which is to extract the "entity" but matching to what I have in my docs. So, if the person response my home, the bot should ask but which city is your home and if I give a city name that does not exist the bot says it doesn't have the information. I have to do it over multiple types of entities and not just one, say location, date / date range etc. What should be the best way to approach this problem?

@samwitteveenai Жыл бұрын

This would be more the prompt and serious amount of customizing the prompt. Also you could look at filling a set of slots, so when the slot is still empty the program then asks a direct question (Which city do you live in") and then filter for that slot. Don't think of the OpenAI Functions as being static, you can call with different functions based on what you get back.

@ArchitJain-j4u Жыл бұрын

I was working on this and trying on some of my use cases. This feels amazing. The problem I want to solve now is to run these chains on large amount of data (documents). Is there any solution/hack that you found using langchain?

@samwitteveenai Жыл бұрын

Actually just a for loop is fine and monitor for API calls that don't work etc

@TomMathews Жыл бұрын

Hi Sam. First off, your content is simply amazing. Detailed and very informative. I have been motivated and experimenting with LangChain quite a lot, especially since I started watching your content. Can you possibly also create a video on the latest HugginFace Transformer Agents?

@samwitteveenai Жыл бұрын

I made a video a while back, have they added anything new?

@TomMathews Жыл бұрын

@@samwitteveenai My bad. I had been going through your LangChain playlist and somehow missed that video. Thanks again.

@mattizzle81 Жыл бұрын

I found the functions a really cool idea but not ready for prime time. It breaks too much still. ChatGPT sometimes ignores them altogether and claims it can't do anything, or even worse it "pretends" to use the function. It calls it but the API doesn't actually pick it up as a function call.