Using LangChain Output Parsers to get what you want out of LLMs

Рет қаралды 43,438

Sam Witteveen

Күн бұрын

Пікірлер: 94

@magicofjafo Жыл бұрын

Dude, I was exactly having output parser problems just last night. This is exactly what I needed. Thanks.

@pankymathur Жыл бұрын

Thanks a lot Sam, I really like the way you went deep into explaining all different types of parser with examples. This is definitely one of the top notch content video you released, keep it up 😊

@toddnedd2138 Жыл бұрын

Thank you. There are pros and cons with langchain. It is a powerful framework but sometimes it is (imho) a little bit to verbose if it comes to prompt templates. This adds up if you make a lot of request to the model and costs unnecessary tokens (in production ready applications). Therefore i use my own written prompts, the crucial thing is finishing the prompt with the final instruction, here is an example: --- The result type 2 should be provided in the following JSON data structure: { "name": "name of the task", "number": number of the task as integer, // integer value "tool": "name of the tool to solve the task", // omitted "isfinal": true if this is the last task, // boolean value } Respond only with the output in the exact format specified, with no explanation or conversation. --- So far, this has always worked reliably (over 500 calls). I found this in a lot of papers, so, the credits for this go to some other intelligent guys. From my experience, the name of the fields and also the order of the fields can make a difference.

@MichaelNagyTeodoro Жыл бұрын

I did the same thing, it's better to do the parsing outside of langchain methods.

@thatryanp Жыл бұрын

From what I see of langchain examples, it seems that for someone with development experience, they would be better served with some basic utility functions rather than taking on langchain's design assumptions.

@toddnedd2138 Жыл бұрын

@@thatryanp & @Michael Nagy Teodoro There could be some disadvantage if you write own solutions when it comes to updating to a newer underlying model. Maybe not critical today, but one day it might be a topic. My guess, the langchain community will be fast provide updates.

@nahiyanalamgir7056 Ай бұрын

Isn't there any way to write a custom LangChain generator that fits your needs better?

@kuhajeyangunaratnam8652 8 ай бұрын

Thanks a lot mate. This is invaluable as it gets, code walkthrough with all the explanations. Not to mention that code itself well documented.

@rashidquamar 7 ай бұрын

thanks Sam, I was strugling with outparser and you helped on time

@MariuszWoloszyn Жыл бұрын

I used to insert and output in YAML. It's more human readable than json hence it works better with llms. No missing colon or stuff like that.

@jolieriskin4446 Жыл бұрын

Another variation I've been using is to have a separate JSON repair method. I usually use a similar technique of showing the example JSON and immediately call my validation routine afterwards. If there is an error, send the JSON error and line number it's on as a separate call and try up to 3x to repair the output. The nice thing is you can use a lot fewer tokens on the repair call and potentially call a more specific or faster model that is tailored towards just fixing JSON (rather than wasting an expensive call to GPT4 etc...).

@bingolio Жыл бұрын

Sam you are my unsung Hero of AI. THANKS!

@jasonlosser8141 Жыл бұрын

Great video. I’m using core python elements to parse right now, but I’ll incorporate output parsers in my next rebuild.

@vijaybudhewar7014 Жыл бұрын

That is something new i did not know this...as always you did your job at its best

@siyuanhuang9221 3 ай бұрын

how to automate the output fixing and retry error? the example seems quite manual

@oz4549 Жыл бұрын

I have an agents which goes through a list of tasks. I want the output structure to be different depending on the question asked. Maybe in one instance I just want to return json but in another instance i want to return markdown. I tried to do this with prompts but it is not consistent. Is it possible to do this?

@easyaistudio Жыл бұрын

the problem with trying to do the formatting in the same prompt that does the reasoning is that it impacts the result

@samwitteveenai Жыл бұрын

You can get the model to give reasoning before as part of the output. Ideally you want reasoning instructions earlier that output instructions.

@cloudprofessor 9 ай бұрын

How can we use output parsers with RetrievalQA ?

@jdallain Жыл бұрын

Thank you very much! This is super helpful and something I’ve struggled with

@anindyab Жыл бұрын

Thanks for this, Sam. Your videos on Langchain have been incredibly informative and helpful. Here's a request: Can you please do a video on creating Langchain agents with open source/local LLMs? The agents seem to require specific kind of output from the LLMs and I think that can be a nice follow up to this video. In my brief experience open source LLMs are not easy to work with when it comes to creating agents. Your take on this will be very helpful.

@samwitteveenai Жыл бұрын

The big challenge is most of the released Open Source models can't return the right format. I have a new OpenAI one coming and will try to convert that to open source to show people.

@anindyab Жыл бұрын

@@samwitteveenai This is great news. Thank you!

@RedCloudServices Жыл бұрын

There is no LangChain plugin in the ChatGPT plugin store. Did they remove it?

@elikyals Жыл бұрын

Can output parsers be use with the csv agent?

@tomaszzielonka9808 Жыл бұрын

How giving a specific role (in this example a master branding consultant) improves (or impacts, in general) the outcome of a prompt? LLMs make predictions based on sequence of words and I try to bond role-playing with model's output.

@redthunder6183 Жыл бұрын

How does the output parser actually parse the output that it gets back tho? Is it just regular code? Or is it something more. Like as an example what if the model forgets the second “ to end a string???

@vikrantkhedkar6451 6 ай бұрын

This is a really important stuff

@RobotechII Жыл бұрын

Wonderful content! I'm sending it to my team

@ugk4321 Жыл бұрын

Super content...explained well. Thank you

@RobvanHaaren 9 ай бұрын

Sam, I love your videos, I'm a huge fan. The only feedback I would have to make your channel better is to fix your typos. Both in your template strings (not a big deal since the LLM will understand regardless), but also your video titles (eg. at 4:57) "Ouput", may affect your credibility. All the best and keep up the good work!

@samwitteveenai 9 ай бұрын

Thanks & Sorry about that. I have tended to record these on the fly and put them out. I have someone editing now who will hopefully catch them as well.

@____2080_____ Жыл бұрын

Awesome and thank you for teaching

@ElNinjaZeros Жыл бұрын

When I try to apply this parsing with models called -langchain, sometimes it works and sometimes it doesn't. Same with langchain's pydantic.

@imaneb4073 7 ай бұрын

Hello thank you so much for such valuable and creative content that helps us a lot please i have a question , I am using pydantic Output Parser on a strcutured pdf documents to generate a dataset ( where I will select only specific fields ) I used OpenAI as llm model but the problem i faced is i am working with a folder of 100 pdfs so the code suddenly is intrurrepted due to openai limit daily rate of requestion . Please how to handle this is there a trick ? Or another alternative?

@alihosseini592 Жыл бұрын

As you also mentioned in the video, CommaSeperatedOutputParser does not really work well(for example there was a dot at the end of LLM's response. Is there any other way to get the model to output only a list?

@giraymordor Жыл бұрын

Hello Sam, i have a question: I aim to send a sizable text to the OpenAI API and subsequently ask it to return a few select sections from the text I've dispatched. The text I intend to send consists of approximately 15k tokens, but the token limit for gpt3-5.turbo is merely 4k. How might I circumvent this limitation and send this text to OpenAI using the API? This is not for the purpose of summarization, as there are ample examples of that on KZbin. My goal is to send a substantial amount of text to OpenAI within the same context, and for the model to retain what I previously sent. Following this, I would like it to return a few parts from the original text, preserving the integrity of the context throughout these operations. Thank you in advance for your guidance!

@Aroma_of_a_Roamer Жыл бұрын

Love your content Sam. I was wondering have you ever got classification/Data Extraction working with an Open Source LLM such as Llama 2? Would love to see a video on this if you have. Thanks Keep up the great work.

@samwitteveenai Жыл бұрын

I have been working on this with mixed results, hopefully can show something with LLaMA2

@Aroma_of_a_Roamer Жыл бұрын

@@samwitteveenai You are an absolute champion. I think all app development is exclusively done with ChatGPT since it is a) superior to open source LLM & b) App & library developers such as Langchain have geared their app development towards it, using their own prompt templates. Each LLM has its own way and nuance as to how to format the prompt in order to make it work correctly.

@user-wr4yl7tx3w Жыл бұрын

By chance. Does LangChain have an implementation of the Tree of Thought?

@samwitteveenai Жыл бұрын

Not yet but I have been playing around with it. I want to make sure it works for things not just in the paper before I make a video.

@TomanswerAi Жыл бұрын

Great explanation thank you!

@Darkhellwings Жыл бұрын

Thanks for the explanations. What I still miss from this tutorial (and some others of yours), is how to personalize langchain's API to go beyond what is provided at the moment. For instance, a simple question raised immediately after watching this would be : how to implement a custom output parser, for a custom format that is not JSON or lists ? Is it possible to make something for tables ? Thanks anyway, that was still great !

@samwitteveenai Жыл бұрын

almost all customization is done at the prompt level. If you are doing something for a table you would want to think through first what would the LLM return as a string. a CSV? How would it represent a table etc. Then work on the prompt that gets that and lastly think about an output parser . You raise an interesting issue, maybe I should make a video walking through how I test the prompts first and get that worked out. If you have good use case please let me know. One issue I have is I can't show any projects I work on for clients etc.

@BrianRhea Жыл бұрын

Thanks Sam! Would using an Output Parser in combination with Kor make sense? Is that worth a video on its own?

@samwitteveenai Жыл бұрын

At the moment all of this is changing with the OpenAI functions (if you haven't seen them I have a few vids about this ). Currently LangChain also seems to be rethinking this. I will revisit some of these. One issue is going to be are we going to have 2 very different ecosystems ie OpenAI vs everything else. I am testing some of the new things in some commercial projects, so let see how they go and then I will make some new vids.

@askcoachmarty Жыл бұрын

Great vids, Sam! So, is this awesome pydantic output parser available for node? I'm finding shaky info in the JS docs, I'm currently using the StructuredOutputParser, but I'm creating some agents that I want to output in Markdown. Is it best in javascript to just post-process and convert to markdown? Any pointers or thoughts would be greatly appreciated!

@samwitteveenai Жыл бұрын

Pydantic is a Python thing so maybe not in the JS version, but my guess is they will have something like this soon. Technically you could just make it yourself as it is all just setting a prompt. I have a new vid coming out in an hour which shows another way to do the same thing.

@askcoachmarty Жыл бұрын

@@samwitteveenai cool. I'll look for that video!

@xiam19 Жыл бұрын

Can you do a video on ReWOO (Reasoning WithOut Observation)?

@samwitteveenai Жыл бұрын

Yeah it looks pretty cool. I will take a proper look.

@MrOldz67 Жыл бұрын

Hey @Sam thanks again for all these useful videos I was wondering would that be possible to use the same outputformer to get a Json file that later we would be able to use as a dataset to train our language model If yes would it be possible to bypass openai in this process and maybe use another Llm from a privacy perspective Thanks a lot

@samwitteveenai Жыл бұрын

Yes, absolutely you can use it to make datasets. Lots of people are doing this. It will work with other LLMs but most the open source ones won't have good outputs so they often fail etc.

@MrOldz67 Жыл бұрын

@@samwitteveenai Thanks for the answer I will try to find a way to do that. But meantime if you would like to make a video i'll be really interested :) Thanks in advance

@ZapCrafter Жыл бұрын

With regards the Pydantic Output Parser, when it gets the badly formatted output - do you get that as your prompt result or does the parser feed that error back to itself to correct it until it has a well formatted output to return to the user?

@samwitteveenai Жыл бұрын

It will give an error and you can set that to trigger an auto retry etc.

@bleo4485 Жыл бұрын

Hi Sam, thanks for the video. You should set up a patreon or something. Your videos have helped a lot. thanks and keep up the good work!

@samwitteveenai Жыл бұрын

Thanks for the kind words.

@blackpixels9841 Жыл бұрын

Thanks Sam! Is it just me or do you also feel that the API is slow to return json 'code' than it is plaintext? Getting upwards of 30 seconds per API call to parse a PDF table into 250 tokens of json

@samwitteveenai Жыл бұрын

Interesting. I haven't noticed that. It shouldn't be any slower

@ohmkaark Жыл бұрын

Thanks a lot for great explanation!!

@ivanlee7450 Жыл бұрын

is it possible to use another llm for output parser

@samwitteveenai Жыл бұрын

yeah certainly then it becomes a bit like an RCI chain which I made a video about.

@ivanlee7450 Жыл бұрын

How about hugging face model

@pranavmarla Жыл бұрын

I've been playing around with langchain for a couple of days and this is really helpful! Output parsers would be great while dealing with tools that need to interpret the response. I hope this gets integrated into SimpleSequentialChains too? Because currently SimpleSequentialChains only accept prompt templates which have a single inputs.

@shivamkumar-qp1jm Жыл бұрын

Can we extract code from the response

@samwitteveenai Жыл бұрын

yes take a look at the PAL chain it does this kind of thing

@sethhavens1574 Жыл бұрын

i’ve noticed that using turbo 3.5 recently there is quite often issues with the model being overloaded - using langchain (at least i assume that is where this comes from) the chain will usually retry the llm query - is there a way to control the number of retries and the interval between retries? and thanks for the awesome content, super useful stuff! 👍

@samwitteveenai Жыл бұрын

I think there is a PR submitted to control the number of retries but don't think it is there yet.

@sethhavens1574 Жыл бұрын

@@samwitteveenai cool thanks for the feedback dude

@tubingphd Жыл бұрын

Thank you Sam

@MichaelScharf Жыл бұрын

Is this not eating up a lot of tokens, especially the pedantic case?

@samwitteveenai Жыл бұрын

Yes it does eat up some more tokens, but the pydantic model really allows you to use the outputs in an API etc much easier. Regarding price it all depends on how much you value interaction. I see some customers are happy to pay a dollar ++ for each conversation which is a lot of tokens. Usually that is a lot cheaper than a real human being involved etc.

@galkim1 Жыл бұрын

This is great, thanks

@picklenickil Жыл бұрын

This is what you call.. more than a party-trick.

@orhandag540 9 ай бұрын

but wwhat if we want to to that with an open source LLM(Hugging Face) ?

@samwitteveenai 8 ай бұрын

You can certainly do the same with something like a Mistral fine tune etc

@orhandag540 8 ай бұрын

@@samwitteveenai but somehow the prompt template of mistral is not compatible with langchain models, I was trying to build this with exactly with mistral

@NoidoDev Жыл бұрын

I don't get it, maybe I missed something or don't know some important element. Why is the language model supposed to do the parsing as some form of formatting? Why isn't this just done in code with the response from the model?

@samwitteveenai Жыл бұрын

Getting the model to output it like that makes it much easier than try to write regex expressions for every possible case the model might output.

@bryan-9742 Ай бұрын

I think you should understand that response schema and structured output parser ARE VERY different than the pydantic response Model Unless you’re creating a schema within a custom tool. Said differently, response and structured Aid the LLM in understanding a given user query where as pydantic response in your example actually occurs AFTER the llm has made its decision. 😊

@ashvathnarayananns6320 Жыл бұрын

Can you post these videos using open source llm rather than using open ai APIs. Thank you

@samwitteveenai Жыл бұрын

I have posted quite a few videos that use OpenSource models. One challenge is up till recently the OSS models weren't good enought to a lot of the tasks.

@ashvathnarayananns6320 Жыл бұрын

@@samwitteveenai Okay and Thanks a lot for your reply!

@gitmaxd Жыл бұрын

I disagree! This is one of the more sexy parts! It’s the hocus pocus of “Prompt Enginering”. Great video!

@mytechnotalent Жыл бұрын

New to LangChain Sam and I appreciate this video. Really looking for how to tune this properly with the open-face HuggingFace rather than OpenAPI paid API.

@clray123 Жыл бұрын

Honestly the more I watch about LangChain the less value I see in using it vs. just coding your own interactions with the model. It seems to be doing trivial things at a very high level of text processing and obscuring what it does. While you still have to learn the API and be limited by it.

@MadhavanSureshRobos Жыл бұрын

Practically speaking, isn't guidance so much easier and better to use? For practical reasons these doesn't seem to add more value

@samwitteveenai Жыл бұрын

I am planning to do a video on Guidance and Guardrails as welll.

@MadhavanSureshRobos Жыл бұрын

That'll be wonderful!

@jawadmansoor6064 Жыл бұрын

What parser or other method do you use in chains? For example: memory = ConversationBufferMemory(memory_key="chat_history") tools = load_tools(["google-search", "llm-math"], llm=llm) agent = initialize_agent(tools, llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, handle_parsing_errors=_handle_error, memory=memory, verbose=True) I am getting output parsing errors: Thought:Could not parse LLM output: `Do I need to use a calculator or google search for this conversation? Yes, it's about Leo DiCaprio girlfriend current age raised 0.43 power.` Action:google_search` Observation: Could not parse LLM output: `Do I need to use a ca Thought:Could not parse LLM output: `Could not parse LLM output: `` Do you want me to look up more information about Leo DiCaprio girlfriend's current age raised 0.43 power?`` Action:google_search`` Observation: Could not parse LLM output: `Could not parse LLM o Thought:Could not parse LLM output: `` Is there anything else you would like me to do for today?`` AI:Thank you! > Finished chain. 'Thank you!'