Use LLMs To Extract Data From Text (Expert Mode)

Рет қаралды 60,643

Күн бұрын

Пікірлер

@abdoualgerian5396 Жыл бұрын

Finally a video that i can enjoy without that backgroud noise , thanks a lot and please continue without background music

@ac_cobra8540 Жыл бұрын

Interesting, I'm going to give this a go. I've experimented with pydantic for parsing llm output into json so this super relevant right now. Thanks Greg, great explainer as always.

@DataIndependent Жыл бұрын

Glad it helped!

@nattapongthanngam7216 8 ай бұрын

Thank you, Greg, for this informative video on using LLMs to extract data from text! I found it particularly valuable for its potential application in skill/information extraction from resumes/CVs submitted to large companies. I also noticed a minor error in the original code: """ output = chain.predict_and_parse(text="...")['data'] printOutput(output) """ updated code: """ output = chain.run(text="...")['data'] print(output) """

@pradeepthiyyagura8677 Жыл бұрын

Greg, great video as always! I achieved the same results by including the desired output in JSON format along with the initial prompt itself, without using the Kor library.

@lucasamadsen Жыл бұрын

But you to prompt in all your JSON file text, right?

@jakobkristensen2390 Жыл бұрын

Thanks, this was super useful! I would love to get some insight into the feedback you got from those 80 companies.

@DataIndependent Жыл бұрын

Most people either wanted the data for investment or sales use cases

@jakobkristensen2390 Жыл бұрын

@@DataIndependent I am developing a few small tools for a recruitment bureau, I am interested since what you mentioned seemed relevant

@oru65 Жыл бұрын

In the 3rd cell of the Kore Hello World example the call 'output = chain.predict_and_parse(text=(text))["data"]' must be replaced with 'output = chain.run(text=(text))["data"]' because 'predict_and_parse' has been depreciated.

@DataIndependent Жыл бұрын

Yikes - thanks for the catch. I would also recommend looking at function calling from openai in case you want to see a different approach

@DelaLange Жыл бұрын

You're channel is gold! Thanks a lot for all those tutorials

@caiyu538 Жыл бұрын

Great lectures. Thank you to share us for free. Thumb up

@DataIndependent Жыл бұрын

Thank you! I also explore more function calling to extract information

@steveadams617 10 ай бұрын

Great introduction. Perfect pacing I’m going to do some further research to see if I can figure out a way to use Kor with a local language model since I deal with confidential patient data in a healthcare setting.

@furkankasap806 8 ай бұрын

I wonder the same thing, some letters for the Turkish language are problematic

@JustDoIt-pl2sl 3 ай бұрын

I'm trying to do it, It's not working, the model (using KOR) is acting very stupid

@tomwalczak4992 Жыл бұрын

Thanks Greg, this is very relevant, will give Kor a try!

@AB51002 Жыл бұрын

I really liked your video "The Data Learning Journey (Part 1)", and am hoping you will post Part 3 soon.

@rolexalexander7513 Жыл бұрын

Thanks Greg, this was really helpful!

@SudhakarVyas 6 ай бұрын

Hey Greg, thanks for this video! Since, there is a limit to access open ai api key without paying, how can the above implementation be carried out with other open source LLMs ?

@fabsync 6 ай бұрын

Fantastic tutorial! It would be great to see another tutorial using "transformers" instead of openai with chroma or any local database... and how will you save the extracted information.. does Kor tokenize that information, etc?

@densonsmith2 Жыл бұрын

Where is the "sign up" you mentioned? This seems very interesting for many applications.

@DataIndependent Жыл бұрын

Whoops! I'll put it in the description, this was it www.openingattributes.com/

@densonsmith2 Жыл бұрын

@@DataIndependent I am very impressed as were all my work friends.

@constandinosk.3251 3 ай бұрын

Does anyone know how to do this with an LLM model loaded from transformers?

@mahroushkagaurav3601 Жыл бұрын

very insightful - thank you

@DataIndependent Жыл бұрын

Awesome! I need to add another level to this which is openai function calling

@ChatGPT-ef6sr Жыл бұрын

Come on why did you steal my idea 😅. I was literally thinking how to scrape a youtube channel's data usung llms. I was looking for the info. You came right on time!

@adumont Жыл бұрын

There's a video from James Briggs iirc that, iirc does Q&A against a knowledge base of youtube channels videos transcripts. Not sure if it was a dataset available or he extracted them from KZbin. Hope that helps

@ChatGPT-ef6sr Жыл бұрын

@@adumont Oh thanks. I will look it up

@asiddiqi123 Жыл бұрын

Why everyone making this?😂

@Ideariver 8 ай бұрын

This was an awesome content

@adumont Жыл бұрын

That's really interesting. Would it be easy (maybe using LangChain) to define like required attributes or elements in the schéma, and if the LLM can't extract them, it would then start a Q&A with the user to ask the missing elememts and attributes until completing the required fields? That would be awesome to launch posterior actions for example.

@pooja1124 Жыл бұрын

Can we extract important contents from research paper ? like some text from abstract and some from results or ablation table present. Can you make one video about it as how to customize that text extraction to google sheets.

@yellowboat8773 Жыл бұрын

Newbie here, I don’t understand why you would need to use the library for this task? Couldn’t you just include in your llm prompt to specify the exact output and formatting you need? Cheers!😊

@aflous Жыл бұрын

Basically this abstracts a way all the extra needed work for formatting and text extracting and let you focus on your business logic

@JustDoIt-pl2sl 3 ай бұрын

I'm having some problems running this with Ollama local models (I tried llama 3.1 and nuextract) and it's not working ... The output has lot of repetitive info

@JustDoIt-pl2sl 2 ай бұрын

After close inspection, seems like the local llms don't understand the (bit complex) prompt generated by KOR

@vanamonde_8809 Жыл бұрын

Hello, how to connect langchain not to chatgpt but to local chat-bots by their local-host names?

@SteveSolun Жыл бұрын

Hey Greg, at 7:54 - what is the "many = True" attribute in Text class? Can you please explain with a bit more details?

@AditiTambi-y8g Жыл бұрын

How can I extract the data from an API output as JSON?

@catyung1094 Жыл бұрын

Is that a few shot NER ? 🤔

@dchip95 Жыл бұрын

yeah the llm's are pretty good at it now

@pocker91 Жыл бұрын

hi Greg, thank you for the great video! How would you go about extracting "tags" or predefined values an not String texts? Especially if the number of values ar in the thousands and are too many to just feed into the prompt (token optimization etc). Any ideas? Thank you!

@DataIndependent Жыл бұрын

hmm good question, check out this tutorial and code In cell 15 I have a schema for tags that may be helpful: github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Topic%20Modeling%20With%20Language%20Models.ipynb kzbin.info/www/bejne/pnbOqYWHe7N0qZY

@rajpdus Жыл бұрын

I think we'll have mor such prompt based tooling available sooner or later. Any other specific tools you are experimenting with?

@mvasanth5200 Жыл бұрын

Can anyone help me with this error [initial_value must be str or None, not dict], while executing chain.predict and parse

@SundarBalamurugan Жыл бұрын

Same

@vamsiraghu3258 9 ай бұрын

i tried `chain.run()` and it worked. output = chain.run(text=(text))["data"] printOutput(output)

@ahmadzaimhilmi Жыл бұрын

This is precisely what I need for my project, but like you said, the cost can spiral out of control. Have you tried with gpt 3.5? If so, how unreliable was it?

@muhammadowaissiddiqui2443 Жыл бұрын

can i use it to extract events from the text using hugging face or any other open source llm model?

@DataIndependent Жыл бұрын

Yes, just swap out your model of choice when you make your LLM

@wiktorm9858 Жыл бұрын

Is there an existing tool that is cutting low-signal text?

@DataIndependent Жыл бұрын

What kind of low signal text?

@wiktorm9858 Жыл бұрын

@@DataIndependent this is term that you used for (probably) "filler words"; words that do not carry much of meaning

@eduardomoscatelli Жыл бұрын

Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?

@programwithpradhan Жыл бұрын

Can you please tell me if I want to give word embeddigns or vector db instead of text how can i do that?

@DataIndependent Жыл бұрын

What do you mean? could you explain more?

@programwithpradhan Жыл бұрын

@@DataIndependent Thank you for your reply:) I am working on a problem where I am extracting text from websites like Amazon, McDonald using web scraping and giving that raw text to my Open AI so that it can extract products or food items and their price, ratings, discount etc. Now the problem here is that I can't give all the text at a time to the open ai because of the limitation of the number of tokens. So is there any other way so that I can give text in chunks. Now the second thing is to improve the model performance, instead of giving raw text to the open ai i want to give embedding vectors of that text by the help of open ai embeddings. I am using retrievalQA and character text splitter in Lang chain to solve the above problem in my previous approach but how can I do that in this approach that you did in this video. Please give me a solution. Thank you for your time ☺️

@programwithpradhan Жыл бұрын

I saw your videos on token limit and embeddings but I want to combine these two ideas and ask a query by the help of kor library so that I can get the output in a structure format.

@TonyHoangPodcast Жыл бұрын

Is there a way to read an entire PDF with Langchain and Kor?

@DataIndependent Жыл бұрын

Oh ya, big time, use a PDF loader and you’re good to go. In my “question a book” video I read a pdf this way

@TonyHoangPodcast Жыл бұрын

@@DataIndependent thanks watching that video right now.

@TonyHoangPodcast Жыл бұрын

@@DataIndependent after watching that video, do I need to use a vector database or can I just use the PDF loader and pipe that directly into Kor?

@Teathebest0 Жыл бұрын

Hi may I know if it is working with LinkedIn?

@DataIndependent Жыл бұрын

Totally - you just need to access their data somehow

@mysticaltech Жыл бұрын

Hey Greg, you sure this doesn't work well with GPT-3.5?

@manujkumarjoshi9342 Жыл бұрын

Wow!! it's magic

@davidmichaelcomfort Жыл бұрын

This looks like a really interesting approach. @DataIndependent any ideas of what the best approach for using tabular data (whether from a pandas dataframe, pyspark dataframe or SQL data table) in conjunction with LLMs? What about combining tabular data with text documents?

@TonyHoangPodcast Жыл бұрын

use the pandas agent

@Ryan-yj4sd Жыл бұрын

awesome

@thorthumb0031 Жыл бұрын

pip install kor? his document doesn't specify...

@DataIndependent Жыл бұрын

Yes! I don't run through the dependencies because it's different for everyone. Especially with sub packages.

@dprggrmr Жыл бұрын

damn, thats cool

@Grahfx Жыл бұрын

This is a wrong approach imho. You have to use output as a text and not as an object. If you do that, you lose the ability to stream the output which is a main feature of these LLM. If you want to structure your text, you'll have to go with MD (mark down). Not to mention also that the translation in object is never deterministic due to the nature of LLM and you could get something unusable for your front end.

@ko-Daegu Жыл бұрын

Wait at what point you are exactly talking u got me a bit confused here