Use LLMs To Extract Data From Text (Expert Mode)

  Рет қаралды 59,292

Greg Kamradt

Greg Kamradt

Күн бұрын

Пікірлер: 81
@abdoualgerian5396
@abdoualgerian5396 Жыл бұрын
Finally a video that i can enjoy without that backgroud noise , thanks a lot and please continue without background music
@nattapongthanngam7216
@nattapongthanngam7216 7 ай бұрын
Thank you, Greg, for this informative video on using LLMs to extract data from text! I found it particularly valuable for its potential application in skill/information extraction from resumes/CVs submitted to large companies. I also noticed a minor error in the original code: """ output = chain.predict_and_parse(text="...")['data'] printOutput(output) """ updated code: """ output = chain.run(text="...")['data'] print(output) """
@ac_cobra8540
@ac_cobra8540 Жыл бұрын
Interesting, I'm going to give this a go. I've experimented with pydantic for parsing llm output into json so this super relevant right now. Thanks Greg, great explainer as always.
@DataIndependent
@DataIndependent Жыл бұрын
Glad it helped!
@pradeepthiyyagura8677
@pradeepthiyyagura8677 Жыл бұрын
Greg, great video as always! I achieved the same results by including the desired output in JSON format along with the initial prompt itself, without using the Kor library.
@lucasamadsen
@lucasamadsen Жыл бұрын
But you to prompt in all your JSON file text, right?
@steveadams617
@steveadams617 9 ай бұрын
Great introduction. Perfect pacing I’m going to do some further research to see if I can figure out a way to use Kor with a local language model since I deal with confidential patient data in a healthcare setting.
@furkankasap806
@furkankasap806 7 ай бұрын
I wonder the same thing, some letters for the Turkish language are problematic
@JustDoIt-pl2sl
@JustDoIt-pl2sl Ай бұрын
I'm trying to do it, It's not working, the model (using KOR) is acting very stupid
@jakobkristensen2390
@jakobkristensen2390 Жыл бұрын
Thanks, this was super useful! I would love to get some insight into the feedback you got from those 80 companies.
@DataIndependent
@DataIndependent Жыл бұрын
Most people either wanted the data for investment or sales use cases
@jakobkristensen2390
@jakobkristensen2390 Жыл бұрын
@@DataIndependent I am developing a few small tools for a recruitment bureau, I am interested since what you mentioned seemed relevant
@AB51002
@AB51002 Жыл бұрын
I really liked your video "The Data Learning Journey (Part 1)", and am hoping you will post Part 3 soon.
@DelaLange
@DelaLange Жыл бұрын
You're channel is gold! Thanks a lot for all those tutorials
@tomwalczak4992
@tomwalczak4992 Жыл бұрын
Thanks Greg, this is very relevant, will give Kor a try!
@fabsync
@fabsync 5 ай бұрын
Fantastic tutorial! It would be great to see another tutorial using "transformers" instead of openai with chroma or any local database... and how will you save the extracted information.. does Kor tokenize that information, etc?
@adumont
@adumont Жыл бұрын
That's really interesting. Would it be easy (maybe using LangChain) to define like required attributes or elements in the schéma, and if the LLM can't extract them, it would then start a Q&A with the user to ask the missing elememts and attributes until completing the required fields? That would be awesome to launch posterior actions for example.
@caiyu538
@caiyu538 Жыл бұрын
Great lectures. Thank you to share us for free. Thumb up
@DataIndependent
@DataIndependent Жыл бұрын
Thank you! I also explore more function calling to extract information
@rolexalexander7513
@rolexalexander7513 Жыл бұрын
Thanks Greg, this was really helpful!
@oru65
@oru65 Жыл бұрын
In the 3rd cell of the Kore Hello World example the call 'output = chain.predict_and_parse(text=(text))["data"]' must be replaced with 'output = chain.run(text=(text))["data"]' because 'predict_and_parse' has been depreciated.
@DataIndependent
@DataIndependent Жыл бұрын
Yikes - thanks for the catch. I would also recommend looking at function calling from openai in case you want to see a different approach
@SudhakarVyas
@SudhakarVyas 5 ай бұрын
Hey Greg, thanks for this video! Since, there is a limit to access open ai api key without paying, how can the above implementation be carried out with other open source LLMs ?
@ChatGPT-ef6sr
@ChatGPT-ef6sr Жыл бұрын
Come on why did you steal my idea 😅. I was literally thinking how to scrape a youtube channel's data usung llms. I was looking for the info. You came right on time!
@adumont
@adumont Жыл бұрын
There's a video from James Briggs iirc that, iirc does Q&A against a knowledge base of youtube channels videos transcripts. Not sure if it was a dataset available or he extracted them from KZbin. Hope that helps
@ChatGPT-ef6sr
@ChatGPT-ef6sr Жыл бұрын
@@adumont Oh thanks. I will look it up
@asiddiqi123
@asiddiqi123 Жыл бұрын
Why everyone making this?😂
@yellowboat8773
@yellowboat8773 Жыл бұрын
Newbie here, I don’t understand why you would need to use the library for this task? Couldn’t you just include in your llm prompt to specify the exact output and formatting you need? Cheers!😊
@aflous
@aflous Жыл бұрын
Basically this abstracts a way all the extra needed work for formatting and text extracting and let you focus on your business logic
@mahroushkagaurav3601
@mahroushkagaurav3601 Жыл бұрын
very insightful - thank you
@DataIndependent
@DataIndependent Жыл бұрын
Awesome! I need to add another level to this which is openai function calling
@eduardomoscatelli
@eduardomoscatelli Жыл бұрын
Incredible. Question of 1 million dollars 😊: How to "teach" chatgpt just 1 time what the schema is and be able to validate infinite texts without having to spend a token inputting the schema at the prompt and without having to train the model via fine-tune?
@Ideariver
@Ideariver 6 ай бұрын
This was an awesome content
@ahmadzaimhilmi
@ahmadzaimhilmi Жыл бұрын
This is precisely what I need for my project, but like you said, the cost can spiral out of control. Have you tried with gpt 3.5? If so, how unreliable was it?
@rajpdus
@rajpdus Жыл бұрын
I think we'll have mor such prompt based tooling available sooner or later. Any other specific tools you are experimenting with?
@SteveSolun
@SteveSolun Жыл бұрын
Hey Greg, at 7:54 - what is the "many = True" attribute in Text class? Can you please explain with a bit more details?
@densonsmith2
@densonsmith2 Жыл бұрын
Where is the "sign up" you mentioned? This seems very interesting for many applications.
@DataIndependent
@DataIndependent Жыл бұрын
Whoops! I'll put it in the description, this was it www.openingattributes.com/
@densonsmith2
@densonsmith2 Жыл бұрын
@@DataIndependent I am very impressed as were all my work friends.
@JustDoIt-pl2sl
@JustDoIt-pl2sl Ай бұрын
I'm having some problems running this with Ollama local models (I tried llama 3.1 and nuextract) and it's not working ... The output has lot of repetitive info
@JustDoIt-pl2sl
@JustDoIt-pl2sl Ай бұрын
After close inspection, seems like the local llms don't understand the (bit complex) prompt generated by KOR
@pooja1124
@pooja1124 Жыл бұрын
Can we extract important contents from research paper ? like some text from abstract and some from results or ablation table present. Can you make one video about it as how to customize that text extraction to google sheets.
@vanamonde_8809
@vanamonde_8809 Жыл бұрын
Hello, how to connect langchain not to chatgpt but to local chat-bots by their local-host names?
@pocker91
@pocker91 Жыл бұрын
hi Greg, thank you for the great video! How would you go about extracting "tags" or predefined values an not String texts? Especially if the number of values ar in the thousands and are too many to just feed into the prompt (token optimization etc). Any ideas? Thank you!
@DataIndependent
@DataIndependent Жыл бұрын
hmm good question, check out this tutorial and code In cell 15 I have a schema for tags that may be helpful: github.com/gkamradt/langchain-tutorials/blob/main/data_generation/Topic%20Modeling%20With%20Language%20Models.ipynb kzbin.info/www/bejne/pnbOqYWHe7N0qZY
@davidmichaelcomfort
@davidmichaelcomfort Жыл бұрын
This looks like a really interesting approach. @DataIndependent any ideas of what the best approach for using tabular data (whether from a pandas dataframe, pyspark dataframe or SQL data table) in conjunction with LLMs? What about combining tabular data with text documents?
@TonyHoangPodcast
@TonyHoangPodcast Жыл бұрын
use the pandas agent
@constandinosk.3251
@constandinosk.3251 2 ай бұрын
Does anyone know how to do this with an LLM model loaded from transformers?
@manujkumarjoshi9342
@manujkumarjoshi9342 Жыл бұрын
Wow!! it's magic
@AditiTambi-y8g
@AditiTambi-y8g Жыл бұрын
How can I extract the data from an API output as JSON?
@catyung1094
@catyung1094 Жыл бұрын
Is that a few shot NER ? 🤔
@dchip95
@dchip95 Жыл бұрын
yeah the llm's are pretty good at it now
@muhammadowaissiddiqui2443
@muhammadowaissiddiqui2443 Жыл бұрын
can i use it to extract events from the text using hugging face or any other open source llm model?
@DataIndependent
@DataIndependent Жыл бұрын
Yes, just swap out your model of choice when you make your LLM
@Ryan-yj4sd
@Ryan-yj4sd Жыл бұрын
awesome
@mysticaltech
@mysticaltech Жыл бұрын
Hey Greg, you sure this doesn't work well with GPT-3.5?
@programwithpradhan
@programwithpradhan Жыл бұрын
Can you please tell me if I want to give word embeddigns or vector db instead of text how can i do that?
@DataIndependent
@DataIndependent Жыл бұрын
What do you mean? could you explain more?
@programwithpradhan
@programwithpradhan Жыл бұрын
@@DataIndependent Thank you for your reply:) I am working on a problem where I am extracting text from websites like Amazon, McDonald using web scraping and giving that raw text to my Open AI so that it can extract products or food items and their price, ratings, discount etc. Now the problem here is that I can't give all the text at a time to the open ai because of the limitation of the number of tokens. So is there any other way so that I can give text in chunks. Now the second thing is to improve the model performance, instead of giving raw text to the open ai i want to give embedding vectors of that text by the help of open ai embeddings. I am using retrievalQA and character text splitter in Lang chain to solve the above problem in my previous approach but how can I do that in this approach that you did in this video. Please give me a solution. Thank you for your time ☺️
@programwithpradhan
@programwithpradhan Жыл бұрын
I saw your videos on token limit and embeddings but I want to combine these two ideas and ask a query by the help of kor library so that I can get the output in a structure format.
@mvasanth5200
@mvasanth5200 Жыл бұрын
Can anyone help me with this error [initial_value must be str or None, not dict], while executing chain.predict and parse
@SundarBalamurugan
@SundarBalamurugan Жыл бұрын
Same
@vamsiraghu3258
@vamsiraghu3258 7 ай бұрын
i tried `chain.run()` and it worked. output = chain.run(text=(text))["data"] printOutput(output)
@TonyHoangPodcast
@TonyHoangPodcast Жыл бұрын
Is there a way to read an entire PDF with Langchain and Kor?
@DataIndependent
@DataIndependent Жыл бұрын
Oh ya, big time, use a PDF loader and you’re good to go. In my “question a book” video I read a pdf this way
@TonyHoangPodcast
@TonyHoangPodcast Жыл бұрын
@@DataIndependent thanks watching that video right now.
@TonyHoangPodcast
@TonyHoangPodcast Жыл бұрын
@@DataIndependent after watching that video, do I need to use a vector database or can I just use the PDF loader and pipe that directly into Kor?
@dprggrmr
@dprggrmr Жыл бұрын
damn, thats cool
@wiktorm9858
@wiktorm9858 Жыл бұрын
Is there an existing tool that is cutting low-signal text?
@DataIndependent
@DataIndependent Жыл бұрын
What kind of low signal text?
@wiktorm9858
@wiktorm9858 Жыл бұрын
@@DataIndependent this is term that you used for (probably) "filler words"; words that do not carry much of meaning
@Teathebest0
@Teathebest0 Жыл бұрын
Hi may I know if it is working with LinkedIn?
@DataIndependent
@DataIndependent Жыл бұрын
Totally - you just need to access their data somehow
@thorthumb0031
@thorthumb0031 Жыл бұрын
pip install kor? his document doesn't specify...
@DataIndependent
@DataIndependent Жыл бұрын
Yes! I don't run through the dependencies because it's different for everyone. Especially with sub packages.
@Grahfx
@Grahfx Жыл бұрын
This is a wrong approach imho. You have to use output as a text and not as an object. If you do that, you lose the ability to stream the output which is a main feature of these LLM. If you want to structure your text, you'll have to go with MD (mark down). Not to mention also that the translation in object is never deterministic due to the nature of LLM and you could get something unusable for your front end.
@ko-Daegu
@ko-Daegu Жыл бұрын
Wait at what point you are exactly talking u got me a bit confused here
@rolenle8794
@rolenle8794 Жыл бұрын
you painted!
@greendsnow
@greendsnow Жыл бұрын
It's just too expensive to offer a viable product with OpenAI. Ada-002 is $0.0004 per 1K tokens...
Extract Insights From Interview Transcripts Using LLMs
12:03
Greg Kamradt
Рет қаралды 20 М.
The LangChain Cookbook - Beginner Guide To 7 Essential Concepts
38:11
КОГДА К БАТЕ ПРИШЕЛ ДРУГ😂#shorts
00:59
BATEK_OFFICIAL
Рет қаралды 8 МЛН
The 5 Levels Of Text Splitting For Retrieval
1:09:00
Greg Kamradt
Рет қаралды 79 М.
5 Levels Of LLM Summarizing: Novice to Expert
19:19
Greg Kamradt
Рет қаралды 54 М.
2 Years of LLM Advice in 35 Minutes (Sully Omar Interview)
49:04
Greg Kamradt
Рет қаралды 24 М.
5 Useful F-String Tricks In Python
10:02
Indently
Рет қаралды 327 М.
Information Extraction with LangChain & Kor
17:05
Sam Witteveen
Рет қаралды 21 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 296 М.
World’s Fastest Talking AI: Deepgram + Groq
11:45
Greg Kamradt
Рет қаралды 54 М.