LangChain & GPT 4 For Data Analysis: The Pandas Dataframe Agent

  Рет қаралды 55,946

Rabbitmetrics

Rabbitmetrics

Күн бұрын

In this video, we are going to explore the Pandas data frame agent to try to understand what the future of data analysis holds.
We will use the LangChain wrapper around GPT4 to analyze and extract insights from data in a pandas dataframe with thousands of rows.
▬▬▬▬▬▬ V I D E O C H A P T E R S & T I M E S T A M P S ▬▬▬▬▬▬
0:00 Introduction and overview
0:52 Loading Python libraries and data needed
2:02 First task to the agent: total revenue
2:22 Second task to the agent: calculate AOV
2:40 Third task: Calculate repeat order rate
4:10 Fourth task: RFM segmentation
4:42 Perspectives for data analysis

Пікірлер: 100
@TheVersionController
@TheVersionController Жыл бұрын
I just started a master's in data analytics (I'm actually a teacher tho). I'm so glad I found this channel. So effing interesting. Seems like a hell of a time to get into this space.
@vineetbabhouria6504
@vineetbabhouria6504 Жыл бұрын
This is what I was searching for. Keep it up. Very informative no bullshit
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Great to hear! Thanks for watching
@bibhutibaibhavbora8770
@bibhutibaibhavbora8770 Жыл бұрын
Found a gem channel, will learn so many new things now.
@avidlearner8117
@avidlearner8117 Жыл бұрын
Fantastic stuff!!! Can be applied to so many things…… thanks for enlightening us with such fantastic content, it’s a lightning speed growing technology and there’s not a lot of information on the subject….. what I’d like to see is proper fine tuning via conversation history that gets saved and referenced in a separate vector database from the document analysis…. Reminds me of the early web! Everything was to be done….
@AdrienSales
@AdrienSales Жыл бұрын
Very well explained. Very compact tutorial. Keep going !
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Appreciate the support! Thanks for watching
@johnpoc6594
@johnpoc6594 Жыл бұрын
That is crazy good, thanks for the video. New sub here!
@helllton
@helllton Жыл бұрын
Great video.
@thedonflo
@thedonflo Жыл бұрын
I came across your channel and it is exactly what i have been searching for. Keep up the great work. Small request. Can we get a similar video but for pdf?
@cristian15154
@cristian15154 Жыл бұрын
That was great!
@Mrlemar1
@Mrlemar1 Жыл бұрын
Very interesting. Does giving it a specific file to analyze solve the hallucination problem?
@ramp2011
@ramp2011 Жыл бұрын
Great video. Will this also work with GPT 3.5 API? Or it needs 4? Thanks
@TheMagmarunning
@TheMagmarunning Жыл бұрын
Insane!
@joseluisbeltramone599
@joseluisbeltramone599 9 ай бұрын
Thank you for the excellent video. Doing analytics on a dataframe os my own, with 3 thousand columns, I came accross the tokens limit for the model I used (chatgpt 3.5). Is there anyway to overcome it?
@usoppgostoso
@usoppgostoso Жыл бұрын
I believe the output parser error is related to the format of the output that it's attempting to parse. Unless you have set up the proper tools to handle some specific formats (like graphs), it might fail.
@Mactuarchitect
@Mactuarchitect Жыл бұрын
If the DataFrame is too long for the chatgpt UI prompt, does that mean by using Langchain you can bypass this limit?
@Maisonier
@Maisonier Жыл бұрын
Amazing! now OpenAI just included in their "code interpreter", there is any way to use Panda Dataframe with a local model, like stablevicuna, redpajama or mpt-7b? thank you. Liked and subscribed.
@DeepakSingh-ji3zo
@DeepakSingh-ji3zo Жыл бұрын
Does langchain send this entire csv file to openai?
@davidmichaelcomfort
@davidmichaelcomfort Жыл бұрын
Looks interesting. One question I have is whether there will be substantial costs for using the OpenAI's models for large data sets?
@rabbitmetrics
@rabbitmetrics Жыл бұрын
I’d err on the side of caution when using a service with this pricing model. This wasn’t a problem but using OpenAI embeddings can get pricy if you’re processing large amounts of textual data
@tonymusk
@tonymusk Жыл бұрын
Great video, really informative! I have a question regarding the dataframe - does Open AI have access to the data? I'm curious if a company has data and wants to use this kind of process, does Open AI have access to the data? Or does this process adhere to GDPR regulations?
@memesofproduction27
@memesofproduction27 Жыл бұрын
A random anecdote: in order to move yourself up on the waitlist for access to bing chat (gpt4), you should set Microsoft as your default for everything, starting with your browser, then with Microsoft Wallpapers. Then the app on your phone etc... what would a pesky GDPR reg do once the ai has has root acces to all machines because its gatekept otherwise?
@armaanchawdhary9427
@armaanchawdhary9427 11 ай бұрын
Same question. But what if all the data is stored on Azure cloud. In a way, Microsoft has access to all our data.
@ronakdinesh
@ronakdinesh Жыл бұрын
curious does it also give graphs if you ask it?
@bwilliams060
@bwilliams060 Жыл бұрын
These are really excellent videos thank you. It's just a shame you are not sharing the workbooks. It really helps to learn when you can process and adjust the code as you go!
@rajupresingu2805
@rajupresingu2805 Жыл бұрын
Great Video, can you make one that uses an open source LLMs instead of GPT4 for handling larger pandas datasets having hundreds of thousands of records as in actual production scenarios for orders. Thanks!
@4p4k
@4p4k Жыл бұрын
So does langchain use GPT to type a sql query, queries the database, then outputs the result? Thats pretty impressive.
@bharadwazsripada5843
@bharadwazsripada5843 10 ай бұрын
HI, In this approach is the data being shared with OpenAI? My understanding is we are using pretrained model and creating an agent for the environment.
@rafaeldelrey9239
@rafaeldelrey9239 Жыл бұрын
It is an interesting concept and I hope it improves with time. Currently, It just dont work for so many examples. A lot of parsing errors, log chains of retries, plain wrong answers.
@HoGSwain
@HoGSwain Жыл бұрын
Nice work. However, for newbies like me, PLEASE EXPLAIN HOW YOU GOT TO THAT .ENV FILE SECTION WHERE YOU INPUTTED YOUR API KEY.
@sodasundae9009
@sodasundae9009 Жыл бұрын
Does these order data gets sent to chatgpt? Is there anyway to keep it local? Vicuna?
@johnwallis1626
@johnwallis1626 Жыл бұрын
anyone tried getting the agent to create graphs in say matplotlib? im getting 'OutputParserException: Could not parse LLM output' error. i can do it using exec on python code generated using normal chat completion but not this way. good vid tho.
@dimitriosmolfetas4711
@dimitriosmolfetas4711 Жыл бұрын
Hey you video is very informative and a great tutorial, i have a question if i use Visual Studio will the code work inside VS as it does in Jupyter. Or i should write the commands in CLI since i have python on PATH? I'm very new to coding and python i hope my question makes sense. Anyway thank you for the great videos!!
@devinwalker9202
@devinwalker9202 Жыл бұрын
VS Code supports Jupiter, so you can run the notebook directly in VSCode. I do it all the time.
@dimitriosmolfetas4711
@dimitriosmolfetas4711 Жыл бұрын
@@devinwalker9202 thanks so much dude I literally found out about that an hour ago and then I saw your comment. I wish you the best thank you.
@pulkitkp
@pulkitkp Жыл бұрын
can we give multiple datframes as input?
@ShaharDS
@ShaharDS Жыл бұрын
Hey man than's for your video! I'm getting an error saying AuthenticationError: Output is truncated do you know how to fix it?
@method341
@method341 Жыл бұрын
Will your dataset be uploaded to OpenAI if you do this? If so, how do I keep my dataset private?
@kentml6856
@kentml6856 Жыл бұрын
Great stuff, have you been successful with using sklearn with this methodology?
@Ramipineappl3
@Ramipineappl3 Жыл бұрын
how to do it with nested json instead of CSV?
@surajkhan5834
@surajkhan5834 Жыл бұрын
How can we save the df to pinecone and query them
@theguildedcage
@theguildedcage Жыл бұрын
Does pinecone or any other service store and have access to your data? This would be important to know for the use of enterprise applications.
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Yes, they have access to the embedding vectors and the metadata about each embedding
@ambrosionguema9200
@ambrosionguema9200 Жыл бұрын
We have problems with the limit of tokens?
@StephenRayner
@StephenRayner Жыл бұрын
You found anything similar but using SQL yet?
@kingmouli
@kingmouli 3 ай бұрын
there was one catch while using gpt-4, if we pass multiple dataframes it just considering header in the prompt and thinking those are the rows, in all dataframes , could you please do a video on how to pass multi dataframes to gpt-4 pandas data agent?
@rabbitmetrics
@rabbitmetrics Ай бұрын
I'm exploring different ways to work with Pandas efficiently at the moment, will make a video about this at some point
@youwang9156
@youwang9156 Жыл бұрын
just wonder why you can use gpt-4 for model name ?
@Fordtruck4sale
@Fordtruck4sale Жыл бұрын
Thanks so much! Do you have a github or colab link for the file?
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Your welcome! Don’t have a repo yet but will post a link
@vilmorevilladolid527
@vilmorevilladolid527 Жыл бұрын
hello! would love that!
@pauldriessens715
@pauldriessens715 7 ай бұрын
What are the advantages of using this method over using OpenAI's advanced analytics plugin?
@rabbitmetrics
@rabbitmetrics 6 ай бұрын
Currently not much. Today I would look into using AutoGen for automating data analysis with OpenAI
@screweddevelopment12
@screweddevelopment12 Жыл бұрын
I personally feel like ChatGPT is not the best AI tool for data analysis work. Writing documentation for code and then having copilot write the actual code goes like a million mph, and you don’t pay per token.
@rabbitmetrics
@rabbitmetrics Жыл бұрын
I agree copilot is superior right now, but things are moving fast
@Jesse-rm4xo
@Jesse-rm4xo Жыл бұрын
isn't copilot powered by OpenAI codex?
@urvog
@urvog Жыл бұрын
We need to consider the application of these tools by analysts who may not possess programming skills. This is where their usefulness truly shines
@samueltallman7317
@samueltallman7317 Жыл бұрын
Chatgpt ≠ GPT4 If you studied, you’d understand this
@samueltallman7317
@samueltallman7317 Жыл бұрын
@@rabbitmetricsI’m a little disappointed you couldn’t point out here how chatgpt is a demo implementation of GPT-4 and not the same as openAI apis for it where you set your own temps
@user-yg6fr6jy3d
@user-yg6fr6jy3d 10 ай бұрын
I have a table with hundrands of rows and 20 columns. I even created a smaller table with only the first 5 rows for testing and I still get this annoying error: InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 12432 tokens. Please reduce the length of the messages. It's impossible for me to work with any csv file like this. What can I do?
@geekyprogrammer4831
@geekyprogrammer4831 Жыл бұрын
can you please post the dataset?
@yookoT
@yookoT Жыл бұрын
Could you please tell me how much did the GPT4 API cost for this task? I have only used 3.5 before and heard that GPT4 is much more expensive.
@Fordtruck4sale
@Fordtruck4sale Жыл бұрын
It's like 30X more expensive than the 3.5 turbo model... curious how many tokens these requests soak up!
@eddyvu8109
@eddyvu8109 Жыл бұрын
$20/month for chatgpt plus
@yatinahuja802
@yatinahuja802 11 ай бұрын
Can we use this to get answers from a set of questions if we have customer reviews instead of sales data. Like we can ask any question related to a product or summary of the reviews to thousands of comments.
@rabbitmetrics
@rabbitmetrics 10 ай бұрын
Indeed, have a look at this video kzbin.info/www/bejne/i4CZamyJr9VraK8
@noktuz
@noktuz Жыл бұрын
Where do I get the file with the code?
@ramp2011
@ramp2011 Жыл бұрын
Link to the notebook? Thanks
@doords
@doords 11 ай бұрын
Can you display the results in html tags
@SusobhanDas
@SusobhanDas Жыл бұрын
is it working with any model other than OpenAI models ?
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Yes. Langchain provides wrappers around various models, see python.langchain.com/en/latest/modules/models/llms/integrations.html
@FREELEARNING
@FREELEARNING Жыл бұрын
Thanks for the video. Based on my understanding, the openAI GPT is able to do the task solely based on the file name and informative column names, because as you might know these models are constrained by the context length and so they aren't able to parse the whole file and really analyse the data. In my opinion, we aren't yet doing something magical heare. We can get most of the results only using some basic pandas functions like df.describe() or df["Column"].value_counts(). What do you think of this?
@startlingbird
@startlingbird Жыл бұрын
I think you can combine his video with this one, kzbin.info/www/bejne/bIioYWx_ncmhb68 you can get around the plugin waiting list problem.
@paaabl0.
@paaabl0. Жыл бұрын
Thing is, that these are still very basic queries that any human can quickly write a pandas code for. For complex queries it's getting lost. Moreover, both gpt3 and 4 are prone to do basic math mistakes. But of course the overall direction is pretty awesome, I'd love an agent to write reliably buch of pandas and sql boilerplate code for me a daily basis.
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Agree, but I expect the LLMs to improve to the point where it will write accurate queries consistently
@MogulSuccess
@MogulSuccess Жыл бұрын
How does an organization share proprietary data with OpenAI and have the LLM do work? We need a middleware obfusticating the data by some distributing normalization such that OpenAI can't reverse engineer the context over time as well as take the secret, top secret data, otherwise none of this is scalable
@RutvikPatel2611
@RutvikPatel2611 Жыл бұрын
It's not, your best bet would be an implementation of local version alphacah llm or something use it , Even then i don't think this is the best approch may be it can take coloumn name and datatype (+metadata) and spits out a formula to which operation is performed on local machine rather openai for both data security and answer integrity also what if the file is extremely large like a parquet file which even gpt 4 can t process in which case something like spark can do the transformation or calculation for us, it be great product tbh
@RutvikPatel2611
@RutvikPatel2611 Жыл бұрын
And yes pricing on data operations on open ai server is definitely not sustainable
@mattforsythe5037
@mattforsythe5037 Жыл бұрын
A company called Palantir does this
@MogulSuccess
@MogulSuccess Жыл бұрын
@@mattforsythe5037Palantir created a Data Security Middleware to communicate with external LLMs using NLP apis?? Whoa!
@marcomaiocchi5808
@marcomaiocchi5808 Жыл бұрын
Great video. But no one is going to work with this workflow
@sanesanyo
@sanesanyo Жыл бұрын
There is a new package called panda ai which does effectively the same thing but in fewer lines of code. Under the hood, it is probably doing the same thing.
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Nice, thanks. WiIl check it out
@rolandheinze7182
@rolandheinze7182 Жыл бұрын
@@rabbitmetrics this dude is right, in my opinion this seems to better than what langchain currently offers through pandas_dataframe_agent. the behavior of pandas dataframe agent is very inconsistent, especially when Action: print(pyton_repl_ast(...)) is called (I often get is not a valid tool). I imagine both are doing the same thing with recursive calls to refine the dataframe operations being called and passed to the python repl. I am going to investigate the pandasai documentation as it seems to be much more straightforward and tractable for a non-contributor
@robbieturtle6218
@robbieturtle6218 Жыл бұрын
langchain charged me $7 in api calls in 30 minutes of testing because I forgot to specify a stop string :(
@xubruce
@xubruce Жыл бұрын
I got ‘False’ at the very beginning
@rabbitmetrics
@rabbitmetrics Жыл бұрын
Check if the keys are loaded using os.getenv('API_KEY')
@rajivraghu9857
@rajivraghu9857 Жыл бұрын
Excemm
@prasanthkumar7328
@prasanthkumar7328 7 ай бұрын
i see a error in the last step Must provide an 'engine' or 'deployment_id' parameter to create a
@ericbroun4657
@ericbroun4657 Жыл бұрын
@vijaysurya6696
@vijaysurya6696 Жыл бұрын
ImportError: cannot import name 'create_pandas_dataframe_agent' from 'langchain.agents'
LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners
12:44
Pandas DataFrame Agent... the future of data analysis?
19:42
Dave Ebbelaar
Рет қаралды 15 М.
WHY THROW CHIPS IN THE TRASH?🤪
00:18
JULI_PROETO
Рет қаралды 8 МЛН
Ну Лилит))) прода в онк: завидные котики
00:51
My top 25 pandas tricks
27:38
Data School
Рет қаралды 264 М.
I Analyzed My Finance With Local LLMs
17:51
Thu Vu data analytics
Рет қаралды 411 М.
LLMs will Transform Data Science - Here's How
25:04
Rabbitmetrics
Рет қаралды 4,7 М.
@LangChain Pandas Agent and GPT-4 for Data Analysis
14:12
AssemblyAI
Рет қаралды 39 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 101 М.
Analyze Custom CSV Data with GPT-4 using Langchain
43:06
Venelin Valkov
Рет қаралды 16 М.
Robust Text-to-SQL With LangChain: Claude 3 vs GPT-4
19:40
Rabbitmetrics
Рет қаралды 2,6 М.
How To Unlock Your iphone With Your Voice
0:34
요루퐁 yorupong
Рет қаралды 20 МЛН
Best Beast Sounds Handsfree For Multi Phone
0:42
MUN HD
Рет қаралды 340 М.
WWDC 2024 Recap: Is Apple Intelligence Legit?
18:23
Marques Brownlee
Рет қаралды 3,9 МЛН
Очиститель экрана • 160418185                       Делюсь обзорами в профиле @lykofandrei
0:14
Apple watch hidden camera
0:34
_vector_
Рет қаралды 59 МЛН