Table Question-Answering with TAPAS in Python

  Рет қаралды 11,584

James Briggs

James Briggs

Күн бұрын

Table question-answering (QA) is like asking Excel a natural language question and getting a truly intelligent, human-like response. We can ask something like "what is the total GDP across both China and Indonesia?" and Google's TAPAS (the machine learning model) will look at the table, find the two parts of the table needed to answer the question, sum both and return them.
We learn how to apply TAPAS for table question answering using Hugging Face transformers and Python.
We take this further by using a Pinecone vector database with a Microsoft MPNet Table question-answering (QA) model. With this, we can ask the question, search through a million, 10 million, or even a billion tables - retrieve the most relevant tables - and then answer the specific question again with Google's TAPAS.
🌲 Pinecone example:
github.com/pinecone-io/exampl...
🤖 70% Discount on the NLP With Transformers in Python course:
bit.ly/3DFvvY5
🎉 Subscribe for Article and Video Updates!
/ subscribe
/ membership
👾 Discord:
/ discord
00:00 Intro
01:04 Table QA process
03:38 Getting the code
04:08 Colab GPU and prerequisites
04:33 Dataset download and preprocessing
06:10 Table QA retrieval pipeline
11:29 First test, can it retrieve tables?
12:55 TAPAS model for table QA
15:04 Asking more table QA questions
17:37 Asking advanced aggregation questions to TAPAS
19:38 Final thoughts

Пікірлер: 44
@li-pingho1441
@li-pingho1441 9 ай бұрын
thank you so much. If it weren't for your videos, my grad school life would've been toast
@henryl7421
@henryl7421 Жыл бұрын
Thank you! Just gold gems since day 1
@alexandercross4142
@alexandercross4142 Жыл бұрын
This is the best content on youtube now 👍
@henryl7421
@henryl7421 Жыл бұрын
Just curious, can you ask language models to do more complex calculations? Like find regressions and variable selections?
@etarhunisuhaib2031
@etarhunisuhaib2031 Жыл бұрын
Hello James, thank you for this video, do you think it’s gonna work if we have only tables with numbers ? Like an accounting report or financial report. If yes, Should we specify the column names or something to help the model to find answers since there is no text ?
@mjacfardk
@mjacfardk Жыл бұрын
As always you are great tutor and brother, God bless you and thank you for your time and help 🙏
@jamesbriggs
@jamesbriggs Жыл бұрын
Any time, thankyou!
@alexaskills3447
@alexaskills3447 Жыл бұрын
Can you post the link to your jupyter notebook?
@jimmynguyen3386
@jimmynguyen3386 Жыл бұрын
Great video again! Say we have a ton of Excel/csv files and we want to extract all the sensitive information. Would this be a good solution?
@dipankarnandi7708
@dipankarnandi7708 8 ай бұрын
I have a quick query. I have my own retriever or table extractor since i am working on PDF files of papers which gives out CSV files. Can I directly then use tapas to generate the response needed? Or do I still need the retriever like the Mpnet model?
@123vow5
@123vow5 Жыл бұрын
Thanks James.
@temiwale88
@temiwale88 Жыл бұрын
James - you keep putting out bangers bro. Thank you!!
@jamesbriggs
@jamesbriggs Жыл бұрын
thanks man 🙏
@temiwale88
@temiwale88 Жыл бұрын
@@jamesbriggs thank you! Are you on LinkedIn? Also, any new courses or books coming out. Take my money!
@jamesbriggs
@jamesbriggs Жыл бұрын
@@temiwale88 yeah I'm here www.linkedin.com/in/jamescalam -- working on image/multi-modal ebook at the moment, it's actually completely free too ;) www.pinecone.io/learn/image-search/
@user-bl6sc4td8w
@user-bl6sc4td8w 11 ай бұрын
Hi James, great video. I have several tables with 20,000+ records each which I'd like to ask questions to. Is this design suitable if you have a small number of large tables?
@blueaquilae
@blueaquilae Жыл бұрын
Quite surprising there is no explanation on the new format 'bodegas'!
@divinusnobilite
@divinusnobilite Жыл бұрын
Hey James! I love this video. Could I take the answer at the end and leverage GPT3 API with a preface for it to read the function and question? The goal would be to make the answer come out more naturally for our business users.
@jamesbriggs
@jamesbriggs Жыл бұрын
you might be able to feed it into gpt3 directly, but I haven't tested this
@aravindarjun4814
@aravindarjun4814 7 ай бұрын
James can you perform the same in chromadb because its open source compared to pinecone
@chrismaley6676
@chrismaley6676 Жыл бұрын
Hi James, Thanks for sharing. I have been interested in the table Q&A use case and using it for bank statements and utility bills. Can you recommend a table-to-text model that I could use to extend this example? Cheers! Chris
@jamesbriggs
@jamesbriggs Жыл бұрын
Hi Chris, I'd actually recommend trying with TAPAS, if you can format the bank statements and bills as a typical table I think it should work
@chrismaley6676
@chrismaley6676 Жыл бұрын
I'm sorry, I forgot the most crucial part of my question. Once I get the answer from TAPAS, I want to generate text to answer the question. Do you have any recommendations for NLG models that could help with this task?
@djdjdjdjdjay
@djdjdjdjdjay 11 ай бұрын
So, how do we use this for our own proprietary tables in a database? I mean, how do you create vector embeddings of tables in vector database, Thank you in advance.
@SACHINPATIL-fp8uu
@SACHINPATIL-fp8uu Жыл бұрын
Great video! I was wondering what if we have a single large table and perform query on that. Do we have to further fine tune TAPAS model? Please make a video on that...
@jamesbriggs
@jamesbriggs Жыл бұрын
No need, you can just use the single TAPAS reader step :)
@manpham5635
@manpham5635 6 ай бұрын
@@jamesbriggs As I am using a large csv file about 1000 rows, it throw the error such as "IndexError: index out of range in self". It would be great if you could provide me with an appropriate solution
@HazemAzim
@HazemAzim Жыл бұрын
I have tried OpenAI text-davinci-003 to parse the NL Query and it gave much better results than TAPAS which is bound by 512 length limitation using ALL the excel table as a context.
@vineetshivhare8185
@vineetshivhare8185 Жыл бұрын
Could give some more context
@ramprajapati3195
@ramprajapati3195 3 ай бұрын
Can we use Tapas --> as retriever model as well ??
@madanleo
@madanleo 4 ай бұрын
Can this answer from relational tables (multiple related tables), instead of individual tables one at a time?
@aravindarjun4814
@aravindarjun4814 7 ай бұрын
And kind of if i upload an csv from that the csv will be read in dataframe and we can perform sentence transformer and store it in a chromadb and retrieve the data according to our query from chromadb using cosine similarity and use tapas to respond to our query . Can you please upload video for this usecase.
@venkatesanr9455
@venkatesanr9455 Жыл бұрын
Thanks for the informative video. I am working on pdfs with table where we are applying table detection using image processing followed by table-qa. 1.Have we able to use other way to hav table inside pdfs and hav the .csv format of the table. Later, table qa can be used. 2. Can you able to suggest some inputs?.
@jamesbriggs
@jamesbriggs Жыл бұрын
hey, you should be able to extract tables from PDF with libraries like PyPDF or PyTesseract, or if you're willing to pay I'd definitely recommend Abby OCR Finereader. From there you should be able to reformat the tables to CSV and then you can follow the same process we did here. For (2) I think you mean can you adjust the tables based on what table QA is saying? In that case, yes I'm sure there's a way, it would just require some additional logic on top of what is already there
@venkatesanr9455
@venkatesanr9455 Жыл бұрын
@@jamesbriggs Thanks for the replies. I hav tried pypdf or camlet libraries for table extraction from the pdf but i feel table rows are not properly detected. So, I am involved in image processinv for table row detection followed by tesseactocr.I beluve this is the only way. If you know others kindly suggest.
@williamlee6466
@williamlee6466 Жыл бұрын
Love for you to share the notebook
@jamesbriggs
@jamesbriggs Жыл бұрын
Hi you can find it here github.com/pinecone-io/examples/blob/master/search/question-answering/table-qa.ipynb :)
@sanatan_yogi_org
@sanatan_yogi_org Жыл бұрын
What is alternative to Pinecone ?
@jamesbriggs
@jamesbriggs Жыл бұрын
faiss or weaviate, but they require more engineering effort and don't offer the 5M vector free plan that Pinecone does
@sanatan_yogi_org
@sanatan_yogi_org Жыл бұрын
@@jamesbriggsThank you. Looking foe in-premise database software. which I can install my software
@jaisingh1292
@jaisingh1292 Жыл бұрын
@@sanatan_yogi_org Did you find any good alternative to pinecone which can work as on premise db software solution
@usercurious
@usercurious Жыл бұрын
Nice, but you failed to state that you are affiliated with Pinecone.
@nikhilgjog
@nikhilgjog Жыл бұрын
very cool video!
@jamesbriggs
@jamesbriggs Жыл бұрын
thanks!
OpenAI's New GPT 3.5 Embedding Model for Semantic Search
16:15
James Briggs
Рет қаралды 71 М.
Open Source Generative AI in Question-Answering (NLP) using Python
22:07
Just try to use a cool gadget 😍
00:33
123 GO! SHORTS
Рет қаралды 85 МЛН
I wish I could change THIS fast! 🤣
00:33
America's Got Talent
Рет қаралды 75 МЛН
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 20 МЛН
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 125 М.
Question Answering using Transformers Hugging Face Library || BERT QA Python Demo
9:57
Build an SQL Agent with Llama 3 | Langchain | Ollama
20:28
TheAILearner
Рет қаралды 2,2 М.
I wish every AI Engineer could watch this.
33:49
1littlecoder
Рет қаралды 57 М.
How to Build Q&A Models in Python (Transformers)
19:49
James Briggs
Рет қаралды 25 М.
Semantic Chunking for RAG
29:56
James Briggs
Рет қаралды 18 М.
NER Powered Semantic Search in Python
17:44
James Briggs
Рет қаралды 6 М.
Supercharge Your RAG with Contextualized Late Interactions
17:45
Prompt Engineering
Рет қаралды 10 М.
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,2 МЛН
iPhone 12 socket cleaning #fixit
0:30
Tamar DB (mt)
Рет қаралды 52 МЛН