How to Build Custom Q&A Transformer Models in Python

Рет қаралды 23,374

Күн бұрын

In this video, we will learn how to take a pre-trained transformer model and train it for question-and-answering. We will be using the HuggingFace transformers library with the PyTorch implementation of models in Python.
Transformers are one of the biggest developments in Natural Language Processing (NLP) and learning how to use them properly is basically a data science superpower - they're genuinely amazing I promise!
I hope you enjoy the video :)
🤖 70% Discount on the NLP With Transformers in Python course:
bit.ly/3DFvvY5
Medium article:
towardsdatasci...
(Free link):
towardsdatasci...
Code:
gist.github.co...
Photo in thumbnail by Lorenzo Herrera on Unsplash
unsplash.com/@...

Пікірлер: 49

@ren-san7589 2 жыл бұрын

What should I do with a bert large model? I'm new to programming and not sure what to do because the model I want to use doesn't have a maximum length

@yinnungandylau1497 Жыл бұрын

Dear James, is there any possible way in todays technology that I can OCR a book and input it to the machine for MCR and make a Q and A system to ask the question related to context of the book?

@drejaquez Жыл бұрын

very possible just would need a folder with each page of the book clearly so the ocr can extract text then you can use an nlp model to analyze the text from there so id assume as im trying to do my own nlp tasks aswell

@yinnungandylau1497 Жыл бұрын

@@drejaquez Thanks a lot, James, I just purchase your course on udemy to get a brief understand to the Foundation on this field. Do you have any recommendation on how do I achieved the target I mentioned aboved? Or any of your video you recommend me to go through?

@viktorciroski Жыл бұрын

Hey, I'm trying to pass new content and questions ie ["To day is the 10th of Feb"] ["What is the date?"] I'm getting a tensor output of the encoded text. My question is does anyone know how to decode/detokenize the torch model's output?

@malikrumi1206 11 ай бұрын

How is this different from doing a semantic search, where the model searches for embeddings that match the question, wherever they may be, and thus no need to do this answer training? (~30:00). Thanks.

@acsport5728 9 ай бұрын

Sir i did not under stand from the whole topic on this squad test set ....is whether we trained our data set by own or by importing the para graph and it train the data by its own 5:58 please 🥺❤️

@ducle7970 2 жыл бұрын

Hi James! Thank you for this video. But I wonder how should we do if: 1. the dataset contains very long contexts or answers. 2. exist a question that has more than one answer and those belonging to different contexts? Thanks a lot!

@dato007 Жыл бұрын

any solutions here?

@osamabuzdar8190 Жыл бұрын

@James Briggs hey can you tell me what was your test accuracy!!!!

@cosmogyral2370 2 жыл бұрын

I get an error when trying to train. Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForQuestionAnswering: ['vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias', 'vocab_transform.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight'] etc. How could I fix this?

@aanchalgupta2577 Жыл бұрын

Hi James , nice explanation but how we can get the prediction in correct format.

@ax5344 2 жыл бұрын

why didn't use huggingface trainer?

@Teng_XD 3 жыл бұрын

How can I run it in tensorflow? I don't know how to define loss in tf

@jantuitman Жыл бұрын

I think the plausibele answers are not supposed to be used. They are adversarial answers on questions that are actually impossible to answer based on the context

@jamesbriggs Жыл бұрын

yes that's right, I think I had misunderstood when I made this video

@goelnikhils Жыл бұрын

Amazing Video on Q&A Task

@tlpunisher42 3 жыл бұрын

outputs['start_logits'] is showing error can you explain why?

@jamesbriggs 3 жыл бұрын

Sent you the notebook link, hopefully that will help :)

@leomiao5959 2 жыл бұрын

This video is fantastic, I learned a lot!! Thank you so much!!! 😁😁

@bhavyakrishnabalasubramani8300 3 жыл бұрын

could we have pdf file instead of context??

@jamesbriggs 3 жыл бұрын

not a full PDF file, for PDFs you need to read them into plain text using something like PDFMiner or PyPDF and then split into paragraphs (or every 1000 characters etc), then each of your paragraphs becomes a context

@garynico9872 3 жыл бұрын

Can i replace the model distilBERT with XLM-R? Or we need a different configuration?

@jamesbriggs 3 жыл бұрын

give it a go, although I'd imagine you'll need to change a few things - not sure what exactly as I haven't used XLM-R before

@garynico9872 3 жыл бұрын

@@jamesbriggs I have another question, in the video you used EM score for the accuracy metrics, how can I add F1 score to that?

@TiMbuilding 3 жыл бұрын

Maybe need to calculate distance between start_pred and start_true to each of element? And if distance is upper that accuracy is lower?

@jamesbriggs 3 жыл бұрын

Sounds like a good approach, I haven't seen this done - but I don't see why it wouldn't work. After fine-tuning, if you're interested in getting more flexible accuracy metrics, ROUGE scores are pretty good - although you can't use these during training :)

@harryayce11 3 жыл бұрын

maybe out of context but, is it a custom theme for jupyter lab? I haven't seen that before

@jamesbriggs 3 жыл бұрын

it looks like the built-in dark mode for Jupyter to me (I think you access in 'Settings'), but I haven't used Jupyter for a while so maybe I'm wrong - but I do also use custom themes, mostly from here: github.com/arbennett/jupyterlab-themes

@harryayce11 3 жыл бұрын

@@jamesbriggs thanx for the detailed answer!

@lsbcip4603 Жыл бұрын

Hi James! Thank you for this video. How can we extract the predicted text with our predicted start and end indices?

@lsbcip4603 Жыл бұрын

how do we convert the tensor values into actual indices for our data?

@jamesbriggs Жыл бұрын

Hey, you can take the argmax values, the maximum probability values represent the tokens that the model is predicting as start/end

@lsbcip4603 Жыл бұрын

@@jamesbriggs can we apply the output values directly to the contexts and extract the answer? or should we apply token_to_char for the final output and decode the input_ids before we get our text form answer?

@tlpunisher42 3 жыл бұрын

PLEASE SHARE THE NOTEBOOK

@jamesbriggs 3 жыл бұрын

hey Soumik, notebook is here: gist.github.com/jamescalam/55daf50c8da9eb3a7c18de058bc139a3

@LMAOgrass 3 жыл бұрын

Hi James, thanks for your video! Was wondering how we would train a transformer model if we do not have context of the question. For example, we only have a dataset of question, and answer? Thanks!

@jamesbriggs 3 жыл бұрын

I think this would be more in the generative Q&A model space using things like GPT, I haven't worked with these yet so I can't really say for certain!

@temiwale88 2 жыл бұрын

@@jamesbriggs In the absence of having your awesome knowledge around generative Q&A models, can we take just all the answers and act like they are contexts and then perform more of a semantic search over these answers once we have a question? (long sentence - sorry!)

@sarahkamraoui4744 Жыл бұрын

did anyone fing a way to do it ?

@dogkansarac4889 3 жыл бұрын

why dont we use "token type ids"? (also known as segment ids). as far as I know, since there are two sequences, one for question and one for context, bert needs to identify them through segment ids

@jamesbriggs 3 жыл бұрын

Really good that you noticed that - for BERT yes we need them, but we're using distilBERT which actually doesn't use token_type_ids, so we didn't need them for this model - check-out the 'tips' section here huggingface.co/transformers/model_doc/distilbert.html#overview

@dogkansarac4889 3 жыл бұрын

@@jamesbriggs that has been really helpful. I did not know that distilBERT does not require token_type_ids. thank you so much

@jamesbriggs 3 жыл бұрын

@@dogkansarac4889 more than welcome! :)

@ManiacMinecraftfreak 3 жыл бұрын

@@jamesbriggs first of thanks for the great video! As you mentioned above for distilBERT we don't need token_type_ids, my question would be how do I need to change the code if I want to use roberta. I have tried it with your code but it doesnt work :/

@jamesbriggs 3 жыл бұрын

@@ManiacMinecraftfreak Which roberta model did you use? I'd imagine it should work by changing the model loading code to 'model = RobertaForQuestionAnswering.from_pretrained('roberta-base')' and the tokenizer initialization to 'tokenizer = RobertaTokenizer.from_pretrained('roberta-base')' - is this what you did?