Training BERT #3 - Next Sentence Prediction (NSP)

Рет қаралды 11,760

Күн бұрын

Next sentence prediction (NSP) is one-half of the training process behind the BERT model (the other being masked-language modeling - MLM).
Where MLM teaches BERT to understand relationships between words - NSP teaches BERT to understand relationships between sentences.
In the original BERT paper, it was found that without NSP, BERT performed worse on every single metric - so it's important.
Now, when we use a pre-trained BERT model, training with NSP and MLM has already been done, so why do we need to know about it?
Well, we can actually further pre-train these pre-trained BERT models so that they better understand the language used in our specific use-cases. To do that, we can use both MLM and NSP.
So, in this video, we'll go into depth on what NSP is, how it works, and how we can implement it in code.
Training with NSP:
• Training BERT #4 - Tra...
🤖 70% Discount on the NLP With Transformers in Python course:
bit.ly/3DFvvY5
📙 Medium article:
towardsdatasci...
🎉 Sign-up For New Articles Every Week on Medium!
/ membership
📖 If membership is too expensive - here's a free link:
towardsdatasci...
🕹️ Free AI-Powered Code Refactoring with Sourcery:
sourcery.ai/?u...

Пікірлер: 16

@sabrinabani7973 Жыл бұрын

ImportError:cannot import name 'bert Tokenizer 'from transformers (c:\user\mic\appdata oaming \python 39\site-packages\transformers\__init___.py) How do we solved this problem please ?

@jamesbriggs Жыл бұрын

try changing it to BertTokenizer

@varshikjain1862 4 ай бұрын

can you tell me how we can implement text prediction using bert from our csv file... like the one gmail uses in smart compose?

@유성현-i5l 2 жыл бұрын

in case of NSP, why we always use CLS token? I think another token was processed self-attention too, so, another tokens have relevante others. isn`t it?

@jamesbriggs 2 жыл бұрын

The CLS (classification) token is optimised to be processed by a linear layer that then outputs whether the two sentences are related or not - whereas the other tokens are optimised to encode meaning that represents their respective tokens

@vijayendrasdm 3 жыл бұрын

Thank you for the video. Are we using fine tuning / pretraining interchangeably in this video series. Are they same ? Are they different ?

@jamesbriggs 3 жыл бұрын

I use the terms interchangably here but technically everything we do is pretraining, pretraining refers to training the core of a transformer model (whether it is from scratch or already trained), and fine-tuning is the training of a mdoel for a specific task, like Q&A, classification, etc

@eliafiore2094 Жыл бұрын

Hi James, Thanks a lot for this precious videos. One question: I am following Your examples to train a model to work with Biblical Hebrew. Since Biblical Hebrew is written right to left, is there any particular settings I need to add? Another question. Can I first train the model with MLM and the with NSP. Or the training of both approaches must be done at the same time?

@jamesbriggs Жыл бұрын

Hey Elia, I think the only difference would occur during the tokenizer training. I don't remember if there is anything specific that needs to be set but I did a video on building a tokenizer for Dhivehi (another right-to-left) language, so that might help: kzbin.info/www/bejne/o5uuooNpoLermLM On MLM and then NSP - ideally you should do them together, but if it isn't possible you can do them separately

@eliafiore2094 Жыл бұрын

@@jamesbriggs Thanks You very much James, it's a good job You are there. Can I also ask You whether You know anything about XAI? I am doing all this in connection with a PhD I am doing at Vrije Universiteit Amsterdam on lexical semantics and Biblical Hebrew. I need to develop a BERT model and apply to it XAI to try to generate row data by applying counterfactual explanations to then try to identify patterns.

@kiddyboy1540 3 жыл бұрын

Just out of curiosity. BERT is implementable using TensorFlow also right? Not just PyTorch..?

@jamesbriggs 3 жыл бұрын

Yes it is, switch out 'BertForNextSentencePrediction' for 'TFBertForNextSentencePrediction', then you would train via model.fit etc - also wouldn't use the DataLoader :)

@cechgyt8336 3 жыл бұрын

Thank you for the video :D. I'm wondering, can we get the probability that sentence B follows sentence A ?

@jamesbriggs 3 жыл бұрын

yes it is from the logits output, you can take a softmax to convert to probability :)

@karinapanama7361 3 жыл бұрын

Hello, thank you very much for the video, it was very well explained. I have a question, NSP consists of giving BERT two sentences, could the next sentence be predicted given only the first sentence?

@jamesbriggs 3 жыл бұрын

You would want to use a language generation model for this - which Bert is not so good at, GPT transformers tend to be used in generation, I covered it at a high-level here kzbin.info/www/bejne/j6e5gpqsdt9smrs