Your videos and articles are a breeze to follow, James! They've truly made my learning journey smoother and more enjoyable. Thanks for all the hard work!
@exxzxxe2 жыл бұрын
Excellent, James!
@poojaagri3776 Жыл бұрын
Hello , this video is beneficial. But I am getting error when passing the inputs as double argument in model. It says it got unexpected argument "label". Can you please tell what I am doing wrong in it?
@BENTOL7 ай бұрын
Great explanation James! I want to ask what is the parameter of the feed forward neural network? What is the size of the weight vector/matrix in it so it can be outputed as a probability distribution consist ~30.000 class.
@PRUTHVIRAJRGEEB Жыл бұрын
Hey James! Thanks a lot for the clear explanation of how MLM works for bert. I have a question tho - so we're using only the 'encoder' part of the transformer during MLM to encode the sentence right, So how does the 'decoder' of bert get trained?
@piyushkumar-wg8cv Жыл бұрын
How do we decide mask value
@hieunguyen89523 жыл бұрын
Really intuitive and easy to understand. Thank you very much, bro!
@jamesbriggs3 жыл бұрын
welcome!
@johnsonwalker5453 жыл бұрын
Very nice video. Thanks. We are also waiting for a solution video with Tensorflow.
@sxmirzaei2 жыл бұрын
Thanks! Great video, can you make a video for MLM using T5-1? it will be very helpful I couldn't find much on that.
@Harish.B-g3n Жыл бұрын
Text Extractor can only recognize languages that have the OCR pack installed. Leam more about supported languages
@basharmohammad53533 жыл бұрын
Very friendly and intuitive introduction. Many thanks for this nice video.
@jamesbriggs3 жыл бұрын
More than welcome, thanks!
@The8455483 жыл бұрын
Thank you James for this video. You've explained everything so well.
@jamesbriggs3 жыл бұрын
Great to hear! Thanks for watching :)
@artursradionovs95432 жыл бұрын
Thank you for the video! What's the best way how to get on the track in Deep Learning? Any hints? Thank you!
@a_programmer27542 жыл бұрын
Very helpful bruh, thank you!
@johngrabner2 жыл бұрын
Good video. Have a question: Tokenizer results in shorter sequences vs raw characters and the probability distribution of each token is more even than the distribution of the characters. My question is how important is tokenization to BERT performance?
@jamesbriggs2 жыл бұрын
thanks! The model must be able to represent relationships between tokens and embed some meaning into each token. If we make 1 character == 1 token, that leaves us with (in English) 26 tokens that the model must encode the "meaning" of language into just 26 tokens, so it is limited. If we use sub-word tokens like with bert, we have 30K+ tokens to spread that "meaning of language" across - I hope that makes sense!
@johngrabner2 жыл бұрын
@@jamesbriggs I get 30K tokens means high-level semantics. BERT still must learn relationship between these tokens to perform. So what is the degradation of BERT at 30K tokens, vs 20K tokens, vs 10K tokens, ... 26 tokens? I can't find any mention of the above.
@charmz9733 жыл бұрын
Thank YOUUU Very much,missing piece in my NLP journey.
@jamesbriggs3 жыл бұрын
Haha awesome, happy it helps!
@testingemailstestingemails42453 жыл бұрын
Thanks for this wonderful explanation Sir, I want to build my own voice dataset to train medical terms model for auto speech recognition please help me I don't how I can start what is the structure of dataset?
@soumyasarkar41003 жыл бұрын
Hi...If we mask tokens after tokenization of the text sequence then would it not lead to masking subwords instead of actual words ? any thoughts on the consequences of this ?
@jamesbriggs3 жыл бұрын
yep that's as intended, because BERT learns the relationships between words and subwords, so BERT learns that the word 'live' (or 'liv', '##e') is a different tense but same meaning as the word 'living' (or 'liv', '##ing'). In a sense the way we understand words can be viewed as 'subword', because I can read 'liv' and associate the word with the action 'to live' and then read the suffix '-ing' and understand the action 'to live in the present' - hope that makes sense! In more practical terms it also reduces the vocab size, rather than having the words ['live', 'living', 'lived', 'be', 'being', 'give', 'giving'] we have ['liv', 'be', 'giv', '-ing', '-ed']
@soumyasarkar41003 жыл бұрын
@@jamesbriggs thanks for the clarification !
@boriswithrazor69923 жыл бұрын
Hello! How to use BERT to predict the word standing still [MASK]?
@jamesbriggs3 жыл бұрын
you can use the HuggingFace 'fill-mask' pipeline huggingface.co/transformers/main_classes/pipelines.html#fillmaskpipeline
@divyanshukatiyar88863 жыл бұрын
Thanks a lot for this masterpiece. I do have something unusual going on. It shows me that BertTokenizer is not callable when it should be able to. I checked and realised that __call__ configuration was introduced from transformers v3.0.0 onwards so I updated my module. Still it throws the same error. Any help here?
@jamesbriggs3 жыл бұрын
After updating you should be able to call it, I'd double check that your code is using the correct version
@AnandP28123 жыл бұрын
Hi, I followed the Hugging Face tutorial for MLM, but it does not seem to work with emojis - any idea on how to do this? For example, I have a dataset containing tweets, with each tweet containing one emoji - and I want to use MLM to predict the emoji for a tweet. Thanks.
@jamesbriggs3 жыл бұрын
Hi Anand, I haven't used BERT with emojis, but it should be similar to training a new model from scratch. Huggingface have a good tutorial here: huggingface.co/blog/how-to-train That should be able to help. In particular this tutorial use Byte-level encodings, which should work well with emojis. I'm working on a video covering training BERT from scratch, hopefully that will help too :) Hope you manage to figure it out!
@AnandP28123 жыл бұрын
@@jamesbriggs Hi James - thanks for the reply. I will take a look at that tutorial - will it work with my own dataset? Also, keep up the great content!
@jamesbriggs3 жыл бұрын
@@AnandP2812 I believe so yes, I haven't worked through it myself yet - but I see no reason as to why not - I will do thanks!
@henkhbit57483 жыл бұрын
Really interesting stuff. But how about if u want to use Bert in a different language. All the vids I saw were based on the english language. A video of creating a Bert model from scratch in a different language with some simple corpus of text would be nice. It would be also helpful if u can explain in a side note what u have to do if you want to transform your english example in another language...
@jamesbriggs3 жыл бұрын
hey Henk, yes I've had a lot of questions on this, will be releasing something on it soon
@henkhbit57483 жыл бұрын
@@jamesbriggs thanks, looking forward👍
@rmtariq3 жыл бұрын
@james really impressive for ur explanation ... is it possible base on this MLM can also apply for text classification model ...
@jamesbriggs3 жыл бұрын
yes MLM is used to train the 'core' bert models, things like text classification, Q&A, etc are part of the additional 'heads' (or extra layers) added to the end of the transformer models, so you'd train with MLM, then follow that through by training on some text classification task This video will take you through the training for classification: kzbin.info/www/bejne/ppvXn555fKqfmac
@tomcruise7943 жыл бұрын
Thanks for the informational video. Enjoyed it. When will you upload the training model through mlm?
@jamesbriggs3 жыл бұрын
Here you go kzbin.info/www/bejne/iGfLlKuDgrSlhqc :)
@tomcruise7943 жыл бұрын
Another great piece. Also I have a doubt, while calculating the weights in encoder(attention layer) , what will the initial value of masked token ? Since there should be a numerical value to calculate probability and find a loss value.
@jamesbriggs3 жыл бұрын
@@tomcruise794 each token will have a vector representation in each encoder, for BERT-base this is a vector containing 768 values (of which there are 512 in each encoder - one for each token). The final vector is passed to a feed-forward NN which outputs another vector containing ~30K values (the number of tokens in BERTs vocabulary), we then apply softmax to this. The loss function can then be calculated as the difference between this softmax probability distribution (our prediction) and a one-hot encoded vector of the real token That's pretty long sorry! Does it make sense?
@tomcruise7943 жыл бұрын
@james thanks for the detailed explanation. But my question is if any word is masked by masked token, then what will be it's initial value/vector representation. Will it be zero? Because there should be some initial value of masked token to calculate probability.
@jamesbriggs3 жыл бұрын
@@tomcruise794 I'm not sure I fully understand! Maybe you are referring to the initial vector in BERTs embedding array? Where the mask token (103) would be replaced by a specific vector which would then be fed into the first encoder block? In that case the initial vector representation wouldn't be zero, it would look like any other word (as far as I'm aware), and before BERT was pretrained these values will have bee initialized with random values (before being optimized in some way to create more representative initial vectors).
@trekkiepanda17043 жыл бұрын
This is very helpful! Thanks!
@vladsirbu95382 жыл бұрын
neat explanations! thanks!
@technicalanu69193 жыл бұрын
Hello. Bhai kese ho
@mohamadrezabidgoli81023 жыл бұрын
Thanks, man. Also visualize the encoded vectors in a video.
@jamesbriggs3 жыл бұрын
happy you enjoyed - I find it helps to visualize these things
@orewa95913 жыл бұрын
Thank you
@rog00793 жыл бұрын
So this series is basically how to pre-train BERT for any language or text from scratch right?
@jamesbriggs3 жыл бұрын
the way I've used it so far is for fine-tuning, you can use the same methods for pretraining for fine-tuning BERT to more specific language (improving performance on specific use-cases), but it's pretty open-ended and I'm planning to do some on training from scratch on a different language I'll upload a series intro soon too :)
@rog00793 жыл бұрын
@@jamesbriggs yes, training for scratch on a new language would be so muchhhh helpful !!!, I'll be waiting for those videos :D, thanks a lot, your channel is a gem!