L19.5.2.3 BERT: Bidirectional Encoder Representations from Transformers

  Рет қаралды 7,590

Sebastian Raschka

Sebastian Raschka

Күн бұрын

Slides: sebastianraschka.com/pdf/lect...
0:00 Introduction
0:29 BERT (Bidirectional Encoder Representations from Transformers)
1:44 BERT Inputs
4:18 BERT Pre-Training Task #1
9:47 BERT Pre-Training & Downstream Tasks
11:51 Transformer Training Approach
12:16 BERT Pre-Training & Fine-Tuning Approach
13:49 BERT vs GPT-v1 Performance
14:59 BERT Pre-Training & Feature-based Training
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L19.5.2.4 GPT-v2: Lang...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 5
@lily-qs7cr
@lily-qs7cr 2 жыл бұрын
thanks for all of your transformers viedos :)
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
you are welcome! glad to hear you like them!
@billykotsos4642
@billykotsos4642 Жыл бұрын
Quick question… the output of the Bert is several word embeddings right? So does that mean they need to be concatenated/added/averaged to create the feature to be passed to the MLP classifier? What is the most standard method these days?
@payam-bagheri
@payam-bagheri Жыл бұрын
I have been trying to understand how sentence embedding (one embedding vector for the whole sentence) is generated by BERT. What I've understood so far is as follows: - The [CLS] token that is added to the beginning of the sentence (assuming you give the model a sentence and want the embedding vector as the output) evolves in the same way as any other token evolves as it passes through the network (BERT) and what results from that process for the [CLS] token is used as a representative embedding for the whole sentence. - Or, the max/mean along each dimension of the embedding vectors for each token is used as the sentence embeddings.
@peregudovoleg
@peregudovoleg 2 ай бұрын
17:55, 2 years ago, 300kk parameters were "quite large". Look at us now - 405kkk llama 3. 1000+ grow. What about next 2 years...
L19.5.2.4 GPT-v2: Language Models are Unsupervised Multitask Learners
9:03
Sebastian Raschka
Рет қаралды 4,2 М.
Sentence Transformers - EXPLAINED!
17:51
CodeEmporium
Рет қаралды 26 М.
Пробую самое сладкое вещество во Вселенной
00:41
The day of the sea 🌊 🤣❤️ #demariki
00:22
Demariki
Рет қаралды 91 МЛН
L19.1 Sequence Generation with Word and Character RNNs
17:44
Sebastian Raschka
Рет қаралды 7 М.
Transformer models and BERT model: Overview
11:38
Google Cloud Tech
Рет қаралды 84 М.
BERT Neural Network - EXPLAINED!
11:37
CodeEmporium
Рет қаралды 380 М.
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 82 М.
L19.5.2.6 BART:  Combining Bidirectional and Auto-Regressive Transformers
10:15
Игровой Комп с Авито за 4500р
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 1,4 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 3,6 МЛН
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 7 МЛН