L19.5.2.3 BERT: Bidirectional Encoder Representations from Transformers

  Рет қаралды 7,590

Sebastian Raschka

Sebastian Raschka

Күн бұрын

Slides: sebastianraschka.com/pdf/lect...
0:00 Introduction
0:29 BERT (Bidirectional Encoder Representations from Transformers)
1:44 BERT Inputs
4:18 BERT Pre-Training Task #1
9:47 BERT Pre-Training & Downstream Tasks
11:51 Transformer Training Approach
12:16 BERT Pre-Training & Fine-Tuning Approach
13:49 BERT vs GPT-v1 Performance
14:59 BERT Pre-Training & Feature-based Training
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L19.5.2.4 GPT-v2: Lang...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 5
@lily-qs7cr
@lily-qs7cr 2 жыл бұрын
thanks for all of your transformers viedos :)
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
you are welcome! glad to hear you like them!
@billykotsos4642
@billykotsos4642 Жыл бұрын
Quick question… the output of the Bert is several word embeddings right? So does that mean they need to be concatenated/added/averaged to create the feature to be passed to the MLP classifier? What is the most standard method these days?
@payam-bagheri
@payam-bagheri Жыл бұрын
I have been trying to understand how sentence embedding (one embedding vector for the whole sentence) is generated by BERT. What I've understood so far is as follows: - The [CLS] token that is added to the beginning of the sentence (assuming you give the model a sentence and want the embedding vector as the output) evolves in the same way as any other token evolves as it passes through the network (BERT) and what results from that process for the [CLS] token is used as a representative embedding for the whole sentence. - Or, the max/mean along each dimension of the embedding vectors for each token is used as the sentence embeddings.
@peregudovoleg
@peregudovoleg 2 ай бұрын
17:55, 2 years ago, 300kk parameters were "quite large". Look at us now - 405kkk llama 3. 1000+ grow. What about next 2 years...
L19.5.2.4 GPT-v2: Language Models are Unsupervised Multitask Learners
9:03
Sebastian Raschka
Рет қаралды 4,2 М.
Вечный ДВИГАТЕЛЬ!⚙️ #shorts
00:27
Гараж 54
Рет қаралды 12 МЛН
Did you believe it was real? #tiktok
00:25
Анастасия Тарасова
Рет қаралды 7 МЛН
Неприятная Встреча На Мосту - Полярная звезда #shorts
00:59
Полярная звезда - Kuzey Yıldızı
Рет қаралды 7 МЛН
L19.5.2.6 BART:  Combining Bidirectional and Auto-Regressive Transformers
10:15
Erdős-Woods Numbers - Numberphile
14:12
Numberphile
Рет қаралды 69 М.
BERT PART-1 (Bidirectional Encoder Representations from Transformers)
8:39
Dr. Niraj Kumar (PhD, Computer Science)
Рет қаралды 2,6 М.
5.5 Math Books For Self Made Mathematicians
25:50
The Math Sorcerer
Рет қаралды 24 М.
Believe Me, Interstellar Travel Is Only A Mere Fantasy!
13:15
Insane Curiosity
Рет қаралды 18 М.
BERT Research - Ep. 1 - Key Concepts & Sources
16:19
ChrisMcCormickAI
Рет қаралды 93 М.
LoRA explained (and a bit about precision and quantization)
17:07
Transformer models and BERT model: Overview
11:38
Google Cloud Tech
Рет қаралды 84 М.
DolphinDB Finance Database Demo
10:00
DolphinDB
Рет қаралды 13
Неразрушаемый смартфон
1:00
Status
Рет қаралды 2 МЛН
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 2,2 МЛН
ИГРОВОВЫЙ НОУТ ASUS ЗА 57 тысяч
25:33
Ремонтяш
Рет қаралды 278 М.
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 7 МЛН
Best mobile of all time💥🗿 [Troll Face]
0:24
Special SHNTY 2.0
Рет қаралды 1 МЛН