how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

  Рет қаралды 2,341

Chris Hay

Chris Hay

Күн бұрын

chris breaks down the chatgpt (gpt-4) tokenizer and shows why large language models such as gpt, llama-2 and mistral struggle to reverse words. chris looks at how words, programming languages, different languages and even how morse code is tokenized, and shows how tokenizers tend to be biased towards english languages and programming languages,

Пікірлер: 6
@feniyuli
@feniyuli 6 ай бұрын
It is very helpful to understand how the tokenization works. Thanks! Do you think data that we encode using tiktoken will be sent to the AI?
@chrishayuk
@chrishayuk 5 ай бұрын
definitely not, it's all local
@ilyanemihin6029
@ilyanemihin6029 8 ай бұрын
Thanks, very interesting information
@chrishayuk
@chrishayuk 7 ай бұрын
glad it was useful
@ernestuz
@ernestuz 6 ай бұрын
The funny thing is the most complete the vocabulary the less pressure in the upper layers, so it's not only cheaper because of fewer tokens, but in processing, I wonder if somebody has prepared a semi handcrafted tokenizer, where, let's say the first 30K tokens come from a dictionary and the rest is generated.
@chrishayuk
@chrishayuk 5 ай бұрын
exactly. tbh, i wouldn't' be surprised if someone goes that direction
Fine-Tune Llama3 using Synthetic Data
37:03
Chris Hay
Рет қаралды 3,2 М.
버블티로 부자 구별하는법4
00:11
진영민yeongmin
Рет қаралды 22 МЛН
🕊️Valera🕊️
00:34
DO$HIK
Рет қаралды 11 МЛН
How I Turned a Lolipop Into A New One 🤯🍭
00:19
Wian
Рет қаралды 11 МЛН
MY HEIGHT vs MrBEAST CREW 🙈📏
00:22
Celine Dept
Рет қаралды 77 МЛН
i really want to say goodbye to copilot...
35:21
Chris Hay
Рет қаралды 2,4 М.
why llama-3-8B is 8 billion parameters instead of 7?
25:40
Chris Hay
Рет қаралды 3,5 М.
Now Anyone Can Code: How AI Agents Can Build Your Whole App
37:14
Y Combinator
Рет қаралды 45 М.
The future of AI agents is WebAssembly (get started now)
39:51
Chris Hay
Рет қаралды 1,8 М.
JavaScript Event Loop -- Visualized!
29:43
ColorCode
Рет қаралды 15 М.
Getting Started with ReAct AI agents work using langchain
43:33
How Powerful Is GPT-4 Really
13:01
Nick White
Рет қаралды 69 М.
버블티로 부자 구별하는법4
00:11
진영민yeongmin
Рет қаралды 22 МЛН