Expanding the UTF-8 Character Set to Infinity

  Рет қаралды 4,042

Mashpoe

Mashpoe

5 жыл бұрын

Expanding the UTF-8 character set to infinity

Пікірлер: 16
@ybungalobill
@ybungalobill 2 жыл бұрын
The proposed scheme breaks another genius property of UTF-8: that it's self-synchronizing. You can always determine if a byte is the beginning of a character just by looking at it. This is crucial not only for iterating back and forth through the string, but also for being able to search for substrings using a simple strstr. You can fix your scheme by filling in those ones into the x'es of 10xxxxxx bytes. Eg: 11111111 10111111 10111111 10111111 10110xxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx ...
@MatheusAugustoGames
@MatheusAugustoGames 3 жыл бұрын
Ok I just want to point out the genius that was the creation of UTF-8. Old computers, if they found 8 bits set to 0 in a byte, would interpret the string as finished. This pattern on UTF guarantees that will never happen accidentally.
@lelouchvibritannia69yearsa78
@lelouchvibritannia69yearsa78 2 жыл бұрын
The beginning of a Legendary Game Developer's journey!
@lelouchvibritannia69yearsa78
@lelouchvibritannia69yearsa78 2 жыл бұрын
Ayo I hot a heart from the creator! Les gooooooooo
@sarahdehart1027
@sarahdehart1027 5 жыл бұрын
Lol! That ending was epic! Loved it!
@halftwins
@halftwins 2 жыл бұрын
I see a couple problems with this, mainly for example, not having clarification on if a character has just started with a byte or is preceded by 11111111. Maybe there's something I'm not noticing, but it seems like for it to really last forever an ending sequence of some kind would be needed(?) Anyway, the video was great and early congrats on 1k!
@Magnogen
@Magnogen 2 жыл бұрын
That's a good point, I was half expecting him to say that if the byte started with 0, then _that_ would be the terminating byte. Something like *1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx* would then be the corresponding utf-infinity code, and ascii would be the base case of just 0xxxxxxx. Backwards compatibility and all. But hey, that's just a thought. I'm not sure how feasible it'd be in practice, as I don't tend to work with memory allocation, but I'd like to know how well it'd work/if it'd work at all.
@BGBTech
@BGBTech 2 жыл бұрын
@@Magnogen That scheme is actually used for encoding numbers in some file formats. One other scheme I had used in some of my formats is: 0xxxxxxx (0-127), 10xxxxxx-xxxxxxxx (0-16K), 110xxxxx-xxxxxxxx-xxxxxxxx (0-2M), ... A lot depends on what properties one wants. There are also various ways these schemes can be extended for signed numbers, to encode variable-length floating point values, ... OTOH, while UTF-8 doesn't have the most efficient representation, it does allow re-synchronizing, and in a few odd-cases non-standard encodings are possible (for example, I had used "transposed UTF-8" values in string tables as to encode string length prefixes), noting that it is possible to unambiguously differentiate between normal coded and transposed encodings (and in some cases, it might be preferable to have some way to be able to encode an explicit string length, without needing to count characters until the NUL byte).
@luca__3044
@luca__3044 2 жыл бұрын
Cant wait to express my feelings in a 420bit alien langue!
@PC_YouTube_Channel
@PC_YouTube_Channel 2 жыл бұрын
lmao amazing ending. your channel really gives off some Tom 7 vibes.
@sullivanbarnett6904
@sullivanbarnett6904 5 жыл бұрын
Thank you jacob!
@TimJSwan
@TimJSwan 2 жыл бұрын
lol 256 bits enough? more than all the plank lengths in the universe represented...
@bored_person
@bored_person 2 жыл бұрын
Patents expire after 20 years.
@robloxxer593
@robloxxer593 2 жыл бұрын
Wait why tf are they adding four entire 1's two chracters already had 4 combinations and wouldn't you know when it ends from the bits that told you how long it is? what's the point of the bits in the front of the byte
@decare696
@decare696 2 жыл бұрын
it's so that a byte that's in the middle of some character can't be mistaken for a correct ascii byte by old or bad/lazy software
@robloxxer593
@robloxxer593 2 жыл бұрын
@@decare696 stupid lazy old software
Creating My Own String Library in C
4:42
Mashpoe
Рет қаралды 9 М.
An Attempt at Making a Better String for C++
10:06
Mashpoe
Рет қаралды 8 М.
Be kind🤝
00:22
ISSEI / いっせい
Рет қаралды 20 МЛН
Omega Boy Past 3 #funny #viral #comedy
00:22
CRAZY GREAPA
Рет қаралды 33 МЛН
ONE MORE SUBSCRIBER FOR 6 MILLION!
00:38
Horror Skunx
Рет қаралды 14 МЛН
Making a Video Game in a Browser's Tab Icon!
4:36
Mashpoe
Рет қаралды 335 М.
4D Miner Post-Kickstarter Update
6:24
Mashpoe
Рет қаралды 177 М.
Improving My String Library
4:27
Mashpoe
Рет қаралды 1,5 М.
I Made a 1D Game 🎮
11:18
Mashpoe
Рет қаралды 1,7 МЛН
4D Miner Devlog #1: New Lighting and World Generation!
9:19
Mashpoe
Рет қаралды 110 М.
4D Miner Q&A Video #1
10:06
Mashpoe
Рет қаралды 84 М.
ASCII Tesseract Rotation Written in C
2:19
Mashpoe
Рет қаралды 75 М.
2D water magic
10:21
Steve Mould
Рет қаралды 499 М.
What percentage of charge is on your phone now? #entertainment
0:14
Обзор игрового компьютера Макса 2в1
23:34
Pratik Cat6 kablo soyma
0:15
Elektrik-Elektronik
Рет қаралды 8 МЛН
👎Главный МИНУС планшета Apple🍏
0:29
Demin's Lounge
Рет қаралды 496 М.
cool watercooled mobile phone radiator #tech #cooler #ytfeed
0:14
Stark Edition
Рет қаралды 7 МЛН