No video

Plain Text - Dylan Beattie - NDC Copenhagen 2022

  Рет қаралды 161,362

NDC Conferences

NDC Conferences

Күн бұрын

Software is complicated. Machine learning, microservice architectures, message queues... every few months there's another revolutionary idea to consider, another framework to learn. And underneath so many of these amazing ideas and abstractions is text.
When you work in software, you spend your life working with text. Some of those text files are source code, some are configuration files, some of them are documentation. Editors, revision control systems, programming languages - everything from C# and HTML to Git and VS Code is based on the idea of "plain text files". But... what if I told you there's no such thing? When we say something is a "plain text file", we're relying on a huge number of assumptions - about operating systems, editors, file formats, language, culture, history... and, most of the time, that's OK. But when it goes wrong, "plain text" can lead to some of the weirdest bugs you've ever seen... why is there Chinese in the event logs? Why is the city of Aarhus in the wrong place? And why does Magnus Mårtensson always have trouble getting into the USA? Join Dylan Beattie for a fascinating look into the hidden world of text files - from the history of mechanical teletypes to encodings, collations and code pages. We'll look at some memorable bugs, some golden rules for working with plain text - and we'll even find out the story behind the mysterious phrase "pike matchbox" and what it has do with driving in Belarus.
Check out more of our featured speakers and talks at
www.ndcconfere...
ndccopenhagen....

Пікірлер: 270
@jonnilazzerini9085
@jonnilazzerini9085 2 жыл бұрын
I was a little bit skeptical: how can anyone give a one-hour talk speaking just about 'plain text'? But I have to admit: it was simply AMAZING! Well done!!!
@tharfagreinir
@tharfagreinir Жыл бұрын
Dylan Beattie can make pretty much anything interesting. I think he likes to challenge himself that way.
@hansbaeker9769
@hansbaeker9769 11 ай бұрын
Same here. I was expecting to go to something else within a minute or two, but stayed for the whole thing.
@crax83
@crax83 6 ай бұрын
​@@tharfagreinirhis art of code talk is one of my all time favorite talks. This one is also way up there in the top 5 or so.
@f.d.3289
@f.d.3289 10 ай бұрын
23:30 That is the most beautiful thing about human beings that I've heard in a long, long while. God bless that postman who really cared for his job and even was smart enough to figure out that problem. This will make me happy the rest of the day :D
@jandorniak6473
@jandorniak6473 Жыл бұрын
Since Dylan does read comments, here's one of my favorite examples, in Polish: "Zrób mi łaskę" means do me a favor. Most of the characters can be turned to their ASCII lookalikes without any issue whatosever. Except one. "Zrób mi laskę" is asking for a specific sexual act. Just turning ł into l changes the entire meaning of the whole sentence.
@nsulikow
@nsulikow 2 жыл бұрын
This is one of the best presentations I've seen in a long time. Amazing content!
@merthyr1831
@merthyr1831 Жыл бұрын
This ascii issue is also a cause of cultural tension in (Republic of) Ireland and (Northern) Ireland, where birth registrations at some hospitals are refused or incorrectly assigned when a child's parents opt to use a Gaelic name, which often includes a bunch of non-ASCII chars. Hospital software is usually pretty archaic and predates a lot of the elegance of UTF. Also. Amazing talk. Funny and interesting the whole way through. Dylan Beattie is a legend!
@szymonbaranowski8184
@szymonbaranowski8184 Жыл бұрын
it's like Slavic names in Germany
@malcolmhutchison
@malcolmhutchison Жыл бұрын
One of my favourite sorting rules is that for Scottish surnames "Mac" and "Mc", both with and without following space, are considered the same letter that comes after L but before M
@EvincarOfAutumn
@EvincarOfAutumn 11 ай бұрын
There’s a similar quirk with English genealogical documents, such as old church birth registers and ships’ passenger lists. They’ll often use abbreviations of common personal names (and even some surnames) to save space, and when these are sorted-whether in the text itself, or later on by a computer-it may be according to what the abbreviation stands for, not the letters themselves. So you have to just know, for example, that “Hy.” might appear before “Herb.”, because “Henry” comes before “Herbert”. Moreover, some of the abbreviations are based on a Latin and/or Greek transliteration of the name, such as “Iabus” = “Iacobus” = “Jacob” or “Xpr” = “Christopher”.
@paulwesley3862
@paulwesley3862 10 ай бұрын
​@@EvincarOfAutumninteresting! Just wondering why Jacob was abbreviated with another 5 letter word? 🤔
@altreusplays
@altreusplays 10 ай бұрын
I’ve also noticed it’s a free for all on whether the word “the” is ignored when sorting lists of names. Steam doesn’t ignore it, for example, and I think Google Play music used to but KZbin music doesn’t. But to me, it’s correct to ignore it and incorrect not to!
@EvincarOfAutumn
@EvincarOfAutumn 10 ай бұрын
@@paulwesley3862 In that case, the person’s name in everyday life would’ve been Jacob, but if the church records are (partially) in Latin, it’s the Latin form that’s abbreviated. I think just “Iab.” is attested as well, though I’m not sure.
@chascuk
@chascuk Жыл бұрын
The 7-bit encoding for SMS messages in GSM is the same as ASCII for most characters but many of the control characters have been replaced text characters that were missing from ASCII. In particular it does not have NUL, 0 encodes the '@' character. So, as one of my colleagues at Ericsson found out the hard way, you cannot use C NUL terminated strings to process SMS messages.
@UliTroyo
@UliTroyo Жыл бұрын
Interesting!
@flammungous3068
@flammungous3068 10 ай бұрын
This video also explained to me why SMS becomes converted to MMS if just put in a few emojis. Because the emojis take so many bytes.
@Architector_4
@Architector_4 9 ай бұрын
wait, what about ASCII 0x40? Isn't that an @?
@chascuk
@chascuk 9 ай бұрын
@@Architector_4 In GSM 7-bit encoding 0x40 is inverted exclamation mark, one of the characters missing from ASCII. No idea why they didn't use 0 for this and keep @ where it was.
@Architector_4
@Architector_4 9 ай бұрын
@@chascuk ...huh. That's fun, thank you lol
@NicholasShanks
@NicholasShanks 2 жыл бұрын
At the risk of being one of those KZbin comments shown in your next talk, the diacritic you discuss at 29:18 is a diaressis not an umlaut. They look the same and are encoded with the same codepoints, but are pronounced differently. An umlaut changes the quality of the vowel, and can appear on lone vowels in any language that uses them. A diaresis tells readers that the second of two vowels is not to be read as a diphthong, but a separate vowel. That's why English has one on, for example, naïve (nigh-eve, not knave). Coöperation is co + op not co͞op.
@jkollin4875683F
@jkollin4875683F 2 жыл бұрын
Something Nordic readers of Tolkien would do well to be aware of -- I'm referring to Eärendil etc.
@EricChipko
@EricChipko Жыл бұрын
Well done. I am not sufficiently educated to know if you are right, but the criticism is concise and I recognize the words if not what they mean.
@stevecarter8810
@stevecarter8810 Жыл бұрын
Saved me posting the same, but having to look up all the terms to double check myself. Thanks!
@TonyCoyle
@TonyCoyle Жыл бұрын
and that specific diaresis is called a trema in almost every other language that uses it...
@Shack263
@Shack263 Жыл бұрын
Also, the umlaut is used in German and was derived from roundabout there (idk the history too well) whereas the diaresis or trema evolved independently and is notably used in French to mark vowels that may usually be silent, but should be pronounced. This is similar to it's use in coöperation, to basically say that the second o is pronounced distinctly. The two symbols were developed independently.
@notthedroidsyourelookingfo4026
@notthedroidsyourelookingfo4026 2 жыл бұрын
Recently, a student of mine opened a text file and it was all Chinese gibberish. I remembered your talk and switched the encoding from UTF-8 to UTF-16 or vice versa, and there was a readable file again :)
@FlameRat_YehLon
@FlameRat_YehLon Жыл бұрын
Meanhile in areas people actually use Chinese, well, time to try all the encodings.
@HasanSIM14
@HasanSIM14 2 жыл бұрын
Watching this for the second time (I watched the video referenced several times in this talk). Absolutely brilliant and I learned a lot
@drullo
@drullo Жыл бұрын
Absolutely one of the best presentations that I've seen and it was a total shock. I watched this because I'm a geek and I like Dylan Beattie. I never expected it to be this awesome!
@NicolasChanCSY
@NicolasChanCSY 2 жыл бұрын
44:14 Glad that my comment in the previous talk video was found helpful :)
@JeremyAndersonBoise
@JeremyAndersonBoise 2 жыл бұрын
The youtube comment near the beginning of this updated version of his previous presentation illustrates the point of the talk powerfully. Dylan is always amazing, but this talk from him is perhaps uniquely important to everyone in the field! From 1st year associates to the most seasoned senior architect, plain text is always less than plain.
@fabioluizalvaresosti7115
@fabioluizalvaresosti7115 Жыл бұрын
Plain text but the 'l' is silent
@jeberle1
@jeberle1 Жыл бұрын
Very good talk. Regarding ASCII and punchcards, it's unlikely they would ever meet in the first place. You do course correct a bit w/r/t the DEL character, but punch cards were originally in 6-bit BCDIC (binary-coded decimal interchange code). This was extended to 8-bits to become "Extended" BCDIC, or EBCDIC. The layout of the character set aligned w/ the rows of the punchcard, such that all alphabetic chars were x1 - x9, so in late variants 'A' is 0x11 and 'Z' is 0x39. To get 3 rows of 9 columns to line up, there's a "/" at the start of the last row, 0x31. Interestingly, ASCII was created by Bob Bemer at IBM to solve interop problems between the BCDICs. However, IBM was in so deep w/ their card-based (E)BCDIC, they couldn't use it in any of their operating systems. Note also, EBCDIC is still very much in use. Finally, Multics did not influence Unix, except to serve as a counter-example of design principles.
@edgeeffect
@edgeeffect Жыл бұрын
I've always wondered how come EBCDIC was "extended", thanks for that.
@braveatnight
@braveatnight 2 жыл бұрын
Yay I love this guy, I binged all his talks like a month ago
@JeremyAndersonBoise
@JeremyAndersonBoise 2 жыл бұрын
You have impeccable taste, bravo!
@f.d.3289
@f.d.3289 10 ай бұрын
I have been a softare developer for 20 years and it's only in the last 5 years that I began to realize the actual complexities of good old plain text. Once I realized how complex this issue actually is, I began to wonder why many of the systems I had worked on even WORKED. It's not something they talk about at university or anywhere, so it was nice to see this gets so many views. I haven't watched it yet but I'm sure it will open many people's eyes.
@ayle1312
@ayle1312 Жыл бұрын
30:00 ij is a dutch letter, not a typesetter's ligature! It's in the extra block at 19:50 left of Ö. Most fonts don't support it and ASCII led to it being written as 2 letters (i and j) because it was the only non-ascii letter in dutch, but all dutch typewriters before PCs were popularized had a dedicated key for it. Fonts that turn it into a ligature often run into problems with words like minijack, Beijing and bijoux. It used to have the same problem as å, with some people turning it into a Y (most famously Cruijff) until it got standardized as I+J.
@heinzk023
@heinzk023 Жыл бұрын
In days of 7 bit ASCII, there were lots of workarounds in non-English speaking countries. For example, in order to be able to print umlauts, printers had special character sets that had umlauts where normally the characters {, [, ], }, \ and | were, because nobody needed them when writing a letter. However, if a C or C++ programmer would use such a printer, his code would look quite funny. In parts that's the reason why some languages have special replacements for these characters, called digraphs and trigraphs. This all sound like multiple layers of duck tape putting on top of one another but it kind of worked.
@MrIkariaman
@MrIkariaman Жыл бұрын
Also, for future talks you may find the "Greeklish" system interesting: en.wikipedia.org/wiki/Greeklish Basically before Greek language was fully supported, Greek people interacting with electronics came up with mappings between ASCII and Greek. These mappings were unofficial and there are several variations. Even after UTF-8 was implemented and got more and more adoption, lots of young people still utilized Greeklish in SMSs to send messages to each other because you'd get charged by the number of bytes you used (in groups of bytes) and not by the actual number of characters used. This is also an issue in a lot of fields that have a byte limit instead of a character limit. On a parallel note... If you do a bit of time travel, and go to Greek villages in Anatolia during the time of the Ottoman Empire, you'll find the Greek alphabet being used to write Turkish text: en.wikipedia.org/wiki/Karamanli_Turkish
@deus_ex_machina_
@deus_ex_machina_ Жыл бұрын
That sounds similar to what many Arabic speakers use, numbers in place of characters.
@qm3ster
@qm3ster Жыл бұрын
Nothing wrong with writing JavaScript in Ukrainian: 1. It runs fine. 2. In production build, the minifier will take it all out and replace it with single-character ascii names. 3. Source maps will work fine.
@filker0
@filker0 Жыл бұрын
I spent a fair part of my career designing and implementing serial terminals and emulators of the same. For terminals from DEC starting with the VT100 (and other "ANSI" terminals), there was something called "code extension", along with character set designators, graphic sets, and shifts (both locking and single) that were used to mix text from multiple character sets on one screen/page using either 7 or 8 bits per character. This was fine on terminals and printers that had the same character sets available, but caused a lot of grief when a device receiving the text didn't support all of the character sets used. Also, very few editors at the time could handle storing such text. It was a mess, but at least it was better than what it replaced, which was National Replacement Character Sets (NRCS), where it was 7-bit ASCII with the glyphs for some of the code points replaced. There was no way to tell which NRCS had been selected when the file was created, even with a hex editor.
@henrikholst7490
@henrikholst7490 Жыл бұрын
Fantastiskt innehåll. Borde vara allmänbildning för alla som jobbar med IT och utveckling.
@vincentvega7908
@vincentvega7908 Жыл бұрын
The reason why you get smiley faces when DOS crashes is not because there is something trying to generate the stop character. The reason is that often it starts executing random garbage or tries to print a message that became random garbage due to memory corruption. In a piece of program data the values 1 and 2 would be quite common if you have some counters that did not fit into your registers, and maybe they encode some common x86 instruction as well. The string terminator in the common OS interface for printing strings was the dollar sign rather than nul on DOS operating system. The dollar sign is much less common than nul and smiley faces in random garbage so you will likely get some smiley faces printed. Note also that 'plain text' is just a binary format (or more precisely a family of binary formats with ASCII, EBCDIC, various code pages, JIS, BIG5, GB 18030, UCS-2, UTF-7, UTF-8, big endian and little endian UTF-16/UTF-32,...) for which there happens to be a lot of editors and viewers. In the end it's all binary bits. One specific property that 'plain text' has over many other binary formats is that it has very little structure and can still be of some use when some bits are flipped or bytes missing as opposed to, say a compressed JPEG image with the caveat that the multibyte encodings are much more fragile.
@rojokongen
@rojokongen Ай бұрын
Loved the talk. Well done, Dylan! 👌
@sauliustb
@sauliustb Жыл бұрын
this is an amazing talk. i already knew some of this, but it still is nice to get a reminder on this stuff :)
@feisty-trog-12345
@feisty-trog-12345 Жыл бұрын
43:35 Generally a very solid talk, but the section about UTF-16 was kinda inaccurate. UTF-16 is not actually a fixed-length encoding and you cannot get the number of bytes just from the number of contained characters (e.g. Emoji need two UTF-16 code units forming a surrogate pair). The actual reason that so many of these 90s systems use UTF-16 is that this was the time of the fixed-size 16 bit UCS-2 encoding ( "65k characters ought to be enough for everyone"), which was later expanded to become UTF-16 when they ran out of code points. Instead, the range of code points U+D800 to U+DFFF was permanently snapped out of existence, so that UTF-16 could use them to encode higher code points as multi-word sequences. This is also the reason why not every String in C#, Java, or JS is Unicode; these languages allow you to have unpaired surrogates which are not valid UTF-16 (they are not scalar values). See the "History" section of UTF-16 on Wikipedia. And this entire paragraph was even without going into that dreaded word "character". If you take character to mean code point, then doubling the number of characters to get the number of bytes is almost correct (so long as you don't care about anything outside the BMP, aka basically all instant messaging, social media, ...). But as we've seen one "character" can be made of many many code points and each of those code points can be multiple code units. And if sequence of code points is displayed as one "character" or multiple depends on the display technology you're ultimately using (wtf is an extended grapheme cluster?). In fact, the Unicode standard doesn't define what a character is. So, ultimately, there is no actual correspondence between the number of "characters" in a string and the number of UTF-16 code units, the concept of a character varies from use to use, and UTF-16 falls short of even the most charitable interpretation of "character = code point". Additionally, the reason that UTF-8 stops at four bytes is actually because Unicode is a 21-bit scheme. Unicode has made guarantees that it will only ever go up to U+10FFFF and this, again, stems from the fact that they weren't able to squeeze more bits out of UCS-2. In summary, UTF-16 is weird a legacy encoding resulting from expanding UCS-2 to a set of code points it was never meant for. In doing so, UTF-16 has lost a key property of UCS-2 (being a fixed-length encoding for scalars), while only displaying the lack of this property for (until recently) uncommon inputs. It now has both the disadvantages of UTF-8 (variable length) and UTF-32 (wasted space, ASCII incompatibility) while introducing additional drawbacks (byte order confusion, false belief in being fixed-size). Unicode has had to insert multiple hacks just to keep this mess going. UTF-16 is Unicode's original sin. Every emoji broken by a Java developer using "char", every "Bush hid the facts" censored by IsTextUnicode, and every broken API call from mishandling wchar_t is a punishment from the tech gods themselves. In our hubris we believed that there were less than 2^16, so now we must suffer forevermore.
@serpent77
@serpent77 2 жыл бұрын
Having recently delved into utf8, unicode, etc, I knew a lot of this, but learned a few new things as well, either way it was thoroughly interesting. Well done!
@JonathanPlasse
@JonathanPlasse 5 ай бұрын
Thank you for this wonderful talk 🙏
@zuao76
@zuao76 8 ай бұрын
Now this was incredible funny, entertaining, intelligent and interesting. Not expecting this. Incredibly done. We need more talks like this in IT and not so serious and boring. Well done :)
@DerekCroxtonWestphalia
@DerekCroxtonWestphalia 11 ай бұрын
Good talk, I did a lot of research on this about 20 years ago but I always forget. BTW, the two dots in English are a diaresis, not umlaut.
@microcolonel
@microcolonel Жыл бұрын
UTF-8 is rarely slower to process than UTF-16, and because UTF-16 only has the BMP in a single code unit, you can't rely on that for counting codepoints anyway; furthermore, rarely do you want to count codepoints, you generally want to count graphemes.
@tappy8741
@tappy8741 Жыл бұрын
UTF16 generally sucks and was the bane of my existence for many years, thanks for nothing windows as usual.
@Karreth
@Karreth 9 ай бұрын
UTF-16 is actually just another hack to fix UCS-2, which is the fixed 16-bit Universal Coded Character Set. It was intended to contain all the codepoints until we discovered that 16 bits were actually too few bits to contain the set. It really is hacks and partial backwards compatibility all the way down. Windows extended their API to work with wide characters to support UCS-2 before UTF-16 or UTF-8 was a thing, and when UCS-2 died they were kinda screwed and couldn't update their design. So that's how we ended up here.
@bujin1977
@bujin1977 Жыл бұрын
Late to the party, but I enjoyed that. So much so that I started watching at about 1am thinking of just catch the intro before I went to sleep to determine if it's something I want to keep watching, and ended up watching over half of it before finally deciding I was too tired. Also I learned something new that will solve an issue with one of my applications, so that was a bonus!
@CRBarchager
@CRBarchager 2 жыл бұрын
At first glance the headline of this video/presentation seems dull but it ended up being extremely interessting! - Very good video and very informative!
@fedormalyshkin
@fedormalyshkin 2 жыл бұрын
It's the most funny IT conference's speech I've ever seen in years!
@Kitulous
@Kitulous 4 ай бұрын
that was a very interesting watch, thank you!
@jalexanderdatkins
@jalexanderdatkins 11 ай бұрын
28:36 Æ is totally a letter in English. It's called the letter æsc, which sounds like "ash", because it represents the tree ash. And for completeness I should also mention the letter œthel, which sounds like Ethel, the personal name. They appear in obviously english words like encyclopædia, manœuvre and Cat7 UTP Æthernet cable. … Not to mention archæologist. I may have cheated a little bit with one of mine, but why doesn't that count?
@theelmonk
@theelmonk 10 ай бұрын
Laughed at Cat7 UTP Æthernet cable. And realised it's perfectly correct.
@jalexanderdatkins
@jalexanderdatkins 10 ай бұрын
It’s obviously an English word, right? And everyone knows that’s a valid spelling for it. The cheaty one is manœuvre, because that’s a French word. But I don’t get why he doesn’t count archæologist? Maybe it’s in the same way as because Latin only has the letter K in one word, it’s not considered part of the Roman alphabet. And to be fair, Æsh and Œthel don’t come up very often. Œstrogen is another one, but that’s basically a Latin word. I don’t know any non-borrowed words containing œ that are still in modern English. Unlike æther.
@dgsagoskis1851
@dgsagoskis1851 Жыл бұрын
I love them YT commentators. World would be a much more imperfect place without them. Btw i thought i knew a lot about plaintext, but turns out i knew something about plaintext. Thank you!
@etmax1
@etmax1 Жыл бұрын
Well that was another exceptional video from the master. I found that extremely enjoyable and informative. Unsurprisingly I didn't know a lot of the histrionics
@Rx7man
@Rx7man Жыл бұрын
2:57 My favourite part of this is your youtube suggested videos are all ones I've watched!
@dmurvihill
@dmurvihill Жыл бұрын
I couldn't imagine working at an airline, where I know for sure that names will be scrutinized in every detail, and deciding "eh, I'll just strip diacritics off of everything." Having scanned passports before, there are very well-publicised and clear standards for how to transliterate any Unicode character into that strip at the bottom.
@theelmonk
@theelmonk 10 ай бұрын
You're probably not American or English, then, where diacritics are uncommon and used only by foreigners. Yes, if you think about it that's a bit parochial but that shows the difference between programmers working for commercial companies with a certain market and the people who write standards like the one that allowed all those different forms in an email address.
@colinmaharaj
@colinmaharaj 10 ай бұрын
Lovely talk, like going down memory lane. Spent a lot of time dealing with this. From writing xmodem and ymodem, to parsing csv files, converting bin to text, and back.
@AshtonSnapp
@AshtonSnapp Жыл бұрын
Rewatching this talk proved very useful today. Currently dealing with the lexer for my programming language project failing unit tests on the Windows runner for GitHub actions. Wanna guess why? I’ll give you a hint: newline tokens report their span to be exactly one character later than expected.
@SiriusXification
@SiriusXification 2 жыл бұрын
You know, featuring the youtube comments in the talk only embodlens us.
@user-oc3mi2ct6t
@user-oc3mi2ct6t Жыл бұрын
Small comment from a Dane. Aarhus is at the start of the alphabet then spelled with a double aa atleast acording to any convention I have seen in use here in Denmark. Eventhough aa and å represents the same letter we still keep the alphabetic order distinct. Implying that Aabenraa is first in a alphabetically sorted list of city names in Denmark.
@deus_ex_machina_
@deus_ex_machina_ Жыл бұрын
This popped up at the right time; while messing around with Notepad++ I looked up the purpose of carriage return, line feed, and tricks like *bolding,* underlining, and -strikethrough- with typewriters and teletext. I've since come across resources like Typography for Lawyers that, apart from being an excellent reference for general formatting, advocate the end of shortcuts picked up from typewriters and a return to form for good typefaces and typesetting.
@f.d.3289
@f.d.3289 10 ай бұрын
Great lecture -- super fun and informative, thanks! And now I'd love see a follow-up that touches upon those lovely grey areas of A) finding out the encoding of a given "plain" text file, and B) UTF-16 surrogate characters. Especially the latter is quite important, because I'd guess that 95% of all applications using UTF-16 are broken, in the sense of not being able to deal with any text that contains Unicode codepoints which can not be encoded in the 16-bit units of UTF-16.
@sportundwein
@sportundwein 2 жыл бұрын
Amazing content - mega cool Präsentation 🈶
@JeremyAndersonBoise
@JeremyAndersonBoise 2 жыл бұрын
I see what you did there.
@edgeeffect
@edgeeffect Жыл бұрын
@@JeremyAndersonBoise I was going to comment "I see what you did there".... but then I saw what YOU did THERE.... so couldn't.
@jkollin4875683F
@jkollin4875683F 2 жыл бұрын
On alphabetical ordering in Finnish... back when I was in school in the 1990s, I was taught that V and W actually are considered equal in Finnish. So going through a list of Finnish surnames, Valli, Waris, Virtanen, Wirtanen (tiebreaker here, I suppose) would be in correct order. But having googled this a bit more, this is apparently nowadays (since 2000) somehow dependent on context -- mixed with foreign words and names such as Vanderbilt and Wolf, it's OK to sort them all V first, then W. So I don't know if even printed dictionaries use this sorting today. I don't think this peculiarity is even well-known, IIRC this surprised many of my Finnish coworkers.
@cameron7374
@cameron7374 Жыл бұрын
So, do computers ever deal with this or do they just sort V first, then W?
@jkollin4875683F
@jkollin4875683F Жыл бұрын
​@@cameron7374 Never noticed a system that would (probably in part because W is in Finnish only in names (outside of possibly loanwords), and even there it is very rare). But after a quick googling, apparently at least in 2006 PostgreSQL allowed for this at least in Swedish.
@pepijnkrijnsen4
@pepijnkrijnsen4 Жыл бұрын
36:09 I see this a lot in the large German company I work for, specifically this example of having to select a country from a dropdown list. The countries' English names are displayed, but ordered as if they're German names.
@GuildOfCalamity
@GuildOfCalamity Жыл бұрын
Great presentation! I code systems that use control codes all the time for work; they are still widely used and accepted (receipt printers, barcode scanners, serial comms, etc).
@heinzk023
@heinzk023 Жыл бұрын
When I was working with ASCII terminals, I liked to use BEL to sound the squeaky buzzer of the terminal.
@SerrinTheElf
@SerrinTheElf Жыл бұрын
That postal worker deserved a raise lol.
@nikneumann1752
@nikneumann1752 9 ай бұрын
I thought it was boring, but surprise! I watched it to the end. 😁
@BenjaminAster
@BenjaminAster Жыл бұрын
Mistake in 50:23: the rocket emoji is U+1F680, not U+1F680D
@BradenBest
@BradenBest Жыл бұрын
I'm famous. I vaguely remember the train of thought I had with that WWIII joke. That you posted a meme on twitter that was so funny that it prevented WWIII, and with you erased from existence by time travel shenanigans, that meme never gets posted and thus WWIII happens. I know I can get long winded especially when I talk about technical stuff, which is probably why I put that joke in there at the end. It's like a reward for sitting down and reading all that stuff about base64 and how vim fucks up binary encoding. Also, how dare you say the End Of Transmission character, Ctrl-D, is unimportant. How else would I log out of my Linux terminal in one keystroke?
@TooLazyToFail
@TooLazyToFail 10 ай бұрын
This was a really fun talk, and very well-delivered.
@maximvoloshin7602
@maximvoloshin7602 2 жыл бұрын
You should never underestimate things labeled “simple” or “plane” )) Thanks, Dylan! Appreciate so much everything you’re doing for the community.
@NeatNit
@NeatNit Жыл бұрын
I have never underestimated a plane. Be it a machine that can carry me to the sky, or an infinite flat set of points in 3D space, or a tool used to smooth wooden surfaces, they are always quite intimidating.
@maximvoloshin7602
@maximvoloshin7602 11 ай бұрын
@@NeatNit 🤣🤣You got the point!
@nneddenn6207
@nneddenn6207 11 ай бұрын
Dylan, thanks for a speech! It was really interesting to hear all this historic details and understand more how unicode works. And my gratitude for your support of Ukraine! Слава Україні!
@emmafountain2059
@emmafountain2059 11 ай бұрын
God I have homework but now I have an irresistible urge to research unicode cause this was fascinating. Its amazing how clever some of their solutions are
@rustkitty
@rustkitty 9 ай бұрын
53:42 According to Apple, Dylan was in Denmark. According to Microsoft, he supports Donkey Kong. Both very respectable!
@akirachisaka9997
@akirachisaka9997 Жыл бұрын
I really wish Dylan talks about Han Unification. Like, it's just such a cursed aspect of Unicode. I really wish more people know about it.
@gbeziuk
@gbeziuk 10 ай бұрын
I guess there's not much hope for doing a cameo in the next version of the presentation, but I'll try anyway. Using Cyrillic, or any other local writing system in JavaScript is probably a bad idea in any production code, for sure, and it's universally frowned upon for a reason. Universality, you know - if you write science in Medieval Europe, use Latin, don't be a dick. But, there's a "but"! Teaching programming to newbies with no STEM background whatsoever, who also don't happen to be fluent in English (you can imagine), I suddenly found allowing them to use the words of their native language as names in their source code very, very useful. Separation of concerns and cognitive load reduction, I guess. As a bonus, there's a clear distinction between library entities and the locally introduced ones, which is also a good thing for the newbies. In fact, the role of English in international software development is a huge topic with a ton of practical consequences. Some Chinese have already stopped giving shit on this "you must write everything in English" thing, and it's not gonna stop there. I LOVE FiraCode, BTW!
@pyropunk51
@pyropunk51 10 ай бұрын
Good talk. I was a bit disappointed that you did not even touch on the whole EBCDIC vs ASCII situation.
@imranhussain8700
@imranhussain8700 2 жыл бұрын
This Guy is true Gem 💎.
@NonTwinBrothers
@NonTwinBrothers 9 ай бұрын
I forgot about the ending. I've always known this as the Kohuept talk :D
@jensGC
@jensGC Жыл бұрын
kzbin.info/www/bejne/nZWYpn1tg9GprNE The Danish letters "æ" and "ø" are much older than the spelling reform in 1948. The only new letter that was introduced in that reform was "å". It is correct that the reform did make Danish orthography more distinct from German - but the main reason for this is that the reform removed the capitalization of nouns.
@Carewolf
@Carewolf Жыл бұрын
Emoji existed in the West long before iPhones did. It came to us with things like instant messaging platforms. ICQ, MSN messenger, even facebook.
@junestorm
@junestorm 11 ай бұрын
Brilliant lecture!! They didn't teach this in the 1980's when I studied computer science. ☝🙃
@daniilboiko
@daniilboiko 11 ай бұрын
The best one I watched last year! Special thanks for supporting Ukraine! Pike matchbox!!!
@KangoV
@KangoV 11 ай бұрын
Java now uses UTF-8 internally. They dropped UTF-16 when Java 8 came out. An hour on plain text? I would not have believed it until I watched it. Just awesome.
@hfranke07
@hfranke07 Жыл бұрын
Awesome job..... blown away
@MeriaDuck
@MeriaDuck 11 ай бұрын
That Russian postal service anecdote is just so wholesome.
@stevecarter8810
@stevecarter8810 Жыл бұрын
Omg that was god level summarising at the end
@Fetrovsky
@Fetrovsky Жыл бұрын
I remember running echo ^G in DOS as a teen.
@acobster
@acobster 11 ай бұрын
I've read the SO post, buy I never knew there was a name for Zalgo Text! Fantastic talk.
@Proppeti
@Proppeti 3 ай бұрын
Amazing, informative and pretty entertaining!! 😮😅
@bommel88
@bommel88 Жыл бұрын
As somebody from Aachen, I appreciate the choice of examples :D
@wagyourtai1
@wagyourtai1 Жыл бұрын
I love watching different versions of the same talk... :)
@theelmonk
@theelmonk 10 ай бұрын
Is there another version where it carries on past the intruiging statement 'and this is where the version for youtube ends' ?
@Jayderzomb
@Jayderzomb 10 ай бұрын
this was beautifully interesting, thanks!
@bluenuttefly8813
@bluenuttefly8813 Жыл бұрын
They sang Odoia on the Billie Joel concert, which is a Georgian folk song!!! It is entered as Odoya in the beginning of the album shown... What the heck. I did not know of this. Cool!
@richardtwyning
@richardtwyning 10 ай бұрын
Brilliant 👍
@secondengineer9814
@secondengineer9814 Жыл бұрын
It was interesting to see the origins of Dwarf Fortress UI!
@CRBarchager
@CRBarchager 2 жыл бұрын
11:42 Anyone else had to try this when viewing the video? - It works!
@theburner4522
@theburner4522 Жыл бұрын
I wanted to try it out, but which key does he mean with "echo"?
@SebastianSchleussner
@SebastianSchleussner 11 ай бұрын
@@theburner4522 Not a key. You open a shell (e.g. "cmd.exe" and literally type "echo" followed by a space and then Ctrl-key together with G, Enter.
@pengain4
@pengain4 Жыл бұрын
Brilliant speaker and very exciting talk. ❤ Дякую!
@DavidLindes
@DavidLindes 4 ай бұрын
Dunno if Dylan will see this, and I'm guessing it's already been corrected, but just in case not... at 51:35, what's said is mostly correct, and what comes up on the screen has a few to many U+E0065's to go along with what's said, and is missing a U+E007F that seems to be necessary. So, this works, for me: ruby -e 'puts "\u{1f3f4}\u{e0067}\u{e0062}\u{e0065}\u{e006e}\u{e0067}\u{e007f}"' ... though, if I paste that result here, it doesn't seem to?? 🏴󠁧󠁢󠁥󠁮󠁧󠁿 - I think all the characters are there, it just didn't render as combined. ?!? (Maybe it will after I post this? We'll see! Edit: nope 😭. It did require me to change double-dash to actual emdash, to avoid s̶t̶r̶i̶k̶e̶t̶h̶r̶o̶u̶g̶h̶ 😂.)
@pawelhepnar1608
@pawelhepnar1608 Жыл бұрын
Absolutely brilliant great speech
@Carewolf
@Carewolf Жыл бұрын
Only one letter was added to the Danish alphabet in 1948. We already had æ and ø. Only å is a Swedish letter
@Carewolf
@Carewolf Жыл бұрын
Ironically it was Sweden using the German typewriters and alphabet. Hence they forgot the old Scandinavian letters æ and ø and replaced them with the German ä and ö.
@SebastianSchleussner
@SebastianSchleussner 11 ай бұрын
​@@Carewolf Typewriters? Ä and Ö, also in Sweden, go back to centuries before typewriters. Just different routes taken - to illustrate with "AE": Make a ligature of it, or put the e above and simplify? Anti-Danish sentiments of certain kings, who went out of their way to make Swedish sound/look different than rival Danish, may have contributed to the development.
@yugoprowers
@yugoprowers Жыл бұрын
Pike Matchbox is going to be one of those thing like when someone said Parachuting Buffaloes for lead on the Periodic Table, I'll never forget it because it is such a weird thing.
@Mokkatomic
@Mokkatomic 8 ай бұрын
"your recording sounds great! What mic do you use?" "Rødgrød med fløde"
@chernyshovandrew
@chernyshovandrew 2 жыл бұрын
Great talk! Thank you.
@warwickleahyssw4163
@warwickleahyssw4163 Жыл бұрын
Awesome video Calum
@Wyrd1975
@Wyrd1975 9 ай бұрын
BTE (Best Talk Ever) ! 👏
@sharkie115
@sharkie115 10 ай бұрын
11:37 End of transmission (Ctrl-D) also still exists. This is the way to exit Linux console session.
@fieryscorpion
@fieryscorpion 2 жыл бұрын
Wow That was a pretty interesting and fun talk!
@lazykbys
@lazykbys 11 ай бұрын
Just to add a bit more pedantry, ASCII is not in alphabetical order since uppercase A comes after lowercase Z. I didn't realize this until I started typing a post to complain about how Windows 10 (unlike Windows 7) sorts Japanese hiragana and katakana, then noticed something similar happened with the English alphabet. Odd how things don't seem strange when you're used to it. :)
@kevinfleischer2049
@kevinfleischer2049 Жыл бұрын
Great talk. I was wondering, what would hide behind that title, and I was not disappointed.
@thevikas5743
@thevikas5743 Жыл бұрын
I was ready to waste my time on the boring plain text. But somehow that moscow postman made me go WOW!!!
@edgeeffect
@edgeeffect 2 жыл бұрын
The Wheatstone bridge was invented by Samuel Hunter Christie and improved and popularised by Sir Charles Wheatstone.
@tolgagorgun7816
@tolgagorgun7816 Жыл бұрын
54:10 I literally bursted in to hard laughter for Windows' statement " 🏳‍🌈🏴‍☠🏁Gay pirates are winning!", hilarious mate. Amazing :D
@jmkok
@jmkok Жыл бұрын
A fantastic talk about letters. However you use font with an incorrect letter "g" in "Mange tak!" (58:42). Is this by accident or an easter egg?
@awelotta
@awelotta 11 ай бұрын
good eye! the g should be single story or double story with the bottom "reversed". interesting. maybe its supposed to be a single story g with a very loopy tail?, especially since the a's are single story and it's slanted, i.e. it's cursive
@chfr
@chfr Жыл бұрын
I like that how part about UTF8 becomes Tom Scott's emoji keyboard video (no sarcasm I really do)
Failure is Always an Option - Dylan Beattie - NDC Copenhagen 2022
47:44
Programming’s Greatest Mistakes - Mark Rendle - NDC Copenhagen 2022
55:35
Schoolboy Runaway в реальной жизни🤣@onLI_gAmeS
00:31
МишАня
Рет қаралды 3,9 МЛН
艾莎撒娇得到王子的原谅#艾莎
00:24
在逃的公主
Рет қаралды 53 МЛН
English or Spanish 🤣
00:16
GL Show
Рет қаралды 7 МЛН
What will he say ? 😱 #smarthome #cleaning #homecleaning #gadgets
01:00
There's No Such Thing As Plain Text • Dylan Beattie • YOW! 2023
49:39
GOTO Conferences
Рет қаралды 4,8 М.
Plain Text - Dylan Beattie - NDC Oslo 2021
54:13
NDC Conferences
Рет қаралды 293 М.
Ctrl-Alt-Del: Learning to Love Legacy Code - Dylan Beattie
56:47
NDC Conferences
Рет қаралды 132 М.
"Stop Writing Dead Programs" by Jack Rusher (Strange Loop 2022)
43:04
Strange Loop Conference
Рет қаралды 436 М.
Plain Text • Dylan Beattie • GOTO 2023
43:12
GOTO Conferences
Рет қаралды 37 М.
Why Democracy Is Mathematically Impossible
23:34
Veritasium
Рет қаралды 1,2 МЛН
Why Isn't Functional Programming the Norm? - Richard Feldman
46:09
The Art of Code • Dylan Beattie • YOW! 2022
56:47
GOTO Conferences
Рет қаралды 21 М.
The Unreasonable Effectiveness Of Plain Text
14:37
No Boilerplate
Рет қаралды 599 М.
The Web That Never Was - Dylan Beattie
1:01:46
NDC Conferences
Рет қаралды 90 М.
Schoolboy Runaway в реальной жизни🤣@onLI_gAmeS
00:31
МишАня
Рет қаралды 3,9 МЛН