From NUL to DEL: Why 7 Bit ASCII IS Actually Really Clever

  Рет қаралды 27,179

Dylan Beattie

Dylan Beattie

Күн бұрын

Пікірлер: 383
@ke9tv
@ke9tv 3 ай бұрын
Once upon a time, I worked in the office next door to Bob Bemer, the editor of the first ASCII standard. Which, by the way, also specified EBCDIC. IBM was the only manufacturer that embraced EBCDIC rather than ASCII because EBCDIC was more punched-card friendly, and IBM virtually owned the marked on 80-column card equipment. Single newline came from the B programming language. Multics used . X-ON and X-OFF are misidentified in your table. They're DC1 and DC3 respectively. ETB was the standard 'file mark' that separated multiple files on a magnetic tape. EM 'end medium' was a mark that meant, 'this file spans multiple reels, time to switch to the next reel.' NAK - negative acknowledgment - is the ^U that you use to cancel the stuff you're typing at the command line.
@allangibson8494
@allangibson8494 3 ай бұрын
IBM literally invented the punched data card via the Hollerith Company in 1889… Punched cards controlling machines however dates from 1798.
@lbgstzockt8493
@lbgstzockt8493 3 ай бұрын
@@allangibson8494 It's crazy that modern IDEs still have markings at 80 characters. That technology was so far ahead of its time.
@allangibson8494
@allangibson8494 3 ай бұрын
@@lbgstzockt8493 80 Characters was what was determined by the U.S. census as being adequate to store the population information as a line item…
@thezipcreator
@thezipcreator 3 ай бұрын
I didn't even know ^U existed, I've always just done ^C (and shells are smart so they know to catch that and not just terminate)
@TheEvertw
@TheEvertw 3 ай бұрын
@@thezipcreator Shells terminate with ^D, not ^C. ^D is End of Transmission (i.e. connection). ^C is passed on from the shell to the program that is running at the time.
@jasonclark1149
@jasonclark1149 3 ай бұрын
My browser decided to buffer at the perfect moment, in the Morse Code section. "The code for A is , but if you leave a gap, ..." and then it just started spinning. It was a VERY long gap 😂
@fritzp9916
@fritzp9916 3 ай бұрын
Great video. Though I think what deserves to be mentioned is backspace. On paper terminals, you can't delete a character you've already written, so all that backspace did was going back one space, allowing you to print over the previous character. This was useful for making text bold - as you mentioned when discussing carriage return - but also for creating combined characters. Want to type "café"? Just type "cafe", hit backspace, and type an apostrophe. The fonts used for paper terminals were carefully designed to make this look good. Likewise, o with " on top was a "good enough" approximation of ö. Some ASCII characters were included specifically for this reason: the tilde, the acute/backtick, the caret. But most importantly, the underscore. The only reason why it was included was to underline words to highlight them.
@billwall267
@billwall267 3 ай бұрын
Important context!
@gcewing
@gcewing 3 ай бұрын
There was a phase of my life when I was using a 5-bit teleprinter as an I/O device for my homebrew 8-bit system. It unfortunately didn't have any backspace ability, which was very annoying when I wanted to print zeroes with slashes through them. I ended up doing a CR and then going over the whole line again to fill in the slashes.
@Huntracony
@Huntracony 3 ай бұрын
I always wondered how terminal progress bars and such worked! This also explains why these often kinda break when there's an error or warning during the progress bar. Thanks, this was entertaining _and_ useful.
@exciting-burp
@exciting-burp 3 ай бұрын
On modern systems, since roughly the '80s (support added ub Windows 11, which previously used a *very* different method), it's all done using VT100 and its successors. Here you'll find ways to encode things like "move to row X column Y", and "set color to red" (there are hundreds of commands). The trade-off is that commands are no longer single bytes. This was used for the first digital displays, especially for dumb terminals.
@declanmoore
@declanmoore 3 ай бұрын
Windows has even added support for these newer control codes to their console host. Before that, and what still works, is to send commands to the console host driver (condrv.sys) via device IO controls.
@backpackvacuum9520
@backpackvacuum9520 3 ай бұрын
As an American I will proudly ignore all further episodes as I now have everything I need. /s 😂
@_f355
@_f355 3 ай бұрын
so you don't need that emoji over there, right? :)
@nickwallette6201
@nickwallette6201 3 ай бұрын
It's kind of funny, but that decision became a self-fulfilled prophecy. Because it wasn't a given that you would have consistently-mapped upper ASCII characters to represent even the most common international letters, it got to be fairly commonplace to see letters with accent marks dropped back to their un-accented variants. Granted, I'm a native English speaker, and so 26 letters ought to be enough for anybody. ;-) But, it didn't seem to have much of an effect on the intelligibility of words that used those letters. I recall seeing discussions on this where, specifically, Spanish-language and German speakers shrugged it off as, "eh... we knew what it meant." And, again, as a native English speaker, I have rarely considered the word "jalapeno" spelled with anything other than a plain 'n', and yet I recognize it easily enough in either form. On a related note, I got a crash-course in the peculiarities of various languages when I started writing a driver for FAT filesystems. Plain FAT (as in, pre-LFN) is case-insensitive, and meant to only consider letters in the low-ASCII range A to Z. All lowercase alpha chars are converted to uppercase with that toggling of bit 5. But, when LFN support was added, well... now we're dealing with Unicode characters in UTF-16 form, and ... _technically_ ... we should be case-folding everything (I think?) to uppercase to store the 8-dot-3 compatibility entry, and when searching for or comparing filenames in either 8.3 or LFN form. I say "I think" because the official FAT LFN spec is a bit quiet on what to do about chars with ordinals above 127, probably for the most obvious reason: It's kind of a pain to handle those. You have languages that case-fold differently depending on context, and when you're converting from Unicode to local code-pages, that character might not even exist. While (IIRC) the common US English DOS code page has upper- and lower-case variants of all the accented characters >127, not all of the code pages do, making it impossible to represent properly uppercased versions of any given filename entered with lowercase chars. And, of course, if you change code pages (by either changing the local code page, or moving a file to a system with a different code page), the filename might "change" completely, resulting in lowercase chars in the filename, potentially making it inaccessible by normal means, or causing match collisions with files that get created with uppercasing applied. I think some (or maybe most?) implementations just continue cas-folding lower-ASCII chars, and letting everything else slide. All of this because the original implementations were designed in a relatively simple (cultural) language with straightforward rules, and when -- or if -- the thought occurred to anyone about how to handle other languages, they just shrugged and thought, "eh... that's a problem for future developers."
@VoyVivika
@VoyVivika 3 ай бұрын
Clicked on this video only to discover it's by the guy who made the Rockstar programming language, lmao wasn't expecting that. Loved the video btw!
@timseguine2
@timseguine2 3 ай бұрын
It took me longer than I care to admit to figure out that ASCII is also BCD with extra "tag" nibbles in between. You can read numbers off easily in a hex dump just by ignoring the extra 3s everywhere. Well and if you get used to it, you can also read letters pretty easily from the hex dump but that feels more like using one of those old cereal box decoder rings.
@ShenLong991
@ShenLong991 3 ай бұрын
The thing with the numbers is even more pretty if you look closely on the bits and have your hexadecimal in mind. 0x30 till 0x39 are d'0' to d'9'. So if you are in embedded Programming, and plan your decimals according, you can look what each are, without have to try and disect the bits.
@JoQeZzZ
@JoQeZzZ 3 ай бұрын
This Is cool, although an inevitable side effect of the "& 0b1111" thing. In order to get a string to int using only an and, they would have to be LSB aligned, and because there are 10 digits they need 4 bits.
@timseguine2
@timseguine2 3 ай бұрын
This is a side effect of it being backwards compatible with BCD. If you wanted to you could actually do arithmetic directly in string form because of that.
@gcewing
@gcewing 3 ай бұрын
And I still have burned into my brain that there are 7 character codes between '9' and 'A', from all the hexadecimal to binary conversion routines I wrote in machine language. I've sometimes fancied that if I were to design an improved character encoding I would make the first 36 codes be all the digits followed by all the uppercase letters. It just makes sense. To a programmer, at least.
@peterlinddk
@peterlinddk 3 ай бұрын
A lot of the "skipped" codes, like ACK, NAK, and SYN was used in a lot of early communication-protocols, like XMODEM and the likes. And for some reason I don't understand, DC1 and DC3 was used for XON and XOFF that I think we all remember from the old modem-days. I don't know why SO and SI are called X-On and X-Off in some ASCII-tables ... maybe some other protocols used those? Ah, the days of RS-232 ASCII-based protocols!
@timseguine2
@timseguine2 3 ай бұрын
STX (start text) and ETX (end text) are used sometimes for framing purposes.
@ke9tv
@ke9tv 3 ай бұрын
DC1 and DC3 turned on and off the paper tape reader on Teletypes. DC2 and DC4 turned on and off the paper tape punch. When you were sending a paper tape down the line, if you were threatening to overrun a buffer, the other end would send DC3 to say, 'hold on there, tiger', and DC1 again when it was ready to slurp down more. ^S and^Q still work that way on most Unix terminal emulators.
@darrennew8211
@darrennew8211 3 ай бұрын
Shift in and shift out controlled what you might think of as the typeface.
@edgeeffect
@edgeeffect 3 ай бұрын
In your chart, you've got x-On and X-off as 14,15 SO,SI "control-N" / "control-O" but, in any system I remember, XOFF and XON are DC1/DC3 17,19 "Control-S" / "control-Q" ... and that takes me back to writing printer handshaking diagnostics for the repair centre at work and saying, "Oh that's why some of the old 8-bit machines had control-S to pause scrolling". My old manual typewriter didn't even have an exclamation mark... because you could make one out of single-quote, backspace, full stop.
@fburton8
@fburton8 3 ай бұрын
Control-O was commonly used to throw away the rest of the current terminal output.
@karlfimm
@karlfimm 3 ай бұрын
I still remember hitting my first EBCDIC files (about 1985) and being amazed that the A-Z characters were scattered around at what looks like random.
@timseguine2
@timseguine2 3 ай бұрын
If I remember correctly, EBCDIC was designed to be backward compatible with IBM's punchcard systems, which were still relevant at the time. I think there were considerations for efficient electromechanical sorting and also for not producing too many consecutive holes in the card which could clog the reader or the hole punch machine. Back when it was invented, IBM was almost superstitious about punchcards because they were a huge reason for their financial success, and continuing financial success. In hindsight they don't seem so important of course.
@peterholzer4481
@peterholzer4481 3 ай бұрын
@@timseguine2 Right. The punchcards didn't use a binary encoding of the digits 0-9. Instead they had 1 row for each digit. So it made sense to use only the digits 0-9 in the lower nibble for the letters, too. There is a picture of a punchcard in the Wikipedia article about EBCDIC. It looks quite neat and not random at all.
@darrennew8211
@darrennew8211 3 ай бұрын
They're lined up properly if you ignore the right holes on the punch card, rather than ignoring the right bits in a byte.
@pjl22222
@pjl22222 3 ай бұрын
EBCDIC was just a newer, fancier version of BCDIC, binary coded decimal interchange code, which itself was more a group of similar but different encodings. BCDIC was a 6 bit encoding where the numbers 0-9 were encoded as the values 0-9 and everything else was distributed basically randomly. The letters (uppercase only) were divided into three groups which were backwards, S-Z was encoded with smaller numbers than J-R which were smaller than A-I. EBCDIC is an 8-bit encoding (although many code points were left undefined) which didn't fix the noncontiguous problem but it did fix the order of the letter groups.
@jasmijnwellner6226
@jasmijnwellner6226 3 ай бұрын
ASCII 27 (ESC, generally written in source code as \x1b, \033 or \e) is still used a lot for terminal applications for more complex than can do, including changing the colour of the text or background!
@ke9tv
@ke9tv 3 ай бұрын
There was a whole ANSI standard that came later for what the various escape codes were supposed to do. (Nobody implemented the whole thing, and no two vendors implemented the same parts.)
@jovetj
@jovetj 3 ай бұрын
Don't forget *^[* 😉 ESC is a pretty important character. Not as important as 0x0A or 0x0D, though.
@BradHouser
@BradHouser 3 ай бұрын
VT100 and later ANSI Escape sequence made BBS pages colorful and graphical (boxes, symbols, etc.). DEC added REGIS graphics to the Escape sequences, and graphic primitives could be drawn on the screen enabling interactive graphis terminals, all using 7 bit ASCII.
@Vennotius
@Vennotius 3 ай бұрын
I enjoyed this one very much. I still remember discovering an ASCII table in one of my father's handbooks when I was a kid. This video took me back.
@ShadowKestrel
@ShadowKestrel 3 ай бұрын
^D isn't dead and gone. in most CLI/TUI contexts it's a semi-standard way to close out into the parent shell, and still works well in cases where ^C is taken (e.g. in the python shell, where it will raise KeyboardInterrupt).
@RoamingAdhocrat
@RoamingAdhocrat 3 ай бұрын
incidentally if you're using the python shell more than very very occasionally... install ptpython infinitely nicer python shell
@pidgeonpidgeon
@pidgeonpidgeon 3 ай бұрын
Except on windows where its usually Ctrl Z
@darrennew8211
@darrennew8211 3 ай бұрын
What ^D does is it sends any buffered data, including whatever you've already typed, but not the ^D. If you type it with nothing buffered, then it sends zero bytes. And Unix treats a read of zero length as an end-of-file. ^Z is the character that CP/M decided to put inline to mark the end of a text file, because all files in CP/M were a multiple of 128 characters long. You never saw a file that was like 74 bytes long, so if you had a 74-byte text string in a file, you tacked ^Z on as byte 75.
@RoamingAdhocrat
@RoamingAdhocrat 3 ай бұрын
@@darrennew8211 I don't know why KZbin sent me a notification about your comment but I'm glad it did.
@nickwallette6201
@nickwallette6201 3 ай бұрын
Huh. While I knew the conventions of the above, I did not know the reasons why. This has been educational.
@kevinmcnamee6006
@kevinmcnamee6006 3 ай бұрын
Excellent video. It certainly brought back memories. My first job as a programmer (1975) was working on code that allowed IBM mainframes to communicate with ASCII terminals. This involved translating ASCII to EBCDIC and of course worrying about how all the control characters worked, like CR, LF, TAB, NULL, etc. On the old Teletype 33 terminals you even had to worry about how long it would take for carriage to return to the left margin after printing a long line, and insert enough NULLs to allow it time to happen before the next printable character arrived. We referred to them as dumb-ASCII terminals. One thing that made things more tricky was that the guy who wrote the specs for the communications controller on the IBM mainframe got the bit order reversed, so the low order bit from the IBM system was sent as the high order bit on the wire. Another difference was sort order. In ASCII, digits sort first, followed by lower case letters, and then upper case. In EBCDIC, upper case letters sort first, followed by lower case, and then numbers.
@nuk1964
@nuk1964 3 ай бұрын
One of the frustrations that I remember from the early 1980s was the occasional mangling of data when going between EBCDIC and ASCII world. Alphabetic and digits were OK, as most of the punctuation. Manged were things such as horizontal tabs, circumflex, backslash, curly braced and square brackets (apparently some versions of EBCDIC had these, and some did not, and those that did sometimes they appeared in different locations). E-mail and general text would generally pass through OK (or if was "mangled" in translation, it was still understandable). What was not so great was when you tried to transfer some source code in languages like C or Pascal. Learned quickly to NOT use TAB characters for indentation (due to the inconsistent translation -- sometimes it translated directly into a single spaces, other times it would get "expanded" to a sequence of spaces but inconsistently -- if you're lucky it expanded to the right number of spaces to preserve the indentation, but more often than not, it didn't). This helped to preserve the indentation of code -- allowing for easier recovery when the curly braces would get lost (and you had a better chance to guess correctly the location of those missing curly-braces). The loss of curly braces, square brackets and backslashes would render C source code unusable -- but a "somewhat obscure" feature of trigraphs became quite useful in this case. Downside is they make your code *really ugly*. For Pascal code, found the some the alternates used in Pascal/VS on the IBM useful -- such as the "(." and ".)" aliases for the square brackets, and "->"" alias for the caret.
@nuk1964
@nuk1964 3 ай бұрын
My first encounter with double-byte character set was on the Control Data mainframe -- where a double-byte system was used to get beyond the limitation of 6-bit bytes. It was also on the Control Data systems that I'd finally understood why Pascal had used eoln() function (rather then looking at the character value and check for carriage return or linefeed) -- end-of-line was a very specific pattern (iirc it was something like a word-aligned sequence of contiguous zero-bytes -- where there were 10 6-bit bytes in a 10-byte word).
@pitan9445
@pitan9445 3 ай бұрын
First time viewing you channel - this was excellent. Before HTML was a thing, I worked for an organisation selling structured news (sports results &c) We used record separators (RS, ascii 30) and file separators (FS ascii 28) to split up our rows and fields. It took me a long time to realise we were redefining the acronyms.
@ke9tv
@ke9tv 3 ай бұрын
RS was right to separate records. The fields should have been separated with US, unit separator. GS and FS were higher level.
@mhzellers
@mhzellers 3 ай бұрын
If you have ever punched a Hollerith card, EBCIDIC makes a certain amount of sense.
@LordPhobos6502
@LordPhobos6502 3 ай бұрын
Looking forward to next week's video! Reading ascii codes in decimal hurts my poor lil brain though, I was taught early on in hexadecimal, and it always made more sense to me that way :)
@Lord-Sméagol
@Lord-Sméagol 3 ай бұрын
I learned BASIC at school using an ASR-33 TeleType dialling in to an HP 2000F, saving my programs to paper tape. Sometimes, classmates would want to know which program was on their paper tape that they forgot to write the name on. This was easy enough if the terminal wasn't being used, but I could read the holes and tell them :)
@davidh.4944
@davidh.4944 3 ай бұрын
I've always liked how caret notation makes clever use of the ascii scheme. If you ever hit backspace in a terminal and see ^H^H^H or cat -A a text file written in windows notepad and see a bunch of ^Ms (or see the programmers use them in comments here), it's because the display has taken the non-printing character, flipped one bit, and is presenting it as its corresponding alphabetic block character. So NUL (00000000) becomes ^@ (01000000), TAB (00001001) becomes ^I (01001001), etc. It also works in reverse to enter these characters, as the Control-C bit in the video explained. Very clever.
@billwall267
@billwall267 3 ай бұрын
Excellent. Thanks.
@lennartbenschop656
@lennartbenschop656 3 ай бұрын
They even did take care to support foreign western languages to some degree. ASCII includes the grave accent `, circumflex accent ^ and tilde ~ and you could backspace and print it over a letter (on a real teletype, not on a video screen). The single-quote/apostrophe character 0x27 ' did triple duty as an acute accent and in some old fonts it looks like a mirror image of the grave accent. The double quote character " could be used as umlaut/diaeresis in a pinch. The double-quote and single-quote characters were also common on typewriters and these did not have separate opening an closing quotes. The underscore character was meant to be overprinted on other text as well, just doing a CR without LF.
@greggoog7559
@greggoog7559 3 ай бұрын
You can do it on a video screen too. It's called "Compose" and you just press the Compose key (whichever key you've assigned for that purpose) and then for example 'a' and '^'.
@lennartbenschop656
@lennartbenschop656 3 ай бұрын
@@greggoog7559 That has nothing to do with ASCII as such. Compose combinations are substituted with codepoints for accented letters (formerly in your favorite 8-bit code page, today in Unicode). I was talking about old printers that only had 7-bit ASCII and could print a letter, then backspace then the accent.
@ReneKnuvers74rk
@ReneKnuvers74rk 3 ай бұрын
13:14 I’m pretty sure not the creators of ASCII threw all the hyphens and quotes on a couple of piles - it was the teletype-makers that around 1900 to 1960’s had no 1, only an i without a dot, a separate dot that doubled as a single quote, and no separate characters for o and 0. That meant that ASCII adding back these additional characters would force mechanical changes to the devices that were supposed to use the new standard. Since computers need a distinction between a letter and a number the 1/i and 0/O issue was required to be solved, but the start and end quotes have no functional meaning in a computer.
@darrennew8211
@darrennew8211 3 ай бұрын
Not just teletype. That was pretty common on typewriters too.
@dragonfly-7
@dragonfly-7 Ай бұрын
That was awesome ! Quite an excellent wrap-up of lots of things I had been learning in the past 50 years or so. Thanks a lot !!! When I did porting a software to an Amdahl machine back in 1993 I had been driven crazy when trying to test the s/w (BTW: compile of pure C code when thru without a clitch). I had lots of attempts with entering the license key. After launching the debugger it turned out that a character was missing. Finally the system admin did ask which characters are among the license key. It turned out that that was the right one; The '#' (a.k.a. hash or pound) was used to was used as a "DEL'/delete character to 'X' out unwanted input. Typewriter style software at its best ...
@dj196301
@dj196301 3 ай бұрын
Subscribed! No dumb-ass stock footage, no tangent shots, just an entertaining and informative chap talking about cool stuff. Looking forward to "Why UTF8 is Actually Very Clever"--unless you've done and ii just haven't seen it. Thank you.
@DylanBeattie
@DylanBeattie 3 ай бұрын
@@dj196301 thank you! UTF-8 is coming in a few weeks. Got some other stuff to talk about first :)
@lostcarpark
@lostcarpark 3 ай бұрын
You skipped over 16-31 very fast. I think the Escape character at least deserves a mention! You mention Morse code, but there were several other digital codes that predate even computers. Baudot was developed in France in the 1870s for telegraph machines as a 5-bit digital code. The early consoles used a piano-like keyboard, and required operators to press keys together to make chords, so the code was designed to be easier for operators, with more common letters in single bit positions, and even the numbers weren't continuous. This was later adapted into Murray code, in the early 20th with the development of teletype terminals and teleprinters that let operators use a QWERTY style keyboard. As they were mechanical, the code was designed to minimise wear on the machinary. Finally, fully electronic machines started appearing in the 1930s, leading to the development of ITA2 (which at least put the numbers back in a contiguous block). Having been developed for one purpose and evolved and tweaked for others, the code was quite messy, so we can probably be grateful that the designers of ASCII decided to go with a clean sheet design. There probably is a universe in which they decided to take Baudot/ITA2 and extend into a 7 bit code. ASCII effectively has four 5-bit "pages". I could imagine taking the "letter" and "figure" modes of ITA2 as two of those pages, than adding lower-case and control codes as the other two. Then, your video would be explaining why the ASCII code letters weren't in alphabetical order.
@AdrianDerBitschubser
@AdrianDerBitschubser 3 ай бұрын
11:50 The Rest contains one really important character: The ESC, or Escape-Character. It is used with ANSI Escape Codes to generate all the wonderful color and other formatting in terminals even to this day. Maybe that is worth a video.
@BradHouser
@BradHouser 2 ай бұрын
Fun Fact: Some of us remember the key-strokes Ctrl-S and Ctrl-Q. They are the ASCII codes to stop and resume display output. They use the codes for Device Control 1 (ASCII 11 Hex) and Device Control 3 (13 Hex) to tell the sending device to stop sending data.
@McDuffington
@McDuffington 3 ай бұрын
One of my favorite subjects! Looking forward to the follow up parts!
@amarqueze
@amarqueze 3 ай бұрын
Very nice video. I work with computers since the 80s, and never though about ASCII. Now I know how python progress bar is built and other clever ideas. Well done Dylan!
@TimSavage-drummer
@TimSavage-drummer 3 ай бұрын
EOT (Ctrl+D) is still used in Unix/Linux to end a terminal session. I also find it odd that 28-31 aren't used more, they are perfect for use in CSV(like) files to avoid needing to do escaping etc.
@lupinzar
@lupinzar 3 ай бұрын
The utility of CSV is that you can edit it in pretty much any text editor in a pinch and it still remains (fairly) human readable. Once you introduce control codes that won't be visible at all in some editors and require special settings in others, you might as well develop a binary format that is more efficient. That said, if you can't influence the design of a data format and need an extra set of delimiters they are useful, but probably not best practice.
@darrennew8211
@darrennew8211 3 ай бұрын
Control D doesn't end a terminal session. It flushes the keyboard buffer without adding anything to it. If you're at the start of a line, then you flush zero bytes. A read from a file of zero bytes indicates an end of file in Unix. So the terminal reads zero bytes, thinks its input is closed, and exits. Write a program that sits in a loop reading the stdin and writing what it gets without any buffering. Then type "ABC" and hit ^D, and you'll see instead of exiting it just prints ABC.
@niczoom
@niczoom 3 ай бұрын
Great video and very well explained! The point about why certain commands are still in use today and their origins was very interesting. I learned something new-thanks for sharing
@protheu5
@protheu5 6 күн бұрын
This was a really captivating video, well presented, interesting stuff.
@MattJoyce01
@MattJoyce01 3 ай бұрын
Some of this I knew, but I didn't realise the deliberate design elements. Good job.
@bishaladhikari9499
@bishaladhikari9499 3 ай бұрын
Loved every second of it
@OhhCrapGuy
@OhhCrapGuy 3 ай бұрын
I've actually used 0x1F instead of commas when I needed to save something with the sheer simplicity of a CSV file while not having to figure out the logic of how to handle data with commas or quotes in them. Works great. You know, since that's what it's for, haha
@Dominik-K
@Dominik-K 3 ай бұрын
Thanks a bunch for this video. I've known most of these things already, but in my programming career knowing those fundamental bit layouts and tricks had been so valuable to writing efficient and understandable code
@rabidbigdog
@rabidbigdog 3 ай бұрын
Good lawd, this was awesome. Kinda hilarious how everyone else tried to ensure IBM was out there in the wind.
@KX36
@KX36 3 ай бұрын
I got distracted at 3:50 and reimplemented morse code as a canonical Huffman code. By hand, in Excel, for fun. 😅 Each character is 3-9 bits long but it's a binary prefix code so no need for gaps in transmission.
@ib9rt
@ib9rt 3 ай бұрын
When I was first introduced to computers in 1977, I used an ASR-33 Teletype complete with paper tape punch/reader. The ASR-33 only had uppercase letters, so it was with a sense of wonder I discovered that some more advanced terminals could also do lowercase! And everyone wrote the obligatory program that scanned through codes 0 to 127 and printed them out to see what they would do. Sending a string of ^G characters to an ASR-33 produced a sound never equaled by later devices, especially since they never seemed to insert a gap between the beeps.
@andythebritton
@andythebritton 3 ай бұрын
This seems to be an abridged version (or possibly the first episode) of Dylan's 'No such thing as plain text' talk, which is well worth a watch.
@DragoniteSpam
@DragoniteSpam 3 ай бұрын
Lol I didn't expect that little shower thought to turn into a whole video, good fun!
@bread8070
@bread8070 3 ай бұрын
One more thing, following on from how upper and lower case are separated by a single bit: look at the number keys on a keyboard and the symbols on them. Starting from 1 you’ll notice the codes for the numbers and symbols are also separated by a single bit. It goes a bit wrong about half way along, but on old keyboards (pre IBM PC) this usually works for the whole set. Now look at the keys for the non alphabetic symbols in those two alphabetic ‘blocks’. You’ll find the symbol in the low case block is on the same key as the equivalent symbol in the upper case block. Thus, the symbols and numbers on most keys differ only by a single bit. Why? Because taking a keyboard scan code and converting it to ASCII requires a bunch of code and a look up table. Old computers were very slow and had very little memory. So old keyboards generated ASCII codes in hardware, to be returned to the processor. Arranging the keys so the symbols on them were one bit apart made the hardware much simpler. To be fair, it’s probably fair to say that the ASCII codes were derived from existing typewriter layouts. So it’s actually the ASCII code ordering being chosen to match the keyboard layout rather than the layout being designed to match the ASCII. But that just makes the ASCII design even smarter. (And I suspect the same is true for teletypes and the symbol pairings on the hammers - which were probably inherited from typewriters anyway).
@ke9tv
@ke9tv 3 ай бұрын
There was also a design for conversion between EBCDIC and ASCII that required only a handful of transistors. The two standards were developed together. (IBM 026 and 029 card code preceded EBCDIC.)
@RaceriEmil
@RaceriEmil 3 ай бұрын
Thanks. That was very informative and insightful. I like your delivery and the small jabs/joked you put in. I am looking forward to your next video!
@agranero6
@agranero6 3 ай бұрын
It is mostly forgotten that we have SOH, STX, ETX, EOT, ENQ, ACK, SYM, ETB, FS, GS, RS, US, and particularly EM: end of medium. This was primarily designed for data transmission like Baudot and not for use for use on the computers themselves (like memory and files) as the very name states: "for Information Interchange". It is interesting to analyze those systems by their purpose (a teleology if you want): Morse made the most used characters shorter (he went to a printing press and looked at the size for the type cases, the most common were bigger, yes this is why we call uppercase and lowercase); Baudot was firstly designed to minimize the wear in the mechanical parts of the telegraph (not the modern Baudot), and ASCII, well we see hew hints of a protocol attached to a machine as those mentioned and DC1, DC2, DC3 and DC4. I always wonder if it was used this way or that part of the standard was simply ignored. Yeah, a teleprinter used many of them, but certainly not FS, GS, RS and US they are used for sending files not only inside of files, you do not need FS inside a file (except maybe a file like a TAR) but need it on a data stream that has several files, like a paper tape a magnetic tape or something like that.
@foo0815
@foo0815 3 ай бұрын
Thanks for the DEL story!
@ChannelSho
@ChannelSho 3 ай бұрын
Another neat thing about the way the digits are organized in ASCII is if you convert it to hex, you just look at the lower half and you'll get the number. Also I like how the alphabet characters start with bit 0 as 1, because it makes more sense to use that A = 1 rather than A = 0.
@lennartbenschop656
@lennartbenschop656 3 ай бұрын
Between Morse code and ASCII there was also ITA2 (sometimes incorrectly called Baudot code), a five-bit code for mechanical teletypes. It used control codes (letters and figures shifts) to switch between letters and digits/punctuations. ASCII still has SO/SI control codes to make it possible to temporarily switch to a different character set. ITA2 has a Null character, CR and LF and even Bell and "Who are you" (similar to the ENQ control code in ASCII).
@OranCollins
@OranCollins 3 ай бұрын
I've always loved your talk on ascii. Love seeing more stuff from your brain! keep it commin!
@BradHouser
@BradHouser 3 ай бұрын
My first programming was over a dialup teletype at 110 Baud or 10 characters per second. I was in high school in the '70s and dial up time share systems running BASIC cost $6.00 per hour, so connect time was precious. You wrote your program offline on paper, then entered it on the teletype, punching it on tape as you typed, and if you made a mistake, the DEL key was like digital White out. Of course, it did not speed up data transmission. Once you had it all typed onto paper tape, you dialed the number with a Touch-Tone keypad, logged in and then played the paper tape back to upload your program. Then you ran it, you could also renumber, and list it back and re-punch it for later. When I told my mom I needed money to learn BASIC programming, she asked what I did on the computer. I told her games. I love her: she didn't complain. I became an Electrical Engineer/Computer Science guy.
@BradHouser
@BradHouser 3 ай бұрын
One of my friend's dad had a 300 baud terminal/printer, and we used to dialup GE's free modem line and just print out stuff in order to watch it work.
@OrigamiMarie
@OrigamiMarie 3 ай бұрын
Ctrl-d is still used a little with Bash. If you want to quit a user session fast (and can't be bothered with "exit"), ctrl-d will end it.
@edgeeffect
@edgeeffect 3 ай бұрын
Reminds me of the MCP in Tron with his "End of Line". Ctrl-d can be used anywhere you want to end a file like `cat - >my_file.txt` - type a line, type another line, ctrl-d
@rogerramjet8395
@rogerramjet8395 3 ай бұрын
And CTRL-L to "clear" the screen. (Maps to "Form Feed" … which shifted the paper to the start of the next - blank - page).
@pidgeonpidgeon
@pidgeonpidgeon 3 ай бұрын
Ctrl D is used a lot on Linux in general. Anytime you use a pipe it takes one processes stdout and connects it to another's stdin and the convention to say that the stdout is empty is to send Ctrl D
@0LoneTech
@0LoneTech 3 ай бұрын
@@pidgeonpidgeon No, ctrl-d for end of transmission is in the terminal (tty) layer. Between processes end of file is indicated by closing the connection, see shutdown or close system calls. The terminal in cooked mode also permits using ctrl-d to input an unterminated line without ending the file, similar to fflush, or actually transmitting EOT with ctrl-v ctrl-d. More details in e.g. stty(1); try "stty -a".
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
It's more than bash. *nix uses ^D to mean EOF. Any program reading from STDIN getting an EOF would exit as it can no longer read any input; eg: $ cat > hello World ^D $ cat hello World $ Thus, when you put an EOF (as the first character) to bash, it gets an EOF and exits, as do sh, csh, tsh, etc.
@xdcountry
@xdcountry 3 ай бұрын
That was great. Excellent tour through the origins. Just incredible.
@JamieBainbridge
@JamieBainbridge 3 ай бұрын
Ctrl+d is still commonly used on Linux. It's the way to logout of a shell, and also the way to get out of a REPL like Python.
@cfhay
@cfhay 3 ай бұрын
EOT (End of Transmission) is Ctrl+D and can be used today still. Ctrl+D in Linux (and other similar systems) will flush the current buffer. If this buffer is empty, it will result in a zero-byte read. A zero-byte read mean end of file/end of input in most contexts. For example using it in at a shell prompt will cause the shell exit with exit code zero. If that was a login shell, it causes a logout. I use it every day. Also ESC is widely used to decorate Linux console output (colors etc).
@orterves
@orterves 3 ай бұрын
Good video, nice refresher of a topic I haven't really thought about directly since university - except for bloody Windows crlf when working with cross platform code
@clasqm
@clasqm 3 ай бұрын
ASCII 27 still maps to the Escape key.
@briansepolen4917
@briansepolen4917 3 ай бұрын
One great thing about these blocks described is that one can see that like using Ctrl-C for ASCII 3 (ETX), one can also use Ctrl-[ (ESC) instead of lifting hands off the home row for Escape. Great for increasing TUI speed and efficiency.
@nurmr
@nurmr 3 ай бұрын
Yep, ESC is essential for CSI (and SGR in particular), so without it there would be no ANSI terminal colors!
@Colaholiker
@Colaholiker 3 ай бұрын
I don't normally comment on the clothing style of KZbin creators - but that t-shirt rocks. 🤣
@__christopher__
@__christopher__ 3 ай бұрын
Control character 4 (EOT), that is, Ctrl-D, still lives on in terminal emulators of Unix-derived system like Linux as the end-of-file character (although technically it's the flush-input-buffer character, but returning an empty input is interpreted as end of file on Unix-derived systems, therefore it effectively acts as end-of-file for terminal input and also is commonly referred to as such; the difference can be seen if you try to use it on a non-empty line).
@enterrr
@enterrr 3 ай бұрын
Correction to the last frame: ASCII has 128 characters, not 127 😏
@darrennew8211
@darrennew8211 3 ай бұрын
I bet you could argue that DEL is not a character. :-) I saw that too, and then thought about it.
@enterrr
@enterrr 3 ай бұрын
@@darrennew8211 more likely he does not "feel" NUL (\0) is a character in the earnest. But gut feeling or C ASCIIZ hangups) are irrelevant - the ASCII is defined as 128 7-bit characters
@darrennew8211
@darrennew8211 3 ай бұрын
@@enterrr Granted that NUL on paper tape is arguably less of a character than DEL is. :-)
@enterrr
@enterrr 3 ай бұрын
@@darrennew8211 that's like calling 0 less of a number than 1, hehe
@darrennew8211
@darrennew8211 3 ай бұрын
@@enterrr Not really. I mean, unless you want to say the tape comes pre-filled with NUL characters, right? :-)
@darrennew8211
@darrennew8211 3 ай бұрын
Fun facts: The ASCII underscore character was originally a left-pointing arrow, which is why Smalltalk (from around 1976) uses "_" as the assignment operator, and why Pascal (designed to work with EBCDIC also) uses ":=" instead, to look like an arrow as close as you can get on punched cards. EBCDIC has the same sort of bitwise feature for letters that the upper/lower trick in ASCII uses, except it's designed for punched cards. So with a card 11 columns high, the letters are in "contiguous" numbers if you ignore the proper holes on the card rather than ignoring the proper bits in the byte.
@__christopher__
@__christopher__ 3 ай бұрын
Actually, Pascal had several digraphs to be used when certain characters were not available. For example, Pascal comments were written in curly braces {like this}, but in case curly braces were not available, you could also use parentheses with asterisks (*like this*). Now the only character used by Pascal that was not available in ASCII was the left arrow, whose digraph replacement was :=, which is why that one became commonly known as the Pascal assignment operator.
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
​@@__christopher__ Were '" not available? "" for assignment, eg: RA -> VARLOC means the contents of the A register are stored in the location pointed to by VARLOC (effectively a variable).
@__christopher__
@__christopher__ 3 ай бұрын
@@cigmorfil4101 that's already the less-equal operator.
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
@@__christopher__ Interesting how all the BASICs I've used over load the '=' operator to mean both "assign" and "compare equal" - the meaning based on context. How about ""? (That looks more like an arrow than ":=".)
@__christopher__
@__christopher__ 3 ай бұрын
@@cigmorfil4101 that is already a less-than followed by a unary minus operator. Also, := was already in use in mathematics for definitions, so it fits quite well. Note also that a proper assignment statement in BASIC was LET var = value A lot of BASIC interpreters (in particular Microsoft's) allowed omitting the LET though.
@aaronbredon2948
@aaronbredon2948 3 ай бұрын
ASCII being 7 bit covered most of the generic characters including accented characters via overprinting. If the inventors had wanted to include all possible characters across the world, they would have needed at least 2 bytes per character to be able to handle Chinese and Japanese ideographs. Leaving the remaining 128 values of a byte unspecified allowed different countries to add country specific characters. In the IBM PC world these were implemented as “code pages”, and were a bit of a problem when talking between countries. Unicode eventually resolved this communication problem, but it requires 32 bits or 4 bytes to encode the over 140,000 characters, and there are visually identical Unicode characters that are logically different, which makes it easier for scammers to fake internet addresses. And something as large as Unicode wasn’t practical in the early days of computing, when every single byte saved was significant. EBCDIC had the advantage that numbers were readable on hex crash dump printouts, but numbers and letters shared the same character codes (C1 represented either A or positive signed 1, depending on what the data type was.)
@gcasar
@gcasar 3 ай бұрын
so happy i got this as a suggested vid
@dimitrioskalfakis
@dimitrioskalfakis 3 ай бұрын
useful and well presented.
@sheridenboord7853
@sheridenboord7853 3 ай бұрын
Great talk thanks. I always suspected DEL because of how it sat in the ASCII table didn't look right. A control character all by itself as if it was an after thought. So a program reading from a stream would just ignore DEL characters.
@aDifferentJT
@aDifferentJT 3 ай бұрын
Ctrl-D in the terminal is great, it will exit most REPLs or shells
@philipoakley5498
@philipoakley5498 3 ай бұрын
I remember doing port-a-punch cards in EBCDIC for my first computer programmes at grammar school! 10-6-8 everyone (or was it 11-6-8;-).
@andrewjameswelch
@andrewjameswelch 3 ай бұрын
Great vid, thanks. A follow up vid could be a similar explainer about how utf-8 uses multiple bytes and what happens when that is read using a single byte encoding.
@williamlyerly3114
@williamlyerly3114 3 ай бұрын
As one who lived and died on TTY33/35 devices this was very interesting. Programmed in SLEUTH (Univac assembler) later BAL. Lived in ASCII land.
@threee1298
@threee1298 3 ай бұрын
New to the channel, this is wonderful
@flamewingsonic
@flamewingsonic 2 ай бұрын
You missed one very important use of characters in the control block: character 27 (ESC) is used by terminal emulators as part of the "control sequence introducer" ("CSI") to do things such as changing foreground/background color, setting bold/italics/underline, etc. Although this is more rpevalent in UNIX world, even DOS (and the Windows command prompt) had a device driver (ANSI.SYS) supporting these ANSI escape codes.
@unvergebeneid
@unvergebeneid 3 ай бұрын
Thanks for bringing the GIF/JIF debate to EBCDIC ;D
@DylanBeattie
@DylanBeattie 3 ай бұрын
The first c in EBCDIC is pronounced like the c in "Pacific Ocean" - what's the problem? 🤣
@unvergebeneid
@unvergebeneid 3 ай бұрын
@@DylanBeattie ;p luckily I have yet to see someone argue that those 256-color images are pronounced "SHIF" :D
@kevskevs
@kevskevs 3 ай бұрын
Praised be the Algorithm ... it happens VERY rarely that I want to upvote a video and notice that I have already done so. Guess I'll have to subscribe ...
@KX36
@KX36 3 ай бұрын
The Device Control characters are still very important for configuring barcode scanners. How do you change the settings of a barcode scanner about e.g. whether or not to insert a or or nothing after scanning a barcode, you send combinations of device control characters followed by alphanumerics. Exact combinations are device specific. Also, we just last year migrated away from a 1980s unix program (still a very popular program) that uses a database of literal ascii strings, each field separated by the Record Separator character.
@u9vata
@u9vata 3 ай бұрын
The ESCAPE ascii character is often used in various APIs like old BIOS interrupts for reading the keyboard you can grab "scan" codes or ascii codes. Most people who wrote games go for scan codes and many other software too, but even though there are no ascii returned properly for arrow keys for example, the escape key generates the ESC character properly in the bios - just example.
@Tweekism86
@Tweekism86 3 ай бұрын
7:12 Speak for yourself! I still use Control-D, to close terminals and exit SSH sessions, quit python or node.js and the like. Edit: Love the video btw, can't wait for the next one :)
@SirusStarTV
@SirusStarTV 3 ай бұрын
On Windows python repl only accepts ^Z and enter key needs to pressed for it to work
@Tweekism86
@Tweekism86 3 ай бұрын
Dammit Windows, this is why we can't have nice things!
@0LoneTech
@0LoneTech 3 ай бұрын
@@Tweekism86 In this case you can blame CP/M, in particular where file length in bytes was not recorded.
@DrCoomerHvH
@DrCoomerHvH 3 ай бұрын
I like how you've recycled some of the points from your talks into their own little videos, especially when the video topics are directly interactive with the community or fans.
@zeitgenosse
@zeitgenosse 3 ай бұрын
I'm very much looking forward to the next episode (kohuept and èÁÒÉ ðÏÔÅÒ).
@KhalilEstell
@KhalilEstell 3 ай бұрын
Amazing video, loved it.
@rollinwithunclepete824
@rollinwithunclepete824 3 ай бұрын
Very interesting! Thank you
@probablypablito
@probablypablito 3 ай бұрын
Incredible video!
@chri-k
@chri-k 3 ай бұрын
I can't believe you just called ^[ and ^D unimportant
@BobFrTube
@BobFrTube 3 ай бұрын
The extra bit also provided parity.. CR and LF were separate because going to the next line on a teletype took two character times. Multics chose LF as the NL because CR could be considered as not doing anything. _ was originally a left arrow.
@acasualviewer5861
@acasualviewer5861 3 ай бұрын
In old Apple ][ word processors you'd enter control characters to teach the word processor how to work with your new printer (instead of drivers). Also they were used for modems. We had to type in weird characters to get the modem transmitting.
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
Apple ][ basic also used the VT52 arrow key ESC sequences to move the cursor around the screen ready for copying - you listed a line and then had to ESC-D-ESC-D etc to get to the start of what you wanted to copy, use the -> key to copy, and use lots of ESCs to skip over blank characters - the Apple ][ was aggressive in printing out blank spaces and word wrapping. To fix the excessive spaces we set the right hand edge of the window to be one less character than it needed before it word wrapped (indenting from line number) and so it would not put in the excessive end and start of line spaces. The next thing I did was to write a character output trap which ignored spaces outside quotes and showed control characters in reverse (particularly for the DOS ^D prefix character, but it also showed up any other CTRL codes) to make it easier when editing such lines.
@jovetj
@jovetj 3 ай бұрын
Excellent video!
@gwaptiva
@gwaptiva 3 ай бұрын
Thanks; now I know how that blasted vertical tab got into that text field that then didn't serialize to XML CDATA, but in fact errored out completely
@Squossifrage
@Squossifrage 2 ай бұрын
4:51 While eight-bit bytes were already common when work on ASCII began in the early 1960s, they did not become ubiquitous until the mid-to-late 1970s.
@notthedroidsyourelookingfo4026
@notthedroidsyourelookingfo4026 3 ай бұрын
8:18: Dylan taking a stance on tabs vs. spaces 😂
@SteveOnTheInterweb
@SteveOnTheInterweb 3 ай бұрын
More about the ASCII graveyard, please! For instance, RS Record Seperator, now used in application/json-seq format to separate JSON objects, e.g. in a streaming event log that will never finish. Lots of goodies in the graveyard...
@pepe6666
@pepe6666 3 ай бұрын
awesome content. subscribbbed. also props for the def leppard shirt
@mag-icus
@mag-icus 3 ай бұрын
Also, ctrl-d is still used to mark end of streams on unix. So it is not just ctrl-d and ctrl-g that has survived until this day.
@luserdroog
@luserdroog 3 ай бұрын
I like this, but what about the earlier threads like Jacquard Looms? There's some fascinating stuff in the first APL books (I forget if it's in A Programming Language or Automatic Data Processing) about how to design encodings for punch cards with various numbers of holes.
@chrisd561
@chrisd561 3 ай бұрын
Great video!
@dfs-comedy
@dfs-comedy 3 ай бұрын
Ctrl-D is still "End-of-File" in UNIX tty land.
@darrennew8211
@darrennew8211 3 ай бұрын
Technically not. It's "send the buffered output without sending the ctrl-D". If there's no buffered output, the program gets a read length of zero, which is EOT. But if you type something first and hit control D, it just sends what you typed.
@AutomatedChaos
@AutomatedChaos 3 ай бұрын
While working in IT for more than 2 decades now, it surprises me that developers try to invent character separated values (csv) for columnar data again and again while there are literally 4 ASCII characters reserved to handle these cases. But no, let's use the comma, semicolon, tab (\t), pipe, tilde or even the |~| combination as separator with all problems that can occur like escaping, quoting and in-field newlines.
@DylanBeattie
@DylanBeattie 3 ай бұрын
10 years working tech support made me realise that if regular folks can't read it on their screens and type it on their keyboards, they're not gonna use it... and, honestly, I think they're right. We wanna bring back ASCII field and record separators, we should be putting them on keyboards.
@jovetj
@jovetj 3 ай бұрын
Yep. Control characters are generally un-keyable and non-displayable. Not very practical for most people.
@ABaumstumpf
@ABaumstumpf 3 ай бұрын
@@jovetj "Control characters are generally un-keyable and non-displayable. " No, they were simple control-character - literal keys on the keyboard, and they are very much displayable as even MSWord can show them.
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
​​​ MSWord might, but Notepad doesn't (other than as an undecipherable character as to which control code it actually is - try looking at a PDF using notepad) - CSV being a plain text format, Notepad, a plain text editor, would be _the_ tool for the job, not MSWord.
@ABaumstumpf
@ABaumstumpf 3 ай бұрын
@@cigmorfil4101 "Notepad, a plain text editor, would be the tool for the job, not MSWord." notepad is just a scratch TEXT-editor and NOT for working with csv. I mentioned word cause most programs do display them correctly. And notepad is just far far off from being the correct program for anything. for CSV you would use a program that either can actually deal with ASCII (so not notepad) or better - a program designed for handling tabular data.
@herbie_the_hillbillie_goat
@herbie_the_hillbillie_goat 3 ай бұрын
Love the DL inspired shirt.
@mag-icus
@mag-icus 3 ай бұрын
You missed the cleverness behind code 33-41. These punctuation marks come in the same order as they do on the number keys on an (American) keyboard; this means that similar to how lower case letters were converted to upper case by resetting a single bit (toggled by the shift key), the same were actually true about pressing shift + a number key.
@TheJamesM
@TheJamesM 3 ай бұрын
Speaking of CP/M’s legacy in DOS and eventually Windows, a pet theory of mine is that we have it to thank for popular awareness of the term “backslash”, use of the theoretically redundant (but clarifying) term “forward slash” (that’s just a slash), and confusion between the two. I’m pretty sure it all derives from the lack of directory structure in early versions of CP/M, meaning the slash was available as the character to introduce command parameters, unlike in Unix-like systems. Once a directory structure _was_ introduced, they resorted to the closest available alternative - its mirror image. The backslash is an odd character, really: it has no linguistic meaning, and as best I can tell it was included on typewriters for the purpose of constructing line graphics. It seems a little odd that it should have been included in such a limited character set; I suppose it was seen as important at the time, and now we’re left with the legacy of decisions made according to pragmatic concerns that ceased to be relevant decades ago.
@mjkaelbling
@mjkaelbling 3 ай бұрын
Backslash was introduced to write AND and OR operators as /\ and \/
@TheJamesM
@TheJamesM 3 ай бұрын
@@mjkaelbling While it was used for that by the 1960s, I'm not sure there's evidence for that being the reason it was first introduced sometime in the 1930s; maybe so, but I wouldn't have thought that formal logic would have been a priority for typewriters at that point?
@jovetj
@jovetj 3 ай бұрын
The backslash was picked because it was hardly used and was easy to parse around.
@0LoneTech
@0LoneTech 3 ай бұрын
Fun fact: DOS supports either slash as path separator. When DOS 2 introduced subdirectories, it also added an option to change the switch character (CharOper call), but that was later abandoned demonstrating Microsoft's insistence on backwards compatibility is selective.
@mjkaelbling
@mjkaelbling 3 ай бұрын
@@TheJamesM I should have written "Backslash was added to ASCII by the X 3.2 standards committee to support the ALGOL boolean operators /\ and \/" See the paper by R.W. Bemer cited in the Wikipedia article on Backslash. Obviously it was introduced in the 1930s for other reasons.
@rascta
@rascta 3 ай бұрын
Sadly lost and not mentioned here, the FS, GS, RS, and US characters (28-31). Meant to serve as distinct bytes that wouldn't be part of text data, and therefore could easily be used to delineate it. But alas instead we just totally forgot they existed and therefore ended up with formats like CSV, which gave double meaning to commas, newlines, quotes, etc. With special escaping rules and incompatibilities between systems. And we've spent generations figuring out how to handle that properly and handle all the edge cases. Just because we didn't have and didn't bother to come up with a few symbols to represent those 4 characters. Some of those other low code points were perfect for networking, sending a single byte to communicate something that now we need an entire packet to communicate the same thing.
@darrennew8211
@darrennew8211 3 ай бұрын
The number of self-taught computer programmers who reinvent the wheel because they were never taught what already works always astounds me.
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
Interestingly Pick uses characters 252-254 as markers in dynamic arrays (and filed items) between the "elements": FE - 254 - Attribute mark FD - 253 - Value mark FC - 252 - Sub Value mark The whole dynamic array is a string with the elements separated by the marks. If an element is required that doesn't exist, Pick adds enough of the relevant marks to create it when setting the value of the "element" or returns a null. This means you get to access things like: Data = '' Data = 'attr 1' Data = 'at 2, v1, sv 3' Data = 'at 2, v 3' Data = 'at 4, v2' CopyData = Data Element2 = Data The strings Data and CopyData contain: attr 1[am][sm][sm]at 2, v 1, sv 3[vm][vm]at 2, v 3[am][am][vm]at 4, v 2 And Element2 contains [sm][sm]at 2, v 1, sv 3[vm][vm]at 2, v 3 Where [am] is char(254), [vm] is char(253) and [sm] is char(252) Pick is a multi-value DBMS OS with all fields of variable length and type (though as the whole is stored as a string they're effectively all strings which are converted to the relevant type at time of use).
@cigmorfil4101
@cigmorfil4101 3 ай бұрын
The use of CSV is to _avoid_ non-printing control characters (other than a line break) so that the data is easily edited as plain text by a plain text editor. A plain text editor generally only understands line breaks; how control characters are displayed depends upon their programming: some may display as ^c, some may display a '?' regardless of the chatacter, some may let the display driver decide what to do (hence the smiley face, musical notes, etc, that the original IBM PCs displayed for control characters) As there was no consensus how to handle control codes, CSVs avoided them and stuck to plain text, using commas (hence the name: _Comma_ Separated Values) requiring some sort of escape for commas - enclose a field with commas within quotes - and a mechanism to handle the quoting characher within fields.
@darrennew8211
@darrennew8211 3 ай бұрын
@@cigmorfil4101 I always found this argument bizarre. ASCII was invented well before any "plain text editor" was, so saying "we changed this because plain text editors couldn't handle ASCII" sounds like working around the problems in tools rather than just fixing the tools. There was also an image format called NetPBM which was great, and one of the options was to represent all the bytes with decimal digits. Like, you could read it with BASIC even. Red would literally be "255 0 0" with nothing other than ASCII digits and spaces.
@darrennew8211
@darrennew8211 3 ай бұрын
@@cigmorfil4101 Wow. It has been *ages* since I heard anyone else who ever used Pick. :-) Blast from the past there.
@timothynewton5231
@timothynewton5231 3 ай бұрын
I'd love to here more about the characters in line 16 through 31 and their uses and if there are still any uses for them today.
@pedro1492
@pedro1492 3 ай бұрын
the best newline character sequence is linefeed, then carriage return, because it is backwards compatible with mechanical typewriters
@CTSFanSam
@CTSFanSam 3 ай бұрын
Not on a real ASR33 teletype. PDP-11's used CR, LF, NUL. It took time to move the head from the far right to the far left. If you printed "blablabla....", CR, LF, "new stuff etc", the first letter of the next line would print as the head was returning to position one. So, Get the CR out first, then the line feed and a Null so the head could finish returning to position 0.
@HenryLoenwind
@HenryLoenwind 3 ай бұрын
@@CTSFanSam You missed the joke. On a mechanical typewriter, the lever you grab first transports the paper, and then, at the end of its travel, you pull the carriage. Just like the handle of a car door, where the travel of the handle opens the door lock and then you pull the whole door with the same handle at its end point.
@0LoneTech
@0LoneTech 3 ай бұрын
@@HenryLoenwind The levers I've used all pushed the carriage first (the literal carriage return), then feed a line once the carriage stops. I suppose your order may occur if the carriage is hard to move.
@darrennew8211
@darrennew8211 3 ай бұрын
@@0LoneTech The levers weren't even on a standard side of the typewriter when ASCII was being developed. :-)
@HenryLoenwind
@HenryLoenwind 3 ай бұрын
@@0LoneTech Are you sure? I've never used a typewriter where the lever was that stiff. Those carriages are heavy, and you have no leverage. It also makes it a bit awkward to use as you now have no way of LF without first CR and instead have to turn the knobs counting clicks to match the line spacing. The other way around, you can easily CR without LF by pushing the carriage at any other point you can touch it.
@BradHouser
@BradHouser 3 ай бұрын
The eighth bit was often used for parity checking.
@dlwiii3
@dlwiii3 3 ай бұрын
I still work with DB2 databases which use EBCIDC encoding!
How much does IJSVRIJ score in a Dutch game of Scrabble?
5:50
Dylan Beattie
Рет қаралды 3 М.
G is for GNU
17:15
Dylan Beattie
Рет қаралды 821
I thought one thing and the truth is something else 😂
00:34
عائلة ابو رعد Abo Raad family
Рет қаралды 6 МЛН
Twin Telepathy Challenge!
00:23
Stokes Twins
Рет қаралды 108 МЛН
Top Cybersecurity Trends: Simplified and Actionable
22:46
The Evolution of Web Apps 1992-2024
14:01
Dylan Beattie
Рет қаралды 34 М.
Code Pages and Kohuepts: The Chaos of 8 Bit Extended ASCII
11:46
Dylan Beattie
Рет қаралды 10 М.
What P vs NP is actually about
17:58
Polylog
Рет қаралды 133 М.
Stop using std::vector wrong
23:14
The Cherno
Рет қаралды 149 М.
Turns out REST APIs weren't the answer (and that's OK!)
10:38
Dylan Beattie
Рет қаралды 161 М.
F is for FFT
14:30
Dylan Beattie
Рет қаралды 7 М.
How a Clever 1960s Memory Trick Changed Computing
20:05
LaurieWired
Рет қаралды 445 М.
Text Encodings Revisited: Let's Read The Comments!
17:53
Dylan Beattie
Рет қаралды 7 М.
The Most Valuable File Format You've Never Heard Of
15:33
Acerola
Рет қаралды 525 М.