I feel like every CS channel inevitably takes a sideways turn into linguistics whenever they spend a certain critical amount of time on string handling, and I'm 100% here for it.
@FlashheadX3 ай бұрын
5:30 Actually, even in Flemish IJ will both be capitalized, even though they are considered 2 separate letters
@geekrichieuk3 ай бұрын
Im here, trying to find sudoku man to help me sleep and random guy starts teaching me dutch referring to scrabble and unicode. Random guy has a great presentation. Random guy just earned a subscriber - go you!
@mischa74063 ай бұрын
I'm dutch, and I think this is not a single letter. It's not a separate letter in our alphabet, so it doesn't count. Yes, if it's at the start of a word, both letters need capitalization. But that doesn't apply to the other common dutch digraphs like 'ei' (which is called 'short ij' and pronounced the same as ij), 'eu', 'ui', 'oe', 'au', or 'ou'. So really there's no rhyme or reason to it. Oh btw your pronunciation is spot on, bravo!
@globalincident6943 ай бұрын
It does behave differently to other Dutch digraphs - its pronunciation varies a lot more with context. And in some cases it gets replaced with Y.
@Robin_Goodfellow3 ай бұрын
Ah, so IJsselmeer has ten letters then
@Excalibaard3 ай бұрын
The difference between capitalizing IJ vs Eu or Ui is that J is a consonant instead of a vowel, so the context is (more) important I guess? IMO We should just use 'y', it's basically a J with a little I sticking out, and already maps to the right sound in the alphabet. Maybe terrible handwriting is how we got here in the first place?
@mischa74063 ай бұрын
@@Excalibaard J being a consonant and therefore it's an exception sounds like a plausible explanation (and what I thought as well) but I can't find facts to back it up. I think the handwriting actually points to a more likely explanation: in handwriting 'ij' is usually written in one stroke and resembles 'y' quite a lot (ij & y were also pretty much used interchangeably). So if you'd then capitalized it, you'd capitalize both letters Edit: the reason I thought the consonant theory was correct is that there is no confusion when you capitalize both letters. It makes it very clear that ij belongs together and therefore needs to be pronounced as the digraph. If I'd read capital i, lowercase j, I'd probably think they were to be pronounced as individual letters. The other digraphs don't have this problem (of course wiith numerous exceptions)
@MeriaDuck3 ай бұрын
Very true, we're so used to computers that the ij is i and j. My first name Merijn is six letters to type. However.... when you would order things alphabetically, where should words starting with IJ end up? like ijsvogel, ijsvrij etc... There's a presentation by Dylan where this features with the Danish example 'Aarhus', the city.
@WitherBossEntity3 ай бұрын
You are absolutely correct that letters are interesting. Fun (or not so fun if you have to work with it) fact: JavaScript strings don't have to be valid Unicode - they can contain unpaired surrogates. And the reason that it allows surrogates in the first place is that when JS was created, Utf-8, the only correct character encoding, was very new, so UCS-2 was the default choice, and got replaced with the essentially backwards-compatible Utf-16. Internally V8 and Spidermonkey actually use a 1-byte encoding for strings where possible. The same thing - from the timeframe to the implementation in HotSpot - also happened with Java.
@Kenionatus3 ай бұрын
What the heck is an unpaired surrogate?
@iljitsch3 ай бұрын
In the early days of Unicode the assumption was 16 bits was enough. As we internet routing people know, any time you think that, you're proven wrong. (Hi IPv6!) By that time there was already a lot of code that uses 16-bit characters out there, and there was much pushback going to 31 (well, 32) bits for space wasting reasons. So they came up with the surrogate pair encoding for another 16 16-bit "planes" so now 17 * 65536 = 1 million+ possible Unicode code points. That was between 1996 and 2000. And still today Javascript uses UCS-2 and thus sees stuff like emojis that are encoded as surrogate pairs as two separate characters.
@MeriaDuck3 ай бұрын
About Baarle-Nassau/Baarle-Hertog, during some of the covid lockdowns, travel between Belgium and the Netherlands was actually restricted and that gave some interesting situations there. That's when people realized were those bloody borders actually were. There's a similar situation across the German border, where a street on one side is named in Dutch (Kerkstraat) and the other in German (Kirchstrasse)
@EdwinMartin3 ай бұрын
Fun facts: - In the Netherlands, children learn the alphabet as ...w x ij z.- Adults still pronounce it that way - In Dutch phonebooks, the IJ is next to the Y - They call the y the "Greek ij" - They never use the Unicode version, but always write i and j.
@tvonwieseseen27743 ай бұрын
Well remarked about the ordering! SQL uses collation for sorting diacritics correctly for a certain language, but I doubt if the separate i and j will count as a ligature that can be used in sorting. The (still official) name for "Y" that I was taught at elementary school is "i-grec", which itself comes from French, and also refers to the greek letter ypsilon/upsilon. 🙂 In modern standard Dutch "ei" and "ij" both have the pronounciation Dylan uses. So "leiden" (to lead) and "lijden" (to suffer) sound identical. In such a case people may ask "You mean with a short ei or a long ij?". But long ago the "ei" sounded more like _a_ and "ij" more like _y_ in _lazy_ . In some dialects you can still hear the difference.
@belg4mit3 ай бұрын
Yes, when in Belgium I often saw this written as an umlauted y, which adds another wrinkle to this.
@Wolfeur3 ай бұрын
One of the uses for the ligature character is also to get both glyphs side-to-side in vertical texts.
@bujin19773 ай бұрын
Ah, digraphs. That's where most English people trip up when they come to Wales! 😆😆 We have many which are written as two separate letters via a keyboard, are actually two standard English letters but which correspond to a single Welsh letter. Unlike the Dutch with their "IJ", though, we don't capitalise both letters. "Dd" is one example, which corresponds to the English "Th" (as in "the" or "that" - we also have "Th", and that one is always pronounced as in "think" or "thing"). We also have "Ff" which is like the English "F" (a single "F" in Welsh is pronounced as "V"). There's "Rh" and "Ph" which are like "R" and "P", but with a little bit more of a breath to emphasise the letter, and the "R" is always rolled - another thing that is fun for English people to try as there's no real equivalent. There's "Ch", which is like the "ch" at the end of "Loch" (the Scottish lake). And, of course, everyone's favourite "Ll", which is incredibly hard to explain in text but we have so much fun listening to people trying to pronounce it.
@TheJamesM2 ай бұрын
My uninformed English understanding of Ll is that it's kind of Ch (as in "loch") that slides into an L. Is that at all close to accurate? Regarding Ph, is it that it's more breathy, or is it that it's breathy (aspirated) at all? If it's the latter, it relates to one of my favourite little linguistics facts: English actually has entirely consistent rules on aspiration, but they're completely implicit and learned intuitively by native speakers, and unless trained to do so we generally don't consciously notice the difference. The classic example is "pin" vs "spin": native English speakers will aspirate the former, but not the latter. It's all determined by the context the letter is found in. But that's why there's seemingly extraneous H's in words like Thailand: in Thai, the aspirated and unaspirated "T" sounds are different letters, so it's a meaningful distinction. I guess it's the same in Welsh! That would be slightly less confusing if we'd hung on to Þ and Ð. And that would also have the added benefit of distinguishing between the sounds in "cloth" and "clothe". Not that that actually causes any trouble, but it would be so much more _logical!_ (Trying to make language or orthography logical is, of course, a fool's errand.)
@bujin19772 ай бұрын
@@TheJamesM The "Ll" sound is one of those things that is very difficult to describe through text. There are videos that show how to pronounce it, but the best explanation for the sound I can give is if you make the shape to say the letter L with your tongue against the top teeth and then blow.
@TheJamesM2 ай бұрын
@@bujin1977 Oh that's helpful - I can see how that makes a sound something like what I've heard from Welsh speakers. Thanks!
@samswift-glasman84823 ай бұрын
One of my mates moved to Amsterdam and on visits one of our favourite things to do was to try to play scrabble in English with the Dutch set and then also to try to play Dutch trivial pursuit
@chaosflaws3 ай бұрын
To be fair, the border around Baarle-Hertog and Baarle-Nassau is not representative of the boundary line in general, which is pretty well-behaved.
@janberentsen98903 ай бұрын
'ij' (in most Dutch accents) is pronounced almost exactly like the English 'I', but without that little j-ish sound at the end (in most English accents). Also, I'm Dutch, and I and most people I know always treat it as just another combination of two letters, of which there are also many others. If I see them written together like in a puzzle, it looks more like some old leftover, than like actual,modern Dutch. My guess as to why they are (or as nowadays is more often the case, were) sometimes written together is that the I and the J are right next to each other in the alphabet, and some people got it confused a long while ago. In Dutch there are many combinations of two vowels which make slightly different sounds: 'eu', 'ui', 'au', 'ei' (which actually makes the exact same sound as 'ij'), and 'ij', just to name a few. Maybe because of their alphabetical closeness, someone got confused by that specific combination, and that confusion spread. But that's just my personal hunch.
@Huntracony3 ай бұрын
I disagree, the English 'I' is more rounded. It's kinda close, but the 'korte a' for example (as in 'achter') is closer to the English 'I' than 'ij' is.
@janberentsen98903 ай бұрын
I suspect we both have different English accents, were it might indeed be different like that.
@EdwinMartin3 ай бұрын
We write i and j, but treat it as one letter in every other way, like capitalizing IJssel etcetera. It's really different from ui en oe etcetera.
@Phyzzius3 ай бұрын
To make matters worse, the letter 'Y' is sometimes called 'IJ' in Dutch and used as such in puzzles. If I recall correctly the official rules specifically state that this is *not* valid in Scrabble.
@DylanBeattie3 ай бұрын
...and old-timers will put y (or sometimes ÿ⟩) instead of ij in a single box in a crossword puzzle, right? Ah, humans. ❤
@HermanDuyker3 ай бұрын
@@DylanBeattie I'm Dutch. When I was a kid, I was taught to write mij last name with a "ij", but without the dots. When typed, that became the "y" (see my YT username). My earliest passport had that as well, I think written or typed. And then time moved on, things got put in a computer, and at some point this got changed into "ij" (two letters). I still tended to personally use and type the "y". Including one time on an airplane ticket from Amsterdam Schiphol Airport to London Heathrow and back. From NL to the UK: no problem. Everyone understands that IJ = Y, or it just got missed. I arrive in the UK, have my vacation, and try to head back. "Sir, your ticket and your passport do not agree" ... it took me a lot of talking and convincing the airport official that this was common in NL, and in the end I was allowed to fly. Nowadays, I'm very VERY careful with filling in any airline reservations...
@cearnicusАй бұрын
@@HermanDuyker I have an "ij" in my last name as well. When I got my NS card (many, many years ago), it was spelled with a "y". I didn't even notice until it was time to extend the subscription; I got a new card with an "ij" ... and then later I got _another_ card with a "y" and somehow I was now paying for two subscriptions. It took some doing to get one of them canceled. And, of course, they canceled the wrong one and I still have the "y" one to this day.
@logiciananimal3 ай бұрын
I remember seeing a book titled _Aegypt_ as a child, where the "Ae" was a diagraph, all capitalized. I found that wrong, though since the book was in English I guess there is a confounding possible error. As for Scrabble, I once played (in Canada, where I am from) what was called by the kids I was with "Canadian Scrabble" where the French tiles and the English tiles are mixed and you play with both at once, scoring the title as depicted - so Q becomes pretty weak if you draw a French one, but W becomes important. One is allowed to use English tiles in French words, and conversely, of course, and the game stops when half the tiles are played (instead of all - the board gets too cluttered otherwise). It makes for some weird strategy, that's for sure.
@JivanPal3 ай бұрын
Regarding "AEgypt", that would be a simplistic rendering of capital "ash", Æ, which appears in many Latin words that have retained their spellings in British English, such as "aether" and "encyclopaedia". In the card game Magic: the Gathering, despite it being an American game and "aether" being spelled "ether" in American English (as in "Ethernet'), the word "aether" was historically always rendered/printed as "æther/Æther", as in the card name "Ætherling". This appeared relatively rarely and was included with the rationale of enhancing the fantasy flavour of the game. However, one game expansion focused heavily on the concept of aether and it was decided that the spelling "aether" should be adopted henceforth instead, in order to prevent the æ/Æ symbol from appearing everywhere and potentially putting off or confusing too many players.
@Huntracony3 ай бұрын
I think 'ij' as a separate character is being killed by the standard Dutch keyboard layout. It does not have a separate 'ij', which means most people who grew up typing on keyboards will see it as two characters. There are typewriters with a separate 'ij', I guess because those are generally monospaced? idk.
@feicodeboer3 ай бұрын
If I'm concerned the standard Dutch keyboard layout is indeed US intl. with dead keys.
@gubigm3 ай бұрын
Have you ever heard of a funny language like Hungarian? We have 42 letters in the alphabet, yet still using Latin characters (with some funny accents...). So we have letters, like: cs, dz, dzs, ly, ny, sz, ty, zs. Yes, those are separate letters in the alphabet.
@Zainjerr3 ай бұрын
🙌
@Stoney_Eagle3 ай бұрын
Your thumbnail confused me a bit, why am I reading Dutch on one of your videos... Am I seeing this right 😂 It can also mean ice free or without ice.
@stevenjlovelace3 ай бұрын
What if you used "ÿ" instead?
@passantNL3 ай бұрын
In the seventies and eighties that was what many people did on cheap non-localized typewriters. Just type y and then backspace and add " so it looked like ÿ which is how ij would look when it was handwritten.
@pooroldnostradamus3 ай бұрын
ijsvrij with mij little ij
@sauliustb3 ай бұрын
you pronounce ijsvrij pretty well, but ij as a letter(digraph) just doesn't sound right...
@iljitsch3 ай бұрын
The "ijsvrij" pronunciation isn't bad, but not exactly perfect... The trouble is that if you didn't grow up hearing certain sounds it's really hard to pick up the subtleties later. Like the difference between "bat" and "bet" for us Dutch people. I first saw a computer when I was twelve, and as such learned how to write in the analog days. What they taught us was an ij that looks like a ü but with the descender from a g. And we'd write the y the same way except without the ¨. (Although y doesn't exist in Dutch, we only see it in loanwords.) Even in block letters you'd write ij like that. But then I asked myself: how was this in print in the pre-computer days? And all the books I checked all the way back to 1952 use the separated i and j form. Note that the ij and IJ ligatures are only in Unicode for backward compatibility and their use is not recommended. Also, many fonts don't implement them, and in all fonts I checked that weren't fixed space just i and j looks identical to the ligature character. I.e., ij and ij. Font kerning is now such that the j descender goes under the i so the two letters are sufficiently close to look normal, without even having them be a ligature in the font. However, in fixed space fonts i j looks bad and that's probably why typewriters used to have a separate ij key.