Fun fact about 'ß' (U+00DF): it actually has a capital form ('ẞ', U+1E9E), which was added to Unicode in 2008 and "adopted as an option in standard German in 2017" (according to Wikipedia)
@elimik312 жыл бұрын
There are no words starting with ß, but when writing a word in all-caps for stylistic purposes, a normal ß seems a bit out-of-place and there I could imagine the uppercase version fitting better. I think this might therefore be especially useful in design, marketing etc., where you use all-caps often for stylistic purposes. I agree it's not needed in everyday writing.
@BRLN12 жыл бұрын
@@elimik31 absolutely right and this is the exact reason and purpose the capital-ß was introduced for
@nibblrrr7124 Жыл бұрын
WER denkt sich denn bitte SO einen SCHEIẞ aus? :^)
@visintel Жыл бұрын
Great video! 4:11 : shouldn't you continue instead of pass in handling the exception? code still works but it will reuse c from previous iteration on exception and you may end up with duplicates.
@anthonywritescode Жыл бұрын
it would only matter if there were a casefold character directly next to a non-character -- fortunately there are none
@jesp19992 жыл бұрын
In situations where only ascii characters are expected, does casefold have a performance drawback over just using upper or lower?
@anthonywritescode2 жыл бұрын
it might, but probably not a meaningful one -- profile it and find out!
@dimboump2 жыл бұрын
Ypogegramméni is the small ι (i) character below the main letter which was used to alter the pronunciation in Ancient Greek. This - arguably funny - word literally translates to "under-signed" :)
@sadhlife2 жыл бұрын
So the examples that your script generates is when s.lower() and s.casefold() would return different things. I tried to modify the script to go the other way: when upper() doesn't change anything, but casefold() would change sorting. And I found nothing. To me it looks like if I was using upper() instead of casefold(), sorting works fine even with unicodes. I find that very weird.
@anthonywritescode2 жыл бұрын
these characters have a different `.casefold().upper()`: ``` 130: İ: İ İ 3f4: ϴ: ϴ Θ 1e9e: ẞ: ẞ SS 2126: Ω: Ω Ω 212a: K: K K 212b: Å: Å Å ```
@NikitaKaramov2 жыл бұрын
lst = ["PREUẞEN", "Preuße", "preußisch"] (translation: Prussia, Prussian (nationality), Prussian (adjective)) This list will get you three different outcomes when sorted by default, by upper(), and by casefold(). Add something with "SS" in it, and sorting by lower() is now different, too!
@Naeddyr Жыл бұрын
The behavior of these functions have a nasty trap in that they decompose the characters into separate unicode code points even when they don't have to (like with Turkish capital dotted İ, which is decomposed into... i+combining dot, when just i would suffice), and if you (like me) assume that the len(text) is the same as len(text.lower()) (or casefold too), you can get some pretty annoying bugs...
@codeshowbr2 жыл бұрын
I like your editor theme, good idea for videos and classes, I will try to replicate on neovim
@alfawal2 жыл бұрын
I've tried it before and was confused of some cases like Turkish letters where "İ" changes to a nonsense thing and ("ö", "ç") where they stay the same, the Arabic letters also stays the same, like ("أ" ,"ب" ,"ت"). I don't recommend tinkering with the Arabic letters by copy-pasting them because the combination of RTL and latin/symbols will drive you crazy :)
@a-rye Жыл бұрын
That's quite the corner case and blind spot! The only reason I know that German character is from swear words my dad taught me in German...😅 Either way, good to know and thank you!
@samuelgunter Жыл бұрын
well i was expecting an "ohhh you stinky python developer >:( `.lower()` and `.upper()` create an entirely new string, wasting memory"
@pkoch2 жыл бұрын
typo on the description: dicsord
@anthonywritescode2 жыл бұрын
gottem :D
@pkoch2 жыл бұрын
@@anthonywritescode 🤦♀😂
@pastenml2 жыл бұрын
I was doing case-insensitive wrong! Thank you sir
@amir.hessam2 жыл бұрын
Do you have any plan to cover some design patterns and best practices, especially for DS/ML cases that the most common issue is "coupling"; In general I studied these stuffs a lot; however, I wanna see how you tackle those issues; I always learn from you; Thanks
@anthonywritescode2 жыл бұрын
I don't do AI / ML / DS, sorry
@jhuyt-2 жыл бұрын
does this relate at all to internationalization or does it only compare on code-point value?
@anthonywritescode2 жыл бұрын
strings compare on codepoint, but casefold is based on the class of character
@vytah Жыл бұрын
For actual sorting strings for human consumption, a better way is to use PyICU's Collator class, which takes into account different sorting rules across different languages.
@user-lk7cl8vd2q2 жыл бұрын
Is casefold() better case-insensitive comparisons as well then? str_a.lower() == str_b.lower() vs. str_a.casefold() == str_b.casefold()
@anthonywritescode2 жыл бұрын
yes.... that is the entire point of this video
@user-lk7cl8vd2q2 жыл бұрын
@@anthonywritescode Good stuff, thank you!
@99.99.92 жыл бұрын
@@anthonywritescode the point of your video was about sorting, not checking equality of 2 strings. (Note ff is latin small ligature FF, not ff) 'ff'.lower() == 'FF'.lower() is false 'ff'.casefold() == 'FF'.casefold() is true
@anthonywritescode2 жыл бұрын
how do you think sorting works?
@sucker100012 жыл бұрын
@@anthonywritescode sound like the title of your next video ;-)
@DavidDellsperger2 жыл бұрын
THE THEME LIVES ON!
@mrswats2 жыл бұрын
Oooooo didn't know about casefold! Is the script you showed anywhere I could find it?
@anthonywritescode2 жыл бұрын
eventually it'll be in github.com/anthonywritescode/explains