you're probably doing case-insensitive wrong (intermediate) anthony explains

Рет қаралды 10,299

Күн бұрын

Пікірлер: 37

@RedstoneLP2 2 жыл бұрын

Fun fact about 'ß' (U+00DF): it actually has a capital form ('ẞ', U+1E9E), which was added to Unicode in 2008 and "adopted as an option in standard German in 2017" (according to Wikipedia)

@elimik31 2 жыл бұрын

There are no words starting with ß, but when writing a word in all-caps for stylistic purposes, a normal ß seems a bit out-of-place and there I could imagine the uppercase version fitting better. I think this might therefore be especially useful in design, marketing etc., where you use all-caps often for stylistic purposes. I agree it's not needed in everyday writing.

@BRLN1 2 жыл бұрын

@@elimik31 absolutely right and this is the exact reason and purpose the capital-ß was introduced for

@nibblrrr7124 Жыл бұрын

WER denkt sich denn bitte SO einen SCHEIẞ aus? :^)

@visintel Жыл бұрын

Great video! 4:11 : shouldn't you continue instead of pass in handling the exception? code still works but it will reuse c from previous iteration on exception and you may end up with duplicates.

@anthonywritescode Жыл бұрын

it would only matter if there were a casefold character directly next to a non-character -- fortunately there are none

@jesp1999 2 жыл бұрын

In situations where only ascii characters are expected, does casefold have a performance drawback over just using upper or lower?

@anthonywritescode 2 жыл бұрын

it might, but probably not a meaningful one -- profile it and find out!

@dimboump 2 жыл бұрын

Ypogegramméni is the small ι (i) character below the main letter which was used to alter the pronunciation in Ancient Greek. This - arguably funny - word literally translates to "under-signed" :)

@sadhlife 2 жыл бұрын

So the examples that your script generates is when s.lower() and s.casefold() would return different things. I tried to modify the script to go the other way: when upper() doesn't change anything, but casefold() would change sorting. And I found nothing. To me it looks like if I was using upper() instead of casefold(), sorting works fine even with unicodes. I find that very weird.

@anthonywritescode 2 жыл бұрын

these characters have a different `.casefold().upper()`: ``` 130: İ: İ İ 3f4: ϴ: ϴ Θ 1e9e: ẞ: ẞ SS 2126: Ω: Ω Ω 212a: K: K K 212b: Å: Å Å ```

@NikitaKaramov 2 жыл бұрын

lst = ["PREUẞEN", "Preuße", "preußisch"] (translation: Prussia, Prussian (nationality), Prussian (adjective)) This list will get you three different outcomes when sorted by default, by upper(), and by casefold(). Add something with "SS" in it, and sorting by lower() is now different, too!

@Naeddyr Жыл бұрын

The behavior of these functions have a nasty trap in that they decompose the characters into separate unicode code points even when they don't have to (like with Turkish capital dotted İ, which is decomposed into... i+combining dot, when just i would suffice), and if you (like me) assume that the len(text) is the same as len(text.lower()) (or casefold too), you can get some pretty annoying bugs...

@codeshowbr 2 жыл бұрын

I like your editor theme, good idea for videos and classes, I will try to replicate on neovim

@alfawal 2 жыл бұрын

I've tried it before and was confused of some cases like Turkish letters where "İ" changes to a nonsense thing and ("ö", "ç") where they stay the same, the Arabic letters also stays the same, like ("أ" ,"ب" ,"ت"). I don't recommend tinkering with the Arabic letters by copy-pasting them because the combination of RTL and latin/symbols will drive you crazy :)

@a-rye Жыл бұрын

That's quite the corner case and blind spot! The only reason I know that German character is from swear words my dad taught me in German...😅 Either way, good to know and thank you!

@samuelgunter Жыл бұрын

well i was expecting an "ohhh you stinky python developer >:( `.lower()` and `.upper()` create an entirely new string, wasting memory"

@pkoch 2 жыл бұрын

typo on the description: dicsord

@anthonywritescode 2 жыл бұрын

gottem :D

@pkoch 2 жыл бұрын

@@anthonywritescode 🤦‍♀😂

@pastenml 2 жыл бұрын

I was doing case-insensitive wrong! Thank you sir

@amir.hessam 2 жыл бұрын

Do you have any plan to cover some design patterns and best practices, especially for DS/ML cases that the most common issue is "coupling"; In general I studied these stuffs a lot; however, I wanna see how you tackle those issues; I always learn from you; Thanks

@anthonywritescode 2 жыл бұрын

I don't do AI / ML / DS, sorry

@jhuyt- 2 жыл бұрын

does this relate at all to internationalization or does it only compare on code-point value?

@anthonywritescode 2 жыл бұрын

strings compare on codepoint, but casefold is based on the class of character

@vytah Жыл бұрын

For actual sorting strings for human consumption, a better way is to use PyICU's Collator class, which takes into account different sorting rules across different languages.

@user-lk7cl8vd2q 2 жыл бұрын

Is casefold() better case-insensitive comparisons as well then? str_a.lower() == str_b.lower() vs. str_a.casefold() == str_b.casefold()

@anthonywritescode 2 жыл бұрын

yes.... that is the entire point of this video

@user-lk7cl8vd2q 2 жыл бұрын

@@anthonywritescode Good stuff, thank you!

@99.99.9 2 жыл бұрын

@@anthonywritescode the point of your video was about sorting, not checking equality of 2 strings. (Note ﬀ is latin small ligature FF, not ff) 'ﬀ'.lower() == 'FF'.lower() is false 'ﬀ'.casefold() == 'FF'.casefold() is true

@anthonywritescode 2 жыл бұрын

how do you think sorting works?

@sucker10001 2 жыл бұрын

@@anthonywritescode sound like the title of your next video ;-)