you're probably doing case-insensitive wrong (intermediate) anthony explains

  Рет қаралды 10,299

anthonywritescode

anthonywritescode

Күн бұрын

Пікірлер: 37
@RedstoneLP2
@RedstoneLP2 2 жыл бұрын
Fun fact about 'ß' (U+00DF): it actually has a capital form ('ẞ', U+1E9E), which was added to Unicode in 2008 and "adopted as an option in standard German in 2017" (according to Wikipedia)
@elimik31
@elimik31 2 жыл бұрын
There are no words starting with ß, but when writing a word in all-caps for stylistic purposes, a normal ß seems a bit out-of-place and there I could imagine the uppercase version fitting better. I think this might therefore be especially useful in design, marketing etc., where you use all-caps often for stylistic purposes. I agree it's not needed in everyday writing.
@BRLN1
@BRLN1 2 жыл бұрын
@@elimik31 absolutely right and this is the exact reason and purpose the capital-ß was introduced for
@nibblrrr7124
@nibblrrr7124 Жыл бұрын
WER denkt sich denn bitte SO einen SCHEIẞ aus? :^)
@visintel
@visintel Жыл бұрын
Great video! 4:11 : shouldn't you continue instead of pass in handling the exception? code still works but it will reuse c from previous iteration on exception and you may end up with duplicates.
@anthonywritescode
@anthonywritescode Жыл бұрын
it would only matter if there were a casefold character directly next to a non-character -- fortunately there are none
@jesp1999
@jesp1999 2 жыл бұрын
In situations where only ascii characters are expected, does casefold have a performance drawback over just using upper or lower?
@anthonywritescode
@anthonywritescode 2 жыл бұрын
it might, but probably not a meaningful one -- profile it and find out!
@dimboump
@dimboump 2 жыл бұрын
Ypogegramméni is the small ι (i) character below the main letter which was used to alter the pronunciation in Ancient Greek. This - arguably funny - word literally translates to "under-signed" :)
@sadhlife
@sadhlife 2 жыл бұрын
So the examples that your script generates is when s.lower() and s.casefold() would return different things. I tried to modify the script to go the other way: when upper() doesn't change anything, but casefold() would change sorting. And I found nothing. To me it looks like if I was using upper() instead of casefold(), sorting works fine even with unicodes. I find that very weird.
@anthonywritescode
@anthonywritescode 2 жыл бұрын
these characters have a different `.casefold().upper()`: ``` 130: İ: İ İ 3f4: ϴ: ϴ Θ 1e9e: ẞ: ẞ SS 2126: Ω: Ω Ω 212a: K: K K 212b: Å: Å Å ```
@NikitaKaramov
@NikitaKaramov 2 жыл бұрын
lst = ["PREUẞEN", "Preuße", "preußisch"] (translation: Prussia, Prussian (nationality), Prussian (adjective)) This list will get you three different outcomes when sorted by default, by upper(), and by casefold(). Add something with "SS" in it, and sorting by lower() is now different, too!
@Naeddyr
@Naeddyr Жыл бұрын
The behavior of these functions have a nasty trap in that they decompose the characters into separate unicode code points even when they don't have to (like with Turkish capital dotted İ, which is decomposed into... i+combining dot, when just i would suffice), and if you (like me) assume that the len(text) is the same as len(text.lower()) (or casefold too), you can get some pretty annoying bugs...
@codeshowbr
@codeshowbr 2 жыл бұрын
I like your editor theme, good idea for videos and classes, I will try to replicate on neovim
@alfawal
@alfawal 2 жыл бұрын
I've tried it before and was confused of some cases like Turkish letters where "İ" changes to a nonsense thing and ("ö", "ç") where they stay the same, the Arabic letters also stays the same, like ("أ" ,"ب" ,"ت"). I don't recommend tinkering with the Arabic letters by copy-pasting them because the combination of RTL and latin/symbols will drive you crazy :)
@a-rye
@a-rye Жыл бұрын
That's quite the corner case and blind spot! The only reason I know that German character is from swear words my dad taught me in German...😅 Either way, good to know and thank you!
@samuelgunter
@samuelgunter Жыл бұрын
well i was expecting an "ohhh you stinky python developer >:( `.lower()` and `.upper()` create an entirely new string, wasting memory"
@pkoch
@pkoch 2 жыл бұрын
typo on the description: dicsord
@anthonywritescode
@anthonywritescode 2 жыл бұрын
gottem :D
@pkoch
@pkoch 2 жыл бұрын
@@anthonywritescode 🤦‍♀😂
@pastenml
@pastenml 2 жыл бұрын
I was doing case-insensitive wrong! Thank you sir
@amir.hessam
@amir.hessam 2 жыл бұрын
Do you have any plan to cover some design patterns and best practices, especially for DS/ML cases that the most common issue is "coupling"; In general I studied these stuffs a lot; however, I wanna see how you tackle those issues; I always learn from you; Thanks
@anthonywritescode
@anthonywritescode 2 жыл бұрын
I don't do AI / ML / DS, sorry
@jhuyt-
@jhuyt- 2 жыл бұрын
does this relate at all to internationalization or does it only compare on code-point value?
@anthonywritescode
@anthonywritescode 2 жыл бұрын
strings compare on codepoint, but casefold is based on the class of character
@vytah
@vytah Жыл бұрын
For actual sorting strings for human consumption, a better way is to use PyICU's Collator class, which takes into account different sorting rules across different languages.
@user-lk7cl8vd2q
@user-lk7cl8vd2q 2 жыл бұрын
Is casefold() better case-insensitive comparisons as well then? str_a.lower() == str_b.lower() vs. str_a.casefold() == str_b.casefold()
@anthonywritescode
@anthonywritescode 2 жыл бұрын
yes.... that is the entire point of this video
@user-lk7cl8vd2q
@user-lk7cl8vd2q 2 жыл бұрын
@@anthonywritescode Good stuff, thank you!
@99.99.9
@99.99.9 2 жыл бұрын
@@anthonywritescode the point of your video was about sorting, not checking equality of 2 strings. (Note ff is latin small ligature FF, not ff) 'ff'.lower() == 'FF'.lower() is false 'ff'.casefold() == 'FF'.casefold() is true
@anthonywritescode
@anthonywritescode 2 жыл бұрын
how do you think sorting works?
@sucker10001
@sucker10001 2 жыл бұрын
@@anthonywritescode sound like the title of your next video ;-)
@DavidDellsperger
@DavidDellsperger 2 жыл бұрын
THE THEME LIVES ON!
@mrswats
@mrswats 2 жыл бұрын
Oooooo didn't know about casefold! Is the script you showed anywhere I could find it?
@anthonywritescode
@anthonywritescode 2 жыл бұрын
eventually it'll be in github.com/anthonywritescode/explains
@mrswats
@mrswats 2 жыл бұрын
@@anthonywritescode beautiful, thank you!
@uusserrrreesssuuu
@uusserrrreesssuuu 2 жыл бұрын
wow. thnx.
getting started with pytest (beginner - intermediate) anthony explains #518
13:19
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН
Python Tips and Tricks: Case-Insensitive String Comparisons Done Right
13:44
My 10 “Clean” Code Principles (Start These Now)
15:12
Conner Ardman
Рет қаралды 318 М.
Naming Things in Code
7:25
CodeAesthetic
Рет қаралды 2,3 МЛН
25 nooby Python habits you need to ditch
9:12
mCoding
Рет қаралды 1,8 МЛН
C++ Super Optimization: 1000X Faster
15:33
Dave's Garage
Рет қаралды 333 М.
how do virtualenvs actually work (advanced) anthony explains #522
16:55
anthonywritescode
Рет қаралды 8 М.
Every Python dev falls for this (name mangling)
14:11
mCoding
Рет қаралды 140 М.
all string syntaxes (beginner) anthony explains #525
19:49
anthonywritescode
Рет қаралды 4,6 М.
Python dataclasses will save you HOURS, also featuring attrs
8:50
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН