4:07 It was already mentioned here, but that ß is a so called "sharp S", basically a mix between an S and a Z in german. It is often being written as a double S like in the example in the video. The beta symbol is longer at the bottom & not opened up that much.
@jon-partlee-sayneАй бұрын
Good you mentioned it! A disaster like that can't be taken lightly.
@GoobsterGooberGooАй бұрын
@@jon-partlee-sayne literally unwatchable, germans will never recover from this
@ericbarlow6772Ай бұрын
Yeah and it used to exist in English until the 19th century. FYI I also leaned that German letter as an ess-zett (sz) as well as a sharp s.
@toraxmaluАй бұрын
And to be transformed to ß=>ss / ẞ=>SS is totally correct. So what about he's crying here?! also ä=ae, ö=oe and ü=ue... Mostly: emojis has to be usable in utf8-filenames. No if and what.
@sasjadevriesАй бұрын
The β does look similar to the ß 🤷♂. A disc that can't handle ß is komplete Scheiβe 😂😂.
@HuskyNETАй бұрын
As a software developer for 25 years, I’m regularly using typographic characters and actually even emojis in some of my file names, and you can’t stop me. I paid for the whole Unicode, I’m gonna use the whole Unicode.
@stupidburpАй бұрын
🔥
@ZeawiАй бұрын
You paid for it...?
@shantilkhadatkar1195Ай бұрын
@@ZeawiYour pfp is cool
@__MerchantАй бұрын
I spotted a psychopath.
@jadespriteАй бұрын
Y'all pay for your unicode? I got mine for free.
@szaszm_Ай бұрын
You should absolutely use emojis in a file name, if you're a developer, to test that your software handles them correctly.
@aqua-beryАй бұрын
No better time to test then on your personal machine
@szaszm_Ай бұрын
@@aqua-bery yes
@aelsi2Ай бұрын
@@aqua-bery By adding emojis to the name of your home folder
@AmirHosseinHonardustАй бұрын
Actually I was thinking of ensuring that my application crashes with an explicit message, when those files are supposed to be shared with others.
@OMGcluelessАй бұрын
@@AmirHosseinHonardust I think that ship has flown. Most OSes and IDEs and the like have decided to support them so you shouldn’t try to force people not to use them. “.🔥” is the default file extension for the Mojo programming language, as one example. For better or worse emojis in file names are here to stay.
@isaacbarahonahidalgo9427Ай бұрын
Can't stop me, using emojis on my fstab
@Terra101Ай бұрын
xd
@PandacierАй бұрын
wait no what the actual fu-
@darukutsuАй бұрын
/home/¯\_(ツ)_/¯
@siz1700Ай бұрын
💞❤️🩹😏
@TheKevinGDXАй бұрын
xd
@jaakkohintsala2597Ай бұрын
I have never wanted to rename my home folder to an emoji of a house more than now
@unitrader403Ай бұрын
but which House do you want? the blue one or the yellow one? :D
@MegalomaniakaalАй бұрын
@@unitrader403 yes
@inertia_daggerАй бұрын
@@unitrader403🏚️ this one
@_nishantk_Ай бұрын
@@unitrader403 the blue one
@aurelia_the_jellyАй бұрын
Use Starship and you can have that without all the hassle of an emoji filename ;) I have my home path substituted by the icon too.
@dragonwisardАй бұрын
According to POSIX, any character is acceptable in a Unix filename, except for forward slash (path separator) and the null byte (string terminator).
@lhplАй бұрын
Right. The only proper advice to programmers is to handle this correctly, as in "just assume names are some bytes".
@gdclemoАй бұрын
You can even have filenames which correspond to invalid UTF-8 encodings, such as [byte 255] which is not valid UTF-8.
@dragonwisardАй бұрын
Using control characters in filenames is also fun, and can break a lot of scripts and utilities. From the perspective of a C programmer, I can see how this might have made sense initially, but once Bourne shell came along and people started using line-oriented utilities this straight up breaks things. A filename can contain a new line or carriage return or bell or tab or any other byte. In Bash, the best practice is to set the inter field separator to the null byte, but I can tell you with confidence that it's rarely done (correctly) in production scripts, and even a lot of C utilities make false assumptions about which characters are valid. Great way to catch vendors with their pants down. So simple and effective it should be as well known as buffer overflows and race conditions. With a little creativity, you can absolutely use this for privilege escalations.
@animowany111Ай бұрын
@@dragonwisard I actually have an extremely cursed file with invalid UTF8 in my home folder. It also starts with a space and includes a newline. It's a crash report log from a severely memory corrupted program. I love seeing what breaks and what doesn't break when it encounters that file.
@kebien6020Ай бұрын
My reason: The anime is called "Fate/kaleid liner Prisma☆Illya". So that is what I'll use as my folder name, and you can't stop me.
@RedSntDKАй бұрын
Speaking of anime, that's the one time my little brother messaged me wanting a solution. He had, ahem, acquired an anime with such a long title that the program he used to download it with was too long for regular windows explorer to handle it. Good thing that copy utilities like ultracopier and fastcopy exists. Since I moved to linux in January I haven't really needed to use a copy utility, but I do have ultracopier installed in case. Shame it doesn't integrate into dolphin.
@temari2860Ай бұрын
When I was first learning Godot about 2.5 years ago by making a mobile game, I got stuck for like 2 days trying to make a testing build for my game. The error messages were completely useless, docs, forums, google search didn't help either. Finally on the third day I decided to make a new empty project, copy the changes from my game and make a build after every change to see when it fails. This new project succeeded in building at every step from an empty project to a full exact copy of the one that doesn't build. Then I copied the only thing that remained different: project's directory name. The build failed. It was a colon in the project directory name.
@RadikaRulesАй бұрын
That is actually painful
@aqua-beryАй бұрын
Oh my god... If that's a problem, Godot definitely should've complained to you when you were making the project first...
@temari2860Ай бұрын
@@aqua-bery It wasn't Godot's problem directly to be fair. If you build for Android in Godot it uses the Gradle build tool, which was the failing link in this case.
@RudxainАй бұрын
I remember being so frustrated trying to run an MKSH script on Android (not Termux, it was Llamalab Automate) and the problem was that my editor used a non UTF-8 encoding and MS-DOS line endings
@bryanpediniАй бұрын
sorry, but after having seen enough projects use it and how it fails in spectacular ways, I can only say: fuck gradle, with all my ❤️
@CEOofGameDevАй бұрын
4:08 >eszett Brodie:"the beta symbol" Germans in shambles.
@RalzoneАй бұрын
I mean... Not a lot of Germans who only speak dutch
@unitrader403Ай бұрын
SCHEIẞE
@kreuner11Ай бұрын
@@Ralzone a beta symbol is a visually different symbol
@MateuLeGrillepainАй бұрын
Based and not austerity-pilled /j
@alpacamale2909Ай бұрын
I'm not a German so I always called it 'the fat B'
@lyranemАй бұрын
Software handling Unicode incorrectly is software problem, not user problem. Saying “you shouldn’t do that” is a bad practice
@JonBraseАй бұрын
Software choking on emojis is a software problem. Users using emojis in filenames is a user problem. Two separate problems, but they interact.
@danielrhouckАй бұрын
Software should absolutely handle it. Especially because things that fail on emoji often fail on anything outside the BMP and I donʼt want to tell anyone using non-BMP languages that sorry, they canʼt name files something reasonable in their language. That said, it is still worth telling people that they should be *aware* that something will more likely break if they do stuff thatʼs farther outside the test conditions. As an analogy, if I break into your house and steal your stuff, thatʼs obviously my fault and I should go to jail. You did not “have it coming” or anything like that. But if I was only able to do that because you didnʼt lock your door or only used a MasterLock lock, then you should have been warned that this is less safe.
@JonBraseАй бұрын
@@danielrhouck Indeed it should. But users using emojis in filenames is problematic in ways that have nothing to do with the ability of software to handle it.
@AmirHosseinHonardustАй бұрын
That is a fair point. But also, using unicodes that are not easily typeable using your keyboard, or require third-party keyboards, when that file should be used by others, is an asshole move. Much more than using spaces in the name. So if it is on your own computer, sure go ahead. But if you are dealing with a software that deals with shared files, I would want to gate-keep the hell out of these assholisms. If the unicode is fairly easy on some keyboard layouts, sure.
@lhplАй бұрын
@@JonBrase Users naming their files however they like is not a problem. If you think it is, then _you_ have a problem. If you develop software for others, that is based on your views, then you _are_ a problem.
@deefdragonАй бұрын
You should be able to use emoji in files, not for emoji specifically, but because extended unicode should be supported. limiting to just a-zA-Z0-9_-. and space etc. is very anglo-centric, and I think other languages should be able to use files in their native language. Emoji are simply a subset of utf8, and so should be as valid as nearly any other unicode character.
@yuvalneАй бұрын
yup
@angeldude101Ай бұрын
The problem really isn't emoji, but rather inconsistent case-folding. Filesystems need file names to be normalised in the same way every time. Case-folding however is ambiguous and can change depending on not just the version, but also the region, which is a _terrible_ thing for compatibility. The solution is to not have filesystem-level case folding and to leave case-insensitive behavior to be handled in userspace. Generally if a piece of software requires a filesystem to be case-insensitive, then either the programmers are idiots, or the software is so old it probably predates unicode, and so casefolding for all of unicode is complete overkill.
@crusaderanimation6967Ай бұрын
Dziękuje
@bleack8701Ай бұрын
Exactly
@jonathanbuzzard1376Ай бұрын
Users have always done idiotic things with filenames. Let's start with Mac users starting file names with a space to change the sorting order. Then again we have Mac users using \ to put dates in the file name rather than ISO 8601. Of course notably \ is illegal in SMB. Then of course is putting newline characters in file names. I mean that really takes some effort. Every couple of months I have to take a look at the failed files in the backup and start emailing users if they want their stuff backing up to fix there file names.
@alarii2582Ай бұрын
I have a bunch of archived KZbin videos with emoji in their filenames
@remixedcatАй бұрын
Same. Specially lofi mixes
@ZadigАй бұрын
The lower-case beta you mentioned as an example isn't a beta. It's a eszett, a character used only in German that's (more or less, I'm simplifying) interchangeable with a double s.
@somenameidk5278Ай бұрын
ß vs β
@DeronJАй бұрын
❤
@kuhluhOGАй бұрын
since the last spelling reform, it isn't anymore double s and ß have different purposes since then that doesn't mean it isn't still sometimes done when the computer system you interact with is outdated (or shit if it's new)
@vincentschult1725Ай бұрын
@@kuhluhOGAfaik double s is a valid replacement for ß if you cannot type it. In Swiss High German it is even the rule that everywhere where German High German would use an ß, a double s is used. In German High German iirc the only difference between a double s and ß is that the preceding vowel is not shortened when using ß, while both produce (ignoring exceptions ofc) a sharp s sound.
@kuhluhOGАй бұрын
@@vincentschult1725 the ß has one additional thing in German High German: the preceding vowel of lengthened (similar to a silent h)
@SophiaGlencairnАй бұрын
The Fire emoji is a valid file extension for mojo files.
@linusbrendelАй бұрын
Was about to comment about Mojo
@GSBarlevАй бұрын
Another reason to dislike Mojo, lol (srsly, fam, just use *numba)*
@johnpenner5182Ай бұрын
mojo is amazing! it allows you to write fast portable GPU agnositc code. 🔥
@theredtechengineer1480Ай бұрын
You can't stop me. I use emojis and accent characters in file names.
@stupidburpАй бұрын
🎩
@unknowntotherestoftheworldАй бұрын
when you name a file with a character that normalizes into slash and the file now points a different file in a different directory
@goasererАй бұрын
Emoji? I don't even put whitespace in my filenames. Staying DOS compatible, just in case...
@EmayeahАй бұрын
relatable, I hate how Windows has program\ files...
@seto007Ай бұрын
Not even just for compatibility with older software and file systems; it just makes specifying the path in a terminal significantly more convenient
@GanerrrАй бұрын
bro ill use like half of unicode but never anything that forces bash to use quotes lol
@jello3456543Ай бұрын
8.3 for ever
@athomashoweАй бұрын
Honestly lack of white space wouldn't be the end of the world but the char limit is brutal
@vsmash2Ай бұрын
4:07 My brother in Arch, that's a Esszett. also known as "sharp S" german thing.
@JamesR624Ай бұрын
Yeah... if someone doesn't know what something is and just make up something so they SEEM knowledgable, I loose respect for them and instantly stop watching since they've shown they can't be trusted to actually be knowledgeable.
@mme725Ай бұрын
I don't think he was pretending to know, or that he saw it and consciously thought he had to make something up. I think it was a simple case of him mistaking it for the lowercase beta. @@JamesR624
@myhandleiswhatАй бұрын
@@JamesR624 my brother in youtube comments, it isn't that big of a deal. This take literally takes "someone was wrong on the internet" to another level.
@akam9919Ай бұрын
@@myhandleiswhat my little brother in youtube comments...you're not wrong
@whohan779Ай бұрын
What do you expect? He literally said “Firstly, most people don't really have an emoji selector on their system” while those _most people_ (probably well >70%) literally are on Windows 10/11 where you can just use 🪟+[ . ] or KDE where the shortcut may be different by default but a selector is still a core component that can be searched for. Normally I like this channel but this vid is an L.
@hoi-polloi1863Ай бұрын
Emojis are just unicode characters, so they're fair game for filenames. If nothing else, having an emoji in the name will keep most of the Linux kids out of your files, because they don't know how to type 'em. Side-note... way, *way* back we kids would "protect" our Apple II files by having the filenames have the bell character (ctrl-G?) in them. You couldn't see the bell character, so when you listed the directory you'd see the filenames, heard the bell chime, and you couldn't access the file unless you knew where in the string the bell was.
@TakeApartLabАй бұрын
Thats what tab completion is for. ❤ if i need to deal with a weird char ill just Type what i can and let bash type the rest of it for me.
@benkato_Ай бұрын
I do have some cronjobs and scripts that will yt-dlp some unarchived youtube streams that I can't watch when they are live... So sometimes it happens that some files may have emojis and japanese characters in it... But it always worked in VLC and never paid too much attention to it xD Sometimes I remove special characters so I can work with them in bash, but that's about it xD Edit: Oh nyo, the ß my beloved got confused for a beta character ;-;
@danielrhouckАй бұрын
Ugh, yt-dlp is especially bad here because it does name mangling into what it thinks is safer characters but those can sometimes cause their own problems, and itʼs still impossible to turn it off. EDIT: And now after ages that’s finally fixed!
@UlvicanKahyaАй бұрын
Oh boy... This brings back all the horrible memories I have about the infamous "Turkish i" problem. I can't even count how many programs out there just crush if you use Turkish locale beause of casefolding a single character handled differently in Turkish.
@PandacierАй бұрын
4:07 I believe this is the german Eszett and not "beta"
@markus321xyzАй бұрын
Yes this is a ß, in German (used I Austrian German & Germany German) this is called "Schafes S" (sharp S) or Eszett. If somebody writes sz it has the same meaning. Some systems/programmes/Databases don't like this letter very much 😅
@voyager-tc9dzАй бұрын
and it has no upper case
@PathinXАй бұрын
@voyager-tc9dzthat is not true, there indeed is a capital ẞ. Here the lowercase for comparison ß
@TeaMaster420Ай бұрын
The weird B!!! Like literally, why is this even needed?
@VeetrillАй бұрын
@@PathinX Capital Eszett got added to Unicode not so long ago, and more like an afterthought. Before then this letter has been considered as exclusively lowercase, and every time someone wanted to write a German word in caps, they needed to replace ß with SS.
@nezu_ccАй бұрын
I know Unicode is hard, but that's not a beta.
@pi_ist_tollАй бұрын
Do anything you can think of. You'll learn from anything dumb. Just DON'T DO IT IN PRODUCTION!
@hopelessdecoyАй бұрын
Do it all in prod got it!
@pi_ist_tollАй бұрын
@@hopelessdecoy Actually, learning from prod mistakes is even more effective because you know you actually broke something and never want to repeat that.
@__Brandon__Ай бұрын
If you don't test prod you don't know that prod is working
@blarghblarghАй бұрын
Go ahead and test in prod. Our project will be happy to accept your users
@salazar1554Ай бұрын
I kind-of feel adopting Unicode file names makes sense for one and done language compatibility. Emojis are an inevitable result of doing that. Why implement the languages separately? Plus I imagine some systems exist that name files based on the first word or few characters of the file, and those would benefit from Unicode. I personally don't have a use case for emoji file names, but if you are doing multilingual file-names anyway it makes sense to just do Unicode (even if file managers and terminals give up on a unified font style for uncommon Unicode characters, just grab any old svg of the character from a Unicode server or local sqlite database or something).
@2kadrenojunkieАй бұрын
i love linux. using emojis in filenames is a absolutely horrible idea, but you can do it and nothing is in place to tell you no. go ahead, label all your folders with emoji.
@TakeApartLabАй бұрын
Yea, windows has some WILD path stuff, its black magic. linux is boring in comparison lol (in a good way).
@MattiasA-t5l20 күн бұрын
The only rule there is: Always put a at the end of the filename, if not possible put it at the beginning (you may include additional s).
@tlhInganАй бұрын
The problem is not emoji. The problem was exposed by an emoji. We can debate having an emoji in a filename (and there probably will be instances where someone may feel it is appropriate as a filename), but the key point is Unicode. Just because it affects emoji today, doesn't mean tomorrow there won't be another codepoint which exhibits the same issue, except it's the equivalent of say, "the" or "a" or other common word in some language and you've just horribly corrupted it. And Linus is right - you should not be casefolding in a filesystem - because you're reducing namespace and that can cause hash collisions. If you want to case fold a filename, that should be done in userspace where the program can figure out how to uniquely identify each item, or to do things like a case-insensitive search. Also, Unicode case folding rules may change - it's an ever-evolving standard
@radswfiihqАй бұрын
1:09 on windows: Win+. on Mac: the globe key on Plasma: Super+. (Can be changed in settings)
@kuhluhOGАй бұрын
also, the newest revision of the german keyboard layout has an interesting key combination it's supposed to open an emoji picker, and if not available, type 😀
@tresfАй бұрын
Yes, the person that says "most people don't have a way to type emojis on their computer" is someone hasn't seen the default repurposing of the Fn key on all modern Macs.
@theairaccumulator7144Ай бұрын
I use win11 but win+. doesn't work for me no matter what I do.
@MarcinKralkaАй бұрын
I never even considered putting an emoji in filename.
@chaos.cornerАй бұрын
It happens easily when using tools like yt-dlp. It can break things pretty well like some files become invisible over samba (or maybe the vifs implementation in vlc, I forget). Fortunately it can be told to use a restricted character set which makes things ugly but at least they work.
@ababcb3005Ай бұрын
It's not just emojis that cause issues, I once ran into a program where spaces in the path caused it to stop working. The issue was fortunately fixed (very recently as a matter of fact), but it definitely got me thinking about keeping my folder names as "variable-like" as possible moving forward.
@lhplАй бұрын
Developers who are so incompetent that their code can't handle all legal file names should not be fǔcking allowed anywhere near a computer.
@coyo_tАй бұрын
its amazing to me that cmake (or was it make. or both.) to this day still shits itself on paths with spaces like we live in the software stone age still literally told me "paths cant have spaces in them" when i tried rebuilding like mf >:[
@lhplАй бұрын
@coyo_t do the c in cmake stands for crap, I guess...
@Lampe2020Ай бұрын
Interestingly the JDownloader2-installer script always put "/JDownloader 2" at the end of the specified path to install it in and then complained about there being a space in the chosen installation path. And I couldn't ever find the string "Jdownloader 2" in the script…
@ziv132Ай бұрын
A use case that I saw that's acceptable is in Obsidian it names the markdown files based on the title, if you have a framework for organising your notes that you use an emoji for different types then you end up with emojis in your filenames
@bloody_albatrossАй бұрын
ß is not a lower case beta, its a lower case German Eszett/sharp S. ẞ is its upper case variant, but that is a recent development. Before SS was used as the upper case variant.
@gunnarguАй бұрын
the file name rule on linux is what, no nulls? no slash? that's it? entire paragraphs, even newlines... fine...
@__Brandon__Ай бұрын
255 character max on most filesystems
@SaHaRaSquadАй бұрын
@@__Brandon__ 255 bytes, not characters. Those just happen to be the same size if you only use ASCII symbols.
@fomxgorlАй бұрын
ive not thought of this before. have considered using emojis in my passwords as a potential usecase that i need to watch out for as i develop my software. very useful with a password manager. will have to test emojis in my software for filenames now
@0xDEADBEEFАй бұрын
2 Brodie Robertson: Actually this patch can be removed easily, then ppl who has that lovely file names MUST run fsck to automatically rename invalid file names to correct ones.
@IAmPattycakesАй бұрын
I have archives of various streams, videos, etc. with the title of those as the file names, with timestamping at the start. Those titles sometimes have emoji and im not gonna change the title because sometimes the emoji are important contextually.
@szirspАй бұрын
10:55 Date ranges wouldn't even solve the problem. It't not when the file was created, but with which kernel version was the file created. And I don't think that information is stored in the file system...
@ThatTrueCJ201Ай бұрын
As someone who uses Japanese, downloads Japanese files and programs, I concur that unicode outside the ASCII specification is a pain.
@stupidburpАй бұрын
It was so bad that Japan created several other text encoding alternatives before unicode standards were approved and continued to support them because unicode was never fully supported universally. They even made their own OS with government support just because the language support was so bad in Windows earlier on.
@SeralyneYTАй бұрын
4:25 - That's not the Beta symbol. β is Beta. ß is a double S. β != ß
@quinten01Ай бұрын
Eh. Looks the same to me
@SeralyneYTАй бұрын
@@quinten01 Yeah so do I (uppercase i) and l (lowercase L). That doesn't mean they are the same.
@tutacatАй бұрын
Thank you bug submitters for triaging all the patches, or using the search function properly
@terranbyte2619Ай бұрын
I wasn't gonna use emoji in file names before, but after watching this... I might as well start doing it because it's cool.
@jasper265Ай бұрын
😎 It's basic, I know. I pretty much use it for anything involving things being cool, whether that's sarcastic, mostly neutral or enthusiastically positive. So it's very ambiguous in a way that I like. The time based approach wouldn't even work. You'd need to know when kernel versions were installed on the system. And even then, you could boot into different installs that mount the same filsystem but have different kernel versions. The best I can come up with is to try one way of case folding and if the file doesn't exist, try the other way. That's very messy though and can impact performance in certain cases.
@jello3456543Ай бұрын
It would be somewhat ugly and time consuming, but a third way would forcing the case folding file systems to perform a fsck that fixes the file names where casefold gives different result. Of course, that would make kernel downgrades impossible...
@CloudCuckooKingАй бұрын
Pedant here, Japanese has hiragana, katakana, kanji, romaji, man'yougana and "variant kana", which I won't say in Japanese proper because it contains as a substring something KZbin will probably filter. so it really makes things even more of a trainwreck, though in terms of computer transcription, variant kana is the only real problem and I don't know if any of them are even encoded in Unicode.
@TrabberShirАй бұрын
yes: if your OS allows Unicode characters in file names and your app handles files, you need a test which loads every Unicode edge case you can find. That test necessarily includes at least 7 files with emoji in the name. Sadly, very few projects actually have that test. And I cannot really blame them because when it fails, you are usually dealing with a bug in your OS or a bug in a library that is used by a library that is used by a library that you use. But not always.
@gokhanersumer2273Ай бұрын
Huh. I dont even use accented characters in my native language when naming. Old habits.
@mactan_scАй бұрын
ran into a program that couldnt handle unicode copyright symbol in device driver names, that was an interesting one to get around
@czos9239Ай бұрын
You certainly see emojis when sailing the high seas. At least that's what a friend told me.
@gdclemoАй бұрын
File names? I'll name my kids with emojis and you can't stop me.
@linuxguy1199Ай бұрын
Fun fact, you can also put newlines, spaces, backspace characters, and even ANSI escape codes in your filenames. If youre really skilled you can make files with null characters in the name, but you'll need some hand rolled assembly and a filesystem that supports it.
@FengLengshunАй бұрын
3:24 I think if you want even more pedantic-er, you can double that by counting Full-width and Half-width as separate system. There is also jpn_vertical which I think is separate enough as it warrants its own tesseract file.
@examancerАй бұрын
A compatible solution is possible by doing a fallback approach: use the new hashing algorithm, and if the hash is not found try again with the old hashing. No date logic required and would let us actually fix this bug while giving support for old files for now. Down the road, after file systems have migration strategies in place for long enough, the fallback could be removed
@nisonaticАй бұрын
Databases have been dealing with this for ages. The specific collation is defined as part of the table schema, and if you want to fix a buggy collation, you add a new version and keep the old one around forever. If a user wants the new collation, they have to rebuild those tables. That's most likely how you'd have to do it in a filesystem: the filesystem would know its casefolding version, and you'd need to run a userland utility to rehash everything, as well as figure out how to rename files if any collisions were detected.
@anon_y_mousseАй бұрын
That's like asking if you should use an 'i' or 'B' or 'k' in a filename. The question itself is pointless and really you should use whatever letters or emojis you want. The only character I really question that is a valid character is that of the newline, and truthfully, it makes sense to have it so you can name your files with a Haiku.
@jan_haraldАй бұрын
yes you should use emojis in filenames so other people can't actually open them >;P extra points if you also use zero-width spaces to create several files that look to have the same name, visually >;P
@lhplАй бұрын
Just use NFC and NFD in the same directory. Strictly speaking, two names can - literally (not binary) - _be exactly_ the same.
@wagyourtai1Ай бұрын
I wish linux had the windows alt+numpad thing for typing special chars.
@muellerhansАй бұрын
You can only access the source code of my git project if you can deal with its emoji name.
@darkwinter7395Ай бұрын
The correct answer is to fix the case folding, and build a file system upgrade tool that re-hashes any filesystems that need it when you install the new kernel.
@angeldude101Ай бұрын
The only use for case folding in the filesystems that I'm aware of is compatibility with older software designed for case-insensitive filesystems. I highly doubt any software that relies on case insensitivity even knows that non-ASCII characters exist, and as such I'd argue to filesystem-level case folding should only ever apply to ASCII letters, and even then, it's expressly for compatibility. If you want a case-insensitive interface for the user, then that's on the developers of the user-space application being used to perform case-insensitive matching.
@PanduPoluanАй бұрын
For the casefolding change, I think the right choice is to keep "the right method" in the kernel, but then write a low-level tool to fix those files using the older (wrong) method.
@chaos.cornerАй бұрын
I'm not a huge fan of emojis as characters but they characters they are and so should be handled appropriately. What this indicates to me is that a boneheaded decision was likely made somewhere else.
@sinomАй бұрын
Case folding in the filesystem is something windows does and it's extremely annoying. E.g. i was trying to compile a c++ program and it included both a folder name, and a library where the only difference was capitalization. (So something like include vs include) on a linux system i tried to compile this this worked no issues, but when trying to compile it on windows it kept mixing up what to include where and it was just a huge mess
@mirailuvАй бұрын
this reminds me of that thing where someone put an emoji in their back account nickname and the entire system just crashed
@LokiCDKАй бұрын
I am a fan of having a wider character set available for use in file name, only because it could potentially interact weirdly with attempts to read that drive by third parties. For instance by use of autopsy or various file system scalpels. For that same reason, why is there case folding in a file system? Case sensitivity in file naming is a long-held standard. Finally, if you were going to change the way something fundamental like that operates. That is something you do during a full major version number update.
@salazar1554Ай бұрын
Custom kernel where syscall functions contain emojis
@SaHaRaSquadАй бұрын
An OS built on Emojicode.
@hubertnnnАй бұрын
Its usually a bad idea to use anything other than lowercase english letters and digits (and maybe "_" (underscore)) for filenames. While most applications can handle other characters in filenames, many don't, and some handle those differently than others. This can cause unexpected behaviors and weird problems starting from sorting (is ą before or after a), up to file corruption.
@chaos.cornerАй бұрын
It's generally better to harden code to handle it though because you're going to run across it sooner or later. Not taking care to handle input properly is how SQL injection attacks are born. But yeah, for quicky get-the-job-done scripts, it's easier to keep it simple. Just be careful of scope creep.
@kuhluhOGАй бұрын
and then you get people who aren't English-native using your system some may not even be able to understand English and even if, they obviously (and understandably) want to use their native language how would you behave as a user if it wouldn't have been English which became THE IT language but let's say Japanese and the letter commonly used for it would be Japanese characters; I would guess you would use English letters nonetheless
@lhplАй бұрын
No it's not. What _is_ a bad idea is using bad software that can't handle all legal file names.
@kuhluhOGАй бұрын
@@lhpl yep, on Linux there are only two bytes which aren't allowed in filenames: 0x00 and 0x2F the nullbyte the / as path separator
@SaHaRaSquadАй бұрын
@@lhpl You're not wrong, but being right doesn't fix broken software.
@makramcАй бұрын
As a German, I use ß in my filenames all the time. Also ss is actually kinda the right way to do it.
@mirzahadzic8666Ай бұрын
So, people watching this video will now start putting emojis in filenames, and keyboards will be enlarged with stupid images. Thanks Brodie!
@TakeApartLabАй бұрын
The comments having to idea of having /home be a home emoji is the funniest shit ever. i would die if i ssh'ed into a computer and saw that.
@pikachulovesketchup666Ай бұрын
Developers should stop assuming that file name (or any input from user and any string displayed to the user) is collection of characters using Latin alphabet. Text is not something rendered from left-to-right and top to bottom character by character. One keypress doesn't mean single character.
@slycordinator25 күн бұрын
One difference between Macs vs Linux and Windows for filenames is that on HFS+ (and HFS, I assume), they use a different utf encoding. Linux and windows use NFC, where all the codepoints are stored in a fully-composed form. On HFS, they're in a variant of NFD, where some codepoint ranges are in composed form like in NFC and many others stored in decomposed form. I didn't know about this and was confused when I downloaded a Korean guy emailed me. The name looked like normal Korean in the browser, but when I opened it on my PC, the name was weird. Similarly, if you were on an HFS+ Mac and used python to create a file named "한글.txt", the filename in the file system will be automatically re-encoded. So, if you were to try and open the file again, it would likely fail, because the string will be NFC and its bytes won't match the file name
@PredatoryQQmberАй бұрын
That's what names are for. If I wanted to use some minimalistic dumbified system then I would have used inode numbers directly.
@MaramowiczАй бұрын
Why just not check for... both? First for new, and if filesystem don't know what's wrong it's probably knows what's wrong, and if it is in old then just automatically update to new one.
@nullplan01Ай бұрын
6:53 I know these symptoms. The directory entry exists, but the kernel cannot find the file when given the name. You can also get the same problem when mounting a FAT partition with files that have names containing '/'.
@rogo7330Ай бұрын
Emoji must be a great thing to use when naming files in viruses. Everybody ignores that you can put ANY octec, not even a valid UTF-8, to the filename. Always remember the ctrl-v key and how to put any number from 0 to 255 into the terminal, boys. Or use latest bash and $'...' strings (which are on the way to be included into the POSIX shell, along with a '-o pipefail').
@johannes7856Ай бұрын
Mojo file extension goes BRRR 😂
@JamaicaWhiteManАй бұрын
I don't know how I'd even do that - my keyboard doesn't have emojis.
@eDoc2020Ай бұрын
Copy and paste, various IMEs, or just download a file with an emoji in the original name. Social media users often use emoji in titles, downloading such media will often copy it as (part of) the filename.
@ws_stelzi79Ай бұрын
Just to put my 2 cents to the 'ß' case folding: especially in Switzerland it is very common to case fold it to 'sz' instead of (the more frowned upon) 'ss'! The 'ß' is basically the very best example in the German language when "just" case folding can fail because of some unknown nuances.
@moocatmeowАй бұрын
i have found that an emoji in a file or folder name will usually break almost everything it comes into contact with
@howqso2885Ай бұрын
Use it. Linus is wise to really enforce his idea of a longevity into this enormous project, where so many distro rely on some code that was built decades ago, obviously moving a single brick might mess up a lot. But in here, that's an evolution rather than just breaking for refactoring or for providing a random solution. Break any program and distro that relied on this old way.
@tutacatАй бұрын
Yes but if you change the new casefold function, that breaks the casefold function for _all previous kernel versions_
@PatternerАй бұрын
2025 windows edition of this: "don't use spaces in file names"
@georgeorwell4752Ай бұрын
Google Drive allows slashes in their file names. I absolutely love this idea and I think linux should adopt it.
@p0358Ай бұрын
Yes you should use emojis in filenames, it's Unicode, Linux supports Unicode, go use it. This is Windows behavior to be afraid of file names, where inserting a space or any special character will break everything. Don't tell them we have colons in the filenames too.
@NobodyInPersonАй бұрын
Exactly. This bug is literally a problem with casefolding. Why would one ever want a case-insensitive filesystem, I don't get it. I have happily been using spaces, arrows and emojis in folders and filenames. Emojis are a very good way to 'tag' files in a super condensed way.
@mrab4222Ай бұрын
You can have spaces in Windows filenames. Can you have slashes in filenames? No, but not in Linux either.
@MasterHigureАй бұрын
A previous workplace had all project files on OneDrive. That OneDrive folder had an æ in its name, in addition to spaces and a dash. It was plain horrible.
@mudi2000aАй бұрын
The problem is a case folding file system. Linux or Unix in general was not designed with this in mind.
@abit_grayАй бұрын
Yes, you should be able to use Emoji and all other Unicode characters (or composite characters). If you say you support Unicode, you cannot go "but not that part". There should be only few characters (like '/') that are not valid filenames.
@c128stuffАй бұрын
This problem, in a more general sense, has existed since forever in any system which keeps persistent data, when wanting to make a change to how that data is formatted or located. And the solution is known, but has its own problems. You need to detect what is old, and migrate it to new before using it. The problems of that solution? - it will have some performance impact, tho if you thought about the need for this, that can be minimal - you run the risk of ending up with multiple 'generations' of a format, or in this case way of calculating a hash, and that means the number of conversions you need to support will grow with time. Those however can be addressed by not doing those conversions 'on the fly', but by scanning the persistent storage (filesystem) for all instances which need migration, and doing this 'offline'. That of course comes with downtime... you gain some, you lose some. Keeping a clearly wrong implementation because migrating is too hard... yes, it can be a valid choice, it can also come back to bite you at any random time in the future when 'clearly wrong' turns into 'real problem'.
@tlpthxАй бұрын
11:02 deciding based on the creation date would be stupid, to be frank. If the patch was installed on the date a file was created cannot be decided based on the merge/revert dates, those are only weakly correlated.
@coryschwartz1570Ай бұрын
While working on at a company that allowed self managed (Linux) devices on the network. A couple of friends decided to change our hostnames to have Unicode characters. All hell broke loose. DNS worked fine, but it broke the network IDS for hours and and our IT department told me it somehow also caused routing problems because of their network auth system
@ottolehikoinen6193Ай бұрын
Whitespace filename characters, emojis, normal foreign characters, the things people do. Basic Latin character set with preferably date in the yymmdd format in the beginning is the only acceptable archivist filename.
@actually_peanutsАй бұрын
@enemixiusАй бұрын
Never considered emojis in filenames. Emojis in SSIDs are pretty fun though.
@MaxusRАй бұрын
They just had to release a simple utility that will find any old incompatible files and rename them. No, they'll just revert the changes and say 'this feature is bad and should never be used'.
@lainalienАй бұрын
ive used emoji in urls which is cool lokin until u get it in the url bar and theyre coverted to numbers rip
@catgirlQueerАй бұрын
punycode!
@5h4ndtАй бұрын
non-printable characters are also quite nice to use. Like the bell. Makes your pc beep everytime you do ls that particular folder.
@DanielNerdАй бұрын
the fact that people found out this is an issue is worse than the issue itself
@fredericjaquet3729Ай бұрын
EDIT : I saw afeter having written this that the case was already mentioned in the comments. Sorry. 4:10 Well, it's not the "beta" character, it's "Eszett" or "scharfes S" : a german character intended to replace the double 's' in some words (unicode U+03B2). In the "beta" character (unicode U+03B2 as an example), the tail should go lower than the bottom line of the character 😉 Eszett : ß Beta : β Merry christmans Brodie !
@stupidburpАй бұрын
Mojo language uses emoji file extensions by default, just because they can. No worries, boring old plain text extensions are also an option. If the OS struggles with long standing unicode, that seems like an OS problem.
@interru_ioАй бұрын
Usually every byte except ASCII '/' is valid for filenames in Linux for most filesystems. Even invalid utf-8. But most Software will choke on bytes that aren't valid utf-8.