Htree | The Secret Savior of EXT3

  Рет қаралды 1,960

Maple Circuit

Maple Circuit

Күн бұрын

Пікірлер: 29
@itguydave2164
@itguydave2164 Ай бұрын
Love this type of content, but don't burn out trying to release too frequently - it's about quality, not quantity! On another note, aren't hashes a computational representation of a data chunk, which by definition is going to be unpredictable? If so, ordering it would be a messy process that could result in a dramatic loss of performance if contents were spread across multiple blocks? Or perhaps I'm missing something obvious in my current sleep deprivation...
@Maple-Circuit
@Maple-Circuit Ай бұрын
Hashes can be for a big chuck of data but also for a small amount of data like FS logic or even passwords. Here, the idea is just to limit the size needed in order to store what could be called the value of the node inside of the HTree... I don't know if I'm clear, so sorry if that isn't helping (; I agree that the success of this channel is clearly due to quality, and that won't change! But if I want it to grow, I must not stop!!
@keeblebrox
@keeblebrox Ай бұрын
Phenomenal coverage of the topic! I'm familiar with data strucures and algorithms but not the specifics of filesystem architectures. You covered enough of EXT2 for me to understand the problem and then the solution as well.
@berndeckenfels
@berndeckenfels Ай бұрын
I would be curious about having f a video on balancing threes, including how to decide when to use a new index block based on the inserted hash values and how to avoid root block full
@Maple-Circuit
@Maple-Circuit Ай бұрын
There can be another level of index block to further expand the tree, I'll probably make a vid about the step by step process one day!
@BleuSquid
@BleuSquid Ай бұрын
Filename hashing itself wasn't a new concept. I've seen many instances (and created a couple myself) where the application uses a directory structure where the files are stored based on their hash. e.g. The MD5 hash of "banana" is "72b302bf297a228a75730123efef7c41" so the file would be saved as /7/2/b/banana One thing I don't think you mentioned was the reason to use a hash instead of the filename itself: Using a hash algorithm allows for a more uniform distribution of the entries. Suppose I were using the filename, and an application that prefixed every file name with "ban" - it would effectively invalidate the usability of those first three characters for locating the file. If I were to use that instead of the MD5 hash in my example above, the file would be saved as /b/a/n/banana, along with all the other (potentially thousands of) files that start with "ban"
@Maple-Circuit
@Maple-Circuit Ай бұрын
True! The distribution is the #1 reason for why they settled on the algorithm that they used. The quantity of detail that goes into a simple FS feature is insane!
@phillipsusi1791
@phillipsusi1791 29 күн бұрын
The down side to HTREE is that the order that file names are returned from readdir() is basically random compared to the inode numbers of the files. So something like tar that just reads a file name and calls stat() on that file name and the reads that file's data is rather slow because every time it calls stat() the disk has to seek forward or backwards to read the next inode that is nowhere near the previous inode. To put that another way, if you know exactly which one of 10,000 files you want, HTREE Is fast to find it, but if you are trying to iterate over all files in the directory, it slows things down.
@Maple-Circuit
@Maple-Circuit 29 күн бұрын
True!
@adymode
@adymode 26 күн бұрын
The entropy provided by the hash is desired as it causes the tree to remain roughly balanced as it grows. Each node holds by default up to 31 hash links, this can begin as links to leaf nodes relating to actual directory entries, but as the directory gains more entries the links can become treated as hash ranges and intermediate nodes can be created freely, sort of guided by the hashes randomness the tree can grow in a roughly balanced fashion without concern that some branch will get exceedingly deep.
@phillipsusi1791
@phillipsusi1791 26 күн бұрын
@adymode yes... I'm aware.
@adymode
@adymode 26 күн бұрын
@@phillipsusi1791 I thought it bared mentioning since, the issue is not that we get directory contents in random order with ext3s file search acceleration, it is that we can no longer easily get the contents in the disk order which ext2 without its acceleration used to supply. Any order that our search accelerator uses will scramble the disk order, since it is the disk position we are seeking from filenames. The hash is sort of crypto ordering the tree by filename. It has to encode filename or it cant locate by filename. readdir() may be expected to return a full list of directory contents faster with ext3 that ext2. It may return file stats faster too. Sorting ext3s filelist by inode may be a faster way to get the disk ordered list than ext2s non-shuffled method ?
@adymode
@adymode 25 күн бұрын
I wonder if the order of ext3's readdir does slow access enough to make a patch worthwhile to sort by inode by default.
@mahagr78
@mahagr78 Ай бұрын
Has anyone found EXT3 driver for microcontrollers? I would love to read EXT3/4 partitions in SD cards...
@jzlazy05
@jzlazy05 Ай бұрын
Maybe tou could implament/find an ext2 driver which would be simpler? In the video he said that ext3 must be backwards and forwards compatible. Could be worth a shot!
@Maple-Circuit
@Maple-Circuit Ай бұрын
Yep, you can mount ext3 with ext2 driver, now GL on finding the code for it XD
@mahagr78
@mahagr78 Ай бұрын
@@Maple-Circuit Well, the one I found is super old EXT2 driver, which is why I asked.
@jzlazy05
@jzlazy05 Ай бұрын
Really awesome video as always! What is your process for making videos? I've been thinking of making videos about implamenting drivers for microcontroller devices (ie rtc's, sensors etc) from scratch, and I really don't know where to even start video-wise. How did you start, and what inspires you when you to keep making them?
@Maple-Circuit
@Maple-Circuit Ай бұрын
There are a couple of thing that I think everyone should do if they want to make technical content: Read a lot. When I make a video, I scrape so much info that I could talk for 4x the time about the subject. Better understanding=better explaining It doesn't to be perfect, just starting and doing it again and again will make you get better. Now the process for me is just noting things down in my notes to make sur I don't forget things, record the audio with audacity and edit with DaVinci resolve. Notes in the description if you want to see what it looks like (;
@jzlazy05
@jzlazy05 Ай бұрын
@@Maple-Circuit Thanks for the reply! That was very helpful :))
@SFDestiny
@SFDestiny Ай бұрын
"bi narry" The '-ary' is the obvious suffix leaving bin- as the root. (The 'n' is added when bi- is appended with word-forming element that begins with a vowel.) Thus, it's `bin-ary or even `bi-nary, but I cannot imagine how to get to bi-`nar-y 😆
@kitlith
@kitlith Ай бұрын
Since I didn't spot the backticks at first, and others might not know what they represent: They're stress markings. They're saying that the stress belongs on the first syllable. i.e. BInary, as opposed to biNARy
@SFDestiny
@SFDestiny 29 күн бұрын
@@kitlith yeah. I considered BI-nary. Good to know you find it reads better. ˈbīˌnerē is how Google presents the pronunciation from my region of NA, with ˈbīnərē as apparently more common... In the main I choose to comment to suggest that etymological stems tend to receive the stress of a word, and thus in a veiled way to share that I find the thought that *nar- is the word root delightful! 🤭😊
@AK-vx4dy
@AK-vx4dy Ай бұрын
I still don't know why they called HTress ;)
@kitlith
@kitlith Ай бұрын
From "A Directory Structure for Ext2" by Daniel Phillips: Thus inspired, I went on to discard the BTree approach, and invented a new type of indexing structure whose characteristics lie somewhere between those of a tree and a hash table. Since I was unable to find anything similar mentioned in the literature I took the liberty of giving this structure the name HTree, a hash-keyed uniform-depth index tree. (So my best guess is that the H stands for Hash)
@AK-vx4dy
@AK-vx4dy 29 күн бұрын
@@kitlith Thank you!
@MykolaTheVaultDweller
@MykolaTheVaultDweller Ай бұрын
second
@hampus23
@hampus23 Ай бұрын
Awesome 👏
EXT4 | How does it work?
35:05
Maple Circuit
Рет қаралды 8 М.
EXT3 | How does it work?
21:10
Maple Circuit
Рет қаралды 2,8 М.
Air Sigma Girl #sigma
0:32
Jin and Hattie
Рет қаралды 45 МЛН
VFS | How your system knows where files are
15:15
Maple Circuit
Рет қаралды 9 М.
Can Ai manage domains?👀
4:22
Curious Bitcoin
Рет қаралды 5
I Created The World's Biggest Sudoku (with Code)
10:45
Green Code
Рет қаралды 81 М.
I made Tetris in C, this is what I learned
15:15
Austin Larsen
Рет қаралды 31 М.
AI Is Not Designed for You
8:29
No Boilerplate
Рет қаралды 367 М.
Linux Kernel 6.13 | you won a guitar pedal?
50:13
Maple Circuit
Рет қаралды 6 М.
I built a FLAP ENGINE (New Rotary Design)
18:58
Integza
Рет қаралды 1,8 МЛН
EXT2 | How does it work?
26:48
Maple Circuit
Рет қаралды 8 М.