The ORDER BY Algorithm Is Harder Than You Think

  Рет қаралды 59,572

Tony Saro

Tony Saro

Күн бұрын

In this video I describe in detail how my implementation of the K-Way External Merge Sort algorithm works. K-Way External Merge Sort is an algorithm used to sort large datasets that don't fit in main memory (usually RAM). Therefore, this algorithm is used by databases like Postgres to process ORDER BY queries when tables don't fit in memory. The algorithm consists of a series of "passes" through one or multiple files and a number of in-memory buffers used to load and process different chunks of a file in each pass. The end result is a file that contains all the requested rows sorted by the keys given in the ORDER BY clause.
🌐 LINKS
Algorithm Implementation:
github.com/antoniosarosi/mkdb...
✉️ CONTACT INFO
Business Email: business@antoniosarosi.io
Contact Email: sarosiantonio@gmail.com
Twitter: / antoniosarosi
Instagram: / antoniosarosi
LinkedIn: / antoniosarosi
🎵 MUSIC
• [Chillstep] Broken Ele...
• Ptr. - Genesis
• Digital Road
• Juno
📖 CHAPTERS
00:00 Introduction
00:22 The Memory Problem
01:32 Database Tables & Sorting
03:18 K-Way Data Structures
04:38 Algorithm Execution (Pass 0)
06:17 Pass 1
09:22 Pass 2
11:07 I/O Complexity
11:58 Variable Length Data
13:16 Final Thoughts
🏷️ HASHTAGS
#programming
#computerscience
#algorithm

Пікірлер: 195
@paulstelian97
@paulstelian97 Ай бұрын
Since you do "limit 100", can't the query planner be smarter and do some sort of limited sorting that drops off entries automatically? Like an array that holds 100 entries, and every entry is just inserted into there, but the excess goes off and is discarded automatically. I actually wonder if you can't do this for most reasonable queries too!
@tony_saro
@tony_saro Ай бұрын
Maybe it can, I didn't implement it myself though.
@treelibrarian7618
@treelibrarian7618 Ай бұрын
I also had the same thought - for any limit N result that is easy enough fit in ram you could do this with no extra disk space required and only one pass of the data. take first N rows, sort them and then load more data from the file in big enough chunks (10MB?). check each row against the last in the sorted result table. if the checked row is before the last in the result table, keep it in an unsorted pile that just tracks the last in the pile, and then move the last-pointer in the sorted result table to the previous entry. continue till the last in the unsorted pile is after the current last in the sorted pile, then merge and sort the two together, keeping the first N again. repeat till whole database has been scanned.
@tony_saro
@tony_saro Ай бұрын
@@treelibrarian7618 Yeah the algorithm seems pretty obvious, but you'd have to figure out if the limit actually fits in memory and the problem is you're dealing with variable length data (not all rows are the same size). It's not as easy as it seems, but it doesn't change much anyway, you still need an external sort algorithm, I don't even know why I added the limit in the example.
@paulstelian97
@paulstelian97 Ай бұрын
@@tony_saro Even variable length has a maximum length, so you can conservatively estimate the maximum for that.
@tony_saro
@tony_saro Ай бұрын
@@paulstelian97 You can estimate a max size using the table schema, as long as it doesn't have TEXT or BLOB fields it's pretty reasonable. But anyway, I'm just so dumb that I didn't even think about optimizing LIMIT queries when I wrote the DB, I actually didn't even implement LIMIT, I only put it in the video example because usually you won't SELECT * FROM a giant table, so I thought it'd be more "realistic", but still, as mentioned, that doesn't change anything about the video, when applicable you still need an external sort algorithm. I might pin this comment if more people have doubts about this.
@eduardoandrescastilloperer4810
@eduardoandrescastilloperer4810 Ай бұрын
Something so simple ended up being so complex and you didn’t even talk about distributed DBs. Top notch
@ArkenGAMES
@ArkenGAMES 24 күн бұрын
Yeah how would that look like?
@eduardoandrescastilloperer4810
@eduardoandrescastilloperer4810 23 күн бұрын
@@ArkenGAMES imagine this but on like 10 or 20 servers and with backups and caching
@nirshadnijam2291
@nirshadnijam2291 Ай бұрын
This is top tier. The people who invented these algorithms. Insane. We are taking technology for granted. You explain this really well. For a person who doesn’t have a formal computer science background, this video helps me understands how the tools I use day to day work under the hood
@tony_saro
@tony_saro Ай бұрын
I have a formal Computer Science background and I still find databases very hard to understand 😂. There's a lot of research that went into them over the past 5 decades. People who came up with these algorithms are definitely insane.
@ra2enjoyer708
@ra2enjoyer708 Ай бұрын
@@tony_saro Well CS is kinda orthogonal to the database design. While RMDBs are based on math more-or-less, you won't be able to explain why Mongodb, a No-SQL database, supports join operations, while an egregiously RDBMS Postgresql supports egregiously unrelational formats like JSON and XML, with math theory alone. And SQL databases weren't THAT math-based to begin with, back when PostgreSQL started there was no such thing as "SQL compliant", since every RDB had its own incompatible flavour of SQL. Considering it was also the time when the applications were written expecting to talk to the DB directly, the hatred for SQL and the need for ORMs at the time makes sense, especially since there was no XML crutch to use as a go-to serializible format for messaging.
@TreeLuvBurdpu
@TreeLuvBurdpu Ай бұрын
I'm watching this just from the perspective of someone wanting to sort tables that might become very large. This explains why it's so important to create indexes on any columns you might want to sort.
@tony_saro
@tony_saro Ай бұрын
True, BTree indexes will skip this algorithm altogether.
@editxswajal
@editxswajal Ай бұрын
This is what for I pay my internet bills. Btw great work and explained well
@tony_saro
@tony_saro Ай бұрын
Thank you
@ciberman
@ciberman Ай бұрын
You opened a door of new kinds of algorithms for me. I never realized that there is a huge world of algorithms and data structures to work with data that don't fit in memory. Amazing!
@alfredovr9111
@alfredovr9111 Ай бұрын
Que sorpresa encontrarme con videos tuyos de nuevo! Finalmente todos los años que llevo aprendiendo inglés valieron la pena 😂. Mucho animo con este canal me encantan tus vídeos y aprendo mucho de ellos
@Andre-kh3fl
@Andre-kh3fl Ай бұрын
You are a legend. I've just watched your video coding a database and I'm very impressed. I'd really like to see you coding a compiler/interpreter. I know it takes a while, but you have awesome didactics. Keep going man, your channel is top tier!
@tony_saro
@tony_saro Ай бұрын
I will at some point, now I'll focus on smaller projects because the database drained all my energy 😂
@neiliusflavius
@neiliusflavius 28 күн бұрын
I remember my dad describing this algorithm to me - except he was doing it on an old mainframe where each of the temporary files were on tapes that needed manually changed between passes!
@cryptopatrick
@cryptopatrick Ай бұрын
Damn! This guy is next level - insane quality. Bravo!
@Israel220500
@Israel220500 Ай бұрын
Really nice video man. Merge sort was my favorite sorting algorithm when I was beginning my journey into computing. Nice to see that what databases use is a variation of it (with a lot more complications hehe). Saludos de Brazil.
@levipalait683
@levipalait683 Ай бұрын
Very good explanation and amazing animations! Please keep up the content!
@ra2enjoyer708
@ra2enjoyer708 Ай бұрын
This video is pretty good at picturing file operations as something that just works and not a clusterfuck at all.
@tony_saro
@tony_saro Ай бұрын
It just works but it's not easy to implement at all 😂
@vinicius707
@vinicius707 Ай бұрын
This is some great content, man! I really appreciate the effort that you put into it and for making it available for free on KZbin. Thanks!
@tony_saro
@tony_saro Ай бұрын
Thanks man
@m-alghobary
@m-alghobary Ай бұрын
The explanation is so clear, I didn't have to put any effort to get the idea 👏👏👏 I subscribed, dropped a like and I hope you continue producing great content 🙏🏻
@tony_saro
@tony_saro Ай бұрын
I will, thanks for the sub and like 👍
@elraito
@elraito Ай бұрын
This is actually valuable lessons presented in awesome way. Man i just hope you blow up because we need way more of this this type of content.
@tony_saro
@tony_saro Ай бұрын
Working on it 📈
@sunkittsui7957
@sunkittsui7957 Ай бұрын
Very intuitive and concise explanation!
@gu1581
@gu1581 Ай бұрын
At my work, we have a project where in an application several queries are run, the results are mapped into memory via 'Object Relational Mapping' and then stuff like joins, limiting and SORTING is done in the application with the RAM of the query results. I pointed out the performance difference it would make but the fact that it's not even possible to do this for bigger tables real adds some fuel to the fire :D Great video, shows why leaving as much processing as possible to the DBMS is a good idea
@tony_saro
@tony_saro Ай бұрын
If you're selecting only a few megabytes of data that's probably fine, but otherwise the database will do a much better job, it's designed and optimized for that.
@gu1581
@gu1581 Ай бұрын
@@TapetBart We used microservices and chose to have one database per service. Turnrd out that joining across multiple databases is very cumbersome especially if they have different credentials. So we simply did the joins with ORM in the application... and by "we" I mean my team before I joined the project. Later I suggested we used one schema per service eliminating all of that. Don't even know why we used microservices in the first place - we don't have to scale out the app for higher throughput because only a few dozen users use it at once
@ashwdq
@ashwdq Ай бұрын
Que buen videoo!! Muchisima suerte y ánimo con este canal! Tienes muchísimo potencial para crecer. No te rindas y muchísimo ánimo❤❤
@ViniciusVieira13
@ViniciusVieira13 Ай бұрын
Great presentation and explanation on this topic. I've happily subscribed
Ай бұрын
Pretty good explanation! I just remember your first videos in the spanish channel and it's just amazing to see how you are progressing as engineer!
@carylandholt
@carylandholt Ай бұрын
Outstanding job! So well presented and extremely clear.
@Glovali
@Glovali Ай бұрын
Very well explained. I look forward to more videos!
@tony_saro
@tony_saro Ай бұрын
Working on it 👨‍💻
@dearozero3631
@dearozero3631 24 күн бұрын
One of the coolest low level CS videos I've seen in a long time. The animation quality is amazing.
@loocheenah
@loocheenah Ай бұрын
This channel is lit, man, instant sub 🎉
@BrunoAlmeidaSilveira
@BrunoAlmeidaSilveira 28 күн бұрын
The animations are cool, and your explanation is right and on point. Really enjoyed your content 👏
@urbaniv
@urbaniv Ай бұрын
Awesome. Great work
@sidreddy7030
@sidreddy7030 24 күн бұрын
This is so beautiful. Loved it
@adokana_
@adokana_ Ай бұрын
Buen vídeo, me alegro de volver a verte
@lampham7874
@lampham7874 Ай бұрын
I've heard this kind of algorithm since university but too lazy to dig to it, now thank you for helping me got its idea 💪
@Mariomm_marti
@Mariomm_marti Ай бұрын
Hey! I think what you did in your Spanish channel was much less niche and this content definitely looks much much more polished and informative. Congratulations!
@s8x.
@s8x. 21 күн бұрын
this video is top quality. such valuable information here explained so well
@amj864
@amj864 Ай бұрын
Just subed, love the video hopefully, you do many many more and gain a yuuugggee fan base
@eric-seastrand
@eric-seastrand Ай бұрын
Fascinating! Subscribed.
@josueC235
@josueC235 Ай бұрын
No hay cosa más hermosa que ver tu contenido en inglés tambien 🥹🫶 Amo enserió tu contenido brou ❤
@66ussa77
@66ussa77 Ай бұрын
Excelente como siempre 💯💪
@alexanderzikal7244
@alexanderzikal7244 Ай бұрын
Thank You again! A really interesting problem with different sizes.
@consciousmi4842
@consciousmi4842 Ай бұрын
Wow, this was very well explained. Thanks for posting. Can we have more videos plz. Thank you again
@tony_saro
@tony_saro Ай бұрын
Sure, I'm working on a video similar to this one focused on algorithms and then I'll start working on another #mkown episode
@robertchang-me
@robertchang-me Ай бұрын
Really good content, like it
@mayfly0
@mayfly0 Ай бұрын
wow such a clear explanation 👍👍
@facundopuerto4415
@facundopuerto4415 Ай бұрын
Muy buenos tus videos. Las animaciones hacen que sea mucho más fácil de entender. Saludos!
@_CJ_
@_CJ_ 25 күн бұрын
So for this you would use things like RAM drive or buy Optane and it become much easier with modern SSD I would guess. I never thought that database use files and hard disk to sort stuff. Thank you for this lesson! Very cool animations, easy to follow and I like that you really implemented it and not just read some documentation! 💛Huge respect. It may be just basic and naive implementation but that is the best place to start with such complex thing. Looking forward to next video! :) Cheers
@ChrisHalden007
@ChrisHalden007 26 күн бұрын
Great video. Thanks
@sebamarin12
@sebamarin12 Ай бұрын
You have explained it in such a way that I have understood it even though I am an HTML programmer.
@borislavrangelov4770
@borislavrangelov4770 Ай бұрын
Regarding dealing with variable-length, during pass-0, when outputing a sorted page of tuples, offset from the start of the file and page size can be recorded in an in-memory 'page file' which can be just an ArrayList (java background here, use something which can grow and has 0-n lookup speed). Later stages can use that 'page file' list as a way to lookup the position of the page in the file for the cursor to use. Their outputs will generate a new 'page file'. Now, if we want to parallelise this algorithm, since we have the offset and size, we can let each worker thread find the location of whichever page range its going to be working with using that 'page file'. The output will always be starting from the minimum offset and be the total size of the range and the 'page file' since its an Array List each thread can insert in the appropriate index the new page values.
@tony_saro
@tony_saro Ай бұрын
That's more or less what I'm doing but you can't store it in memory only, let's say you need to sort 1TB and pages are 4KB, you'd need to store the offsets of approximately 250 million pages, using 4 bytes per offset would already require 1GB of memory. So in case that happens you need to store what you call "page file" on disk as well. Another very important detail is that the number of pages will change in each different run, just like I showed the example of producing 3 pages from 2 pages only, maybe in the next pass you produce 1 page using 3 pages. It's still paralelizable, but you have to store the offsets for each thread during each run. And this is just a basic algorithm I came up with, if you take a look at the Postgres version it's much more complex than what I've explained here 😂.
@borislavrangelov4770
@borislavrangelov4770 Ай бұрын
​ @tony_saro Will take a look. Also I didn't consider such large queries where GBs or even TBs of data are involved. The videos are great. Got me thinking on a lot of stuff around db engine design. Looking forward to more videos 😁
@AqoCyrale
@AqoCyrale Ай бұрын
glad you went in-depth about this topic. will you do more like this with other sub-topics? or do you have a new big project you're working on that you will release in the future when done?
@tony_saro
@tony_saro Ай бұрын
Both, next video will be similar to this one and then I'll move to another project
@AqoCyrale
@AqoCyrale Ай бұрын
@@tony_saro awesome, looking forward to it. thanks a lot!
@genins21
@genins21 Ай бұрын
Amazing explanation about the sorting, but I'm actually not sure you need to sort the whole table any time someone asks for top N values... It would make much more sense to select the top 100,and then just sort that 100
@tony_saro
@tony_saro Ай бұрын
Check the pinned comment
@brayancortes7333
@brayancortes7333 Ай бұрын
Gran explicación, sigue asi, ademas me ayudas a mejorar mi ingles, un saludo desde Colombia
@user-ju7co9md1q
@user-ju7co9md1q Ай бұрын
El rey a vuelto, ahora sí hay un buen motivo para aprender inglés.
@starc0w
@starc0w 23 күн бұрын
Very good!
@zxuiji
@zxuiji 6 күн бұрын
during pass 1 I woulda filled a file with the highest and lowest ID of each page and used that during phase 2 to sort the results only once since it's going to be a lot easier to sort the ID recorded in memory then it is to sort the file contents. Once the IDs are sorted by lowest ID 1st it's only a matter of looping through them to read the relevant pages. If a page noted by the ID list is missing an ID in a range it's only a matter of looking at the next page in the sorted ID list. If you're lucky the list of IDs will fit in memory, if it doesn't then you're just applying the original method you mentioned to the ID list.
@yunlin-us8so
@yunlin-us8so 22 күн бұрын
great explain and introduce
@tech-nomade
@tech-nomade 3 күн бұрын
Rust, Databases, Algorithms, Neovim & Linux - What do you need more in a video? :D
@victormadu1635
@victormadu1635 Ай бұрын
Please keep up the good job
@StevenHokins
@StevenHokins 20 күн бұрын
Very cool ❤
@macgyverswissarmykni
@macgyverswissarmykni Ай бұрын
Nice seeing someone else writing Rust code
@LokendraSingh-42
@LokendraSingh-42 Ай бұрын
This video is complex as is, and now I am wondering what will happen when we have to implement Isolation. I guess, it'd simply be reading required pages during initial run, and then we can work with that dataset and sort it Here's one more, implementing postgres statement timeout
@user-yi6sb8qo1j
@user-yi6sb8qo1j Ай бұрын
This reminds me the bottom up approach of merge sort, tricky, would this K-way done in such approach actually? Subcribed and liked, hope this channel always continue. Here are DSA topics beyond textbooks but also explained well, specialized for DB . I remember "The Arts of Programming" mentioned external sort too, from random places, I bookmarked some other external sorting algo: Poly-phase Mergesort, cascade-merge, oscillating sort, I know I don't know them
@tony_saro
@tony_saro Ай бұрын
It's similar to bottom up you can represent the passes as a tree
@alejandroklever
@alejandroklever Ай бұрын
Another great video. I would really like to see the code of this.
@tony_saro
@tony_saro Ай бұрын
It's on GitHub, linked in the description.
@ChrinovicMukanya
@ChrinovicMukanya Ай бұрын
I found my new role model
@edwolt
@edwolt 26 күн бұрын
The page size can be the size the disk read at once (if you try to read a byte, disk will read more than a byte), or the size OS read at once (if I am not mistake, OS also read more to optimize IO)
@tony_saro
@tony_saro 26 күн бұрын
It's at least equal to the file system page size. Otherwise it's a multiple of the FS page size.
@Desam1000
@Desam1000 16 күн бұрын
One of the basic ideas of databases is to provide indexing structures (trees) to improve any search or ordering process.
@tony_saro
@tony_saro 16 күн бұрын
That only works when sorting by indexed columns.
@angelffg
@angelffg Ай бұрын
Great!
@nazarshvets7501
@nazarshvets7501 24 күн бұрын
Great explanation, great video 🔥 One thing I missed tho, how do we choose K?
@tony_saro
@tony_saro 24 күн бұрын
Depends on how much memory you have and how big the page size ends up being. I didn't implement any algorithm that analyzes the current memory consumption to determine K, I just added a default constant value of 4 😂.
@SurfingOnTheGuitar
@SurfingOnTheGuitar Ай бұрын
Great video, so for every order by query, a separate file with the sorted results will be produced? Or does it actually sorts the records in place (in the db files) and keep them sorted for later queries ?
@tony_saro
@tony_saro Ай бұрын
Separate file unless the results fit in memory
@retenim28
@retenim28 21 күн бұрын
impressive animation and knowledge. i don't think there's something similar on youtube about Database, congrats. May you suggest some resources (especially books) about DB? I don't want to be an expert, but i want only have a basic-medium knowledge about what's happening under the hood when I interact with DB. Right now, I am following a course by Andy Pavlo here on KZbin (i don't think it's possibile to paste the link, but it's well known) which looks amazing , but some written resources are appreciated too
@tony_saro
@tony_saro 21 күн бұрын
I did not read any books. The resources I used are linked in the full database video and also in the GitHub repository.
@atackhelikopter4303
@atackhelikopter4303 26 күн бұрын
if you have a limit on the number of results, so you have to show the first k rows sorted in a particular way (so basically the first k rows in an ordered data set), it's better if you made k pages of size(maximum size of a row) and just select the first k rows to be in pages and keep going thru the list till you found the first k this way your algorithm has a complexity of O(n * logk), assuming that you can hold in memory the k pages, and you would use a priority_queue to handle the data, than you can sort it in O(k logk) but that is smaller than O(n logk), whilst that order by algorithm has O(n * logn * log base k (n)) where n is the number of rows in the database also you can get the rows a-b that would be in a data base if it were sorted, you can do the same thing, except you take the k as being b, with the same algorithm and limitation
@tony_saro
@tony_saro 26 күн бұрын
We've already discussed the "LIMIT" thing in the pinned comment. Add your comment there if you want to contribute further to that conversation.
@sebastianfranco1507
@sebastianfranco1507 25 күн бұрын
You could make a kind of hash for the variable data that keeps the order that way you only need to compare the actual variable size data if the hash is equal, you just have to not multiple the hash by a prime number
@tony_saro
@tony_saro 25 күн бұрын
You know any algorithm that preserves order and doesn't cause collisions? When two hashes are equal how do you know if the original variable data, for example strings, were equal or they just produced the same hash value?
20 күн бұрын
You are very good at explaining this complex topic! I love the animations, may I ask what tools are you using to do them? Btw, I would be happily pay for a Rust course from you on Udemy :)
@tony_saro
@tony_saro 20 күн бұрын
I use Adobe Premiere mostly, some other tools like diagrams.net as well
20 күн бұрын
@@tony_saro thanks man, I had no idea that Premiere can be used for animations 😅
@tony_saro
@tony_saro 20 күн бұрын
There's something called "Essential Graphics" within Premiere that you can use, that's how I make my videos.
@cdavidd
@cdavidd 28 күн бұрын
eso es contenido del bueno, eres el mejor crack, si se te ocurre hacer estos mismos videos en español , asi sean de paga, por aqui tiene un cliente! 😎
@tony_saro
@tony_saro 28 күн бұрын
Estos videos son gratis, los hago en inglés porque tienen más público que en español.
@cdavidd
@cdavidd 26 күн бұрын
@@tony_saro esque aun no domino bien en ingles maestro 😅
@valcubeto
@valcubeto Ай бұрын
Perfectamente explicado, y eso que no se me da muy bien el inglés
@LarryGarfieldCrell
@LarryGarfieldCrell 10 күн бұрын
What's your process for making the graphics? They're fantastic. What tools do you use?
@tony_saro
@tony_saro 10 күн бұрын
Adobe Premiere
@poutineausyropderable7108
@poutineausyropderable7108 23 күн бұрын
I feel like if you are just sorting hy something, you can just argsort said thing and then shuffle following that. It skips the problem of changing size.
@tony_saro
@tony_saro 23 күн бұрын
Sorting on disk isn't as easy, you have to avoid random IO.
@SlackwareNVM
@SlackwareNVM 22 күн бұрын
Would it make sense to not keep track of the whole tuple but just the id and the columns used to perform the ordering? That would decrease the size of the data that needs to be transferred and make things potentially faster, but at the cost of having to do another lookup using this new "index" to satisfy the original select statement.
@tony_saro
@tony_saro 22 күн бұрын
That's the problem, if you sort just the row IDs when you do the final lookup using the new "index" you'll have to do a bunch or random IO. Random is slow, you have to do Sequential IO.
@SlackwareNVM
@SlackwareNVM 22 күн бұрын
@@tony_saro Yeah, I've no idea how they compare in speed. Since you know the tuple size from the schema (at least a minimum), and you know how many rows you're sorting, perhaps some cut off point can be calculated (given some approximate constant for random IO cost) where you can switch between implementations.
@SlackwareNVM
@SlackwareNVM 22 күн бұрын
@@tony_saro I did afterwards read the thread in the pinned comment and saw you guys already discussed the topic. Thank you for the easy to follow explanations and for doing some additional passes of the algorithm to make it extra clear.
@Ashley-sd5xn
@Ashley-sd5xn 22 күн бұрын
Did you happen to look into how any open source databases solve the problem with mixed page sizes? I wonder if they did it the same way as you or not.
@tony_saro
@tony_saro 22 күн бұрын
Looked into Postgres but I didn't fully figured out how it works. Their implementation is much more complex than mine obviously. I also looked into SQLite 3 and they use PMA (Packed Memory Array) which a data structure basically designed to deal with this problem, but it seemed to be more complicated to implement than external sorting.
@umikaliprivate
@umikaliprivate 19 күн бұрын
What if you went through the table stored on the disk, and saved the top 100 values, so if you saw a bigger number (you could save the name as a number) than the smallest number in the 100 size array, and then you would move that entry into the array kinda like insertion sort (with a binary search, cuz what you have is already sorted). Then after going through the entire database stored on the disk, you have the top 100.
@tony_saro
@tony_saro 19 күн бұрын
We discussed it in the pinned comment, just forget about the "LIMIT 100".
@cat-.-
@cat-.- 24 күн бұрын
What if I’m ORDERing BY a col that is b-tree indexed? I assume the db wouldn’t need to sort, because it’s already sorted?
@tony_saro
@tony_saro 24 күн бұрын
If that's the only column you sort by then yes, it's already sorted.
@vicsteiner
@vicsteiner 20 күн бұрын
The curious bit is that order by is not even part of the relational data model. Relations are sets of tuples and tuples have no order. Not to deny the need for sorting them.
@tony_saro
@tony_saro 20 күн бұрын
Right, it's just a sorting algorithm.
@user-ur4ev7vl6c
@user-ur4ev7vl6c Ай бұрын
Hey pal! Thank you very much! I wanna to create my own dbms(cloud, embedded and so on) too! If I would have a some result than can I send a repo github under you commentary?)
@tony_saro
@tony_saro Ай бұрын
I think KZbin will flag it as spam if you add a link to your commentary
@sunkittsui7957
@sunkittsui7957 Ай бұрын
Are there any other external sorting algorithms like this?
@tony_saro
@tony_saro Ай бұрын
Yes, SQLite 3 uses something called PMA (Packed Memory Array) to sort on disk. I don't know how it works, didn't research enough.
@cn-ml
@cn-ml 23 күн бұрын
How does this work in combination with pagination queries like ORDER BY ... LIMIT ... OFFSET? Is there any way to optimize this without having to order the entire table every time?
@tony_saro
@tony_saro 23 күн бұрын
We've talked about the LIMIT in the pinned comment, check it out. The OFFSET I'm not sure you can optimize that, it's similar to a limit so maybe you can.
@JonBrase
@JonBrase 29 күн бұрын
Why do databases tend to do their own swapping to temporary files for tables that don't fit in RAM rather than just doing everything in-memory (with a cache-friendly algorithm) and letting the OS's paging facilities handle swapping to disk. Process memory can already be much larger than RAM (with appropriate swap space configured).
@tony_saro
@tony_saro 29 күн бұрын
Because the replacement algorithm is determined by the OS in that case. Databases don't have control over that, and the OS doesn't know anything about databases so they just roll their own optimized algorithms.
@kesocialuwu
@kesocialuwu 27 күн бұрын
No sabia que tenias un canal en inglés 😮
@tony_saro
@tony_saro 27 күн бұрын
Lo sabes ahora jaja
@ProxyDev
@ProxyDev Ай бұрын
Amazing work
@jaime7295
@jaime7295 Ай бұрын
Una pena que en la comunidad hispana no haya interés para este tipo de temas :( La mayoría de gente solo se centra en el desarrollo web, cuando no conocen el fascinante mundo del bajo nivel
@tony_saro
@tony_saro Ай бұрын
Justo, por eso me he cambiado a la inglesa porque esto es lo que me interesa a mí.
@zeusjean
@zeusjean Ай бұрын
@@tony_saro
@danields_04
@danields_04 Ай бұрын
Mucho ánimo con este nuevo canal.
@NathanaelNewton
@NathanaelNewton 28 күн бұрын
I feel like this could take a while 😂
@victorramos3110
@victorramos3110 Ай бұрын
Cómo te volviste tan fluido en inglés? Llevo años con el inglés y no soy ni la mitad de fluido...
@tony_saro
@tony_saro Ай бұрын
Escuchando cómo hablan los nativos y practicando la pronunciación. Una cosa es el inglés que te enseñan en la escuela o academias y otra es el inglés que se habla en la calle 😂.
@iajaydandge
@iajaydandge 28 күн бұрын
Just wanted to thank you for this. In near future, if you have time will you implement redis from scratch with master replica command propagation.
@tony_saro
@tony_saro 28 күн бұрын
I don't know, I won't touch databases any time soon after this project.
@iajaydandge
@iajaydandge 28 күн бұрын
@@tony_saro Well it Iooks like I need to wait
@MrAtomUniverse
@MrAtomUniverse 17 күн бұрын
Are you using after effects?
@tony_saro
@tony_saro 17 күн бұрын
No
@edwinroman30
@edwinroman30 Ай бұрын
Ey there is man named Antonio who does programming videos too, is he your twince?
@tony_saro
@tony_saro Ай бұрын
Nah man he's an AI
@edwinroman30
@edwinroman30 Ай бұрын
Either way, new sub!
@MrC0MPUT3R
@MrC0MPUT3R Ай бұрын
Sir, do you have a license for those guns?
@tony_saro
@tony_saro Ай бұрын
They might be illegal ☠️
@diegoparra8859
@diegoparra8859 Ай бұрын
Volveras al canal de Antonio Sarosi?
@tony_saro
@tony_saro Ай бұрын
Ahora mismo no
@XMaverick20
@XMaverick20 27 күн бұрын
Virtual memory?
@user-ey5xw2nx9s
@user-ey5xw2nx9s 27 күн бұрын
Nah, because too dependent on OS
@julionavarro2600
@julionavarro2600 Ай бұрын
El doble de Sarosi?
@tony_saro
@tony_saro Ай бұрын
Sarosi es una IA
@user-fj9hf4bu9f
@user-fj9hf4bu9f 25 күн бұрын
did you really need to have built a DB to realize you can't assume some arbitrary piece of data may not necessarily fit into memory (ram) ?
@tony_saro
@tony_saro 25 күн бұрын
Yes, I have a negative IQ.
@rayilisto
@rayilisto Ай бұрын
Nos olvidaste, toñito:c
@tony_saro
@tony_saro Ай бұрын
Ya he hablado sobre el tema en Instagram y Twitter.
@baranacikgoz
@baranacikgoz Ай бұрын
I felt myself a way worse software engineer after I saw you ...
@tony_saro
@tony_saro Ай бұрын
This is leaning more towards pure Computer Science than software engineering.
@user-ey5xw2nx9s
@user-ey5xw2nx9s 27 күн бұрын
What so you mean?
@lazarus6983
@lazarus6983 25 күн бұрын
I think you didnt really explain something well because im confused...... You say you need to implement k way merge sort with paging because you cant assume the tuple set fits in memory But based on your explanation, the buffer seems to double each time as the pages are merged, presumably until the # of pages inside the buffer = the number of records / page size.... . Meaning in the kast pass, you will have the final output buffer size (in memory) = the entire result set? Now toure back at the original problem you had at the beginning of the video. That is, for the bulk of the algorithm you are space efficient but in the last oass, the final output buffer will have the entire resultset in memory before writing it to the final output file ..... How do you get around this issue? Are you writing to the disk as you are merging basically?
@tony_saro
@tony_saro 25 күн бұрын
Look at 04:43. Pages you see at the bottom are on disk and the ones you see at the top are in memory. I think it's pretty clear what's in memory and what isn't. You only load as many pages as input buffers you have, that's it, last pass included. The memory buffers never change their size, and in the last pass you can see with the animations that it's still loading 2 pages at a time.
@adriankal
@adriankal Ай бұрын
Wouldn't that be unnecessary if you had index on name which could be already sorted on insert and just take first 100 elements from it? This algorithm was invented when hdds were expensive. Today 1gb is cheaper than one minute of your time.
@tony_saro
@tony_saro Ай бұрын
If there's a BTree index and the query sorts by that index only then yes. Otherwise, if there are 100 queries sorting 1GB at the same time you still need an external sort algorithm, and if there's one query sorting 5TB you still need the same algorithm. All databases have some variant implemented.
@ioannesbracciano4343
@ioannesbracciano4343 27 күн бұрын
Why do you have a Greek accent?
@tony_saro
@tony_saro 27 күн бұрын
It's not a Greek accent, I'm not Greek and I don't speak Greek
Writing My Own Database From Scratch
42:00
Tony Saro
Рет қаралды 192 М.
A teacher captured the cutest moment at the nursery #shorts
00:33
Fabiosa Stories
Рет қаралды 46 МЛН
Gym belt !! 😂😂  @kauermtt
00:10
Tibo InShape
Рет қаралды 17 МЛН
Amazing weight loss transformation !! 😱😱
00:24
Tibo InShape
Рет қаралды 61 МЛН
Smart Sigma Kid #funny #sigma #comedy
00:40
CRAZY GREAPA
Рет қаралды 9 МЛН
Solving one of PostgreSQL's biggest weaknesses.
17:12
Dreams of Code
Рет қаралды 185 М.
10 Nooby Mistakes Devs Often Make In Python
24:31
Indently
Рет қаралды 56 М.
Write a database from scratch | Episode 2, Indexing
7:34
Code With Sep
Рет қаралды 511
WHY IS THE HEAP SO SLOW?
17:53
Core Dumped
Рет қаралды 212 М.
HOW TRANSISTORS REMEMBER DATA
16:58
Core Dumped
Рет қаралды 228 М.
But, what is Virtual Memory?
20:11
Tech With Nikola
Рет қаралды 244 М.
Quicksort algorithm
20:39
mycodeschool
Рет қаралды 1,8 МЛН
A Natural Language AI (LLM) SQL Database - Could this work?
8:52
Naming Things in Code
7:25
CodeAesthetic
Рет қаралды 2 МЛН
Mastering Dynamic Programming - How to solve any interview problem (Part 1)
19:41
iPhone 15 Pro в реальной жизни
24:07
HUDAKOV
Рет қаралды 471 М.
Копия iPhone с WildBerries
1:00
Wylsacom
Рет қаралды 8 МЛН