This is good , i'm on udemy business, do keep more courses open to business plan users! thank you! great videos you have there
@bishalkandel9976 Жыл бұрын
Let's make it bigger.
@barebears289 Жыл бұрын
That's what she said😂😂
@warrenarnoldmusic19 күн бұрын
That caught me off guard 9:40 😂😂😂 damn all indians are unhinged
@AffyisAffy Жыл бұрын
The whole time I was wonder why not just insert on a new index rather than pointing to a UUID, that way it's always ordered. But then you said MySQL defaults to this at the end of the video. That's critical for the 'why' of this video
@tirthwork9288 Жыл бұрын
He said that in the start of the video as well
@berndeckenfels Жыл бұрын
You can use a sequence but keeping them has its own scalability problem, plus the pk won’t be stable
@nalbertcerqueira60792 ай бұрын
He talked about this at 4:45. He explains that MySQL uses the primary key index as the table itself, this implementation is called clustered index. In the case of MySQL you can have only one clustered index per table, because the index is the table itself. The other indexes you create are "common indexes" which stores only the index id and a pointer to the actual data, and this pointer is actually the primary key.
@berndeckenfels Жыл бұрын
Yes when you age them out, they might leave gaps in the index tree but also reuse the gaps regularly. They might not “nowhere near each other” but always in between. Sp with larger pages the splits are less likely. (Of course IOT is to a good usecase)
@ahmedalaaeldin7362 Жыл бұрын
Actually, I was thinking about that question a couple of days ago Thank you :)
@RealEvangelizer Жыл бұрын
Ive used UUID generated outside the database and stored as primary key varchar within the db. Its worked a lot better for over 100 million devices that I was working with. Primary goal was to use it as hashtables when use other systems like redis/dynamo etc. For lookups, with a bit off magic in UUID generation which is not being generated by DB, you can apply Bloom filters as well. I've been out of engineering field for a decade+ but these sort of videos are always much needed to discuss the fundamentals of how things work.
@ddanielsandberg Жыл бұрын
MySQL 8 uses UUID v1 (kind of compatible with UUID v2). By using the UUID_TO_BIN()/BIN_TO_UUID() function with the optional second argument 'swap_flag' it will reformat the UUID before converting it to a BINARY(16). This will make them sequential (since UUID v1/v2 is based on the timestamp).
@someoneanonymous6707 Жыл бұрын
How costly is that operation tho?
@cedricbrisson7240 Жыл бұрын
uuidv1 is not secure tho
@joesb Жыл бұрын
@@cedricbrisson7240 "What security and why should it apply for this case?" uuidv1 has some characteristic that make it possible for someone who want to *intentionally* clash/guess your id, they might be able to. If YOUR OWN service is creating this id, why would you intentionally do it? Don't just parrot that something is insecure. Knows if it applies to your use case and attack vector as well.
@cedricbrisson7240 Жыл бұрын
Jesus chill lol It's simply good practice to avoid using vulnerable services/tools when safer alternatives can be used. As someone on a CSEC team I'd rather pre-emptively protect myself than wait for someone to find a use case for a specific flaw :) But sure, I'm just "parroting". You're the kind that praises himself as a software engineer thinking their above "programmers" right?
@joesb Жыл бұрын
@@cedricbrisson7240 do you think “integer” is secure? Do you avoid using integer? It’s nonsense statement. Uuid v1 is insecure for a certain use case. Using it as an id you generate yourself is not one of them.
@OzoneGrif Жыл бұрын
The worst is using UUID with SQL Server; since the PK is Clustered; performance becomes awful very quickly. I use Postgresql and never had performance issues with UUIDs.
@ShinigamiZone Жыл бұрын
Can you tell me a bit more about this? What means clustered in that case for SQL Server? I just structured a whole DB with PK UUIDv4 in SQL Server but it’s not shipped to prod yet 👀
@ShinigamiZone Жыл бұрын
At least the prod system is not high-performance required I expect around 2000 writes per day and 1_000_000 reads per day And let’s say 10_000_000 rows in total with around 10 tables that are read mainly, 5 more tables for not so much read tables and 2 views (maybe need to be materialized views)
@OzoneGrif Жыл бұрын
@@ShinigamiZone Clustered means physically ordered on the storage. Random UUID used as a Clustered PK (the default on SQL Server) means your records will be fragmented everywhere on your storage, leading to awful performance.
@tempaccount8256 Жыл бұрын
Never thought, Hussein will come with a pun in the video "that's what she said"
@Maman-Setrum3 ай бұрын
me usually used index "order by timestamp", thats why UUID (in this case, just for unique identity) not a problem.
@gam1l Жыл бұрын
This one cracked me up 09:59 😂 Great work as usual 👏 Please consider making a video on Pinecone and vector databases 🙏
@siya.abc123 Жыл бұрын
He made it bigger.
@vasiliynet3425 Жыл бұрын
Awesome! Thanks. Dialectic of randomness and order. Very beautiful.
@redpillsatori3020 Жыл бұрын
Pro tip: watch at 1.5x speed. You're welcome
@sad_man_no_talent2 ай бұрын
thx
@simon3121 Жыл бұрын
Missing the point. A UUID allows an application to create the ID rather having a two step approach of asking the DB for an ID first.
@AnOmegastick Жыл бұрын
Applications can create a ULID.
@saggitt Жыл бұрын
If your IDs are public and you make them sequential, be aware that third parties will know the number of entities created over a time period.
@warrenarnoldmusic19 күн бұрын
Enumeration attack, you can prehash to protect against that
@mikestaub Жыл бұрын
Please do a video on vector databases.
@Notion615 Жыл бұрын
set page fill to 71% and periodically rebuild your index at low fragmentation percent and problem solved for indexes on uuid4
@durgeshchoudhary Жыл бұрын
can you please point to documentation for these? would be helpful.
@ehfoss Жыл бұрын
What shower of a db requires that you rebuild it all the time haha
@Notion615 Жыл бұрын
@@ehfoss set up a service that runs every so often that rebuild your index programatically during periods of low load.
@Notion615 Жыл бұрын
@@durgeshchoudhary To my fellow neckbeards who wish to know the black arts. kzbin.info/www/bejne/oKmQd6iEpZmZhaM
@alegon2007 Жыл бұрын
What about distributed NO SQL DBs (document, key-value, etc) where the recommendation is to use partition keys with high cardinality and avoid sequential values because that will create “hot” partitions for inserts at least? Maybe its a different use case but it would be an interesting topic too
Жыл бұрын
where did you read that? Could you share the article?
@mishikookropiridze Жыл бұрын
Different scenario.
@ehfoss Жыл бұрын
There are 3 new uuids in rfc draft. ulid is equivalent to uuidv7. uuidv8 let's you define your own data. And uuidv6 has a machine key followed by a timestamp akin to ulid which can help with the situation you're describing.
@mishikookropiridze Жыл бұрын
Also possible problem is that ULID doesn't have RFC, so implementation can differ.
@avinashboppudi Жыл бұрын
@alegon2007, that is true for NO SQL DB's as it does not use B+ tree datastructure to store data. If I take DynamoDB as an example, It relies on something like partition key to hash and figure out the partition. This makes sure your single partition is not going to be overloaded with requests. While for sort key, it's recommended to keep it sequential if possible as internally in the shard it must have been using a B+ tree to sort on the sort keys
@consistentlearners Жыл бұрын
Happy Teachers Day Husseinn
@usamaabubakar3061 Жыл бұрын
Will this concept be applied to columns that are Varchar and indexed? Because strings are also random like username, email, url slugs, etc.. If so, what is the workaround for storing string values as indexed?
@shmmh Жыл бұрын
His videos are so interesting but so slow. I can easily watch it at 2x speed.
@kavyadevadiga244624 күн бұрын
What factors led Shopify to favor ULID over UUIDv7? What specific advantages does ULID provide that make it a better fit compared to UUIDv7, especially considering that sorting could also be achieved with UUIDv7?
@anuragbhagsain9724 Жыл бұрын
lowkey "That's what she said" reference at 9:57 😂
@RZhuAmpere Жыл бұрын
10:00 would have never expected that joke from you 😂
@SwapnilSoni3 ай бұрын
BRUHHH XDDD
@vmaxxk Жыл бұрын
I've seen DBs with millions of records across various tables perform just fine with UUIDs leveraged everywhere. Either the performance difference is negligible or it only affects extremely large datasets.
@willi1978 Жыл бұрын
i think so too. usually data has no correlation between each other for example customers in a table. for timeseries data uid v7 can be used
@IvanRandomDude Жыл бұрын
Databases are much more powerful than we think.
@DirkSteinkamp Жыл бұрын
Sorry, but having a B+ tree with a page size of 2 is very close to degenerate the B+ tree to a binary tree -- and binary trees either need to be rebalanced regularly, or might degenerate into a linear list. Rebalancing is an expensive operation, especially if it involved a secondary storage like HDD / SSD. And that's what B+ trees are optimized for: to minimize HDD access, leveraging having some leeway for inserts and deletes without "reshuffling", and generally keep the tree depth low. This requires the page size to be considerably larger than two, and be harmonized with the block size on the disk. Besides that, different DBMS might offer different variants of indexes, e.g. hash indexes, that are not implemented as trees, but as hash tables. Depending on the use case this might be the index type to choose, and your argument becomes void. I personally don't like UUIDs very much in many use cases, but for other reasons: first is they are not easily graspable for humans, they don't tell a story, don't have some inherent use case information in their structure, ... yet they are very useful when randomness is required, because the systems involved only synchronize their data eventually. But besides that: too much cognitive load ... Anyhow: I don't buy your argument, sorry.
@amrojjeh Жыл бұрын
Hash table would still suffer from performance as it would have to load from disk if inserts aren't stored close to each other
@energy-tunes4 ай бұрын
it was just for an example dumbass we all know what b trees do
@malibudancer8472 Жыл бұрын
This is very narrowly scoped on dbs that don't optimize for distributed cases...
@SmartK8 Жыл бұрын
You never had to distribute databases into multiple locations, that insert independently and then access global data. I use UUIDs for all the data tables. You have 100 databases with millions of IDs. Just insert independently whenever you want, the worst thing that can happen is a rare collision hypothetically (never happened to me yet). They will still be OK with UUIDs. Now imagine this with auto-increment integer. Insanity. How would you keep track of this auto-increment ID and ensure it's unique across all databases, other then synchronizing them. Auto-increment integers as a key are for script kiddie databases, when you have one local database. I still wouldn't use integers even then. Because you never know, whether you'll need distributed database later.
@ConsuelaPlaysRS Жыл бұрын
"Auto-increment integers as a key are for script kiddie databases" I'm willing to bet the house that most tables in most databases powering most applications use auto-incrementing PKs lol
@SmartK8 Жыл бұрын
@@ConsuelaPlaysRS That doesn't automatically mean there's a good reason to have such keys. Also most databases are probably local file databases. Everyone can do whatever they want. I'm just saying after some gathered experience, you realize that little gain in being sequential is worth sacrificing for a more robust and secure database.
@shubhambravo Жыл бұрын
I work for a major b2c org and we use auto increment IDs to store order info 😅
@Ryan-gf1sz Жыл бұрын
@@ConsuelaPlaysRS While I agree that's true and it's convenient in many case,it just won't work in large distributed system.
@omniphage9391 Жыл бұрын
for the longest time you wouldnt distribute your databases if you could avoid it, i know that sql sharding exists but that is something you would have to build your persistence layer around. With the rise of nosql databases this has changed, although that in turn also has implications for design. Its engineering. The perfect solution is the one that works.
Жыл бұрын
does it mean that, generally, indexing on a random value column is hurt?
@fullstackdotnet Жыл бұрын
it is, but dont start avoiding it and enforcing new guidelines yet, wait till you reach shopify level scale where you can benefit from that sort of performance improvement. its all about ROI
@tmbarral664 Жыл бұрын
Hum…Hussein, I’m a kind of surprised here…. To a point I maybe doubtful…. Here you’re showing the ordering of the whole rows ( as stating item as key value) when we’re talking about indexes (indices). Indexes are ordered (by nature ;)) and are pointing to the value, no matter where the value is. But here, you’re saying the primary index, the one for the primary key, is not behaving as a normal index…. If that’s the case, that’s a major (I should use uppercase here) problem. That means that we shouldn’t be allowed, in any case(uppercase again) to use anything but autoincrement number ! See, that’s so big it makes me doubt.
@berndeckenfels Жыл бұрын
For clustered index, also called IOT the value is in the index - however even if the value is not there you need to shuffle them
@ankurdutta3277 Жыл бұрын
I learned this the hard way. In mongodb I keep getting 'document not found' even though it does.
@AbhishekSingh-pu6dg Жыл бұрын
The silent joke at 10:00 That's what she said. Haha, classic.
@therealtuyen Жыл бұрын
I just randomly stumbled upon your video, and I've always had this itch about UUID performance. Your video really boosted my confidence in what I was thinking. Thanks a lot! Oh, and quick question: Do databases like MariaDB, MySQL, or PostgreSQL automatically play nice with ULIDs? And how do they know to sort 'em out for indexing?
@nirnullz Жыл бұрын
Postgresql, mysql 8++, mariadb 10.7++ support uuid internally. Postgresql have best implementation since it’s saved as 128bit binary but can be accessed as string without custom function on client. MariaDB/MySql client need to call certain function if you want to query a binary uuid field.
@therealtuyen Жыл бұрын
@@nirnullz I asked about ULID, bro ?
@PabloGnesutta Жыл бұрын
What about NOSql databases like mongo db?
@Iandavidandrino Жыл бұрын
I was just thinking about this 😵
@utkarshsaxena8034 Жыл бұрын
How about when we go ahead and distribute the data. Won't this make the last shard hot all the time?
@xD-saleem Жыл бұрын
thanks so much its make sense now
@jameswestbrook5709 Жыл бұрын
how about ULID thought?
@mardix Жыл бұрын
ULID leaks timestamp. For sensitive data, you may not want to expose this info.
@julianpurse956 Жыл бұрын
What about database performance of using a uuid stored in varchar column vs new db versions which support a native uuid type
@jacob_90s Жыл бұрын
Doesn't change the problem, and in fact you'll probably get even worse performance since you'll have a less compact data type.
@alfrendosilalahi6652 Жыл бұрын
Any one can explain about what is page in that explanation? i don't really understand
@muratkhan1453 Жыл бұрын
it is like a file that DB put indexes in. Lets say you want to get a record from DB with id number 80. if indexing in pages were not used it would have to scan entire DB to find where a record with id=80 saved. This wont be an eficient way of getting data. this is actualy how file system work. In DB it uses indexes that points which sector that record are saved. And pages are basicly where those indexesare saved and its pointers are where actual data/record are saved. So DB first goes to pages and finds index. then looks for the exact record from where index is pointing. Long story short pages are phonebook for DB...
@jwbonnett Жыл бұрын
UULD is not 100% accurate, the issue is the datetime ticks, not every generation, generates a tick. So you can still have this shuffling, though it won't be to the same extent.
@michawhite76137 ай бұрын
If you need an index on usernames, won't you have the same problem?
@MrAtomUniverse Жыл бұрын
For postgresql, why would anyone use UUID , SERIAL is the default method for incremental ID right?
@dz5483 Жыл бұрын
Perhaps you don't want to reveal the IDs of your records such that it becomes easy to query your records incrementally. This could be a security risk in API available publicly
@gamemoves2415 Жыл бұрын
@@dz5483I'm thinking when you get the record from the DB on the front end rather you show the UUID.
@gamemoves2415 Жыл бұрын
@@dz5483I'm thinking when you get the record from the DB on the front end rather you show the UUID.
@sasonbraha4599 Жыл бұрын
I remember reading about an incident in GithHub involving auto-incremented counters: "For example, in one incident at GitHub [13], an out-of-date MySQL follower was promoted to leader. The database used an autoincrementing counter to assign primary keys to new rows, but because the new leader’s counter lagged behind the old leader’s, it reused some primary keys that were previously assigned by the old leader. These primary keys were also used in a Redis store, so the reuse of primary keys resulted in inconsistency between MySQL and Redis, which caused some private data to be disclosed to the wrong users." Kleppmann, Martin. Designing Data-Intensive Applications (p. 250). O'Reilly Media. Kindle Edition.
@MrAtomUniverse Жыл бұрын
@@dz5483 in this case, people use slug which is unique , otherwise it's like youtube watch?v=qRv5g5MEF2k , let me know i anyone has a better method.
@usun_development3947 Жыл бұрын
So what are we supposed to use as a primary key identifier in MySQL, if we use distributed database setup?
@ferrysuhandri Жыл бұрын
using UUID for distributed database is fine. the video is about using identifier in general case. using distributed database is a special case.
@VTOLfreak Жыл бұрын
And now you get locking contention on that last page. Fix one problem, create another.
@AnkurVermachords Жыл бұрын
I am confused at the statement that indexes are b+ tree and it gets indexed and it gets balanced but if gets balanced isnt the order gets messed up...i am a bit confused 😐 🤔
@Aspartame12 Жыл бұрын
The tree gets balanced by shifting which node is treated as root , balanced binary search tree concept.
@AnkurVermachords Жыл бұрын
@@Aspartame12 but if change the root isnt the root becomes the first node then where is the order still same confuision
@Aspartame12 Жыл бұрын
@@AnkurVermachords order in a balanced BST is never there, it's for search only , in a balanced bst search takes less time compared to unbalanced that's the whole premise, it also supports less than and greater than queries easily.
@AnkurVermachords Жыл бұрын
@@Aspartame12he mentioned the ds to be ordered at 3.40 .. aaaaahhhhhhh i am confused now, had the idea of tree now doesnt have the idea of plant as well😂
@Aspartame12 Жыл бұрын
@@AnkurVermachords Okay, so a BST is ordered in the sense that there is a order in which the nodes are arranged left child is smaller right child is bigger and so on, if you balance the tree your root value changes , that means you do not have ready access to smallest value or largest value, but the middle value ( approx ) , so while searching for something you start from middle value ( root ) and go left or right to find it. You can also write a simple recursive/stack based code to print contents of a BST in ascending sorted order or descending sorted order.
@CaimAstraea Жыл бұрын
They never made sense to me. There's no force powerful enough in the universe to convience me I need them over autoincremented integers.
@CaleMcCollough Жыл бұрын
Great info. I need a tutorial on how to do this with PlanetScale. I ordered your Udemy class. You should add it to that.
@bernard-ng Жыл бұрын
I messed up 😂😂😂 gotta talk to my CTO as soon as possible
@mitchdigitalnetwork Жыл бұрын
😂😂😂😂 Seems to be a little bit small, so let's make it bigger............ that's what she said 9:56
@ДаниялбекДаниялов-у3ч Жыл бұрын
so what about storing uuid in hashmap index?
Жыл бұрын
if a B-tree index on UUID is hurt, will the hash index on UUID be better?
@ferrysuhandri Жыл бұрын
hash index for secondary index is possible, but for primary key not all systems can set it with hash. also, uuid is large, we should only use it as primary key when we really need.
@amrojjeh Жыл бұрын
My guess is that it'd be better but would still suffer because inserts would be stored far apart, and so you'd still have the caching issue
@michawhite76137 ай бұрын
Updates are slower on hash indices in the worst case. When a bucket becomes full, you now have to update every single bucket, since ID % (number of buckets) is now resulting in a different value for every single ID
@hipertracker Жыл бұрын
Using numbers as the example is misleading. UUID are not ordered by design so you cannot sort them. I do not understand why the index has to be sorted? What about hash indexes?
@berndeckenfels Жыл бұрын
You can sort them by comparing their raw byte or hex string representations, both is used depending on your db type
@hipertracker Жыл бұрын
@@berndeckenfels yes, but it doesn't make sense, you have no control over the sort order.
@berndeckenfels Жыл бұрын
@@hipertracker for an index which is supposed to find a specific UUID the sort order does not matter as long as it can be used to find it.
@michawhite76137 ай бұрын
Hash indices are even slower to update. Instead of updating a few nodes, you now have to update every single bucket, since ID % (number of buckets) is now resulting in a different value for every single ID.
@banterfc2073 Жыл бұрын
Why not use auto incremented IDs instead
@michawhite76137 ай бұрын
My understanding is that's not a problem in-of-itself. But when you have another vulnerability, this will make it way worse. For example, say I found a way to see a user's phone number. I do this for user 1, then user 2, etc. I can run the attack on every single user rather quickly. If the IDs are random, then I have to guess and check user IDs, and I won't be able to affect as many people.
@experimentalhypothesis1137 Жыл бұрын
does this apply also for uuid5?
@philheathslegalteam Жыл бұрын
Yes. UUIDv6 is the implementation which is proposed to fix this
@BrianWoodruff-Jr Жыл бұрын
I get the sense you hate indexed random UUIDs in databases.
@yadneshkhode3091 Жыл бұрын
👍
@adarshkr532 Жыл бұрын
did he just make a that's what she said joke at 10:00 😂
@shubhambravo Жыл бұрын
Use escalidraw
@IAjayMukhiya Жыл бұрын
the speed of the video is 1.25x
@RaduCruceru Жыл бұрын
All talk without showing numbers. You can theorize on how the index performs all you want, but you should take the time to run some performance tests before-hand and present the data. Otherwise this video is just basic guessing and a waste of time.
@nbme-answers Жыл бұрын
I appreciate the conceptual overview
@AtomicCodeX10 ай бұрын
What you expect? Watch him calculate how long it takes to conclude which method they use? You do realise he’s talking about how it works, a performance test won’t help with that. If you don’t have time to learn and just need to know which is faster, then just do that. Instead of shutting on other peoples work.
@warrenarnoldmusicАй бұрын
You can analyse algorithms analytically not just statistically. You can request for a follow up video instead of discrediting the hypothesis and the author
@ArunGordon Жыл бұрын
That's what she said 🤭
@Cthulch Жыл бұрын
Do you need to have your index ordered? A linked list? TERRIBLE. A tree? gosh! I always imagined random uuids would be utilizing hash table indexes, which is completely useless to sort. Using random numbers with a sorted index seems like a pretty dumb mistake. I feel like when you say "index", you don't even consider that there's more than one index type.
@michawhite76137 ай бұрын
Hash indices are really fast for select queries, but they make the updates even worse. If a page gets full, now instead of just updating a few nodes, you have to update every single bucket, since the ID % (number of buckets) will result in a different value now.
@Cthulch7 ай бұрын
@@michawhite7613 If it was a student who wrote the hashing lib then yes. In SQL, however, it's so well optimized that the page overflow happens extremely rarely and it's not as dramatic as it may seem. Plus in vast majority of cases it's totally worth it to have inserts longer if it means having O(1) for reading. And the complexity of the inserts is also linear or at least log(n), so insignificant.
@andrewbrown846311 ай бұрын
There is a big difference between a database that understands a uuid type and is storing it as a number compared with a uuid stored as a string (varchar). Ultimately though using either a int64/bigint or uuid is the right way to create a primary key for a table. Never should a user entered value be a primary key. Otherwise the rest of this video is mostly garbage or irrelevant in the scheme of things when it comes to performance in a database.
@kraigochieng6395 Жыл бұрын
10:02
@solitary200 Жыл бұрын
ULID > UUID
@silicalnz Жыл бұрын
you've never worked with distributed databases 😂
@roy20021 Жыл бұрын
can you elaborate?
@voicevy3210 Жыл бұрын
@@roy20021you need uuid to handle data migrations, data should be independent of where it's located
@malibudancer8472 Жыл бұрын
@@roy20021 he is correct.
@roy20021 Жыл бұрын
@@malibudancer8472 ok, can you elaborate a bit more?
@malibudancer8472 Жыл бұрын
@@roy20021 I don't really have time to spend time on a stranger on youtube, and you can't really post links on youtube without getting your content deleted. So I can only guide you what to google for. Read about the partitioner of Cassandra and the hash function used. Search for partitioning hot spots and educate yourself about it. Also all relevant databases optimize their indexes for truly random UUIDs and allow tuples to be indexed. That is all I can do for you, if I could I would have posted you direct links instead. Thank youtube for being a P**e of s***.
@DavisTibbz Жыл бұрын
What about turning the uuid to binary(16) ?
@mmm-ie5ws11 ай бұрын
why do you have to make this a 20 min video???
@ttst3 Жыл бұрын
10:01 we said it at the same time 😂
@jamesh4129 Жыл бұрын
First!
@JohnCDSMB Жыл бұрын
Congratulations
@gradientO Жыл бұрын
Incredible achievement
@karasira2696 Жыл бұрын
Wow
@franciscos.2301 Жыл бұрын
@@gradientObrought to you by the Honda Gang
@opticalmouse2 Жыл бұрын
You're like my college professor who lives in a fantasy world.
@rollinOnCode Жыл бұрын
how do you detect and troubleshoot bad page splits?