The Problem With UUIDs… And How To Solve It

  Рет қаралды 15,346

Milan Jovanović

Milan Jovanović

Күн бұрын

Пікірлер: 114
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Want to master Clean Architecture? Go here: bit.ly/3PupkOJ Want to unlock Modular Monoliths? Go here: bit.ly/3SXlzSt
@iteospace
@iteospace 5 ай бұрын
Milan, How about "long" Snowflake Id next?
@P1n3apqlExpr3ss
@P1n3apqlExpr3ss 5 ай бұрын
I'd highly recommend Jeff Moden's Guid fragmentation talk that dives deeper into this topic looking at page splits, fill factors, and gradually evolving data. It offers a perspective that is much different to the high count one-off insert/read demo shown here, and should be closer to the reality of many production systems
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Link it here?
@P1n3apqlExpr3ss
@P1n3apqlExpr3ss 5 ай бұрын
@@MilanJovanovicTech kzbin.info/www/bejne/qKe9qICEf92kjNEsi=TJ3KhHMFwows0C_4
@EddieDemon
@EddieDemon 4 ай бұрын
@@MilanJovanovicTech kzbin.info/www/bejne/qKe9qICEf92kjNE
@dotnetfanboy
@dotnetfanboy 4 ай бұрын
@@MilanJovanovicTech kzbin.info/www/bejne/p5e0pYxnbMeDh5Y
@goldenwraft5362
@goldenwraft5362 4 ай бұрын
I believe it is this one kzbin.info/www/bejne/p5e0pYxnbMeDh5Ysi=Ls6A110rSMKR5Qbn @MilanJovanovicTech
@vasiovasio
@vasiovasio 2 ай бұрын
Great video, Milan! The fragmentation difference is Insane! Поздрави от България! 🇧🇬❤️🇷🇸
@MilanJovanovicTech
@MilanJovanovicTech 2 ай бұрын
Пуно хвала! :)
@WolfspiritMagic
@WolfspiritMagic 5 ай бұрын
While everything in the video is true, there is one thing that somehow nobody mentions when talking about these time based ids: Security. In my opinion there are cases where exposing the creation Date/Time of a database record can lead to security issues. For example with these kind of IDs users will always be able to figure out if they have been created before or after another user if they get their IDs. That's sometimes the reason why you use GUIDs instead of auto incrementing numbers in the first place.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Makes sense, but a timestamp leaks less information than an INT/LONG id.
@Whojoo
@Whojoo 5 ай бұрын
Asume I know too little about security. Why would it matter if I as a user know if my account is created before or after an other user?
@WolfspiritMagic
@WolfspiritMagic 5 ай бұрын
@@Whojoo Before/After might not a big issue but it's still data that you might not want to leak. For example if you have a game and User A invites User B to your game, might even spend some money for User B to give them a starting credit and then figures out based on the ID that User B was already member of that game before User A.
@WolfspiritMagic
@WolfspiritMagic 5 ай бұрын
@@MilanJovanovicTech Actually I'm not sure about that. I think it depends and a timestamp might in some cases be even worse. Lets say you have a database of patients for a doctor treating a specific illness. Knowing the timestamp could give people insight into when that person actually got that illness. Or lets say you're looking for another job and join a job site for that. Your current employer finds you on that website. With an ID in itself of int/long (and no other IDs to compare) they won't have much informations (unless the website provides that to them) and you could be member of that size for years. With an ID based on a timestamp they will be able find out that you just recently started looking for a new job. Also most of randomness is based on time. Having the exact creation time in milliseconds with the entry might in some cases help an attacker to limit the amount of bruteforce required. That's for example how they recently hacked a crypto wallet.
@JohnathonSmith03
@JohnathonSmith03 5 ай бұрын
@@WolfspiritMagicstill don’t see how making it harder for users to lie and deceive each other would be a security issue… not to mention it isn’t even reliable at all short scale anyway otherwise time based keys around be guid anyway. So you get a small amount of ordering info which is can think of very few circumstances being an actual security issue. Just makes it harder to lie about sequencing.
5 ай бұрын
This video is really great, and there are so many bad videos on this topic. The only thing I don't think you mentioned (or I might have missed it), is that SQL Server saves and order a UNIQUEIDENTIFIER in a different byte order than C# (and some other databases). That means even with UUID 7, if we use this the UNIQUEIDENTIFIER datatype, there will be fragmentation issues. Storing as string or binary will not have the same type off issues though.
@RobertMcLaws
@RobertMcLaws 5 ай бұрын
That only happens when you have a clustered UNIQUEIDENTIFIER. Create the ID column as NONCLUSTERED and create a CLUSTERED INDEX on the DateCreated instead, and your database will have almost the exact same performance as INT IDs.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
"there are so many bad videos on this topic" - I'm not sure why some videos focused on the how fast the ID creation time is. 🤷‍♂️
@dcernach
@dcernach 5 ай бұрын
Never thought about it. Our DBAs complain a lot about our using GUID as a PK. If I understood correctly, you meant to create a table with an ID as a UNIQUEIDENTIFIER and add a column like CreatedOnUtc. Then, create a CLUSTERED INDEX using both columns, or create a clustered index on CreatedOnUtc and a non-clustered index on ID?
@daveanderson8348
@daveanderson8348 5 ай бұрын
@@RobertMcLaws How about using that ID column as a Foreign Key in other tables? What is the impact of that?
@daveanderson8348
@daveanderson8348 5 ай бұрын
@@dcernach The GUID type column as PK should be non-clustered index. Otherwise the fragmentation of GUID would have negative performance impact when your table has at least millions of rows.
@robadobdob
@robadobdob 5 ай бұрын
I think using UUIDs in databases is one of those topics which probably doesn't have a measurable impact to most of us at the scales we work at.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Probably true
@AdonikamVirgo
@AdonikamVirgo 4 ай бұрын
We've had issues in a 1mil+ row events table; Jeff Moden's talk is verbose, but boils down to not using the defaults - 90% fill factor for random ids causes expensive page splits, clustered PK is a waste and sequential inserts are not the enemy UNTIL you update and grow the row in some way. kzbin.info/www/bejne/qKe9qICEf92kjNE
@iteospace
@iteospace Ай бұрын
guid 10x slower
@robadobdob
@robadobdob Ай бұрын
@ it might be but unless your customers are ringing you up complaining about slow database queries, I wouldn’t be losing sleep over it.
@ucretsiztakipci6612
@ucretsiztakipci6612 5 ай бұрын
Adding Postgress to this benchmark would be interesting. Wher is the repo?
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Was planning to, but didn't have time to add a Postgres benchmark
@ahmettarksahin3777
@ahmettarksahin3777 5 ай бұрын
When we use ULID or UUID as the primary key in the database, should we check whether ULID or UUID has been created before when adding data to the database?
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
No need, they're random enough that you're unlikely to see a collision. And even then, you'd get an exception for duplicate key values.
@christianrazvan
@christianrazvan 2 ай бұрын
But the sql server knows about the different type of uuids? What if I want to sort in the database query and not in code
@MilanJovanovicTech
@MilanJovanovicTech 2 ай бұрын
I didn't test out that part. I think SQL Server does do something strange to UUIDs, and Postgres doesn't
@SKotekar
@SKotekar Ай бұрын
Using an auto incrementing number as the clustering key instead of datetime might be slightly faster
@MilanJovanovicTech
@MilanJovanovicTech Ай бұрын
There is an example of that
@marikselazemaj3428
@marikselazemaj3428 5 ай бұрын
10:45 What are those icons before class names?
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
ReSharper detects they're entities in an EF DbContext
@jodainemoore8300
@jodainemoore8300 5 ай бұрын
Great Videos, Please can you do one on Composition and Or with Strategy Pattern, best struggling with these for months.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Allright
@FranciscoCastellanodePablo
@FranciscoCastellanodePablo 5 ай бұрын
Be carefully with int, it can only support database generation. I have faced problems when your application (not planned in case) began to support external applications that has there own database like mobile applications that need to work with no connection.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
There's also the HiLo strategy
@FranciscoCastellanodePablo
@FranciscoCastellanodePablo 5 ай бұрын
@@MilanJovanovicTech good point, never the less in my experience its better to go with UUID, this Hilo and others strategies that could exists could be needed if UUID creates other issues like performance as mention in the video. But for the majority of the cases I presume UUID it is the better option.
@RomanTurovskyy
@RomanTurovskyy 5 ай бұрын
If using GUIDs introduces less then 2x slowdown I do not see this as an issue. 10x performance penalty would have been a problem, but 2x can be easily covered by the modern hardware for the benefits GUIDs provide.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
True. Most problems in engineering are amplified by scale.
@thedacian123
@thedacian123 4 ай бұрын
Nice,but about database sequences?
@MilanJovanovicTech
@MilanJovanovicTech 4 ай бұрын
Option 1?
@AhmedKhaled-g8t
@AhmedKhaled-g8t 5 ай бұрын
what is the best order to study clean arc
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
There's a million sources, watch them all. But the source of truth is Uncle Bob's book.
@enricoroselino7557
@enricoroselino7557 Ай бұрын
aaaaand V7 still causing fragmentation because big / little endian thing in mssql. But it should be same or similar performance as ULID isnt it?
@MilanJovanovicTech
@MilanJovanovicTech Ай бұрын
SQL Server 😅
@theeusftn
@theeusftn 5 ай бұрын
But why I need to use timestamp in my uuid? i dont got it
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Clustered index
@theeusftn
@theeusftn 5 ай бұрын
@@MilanJovanovicTech interesting
@DavidSmith-ef4eh
@DavidSmith-ef4eh 5 ай бұрын
do they slow down queries and db in general? an eleven length unsigned int seems much more preferable than a 32-length char entry lol. I assume the index size and look up speed are much slower on the latter, on huge databases.
@DavidSmith-ef4eh
@DavidSmith-ef4eh 5 ай бұрын
although, on the other hand, you don't need a created_at column anymore. I can see msyelf using them, for things that require unique ids accross a system
@Rob_III
@Rob_III 5 ай бұрын
You shouldn't store a ULID (or (G/U)UID for that matter) as (var)char but as (G/U)UID. And, yes, a ULID can be stored as a (G/U)UID.
@RobertMcLaws
@RobertMcLaws 5 ай бұрын
Using the NONCLUSTERED UniqueIdentifer + Clustered DateCreated is nearly as fast as integers and has been tested on huge databases with negligible performance impact.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
They have an impact on table/index size, for sure
@casperhansen826
@casperhansen826 5 ай бұрын
It's 16 bits and the main benefit is that it can be generated on the client and written to the database. The problem with GUID and UUID is that the order of them is more or less random so writing it to the database as a clustered index will cause some heavy manipulation to make it work. ULID should solve that problem, still generating unique values in almost sorted order making it possible to just place the new record at the end of the table
@kenwarner
@kenwarner 5 ай бұрын
It's probably just me but having the SSMS toolbars on two rows instead of one stuck out like a sore thumb. I always combine to a single row whenever I have to install it. 😬
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Never bothered me 🤷‍♂️
@vijayarajan-bt5fk
@vijayarajan-bt5fk 5 ай бұрын
15.26 where to get this benchmark...? Shell script;
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
I share the source code on Patreon
@casperhansen826
@casperhansen826 5 ай бұрын
The problem with ULID it is not compatible with UUID/GUID so it will require a lot of changes to make it work in an existing system If you let the SQL server database create the guid it will be generated in sorted order and work just as well
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
SQL Server isn't the only database people commonly use, so we should look at this from a broader perspective. I'll upload a Postgres benchmark soon!
5 ай бұрын
If you use NEWID and not NEWSEQUENTIALID, then you should have the same problem if MSSQL generates it. NEWSEQUENTIALID is closer to what UUIDv7 does, but compatible with the UNIQUEIDENTIFIER type without getting to much fragmentation. ULID is 128 bits, same as UUID so not sure what you mean with "not compatible", there is always the little-big-endian issues when working with UUID/GUID/ULID, so if your working on a bit-level, you can get confused really fast... For the text representation, they are not presented in the same way. But if you store it as CHARs and not BINARY or UNIQUEIDENTIFIER, then you have about doubled the size already... if that's important for your usecase.
@pavolslamka821
@pavolslamka821 9 сағат бұрын
This performance test is not very informative in my opinion, because your tables are empty! I think the point should be that option Table2 should be SLOWER than option Table5, because the entire table data is sorted by random data since it is clustered by random GUID (and you shuffle all the table data on every insert), whereas in Table5 the data is clustered by increasing DateTime, although it has additional NON-CLUSTERED index on Guid (which is shuffled a lot, but is just one GUID so not too large). But because your tables have no additional data, it does not matter whether the index is clustered or not. Of course I could be wrong. If so, I would appreciate a correction ;)
@MilanJovanovicTech
@MilanJovanovicTech 9 сағат бұрын
@@pavolslamka821 What do you mean by empty? Can't recall the exact setup, but there are at least 1M rows per table
@sunzhang-d9v
@sunzhang-d9v 5 ай бұрын
SnowFlake ldGenerator
@DavidSmith-ef4eh
@DavidSmith-ef4eh 5 ай бұрын
lol, thats an unfortunate name. does it contain the pronoun of the id as well?
@Rob_III
@Rob_III 5 ай бұрын
@@DavidSmith-ef4eh Snowflake predates woke stuff by at least a decade; Twitter introduced it in 2010.
@DavidSmith-ef4eh
@DavidSmith-ef4eh 5 ай бұрын
@@Rob_III yeah, read upon it. seems to be used across all the major socials, apparently. must be fastest I guess.
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
Cool, will see if I can include it in some future discussions
@sunzhang-d9v
@sunzhang-d9v 5 ай бұрын
@@MilanJovanovicTech what? feel ,long long ago
@AravindhKumar007
@AravindhKumar007 Ай бұрын
You should have used snowflake id and called it a day 😅
@MilanJovanovicTech
@MilanJovanovicTech Ай бұрын
Not quite
@sunzhang-d9v
@sunzhang-d9v 5 ай бұрын
kubernetes ,Waiting
@MilanJovanovicTech
@MilanJovanovicTech 5 ай бұрын
🤔
Don’t Use UUIDs/GUIDs in Databases. Use this Instead
10:36
Nick Chapsas
Рет қаралды 50 М.
I Built a Neural Network in C# From Scratch. Here’s What I Learned…
18:12
Jaidarman TOP / Жоғары лига-2023 / Жекпе-жек 1-ТУР / 1-топ
1:30:54
OCCUPIED #shortssprintbrasil
0:37
Natan por Aí
Рет қаралды 131 МЛН
Microservices with Databases can be challenging...
20:52
Software Developer Diaries
Рет қаралды 116 М.
The Problem With UUIDs
6:36
Josh tried coding
Рет қаралды 78 М.
Mastering Distributed Tracing with Serilog and Seq in .NET
16:13
Milan Jovanović
Рет қаралды 12 М.
I Asked Microsoft Software Engineers How To Get Hired
10:34
Namanh Kapur
Рет қаралды 390 М.
The effect of Random UUID on database performance
18:51
Hussein Nasser
Рет қаралды 72 М.
Clean Input Validation With FluentValidation in .NET
19:56
Milan Jovanović
Рет қаралды 12 М.
Why is everyone LYING?
7:56
NeetCodeIO
Рет қаралды 374 М.
Generating Unique Identifiers in Your Programs (GUIDs/UUIDs)
21:36
Jacob Sorber
Рет қаралды 27 М.
Implementing the Transactional Outbox Pattern from Scratch
24:27
Milan Jovanović
Рет қаралды 12 М.
UUID vs INT: What’s Better For Your Primary Key?
9:40
Database Star
Рет қаралды 58 М.
Jaidarman TOP / Жоғары лига-2023 / Жекпе-жек 1-ТУР / 1-топ
1:30:54