Understanding SQL Auto-Incrementing Identity in 10 Minutes or Less

Рет қаралды 7,027

IAmTimCorey

Күн бұрын

Пікірлер: 39

@andergarcia1115 Жыл бұрын

Thank you, Master. Nothing better than to begin the day learning with you.

@IAmTimCorey Жыл бұрын

You are welcome.

@stuartharrod4007 11 ай бұрын

Excellent explanation Tim. I know there are opinions about not having a column specifically for this (mainly due to security risks relating to the German tank problem), but seems to the most sensible and useful solution to me, as a real data column, e.g. serial number is not always available and cannot be guaranteed to be unique.

@Norman_Fleming Жыл бұрын

Identity columns makes so many things so much easier. One thing you can do with the Seed value is start your tables at different chunks (10k, 20k, 30k) so that while in initial development if you typo your joins, they do not find a hit on the wrong table that you fat-fingered your join against.

@IAmTimCorey Жыл бұрын

Thanks for sharing!

@RiderInHell Жыл бұрын

Thanks, Tim. I work with SQL daily and I've wondered that myself but hadn't yet searched for why did it skip numbers for no apparent sometimes. So, this is why! You learn something everyday! 😄

@IAmTimCorey Жыл бұрын

I am glad it was helpful.

@torrvic1156 Жыл бұрын

Thank you so much for your learning Tim! You told about the most important things regards to Identity (in MySql they logically called it Auto-Implement and I don’t know why Microsoft called it Identity) in a short and sweet manner. I watched many other KZbinrs telling about C# and you are by far the best one of them. It is always like you know exactly what people want to understand.

@IAmTimCorey Жыл бұрын

I am glad it was helpful.

@dcernach Жыл бұрын

Auto-increment columns can be beneficial, but they pose significant challenges in distributed systems or databases. Additionally, when dealing with offline applications, the maintenance of these types of apps becomes difficult with auto-increment columns. In such scenarios, I prefer the utilization of Sequential GUIDs. Despite their larger storage requirements, they can effectively prevent various issues as your system scales or begins to integrate with microservices. Opting for a sequence-agnostic ID seems to be a wiser choice, as it allows different subsystems to represent parts of your data in diverse ways. So, I always go with GUIDs. Anyway, great explanation about auto increment columns!

@IAmTimCorey Жыл бұрын

Few things here. First, Sequential GUIDs negate some of the benefits of GUIDs (they can be guessed, which can be a security issue that is solved with GUIDs). Second, they still have a performance issue compared to an auto-incrementing int when it comes to storage in SQL (index fragmentation is a big issue). Third, the type of structure you are proposing is one that is overkill for the typical application. However, with all of that being said, Sequential GUIDs can be the right choice in that scenario.

@dcernach Жыл бұрын

@@IAmTimCorey Agreed! :)

@realdreg Жыл бұрын

thank you master 2

@IAmTimCorey Жыл бұрын

You are welcome.

@adrgri Жыл бұрын

If the SQL Server stores the records on the disk by the primary key column then why (rarely) does a record with a bigger ID get inserted before a record with a smaller ID? Why doesn't the SELECT guarantee that the records will be retrieved by their PK if you don't ORDER BY anything?

@IAmTimCorey Жыл бұрын

There are a couple of reasons why a record with a larger ID might get inserted before a smaller one in the case of auto-incrementing IDs. First, you can override the auto-increment and put your own value in place. Second, if the SQL Server crashes unexpectedly, it will skip 1,000 ID values (by default) so that it can process any entries in the temp storage. As for why select doesn't guarantee an order without an order by, this is true of all relational databases. It could return the order by PK, but it is not guaranteed. SQL determines how best to retrieve the records for you. It may return the records in parallel, which would mean that it is getting records from more than one spot at a time. If you need a guaranteed order, use order by.

@adrgri Жыл бұрын

@@IAmTimCorey I should have mentioned that those IDs were not separated by a 1000, when I did a SELECT on that table then rows would every time display something like this: [...], 10, 11, 13, 12, so that would dismiss the idea of the server crashing nor was I inserting the ID manually since the identity insert was disabled on that server. However your second point, the one regarding SELECT statement sounds like it would probably be the explanation of this phenomenon but would it matter if there were only like 13 records in that table? Unfortunately I don't have any screenshots of those situations to share with you :( Has similar thing ever happen to you?

@Reellron Жыл бұрын

A Guid is 32 characters of numbers and letters, it's 4 times larger than a regular integer, which slows down queries when you have to join on that key.

@mannetjie3704 Жыл бұрын

also, worse when it's randomly generated and part of a clustered index, as inserts/deletes become slower and slower over time because of the reordering.

@phillismable6303 Жыл бұрын

Oh GUID insert,s I'm having to deal with those from another dev, index's are fragged to hell an hour after I rebuild them.

@IAmTimCorey Жыл бұрын

Yep, they definitely have downsides. The upside of being able to generate the GUID on the client is nice, but it comes with a lot of trade-offs.

@dcernach Жыл бұрын

In fact, GUIDs are 16-byte numbers, only four times larger than 'int' and twice the size of 'longs.' This size increase isn't significant when you consider the exceptional uniqueness that GUIDs offer universally. Even in an huge system ecosystem, encountering a duplicate value is highly unlikely, ensuring that unique keys remain genuinely unique.

@tongyinwang215 Жыл бұрын

I also like identity column and I did in most of my projects 😅. I had some pain experience where the team like primary key but not create foreign key to join tables and enforce update cascade. Because the foreign key is a pain as well when we try to import test data, must follow the table insert sequence to prevent foreign key constraint error. Identity column is simple and not need to worry about all these. My previous company's consultant prefer to use GUID as identity because easier for data migration and replication. It has downside on performance and readability. But good hardware and small system can overcome this. We not always build large records system like banking and social media.

@IAmTimCorey Жыл бұрын

Yep, there are trade-offs with every approach.

@m5s10 Жыл бұрын

Thanks for the video! Shouldn't you switch to guid in case you have more than billions of records? Doesn't guid have more possible values than int?

@IAmTimCorey Жыл бұрын

No, switching from int to GUID because of the numeric limitations is not a reason. Even if you are planning on storing more than 2 billion records in one table, you can use a bigint to store up to 9,223,372,036,854,775,807 record (no, I didn't make that number up - that's from the documentation). But here's the thing - practically no one is storing one billion records, let alone 2.1 billion records. That would be reserved for the largest of businesses. And if you are, there is no way you are going to do so with a database built by a person just learning SQL. You are going to need a team of individuals who are highly skilled to handle that many records. Think about how large of a database that is. I know companies that have millions of records that are struggling to back up their database because the backup takes too much time and the backup size is large enough to exceed their storage limits. Now multiply that by a thousand times. Here's a real-world example: Stack Overflow uses SQL Server. It is one of the most popular Q&A sites on the Internet. In the past year, they had 3.1 million questions asked and 13.5 million comments. At that rate, if we say that the comments are all in one table, it would take 155 years to hit the 2.1 billion records needed to exceed the regular int in SQL (documentation here: stackexchange.com/about ). There is a careful balance to be hit when designing systems. You need to think ahead, but also not too far. Often, people try to develop their systems to handle Google-level traffic patterns when that absolutely isn't going to happen. The problem is that this attention to performance actually has the opposite effect. Trying to prematurely optimize a system can actually slow it down. That means your "future-proofing" (which isn't, actually) is actually doing more harm than good. Coming back to our Stack Overflow example, they serve 1.3 billion page views per month, yet they run their entire system off of four SQL Servers running in two clusters (documentation: stackexchange.com/performance ).

@torrvic1156 Жыл бұрын

@@IAmTimCoreyawesome examples! This really clarifies things.

@waynehawkins654 Жыл бұрын

Thanks Tim, or should I also call you Master, as you are my most watch videos on learning. I know what you showed here, nothing new and only use GUID now in any designed. But I always wondered what SQL would do when it did hit the limit of this value. Is it game over and crash - no more new data entries. Do we have time bomb in the database world.

@IAmTimCorey Жыл бұрын

If we use a regular int in SQL, we have 2.1 billion values (technically twice that many if we make it unsigned). If your table gets that big, the numbering will be the least of your concerns. You are going to have a MASSIVE backup problem and storage problem. For instance, Stack Overflow had 13.5 million comments just in the past year. If that was a consistent number every year, it would take them 155 years to exceed a regular int. The number of applications that grow beyond that size are rather few. But in the case that you indeed did go beyond that number, yes, I believe it would crash. It might roll over to the negative, but I don't believe so. However, if you were afraid of that happening, you could change to a big int. The maximum value of a big int is 9,223,372,036,854,775,807 (signed, so double that if you do unsigned). And no, I didn't make that number up.

@i.eduard4098 Жыл бұрын

Had my first day! Tons of meetings... I am not used to work in a big corporation, never had like 4 meetings a day.... I am a little scared of the tasks... I never did customer support, I have an insecurity I fail everything lol. Having a serious imposter feeling lol.

@IAmTimCorey Жыл бұрын

Take notes. Don't rely on your memory. That will help you progress faster. You will be tempted to just remember things, but that will cause problems. Tackle one problem at a time and do your best. Admit when you do not know something rather than trying to pretend otherwise. You can do this.

@johnnyblue4799 Жыл бұрын

SQL only stores the data in the order of the PK if the PK is clustered. Otherwise it'll store it in the order it's inserted, or in the order of a CLUSTERED index, if that exists.

@IAmTimCorey Жыл бұрын

This is mostly true. "SQL only stores...if the PK is clustered" - It is actually the opposite. SQL only stores the data in a different order if the PK is nonclustered. By default, the primary key is clustered. You have to explicitly turn it off in order to store the data in a different order. At that point, you are changing the defaults and you should know why you are doing what you are doing and what the consequences are.

@johnnyblue4799 Жыл бұрын

@@IAmTimCorey Yes, the PK is clustered by default. I was just nitpicking to the affirmation that the data is stored in the order of the PK. It's stored in the ordered of the clustered index, which, in the vast majority of cases, it is the PK. So while mostly true, it was technically incorrect. The point of my objection is that a beginner might pick up the info and not realize it's not always true.

@shoobidyboop8634 Жыл бұрын

The last and most important thing to learn is, never let the db generate your IDs. Always generate your own IDs (semi-random bigint, not increment-by-one). It's very easy to do, no downside, huge upside.

@IAmTimCorey Жыл бұрын

I think maybe you missed a few points from the video, specifically about performance on insert.

@shoobidyboop8634 Жыл бұрын

@@IAmTimCorey I'd have to see some benchmarks showing the diff, to see if that outweighs the many benefits of generating own IDs. And I'm not referring to GUIDs, but to bigints where 32 MSBs are datetime-since-epoch-in-seconds, and LSBs are random at start of second, then increment thereafter (like how mongo does it, but far fewer bits), so inserts are almost always end-of-list.