No video

To Surrogate Key or Not...

  Рет қаралды 15,787

Guy in a Cube

Guy in a Cube

Күн бұрын

We've talked about using a surrogate key in your data warehouse whether that's Azure Synapse Analytics or something else. Patrick looks at why you should consider this even if you aren't using a slowly changing dimension.
📢 Become a member: guyinacu.be/me...
*******************
Want to take your Power BI skills to the next level? We have training courses available to help you with your journey.
🎓 Guy in a Cube courses: guyinacu.be/co...
*******************
LET'S CONNECT!
*******************
-- / guyinacube
-- / awsaxton
-- / patrickdba
-- / guyinacube
-- / guyinacube
-- guyinacube.com
**Gear**
🛠 Check out my Tools page - guyinacube.com...
#AzureSynapse #DataWarehouse #GuyInACube

Пікірлер: 27
@ChrisWagnerDatagod
@ChrisWagnerDatagod Жыл бұрын
Surrogate keys are THE most critical piece for good model management. Management of keys pushes you to ensure the proper Schema. Do it! Do it!
@jeffshieldsdev9026
@jeffshieldsdev9026 Жыл бұрын
Implementing surrogate keys (at least in the Power BI model) usually just adds management overhead and increases consumption and refresh time. If your source has unique primary keys, why not use them?
@ChrisWagnerDatagod
@ChrisWagnerDatagod Жыл бұрын
@@jeffshieldsdev9026 Surrogate keys should typically be added upstream in the datawarehouse or the data mart giving you a star schema to source your reports from. If you do NOT have a star schema you are sourcing from, then you should be using surrogate keys to manage dims and facts to prepare for when your source changes, or you add sources to your model.
@eta2001carl
@eta2001carl Жыл бұрын
@@ChrisWagnerDatagod how about you need to allow to user to add like a budget fact table to the model (Copositive model) then they will need to know the SK value of an Item, if the Item Dimension has NK then they don't, just add it and it works.
@ChrisWagnerDatagod
@ChrisWagnerDatagod Жыл бұрын
@@eta2001carl You still want the SK for the core model and then use the NK to the DIm that will have the NK to form the composite model. Do not join the NK to the Fact table.
@eziola
@eziola Жыл бұрын
Incredible! Keep these SQL theory vids coming. This is high-level stuff. Thanks Patrick for continuing to be there for us enterprise level SQL folks. YOU DA MAN
@jeffshieldsdev9026
@jeffshieldsdev9026 Жыл бұрын
On a related note, I've seen users replace string-based natural keys with integer-based surrogate keys in their Power BI data models, assuming the smaller integer data type would be more performant. However, most of the time, the VertiPaq engine will hash these integers, making them strings anyways, so all the user is really doing is adding complexity to their transformations and compute to their refreshes.
@mcnater
@mcnater Жыл бұрын
Do we know this is always the case? Or how are you seeing this in your work? Curious as we tend to make the strings integers but I've seen this noted elsewhere and always wondered if it's true.
@jeffshieldsdev9026
@jeffshieldsdev9026 Жыл бұрын
@@mcnater I don't think this is always the case--but VertiPaq is going to choose HASH or VALUE encoding by its own evaluation. You can try applying encoding hints using Tabular Editor, but it didn't work for me when I tried. I'm able to see the encoding type by connecting to the dataset from DAX Studio and clicking View Metrics on the Advanced Tab.
@mcnater
@mcnater Жыл бұрын
@@jeffshieldsdev9026 got it thanks
@JohnBrown-hx8bq
@JohnBrown-hx8bq Жыл бұрын
Important to understand when to use them, and also when not to. Sometimes its very easy to overcomplicate your models with an insistence on 'incremental' SKs, even when they are not needed or even costly (e.g. in columnar datasets). I prefer the Super Natural Key / Durable Key model, or even better, having natural dimensional fields actually in the facts for the columnar data compression to work with.
@pauloneufneufneuf
@pauloneufneufneuf Жыл бұрын
Nice simple explanation of when & why to use surrogate keys. Particularly the why - too often surrogate keys are presented as being good design always, even when they aren't needed and the natural key will work fine. Side note: One habit that doesn't seem to go away, but has no value vs using the natural DATE is creating a YYYYMMDD integer surrogate key for a date (aka Calendar) table. Bad: The INT has more bytes than DATE - it will cause more IO. So it will be slower, even if only marginally. Bad: It has no integrity enforcement as a valid date. Bad: It is awkward to manipulate as a date - e.g. using date functions. Dates as 4 byte INT has not had value since the introduction of SQL Server 2008, with the 3 byte DATE type.
@baklava2tummy
@baklava2tummy 10 ай бұрын
Thanks, this was interesting to read. We hold all our dates as integers and it’s a pain when trying to manipulate back to date. I didn’t think about the IO being impacted by it.
@eta2001carl
@eta2001carl Жыл бұрын
How about you need to allow to user to add like a budget fact table to the model (Copositive model) then they will need to know the SK value of an Item, if the Item Dimension has NK then they don't, just add it and it works.
@Milhouse77BS
@Milhouse77BS Жыл бұрын
Not my first priority. But you hit on the best reason to use it, if your IDs like CustomerID may change or be augmented in future.
@ChrisWagnerDatagod
@ChrisWagnerDatagod Жыл бұрын
It should be. ;)
@pp_ca_2000
@pp_ca_2000 3 ай бұрын
short and sweet. Thank you 🙂
@Vikasptl07
@Vikasptl07 Жыл бұрын
During regular load then dimensions needs to be loaded first then facts by look-up on dimensions for keys
@AmritaOSullivan
@AmritaOSullivan Жыл бұрын
Brilliant explanation!!!! Thank you
@optimalchoice270
@optimalchoice270 Жыл бұрын
If the data you are processing has or could have more than one source, and you want this representation, the information you are providing, to be used by any other process, then an independent (surrogate) key is needed to represent the composite information. If you were to represent a multi-source dataset with the key from one of its sources, you would be changing the meaning of the source keyfield and adding a processing dependency. I would argue that every dataset should have an independent keyset because change is inevitable in real-world processes. Certainly, for repositories that are intended to be long-lived data warehouses, independent (surrogate) keys should be the default for the reasons illustrated here, among others.
@DanielRodriguesPT
@DanielRodriguesPT Жыл бұрын
Good vídeo, I'm going to share this with my team! Also the new source system could have overlapping IDs (even thou it is varchar it can contain ints)
@fb-gu2er
@fb-gu2er 7 ай бұрын
I don’t think customer id is a natural key. Natural keys are derived from the columns. If the id is chosen arbitrarily, then it’s synthetic
@benshaw2644
@benshaw2644 2 ай бұрын
Amazing video x
@umeshchanchlani7373
@umeshchanchlani7373 Жыл бұрын
Nice video. But i have question. What if customerId was a varchar do we still need to have surrogate key?can you please clarify?
@maherkebaier5954
@maherkebaier5954 Жыл бұрын
Maybe i'm missing some thing, once there is a new line in the customer dimension table how can the fact table get the new CustomerSK?
@wilsonman8661
@wilsonman8661 Жыл бұрын
@@federicozambelli9570 Frankly, I'm on a mission to figure this out myself at the moment so take this with a grain of salt: One solution I've found is to join the fact table back to the dimension table in your data warehouse and bring in only the surrogate key from the dimension table. Seems clunky to me but also I'm definitely still figuring things out! :D
@clairerovic
@clairerovic Жыл бұрын
Love this, thanks 👍
Why Surrogate Keys are used in Data Warehouse
7:03
aroundBI
Рет қаралды 139 М.
Handling MULTIPLE fact tables in Power BI
9:02
Guy in a Cube
Рет қаралды 307 М.
小丑把天使丢游泳池里#short #angel #clown
00:15
Super Beauty team
Рет қаралды 47 МЛН
Matching Picture Challenge with Alfredo Larin's family! 👍
00:37
BigSchool
Рет қаралды 44 МЛН
Embedding with Power BI - What's the difference?
10:46
Guy in a Cube
Рет қаралды 120 М.
Twitter Wars! Natural vs Surrogate Keys
6:30
SQL and Database explained!
Рет қаралды 1,6 М.
Using Microsoft Fabric Workspaces (Power BI and more!)
9:51
Guy in a Cube
Рет қаралды 36 М.
What the French Toast is a Slowly Changing Dimension???
7:33
Guy in a Cube
Рет қаралды 6 М.
Create custom keys for your Power BI relationships
9:44
Guy in a Cube
Рет қаралды 112 М.
Topic 02, Part 06 - Primary Keys and Surrogate Keys
9:34
Dr. Daniel Soper
Рет қаралды 4,5 М.
Working with Slowly Changing Dimensions in Power BI
13:31
Guy in a Cube
Рет қаралды 53 М.