How Would You Model This Data? (Example)

  Рет қаралды 4,737

Kahan Data Solutions

Kahan Data Solutions

Күн бұрын

Пікірлер: 14
@KahanDataSolutions
@KahanDataSolutions 3 ай бұрын
Want to build a reliable, modern data architecture without the mess? Here’s a free checklist to help you → bit.ly/kds-checklist
@fury00713
@fury00713 3 ай бұрын
In our company, when we have fields with low cardinality (values less than 30), we’ve created a common codemap table for those types of fields. We then reference that codemap table to a dimension table instead of creating a separate dim_type table for each type.
@Milhouse77BS
@Milhouse77BS 3 ай бұрын
Aka Junk Dimension
@andrewh6951
@andrewh6951 3 ай бұрын
Great video and a fun wee problem to solve that comes up a lot! In my current role's DWH we have a lot of dimensions upon dimensions. From what I can tell in the problem statement, product type is related to product. In a STAR schema, I'd probably keep product type + related fields in the product dim table. If we had to split it out, I'd make a Snowflake style Dim table. In application, I'd go 3NF, for analytics I'd either go snowflake or preserve Star by keeping the product type information in the product table to reduce join clauses. The product ordered is a facet of your transaction, so product id column makes sense in the fact. The product type, however, is a facet of a product, not a transaction. The problem with having a dim table for product type that joins onto the fact table (and not the product dim table) is that the fact table now has to store information about how product and product type are related. If the relationship changes (i.e. a product changes type) this now has to be accounted for in the fact table, where the relationship between those two entities should be ideally managed by the connections between dimensions (that ideally have some historicization element or SCD2 in your pipeline). Adding dimensions to the star should be reserved for entities that are unrelated to each other, this is a case where two dimensions are linked by a grouping concept; the fact table should not have any say in the relationship between them.
@KerneelsRoos
@KerneelsRoos 2 ай бұрын
I agree with this and the video's statements and opinion here. What I'd like to add is to have an efficient code for product type in the dim product, as well as a textual description so that the product_type_code could efficiently be used in large queries, and can always be described more by a simple lookup table (or just the textual description in the dim product itself, which would make it live alongside the historic trail of products). However, in the event that the product type starts to be applicable to additional facts, or if another dimension appears: let's say dim packaging which also depends on the very same product type, then it might be time to create a dim product type in my opinion.
@Milhouse77BS
@Milhouse77BS 3 ай бұрын
Well said. You laid out the “it depends” well.
@stoianandreimircea1509
@stoianandreimircea1509 3 ай бұрын
Thank you from the bottom of my heart are so well done that I believe they should be presenting at the university level or I don’t know. I am just mind blown how well you express some of the tanks and the things that we grapple with every single day. Thank you so much for your videos, they mean a lot. I am now searching if you have a paid course but I’ll tell my manager to pay for.
@KahanDataSolutions
@KahanDataSolutions 3 ай бұрын
Really appreciate the kind words & support! I encourage you to come check out The Modern Data Community (www.ModernDataCommunity.com)
@adam67bree
@adam67bree 3 ай бұрын
Great video as always! Just joined the Modern Data Community :)
@KahanDataSolutions
@KahanDataSolutions 3 ай бұрын
Awesome! See you inside!
@darshakkumarranpariya867
@darshakkumarranpariya867 3 ай бұрын
Any books or sources you recommend to learn about data modeling?? Thank you so much for this type of content 🙏 Really appreciated!!
@pseudounknow5559
@pseudounknow5559 3 ай бұрын
I really appreciate your videos 💯
@InstantBI
@InstantBI 3 ай бұрын
Just FYI Michael. I have published my BI4ALL Base models. YT is censoring how to get them but I am sure you will work it out...
@raram5866
@raram5866 3 ай бұрын
In your proposed solution the product id would not be unique anymore. How would you go about that?
Data Modeling in the Modern Data Stack
10:14
Kahan Data Solutions
Рет қаралды 114 М.
What is Data Mesh?
4:28
Kahan Data Solutions
Рет қаралды 8 М.
The Best Band 😅 #toshleh #viralshort
00:11
Toshleh
Рет қаралды 22 МЛН
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
BAYGUYSTAN | 1 СЕРИЯ | bayGUYS
36:55
bayGUYS
Рет қаралды 1,9 МЛН
You Don't Need to Learn Every Data Tool & Skill
6:46
Kahan Data Solutions
Рет қаралды 2 М.
How I'd Learn Data Analytics in 2025 | 3 Month Plan
11:42
Rohan Adus
Рет қаралды 439 М.
The Missing Piece in Many Data Pipelines
9:55
Kahan Data Solutions
Рет қаралды 6 М.
4 Reasons Why You Should STILL Learn SQL
4:23
Kahan Data Solutions
Рет қаралды 1,2 М.
How to Create a Data Modeling Pipeline (3 Layer Approach)
9:41
Kahan Data Solutions
Рет қаралды 7 М.
Top 9 BEST Data Analyst Certifications (2025)
15:56
Learn with Lukas
Рет қаралды 6 М.
What is Data Pipeline? | Why Is It So Popular?
5:25
ByteByteGo
Рет қаралды 234 М.
Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j
15:01
Coding Crash Courses
Рет қаралды 37 М.
The Best Band 😅 #toshleh #viralshort
00:11
Toshleh
Рет қаралды 22 МЛН