ETL Architecture In-Depth - Advanced Dimensional Modelling

Рет қаралды 21,093

Күн бұрын

Пікірлер: 14

@DebabrataPatnaik 8 жыл бұрын

Ross ... Its a pleasure watching your presentation on Dimensional Modeling.(I watched all 3 continuously) I believe going through the 3rd Edition of toolkit, though Ralph strongly opposes urges for normalization and believes in dimensional approach for modeling. In my opinion SCD4/SCD5 are definitely an alternate definition of Normalization to tackle Monster Dimensions. Similarly Junk Dimensions/Out-trigger are concept that is indeed supportive to Normalization. I feel both Bill Inmon & Ralph Kimball's approach are widely used. I am fascinated by Dimensional Modeling, I had been involved in projects with Normalization approach and would love to make foundation more solid in Dimensional Modeling. Thanks for all the insightful video tutorials, enjoyed and relished every part of the session. Thanks for providing it to the larger audience via KZbin.

@neelred10 11 ай бұрын

Great presentation. For Ragged hierarchy example of salesperson and managers would be highly relatable. Like sales is attributed to a salesperson and many salesperson fall into a specific manger and then senior manager..

@usmanrahat5254 4 жыл бұрын

Really nice set of presentations. Learned alot, thanks.

@Stelios.Posantzis 3 жыл бұрын

Awesome intro to DW design.

@fishsauce7497 6 ай бұрын

Doesn't Type 5 dimension directing towards Snowflake design of dimension?

@satyak1337 6 жыл бұрын

super helpful and really nice presentation.

@loudravetortoise 2 жыл бұрын

Lead and lag functions can do the same thing for the ragged hierarchy

@videet Жыл бұрын

they can, but adds a complexity interms of querying the data.

@jimtonykiller 8 жыл бұрын

Hi Ross in the recursive relationship, we can filter also on the basis where parent key = child key?

@boganpies 8 жыл бұрын

Do you mean, "can we *safely* filter on ...?" And by *safely*, I mean without risk of picking up duplicate rows in the fact. Yes, you can, but it doesn't make much sense. The purpose of using the ragged hierarchy bridge (in my example, the CUSTOMER_BRIDGE) was to find facts that belong to children (or grandchildren, etc) of the chosen customer(s).By filtering CHILD_CUSTOMER_KEY = PARENT_CUSTOMER_KEY, we get the subset of the bridge that includes only level-0 relations (ie. level-2 = self, children and grandghildren; level-1 = self and child; level-0 = self only).The LEVELS_FROM_PARENT attribute provides this same functionality (LEVELS_FROM_PARENT = 0) in a slightly more intuitive fashion, and it also extends to deeper levels (1, 2, etc).But even THAT is not the best solution. If you're not interested in picking up facts for children and grandchildren etc, then don't use the bridge table at all - just join the fact to the CUSTOMER dimension.

@ParasDoshiBlog 4 жыл бұрын

Great series, thank you!

@jimtonykiller 8 жыл бұрын

Hi Ross just thinking, in your treatment-diagnosis example, during ETL load can't I group the rows to have coma separated diagnosis, which then simplifies the architecture and there is no need of any bridge table. So you have patient joined to treatment fact on patient key and with diagnosis appearing in treatment fact as comma separated values? We also eliminate the exercise of "constructing" possible "groups of diagnosis", to be more close to real world. Also with this architecture then we have no need for weighting factor.... your thoughts please.

@boganpies 8 жыл бұрын

Hi AT. Thanks for the question. Yes, you can indeed have a column containing repeating elements--either comma separated, XML, or some other well-formed scheme for storing such data. But you've only solved one part of the problem - storage. What about querying? After all, that's why we're here.For your comma-separated example, when you want to search for cases with diagnoses including FRACTURE - TIBIA, you can't query on DIAGNOSIS = 'FRACTURE - TIBIA', you need to include wildcards. e.g. DIAGNOSIS LIKE '%FRACTURE - TIBIA%'.This opens up a new world of problems, like how do we treat strings that are a subset of other valid strings. For example, if our medical records system used particularly archaic language, we might record malaria as AGUE, but then DIAGNOSIS LIKE '%AGUE%' might well serve up results of PLAGUE, which would be undesirable.Bridge tables resolve these problems with predictable and intuitive results.