Data Mart Design - Populate Dimensions and Fact Tables

  Рет қаралды 69,697

Clinton Daniel

Clinton Daniel

Күн бұрын

This demonstrates the ETL process involved in populating the Crime Data Mart Dimensions and Fact Tables.

Пікірлер: 56
@poptrt
@poptrt 3 жыл бұрын
Thank you so much, I looked for just this kind of walkthrough, tutorial on this subject; you spoke so clearly and directly and stayed on topic that it was easy to follow. Thanks again, great job, I like the usage of staging tables I use that alot as well!!
@jahangirkabir9271
@jahangirkabir9271 5 жыл бұрын
Great video! thank you. Please continue uploading.
@jesflo4
@jesflo4 3 жыл бұрын
Very clear and straightfoward, thank you for sharing your knowledge!
@philippr.3946
@philippr.3946 3 жыл бұрын
Great Job on this video I am trained at SSMS, SSIS and SSAS seperately but putting it all together to get a DW from a source was a missing link for me. Keep up the great work :-)
@arturocontreras963
@arturocontreras963 Жыл бұрын
Thank you for all the information shared, I’ve just finished the course project. Thanks!!
@anhngo9039
@anhngo9039 5 жыл бұрын
this is spot on. thank you so much!
@ronaldmudenyo6281
@ronaldmudenyo6281 2 жыл бұрын
Best video ever. You just helped me complete my project. Thanks for the video
@abrammakgalemele3491
@abrammakgalemele3491 5 жыл бұрын
Tx Clinton. You are super! Spot on like Clint Eastwood's shooting!
@pedrobarrocas8673
@pedrobarrocas8673 2 жыл бұрын
You are the man, thank you so much! You are saving Panda Jardins
@godfreynyirenda470
@godfreynyirenda470 5 жыл бұрын
Thanks Clint, very well explained. Do you have follow up videos on multi dimensional analysis?
@tincanpete
@tincanpete 2 жыл бұрын
That was really helpful, thank you!
@Jackie.1001
@Jackie.1001 3 жыл бұрын
Best tutorial of data marts in youtube
@roya75
@roya75 5 жыл бұрын
thank you this was very helpful
@sandipbaidya2839
@sandipbaidya2839 2 жыл бұрын
I really enjoyed watching this video ::)
@asasas617
@asasas617 3 жыл бұрын
Thanks for the detailed video
@ouaildjaidja1447
@ouaildjaidja1447 2 жыл бұрын
thank you so much !
@timovskyt8520
@timovskyt8520 5 жыл бұрын
Thank you, helped a lot, very grateful, any videos on updating the fact table through querying in siss. Eg example if you were to add to the datawarehouse a new dataset and had to insert new fields into the fact table and then update that in visual studio. Is this difficult?
@loinguyen9331
@loinguyen9331 4 ай бұрын
thank you so much
@chihhowbong8460
@chihhowbong8460 4 жыл бұрын
Very good.
@JanUnitra
@JanUnitra Жыл бұрын
is it a good way of doing warehousing ? I always hear that the staging should be raw, no calculation, no changing etc. but you added surrogate keys ?
@ajeyarjun8718
@ajeyarjun8718 9 ай бұрын
This is Great
@clintondaniel5285
@clintondaniel5285 4 жыл бұрын
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402
@mohamedkhalilmadhkour9416
@mohamedkhalilmadhkour9416 3 жыл бұрын
thanks a lot
@tanat2247
@tanat2247 4 жыл бұрын
thanks you,
@datafreak911
@datafreak911 3 жыл бұрын
Thanks for this nice video, would you do another one for how to do this project from beginning
@MasterCamus
@MasterCamus 3 жыл бұрын
Great video. What’s the video 1?
@chamathhasthuka
@chamathhasthuka 3 жыл бұрын
Thank you Sir. Could you please update more videos. And one suggestion please create playlists
@jitenderreddy8605
@jitenderreddy8605 3 жыл бұрын
Hey Daniel !! Its a nice video can you please share the Database and queries to practice.
@joyo2122
@joyo2122 2 жыл бұрын
you normally dont update staging tables or merge any id
@ericsos101
@ericsos101 2 жыл бұрын
Such a wonderful project approach on data mart.Can you please share the dataset because i would love to follow through on hand on ...it will be much appreciated./.Thx again in advance
@vayunandu
@vayunandu 3 жыл бұрын
Neat!
@learnenglishwithcumaali2883
@learnenglishwithcumaali2883 3 ай бұрын
Why would you keep all the values in a single FactTable? Is not a FactTable suppose to keep only agg values in order to limit the storage and the cots of DWH?
@keerthihulagur55
@keerthihulagur55 4 жыл бұрын
Nice
@sandeeppai25
@sandeeppai25 3 жыл бұрын
Hi, thanks for the tutorials. Can we include business logic or calculation in ETL while loading data from staging to fact ?
@clintondaniel5285
@clintondaniel5285 3 жыл бұрын
Yes. For example, if you had a specific business formula, such as an interest rate calculation, you can calculate this value as a form of pre-processing during the ETL. If the pre-processing calculations are really complicated you can load the data in a staging table, then execute calculations using SQL or a scripting language (such as C# or Python). In other words, if there is a lot of computing involved, you do not want to perform this in one single ETL process. The calculated data can then be saved to another "secondary" staging table temporarily. Then, you can move the data from the "secondary" staging table to the final FACT table. I do this type of work a lot in staging tables. This would add some separation of pre-processed computing vs. the normal ETL processes.
@sandeeppai25
@sandeeppai25 3 жыл бұрын
@@clintondaniel5285 Thank you for quick response helped in clarifying doubt.
@HumairDawood
@HumairDawood 4 жыл бұрын
Quick questsions, we made Dimensions in previous videos and made tables to layout our data correctly and all those tables were empty. My question is how did you import the CSV file for Staging table in the beginning.
@clintondaniel5285
@clintondaniel5285 4 жыл бұрын
Take a look here: kzbin.info/www/bejne/jofQpGOJdpJpjtU
@HumairDawood
@HumairDawood 4 жыл бұрын
@@clintondaniel5285Thank you Clinton. Currently doing my Coursework from your videos and it took me almost 200 hours of videos lol. I still need to do OLAP CUBE and run some Queries such as; Show the top 5 Facilities and the total weight in TotalTonnes for each material in each of the four quarters e.g. April-June, July-September, October- December and January-March. ii. Rank the top 10 Materials and the TotalTonnes of waste processed by each of these for the 2016-17 year and for each of the two half-years April- September and October-March. For the second half year period show the number of places each Authority has moved (up/down the ranking) since the first half year period. iii. For each Authority show the WasteStreamType, TotalTonnes reported for the year, order the output in descending order of TotalTonnes. iv. Develop a chart that shows the monthly TotalTonnes processed for each OutputProcessType. Your answer should allow selection of the outputProcessType and the Authority in which the waste originated. Till Now I have loaded my tables USING your videos but still way to go. Ty so much for your videos
@josuearana8756
@josuearana8756 3 жыл бұрын
Hi, thanks for you vídeo si very good and I appreciate your time here. I would like to see how build without etl tool but you show all those queries and also Understood very well thanks
@muhammadtanveerislam5998
@muhammadtanveerislam5998 4 жыл бұрын
Can you give the data source link?
@Ryanspjut
@Ryanspjut 4 жыл бұрын
Question -- I noticed that in your fact table you have each crime category (MurderAndNonEgligentManslaughter, ForcibleRape, Robbery, AggravatedAssault, etc.) each in their own column. Would you agree that this data set would be better if it were stored unpivoted? So that there was a new column called "CrimeCategory" and that column was populated with the values of MurderAndNonEgligentManslaughter, ForcibleRape, Robbery, AggravatedAssault, etc., along with another column called "NumberOfCrimes"? This way a user could ask a question like "how many total crimes were committed?" and the system could aggregate a single column (NumberOfCrimes). I understand that this data set is just being used for illustration, but maybe you can touch on when to convert columns to a single dimension and when to leave them as individual columns. Thanks.
@clintondaniel5285
@clintondaniel5285 4 жыл бұрын
Typically, in my experience, you would want to store the measurable FACTs as separate columns because they can have a different context. Yes, they are all crimes. But, someone may want to establish their own context on each column and would require the fields to be separated. If the context of a data analyst is to Pivot the columns so that they all fall under "CrimeCategory" and "NumberOfCrimes", then this could be easily done in a reporting tool such as Excel, Tableau, or PowerBI. But, there really is no right or wrong answer to your question. I typically architect my data marts to account for maximum context and allow for a user to define their own contextual rules.
@Ryanspjut
@Ryanspjut 4 жыл бұрын
@@clintondaniel5285 Great info!
@sakshamjinsi6731
@sakshamjinsi6731 3 жыл бұрын
Hey, the relationship between state and city and all of these facts and dimensional tables, can these be considered part of a star schema design?
@clintondaniel5285
@clintondaniel5285 3 жыл бұрын
Yes, absolutely.
@clintondaniel5285
@clintondaniel5285 3 жыл бұрын
Also, depending on whether you build a hierarchy between state and city you can also consider a snow flake design
@mayankchaubey5917
@mayankchaubey5917 4 жыл бұрын
For newly create Data this looks good. Can you let us know how it will work for updated records like Type1 and Type2 Attributes
@clintondaniel5285
@clintondaniel5285 4 жыл бұрын
Good question. This is a common operation sometimes called an "UPSERT". Essentially, you check if data associated with an ID has changed. If it has, you execute an UPDATE. If the ID does not exist, then you INSERT. But, that's easy for me to just say vs. demonstrate. I plan on doing a video that addresses this scenario sometime in the future.
@brianaelia
@brianaelia 3 жыл бұрын
@@clintondaniel5285 is this more like a merge command in sql?
@clintondaniel5285
@clintondaniel5285 3 жыл бұрын
@@brianaelia Not really, there is a high cost to the MERGE command depending on the number of records. This is more of a methodology to track and update records.
@quinnhogan545
@quinnhogan545 11 ай бұрын
Hello, great video. I know it's been a while, but can you perhaps speak on how you would take your data from a staging db and insert it into your fact table after all your dimensions are built, but this time you do not have any IDs for the dimensions in your staging table?
@clintondaniel5285
@clintondaniel5285 11 ай бұрын
I will be posting an updated version of the data mart design process using SQLite later this year. There are a lot of examples where the data mart design is useful for localized analysis/data science projects rather than enterprise scalable or service oriented deployment. Instead of using a Microsoft tool, I will be showing how all of the data engineering (ETL) is done using Python and SQLite. I'll see if I can include a part of the FACT table example where the ID is not available in a staging table.
@astralvolt6309
@astralvolt6309 Ай бұрын
Can you share me your previous videos
@ericsos101
@ericsos101 2 жыл бұрын
Hi do you have the dataset?
@rohitnath5545
@rohitnath5545 10 ай бұрын
amazing snowflake schema video. no BS
Data Mart Design - ETL Process to Populate Staging Table
25:42
Clinton Daniel
Рет қаралды 39 М.
Data Mart Design - Table Construction
15:04
Clinton Daniel
Рет қаралды 16 М.
KINDNESS ALWAYS COME BACK
00:59
dednahype
Рет қаралды 151 МЛН
How Many Balloons Does It Take To Fly?
00:18
MrBeast
Рет қаралды 135 МЛН
Survival skills: A great idea with duct tape #survival #lifehacks #camping
00:27
LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION
00:15
Dimensional Modeling
53:54
Bryan Cafferky
Рет қаралды 165 М.
OSI Model with Practical Demonstration
23:14
Clinton Daniel
Рет қаралды 1,2 М.
129 How do you load a fact table
32:14
Learn SSIS
Рет қаралды 10 М.
Dimensional Modeling - Declaring Dimensions
55:32
Pragmatic Works
Рет қаралды 24 М.
Python Flask development with SQLite
46:23
Clinton Daniel
Рет қаралды 19 М.
SSIS - Loading Dimensions Tables
57:02
Pragmatic Works
Рет қаралды 33 М.
Visual Studio 2019 integration with Git and GitHub demo
11:16
Clinton Daniel
Рет қаралды 22 М.
SSIS Design Patterns for Loading a Data Warehouse
1:01:14
Pragmatic Works
Рет қаралды 41 М.
Database vs Data Warehouse vs Data Lake | What is the Difference?
5:22
Alex The Analyst
Рет қаралды 744 М.
devil Pk jesus 😱❤️😈#devil #jesus #shorts
0:15
Jesus Vs thank Devil
Рет қаралды 23 МЛН
Ужин для Леонида Каневского!
1:00
dacooker_
Рет қаралды 10 МЛН
Getting kids hooked on motorcycles not drugs 👏
0:18
Twinstunts
Рет қаралды 5 МЛН
В семье появился подросток!
0:15
Victoria Portfolio
Рет қаралды 3,9 МЛН