Data Mart Design - ETL Process to Populate Staging Table

  Рет қаралды 40,923

Clinton Daniel

Clinton Daniel

Күн бұрын

Пікірлер: 20
@1review5229
@1review5229 4 жыл бұрын
Hi Clinton, this is really helpful. one suggestion: it would be nice to add Video # number to follow one after another video.
@bridgetmukuhi4810
@bridgetmukuhi4810 4 жыл бұрын
Hello Clinton. Your lectures are very helpful for my lectures they are very easy to follow especially in this corona situation. Thank you
@manishaggarwal7600
@manishaggarwal7600 4 жыл бұрын
Your videos are awesome Clinton. I have seen LinkedIn Learning videos also but you have done better. And yes please number them then the whole tutorial will be great!! Once again thanks for sharing your knowledge.
@asmegetachew2528
@asmegetachew2528 11 ай бұрын
Thank you Clinton!
@ASDAvlogs
@ASDAvlogs 3 жыл бұрын
Why are we performing conversion while loading into staging?
@ilikedigitallogic5059
@ilikedigitallogic5059 4 жыл бұрын
Thanks for your transfer knowledge👍
@durgaprasadvallabhineni2322
@durgaprasadvallabhineni2322 4 жыл бұрын
Very good explanation
@clintondaniel5285
@clintondaniel5285 5 жыл бұрын
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402
@ericsos101
@ericsos101 3 жыл бұрын
Can you please share the dataset you used?.. Thanks
@alphar85
@alphar85 4 жыл бұрын
Hey Daniel, i am looking to watch all your videos on sql ETL but am trying to figure out which one comes first 😉. Thank you for making those videos
@clintondaniel5285
@clintondaniel5285 4 жыл бұрын
If you look at the videos included in my channel, identify the video names that start with "Data Mart Design: ...". Then, you will see a sequence in the names.
@alphar85
@alphar85 4 жыл бұрын
@@clintondaniel5285 I am going through all of them. Can i ask you please ? does the Snowflake and Star Schema come before we extract the data from the source to the destination ? at what stage, do we need to design either snowflake or star schema ?
@clintondaniel5285
@clintondaniel5285 4 жыл бұрын
@@alphar85 Yes, you should be designing your data mart with a schema before you begin the ETL process.
@alphar85
@alphar85 4 жыл бұрын
@@clintondaniel5285 Thank you Daniel for helping me. can i also ask you? how would you consolidate data sources (CVS, excel, files) within your organisation into one spource of truth? would you ask your organisation to switch to Access, for instance, to enter data in order to avoid redundancy ? if os, what about the old files in excel sheets ?
@maheshbabu4084
@maheshbabu4084 5 жыл бұрын
how can i proceed if the number of data source files are more than one?
@clintondaniel5285
@clintondaniel5285 5 жыл бұрын
I've added the source code, source data, and documentation for creating the Crime Data Mart on GitHub here: github.com/cedanie2/ISM-4402. I'll see if I have time to work an example with multiple source files. Basically, you need to append the existing Staging table or you have to create additional staging tables. Then, your ETL package will have additional connection managers that point to the additional data source files. These connection managers will be used in separate Data Flows within SSIS to move the data to the additional staging tables or UPDATE an existing staging table.
@lazykitten4356
@lazykitten4356 5 жыл бұрын
@@clintondaniel5285 Thank you so much. This is very helpful. As for the additional staging tables (for multiple sources of data at multiple times), do we delete them after we uploaded the data to the data mart? Another question: if we append the staging table, instead of creating more staging tables, how to upload the appended data to the data mart?
@clintondaniel5285
@clintondaniel5285 5 жыл бұрын
@@lazykitten4356 Usually, in a production environment the staging tables are not useful after your data mart has been loaded with the most recent or initial data. I usually recommend two additional columns in each dimension that record a timestamp in one column and a status of the update in another column. For instance, status = I if the record was INSERT or status = U if the record was UPDATE. In terms of the appended data, take a look at various methods of "UPSERT" (INSERT vs. UPDATE example - using the MERGE command). Since the staging table would be empty (TRUNCATE) from the previous LOAD, then you have new data in the staging table. Then, your SSIS package should run an "UPSERT" (example - MERGE command) on the existing dimension. If there is a new dimension record, then execute an INSERT. If the record already exists with no change, then ignore. If the record exists but with updated changes, then UPDATE. If any record is UPDATE or INSERT, also UPDATE the "status" column and the "timestamp" column in your dimensions. In terms of FACT tables, different opinions from different developers. You can also take the "UPSERT" approach, but you can also just add a new record. If you just add an additional record to your FACT table and you have an updated FACT with 2 or more rows, then you will need to develop additional SQL logic on your reporting side that either MERGEs the 2 records or shows the timeline where the FACTs have changes. It all depends on your use case and requirements. If historical change is important in the FACTs and you don't want multiple records, then just create a separate FACT table that allows all changes. So for instance the normal FACT table may only have 1000 rows while the "historical" FACT table may have > 1000 rows.
@lazykitten4356
@lazykitten4356 5 жыл бұрын
@@clintondaniel5285 Thank you so much. You at least give me ideas of what to look for :) I really appreciate it.
@lazykitten4356
@lazykitten4356 5 жыл бұрын
@@clintondaniel5285 Another question, where did you learned all these techniques about staging tables and DW? Is there a book that you recommend? Thank you very much
Data Mart Design - Populate Dimensions and Fact Tables
29:30
Clinton Daniel
Рет қаралды 71 М.
Data Mart Design - Problems with ETL and Staging
21:44
Clinton Daniel
Рет қаралды 6 М.
Какой я клей? | CLEX #shorts
0:59
CLEX
Рет қаралды 1,9 МЛН
ETL Process using SSIS [Real Time Practical Example] Step By Step
59:54
Create an ETL package with SSIS! // step-by-step
13:11
Kahan Data Solutions
Рет қаралды 170 М.
Querying 100 Billion Rows using SQL, 7 TB in a single table
9:07
Arpit Agrawal (Elastiq.AI)
Рет қаралды 55 М.
Unraveling the Layers of Data Warehouse: A Staging Layer Demonstration
17:26
Which Database Model to Choose?
24:38
High-Performance Programming
Рет қаралды 67 М.
05 Load only new files in SSIS
13:44
Learn SSIS
Рет қаралды 34 М.
A simple example of ETL (Extract, transform, and load)
11:42
Isaac Vaghefi
Рет қаралды 34 М.
Automate the SSIS Package Execution in SQL Server | Schedule SSIS Package
14:09
Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF
28:34
Decomplexify
Рет қаралды 2,2 МЛН
Data Mart Design - Table Construction
15:04
Clinton Daniel
Рет қаралды 17 М.