This is down to earth . I do appreciate your level of understanding and passing it on to me as a beginner
@SeattleDataGuy5 жыл бұрын
Thank you for your kind words Abdul! Glad I could pass on some knowledge.
@yaqubhassan515 жыл бұрын
By far one of the best i have seen on data mapping and stages of ETL...Thanks Ben! Keep them coming!!
@SeattleDataGuy5 жыл бұрын
Thank you! I am trying to find time between work and consulting!
@passais5 жыл бұрын
Is a Part 2 coming? I could not find it. Very good vid!
@zhangleo91923 жыл бұрын
I can’t find it either
@elysel94243 жыл бұрын
kzbin.info/www/bejne/aKKwZIiijZ2titE&ab_channel=SeattleDataGuySeattleDataGuy i think this
@jpank113 жыл бұрын
You’re a natural teacher. Great vid!
@SeattleDataGuy3 жыл бұрын
You're too kind! Thank you!
@ablack04 жыл бұрын
Awesome explanation. Thank you. I have lots of questions about this topic. How do you handle additions of new data? Do the raw and stage databases get cleared out each time new data is added to the data warehouse? Also how do you handle changes in ETL logic over time? How would you handle a situation where a portion of historical data needs to be reloaded into the data warehouse possibly using new ETL logic? Should it always be possible to recreate the entire data warehouse from the flat files? If so how do you ensure that the current state of the data warehouse is the same as it would be if it were blasted away and recreated from scratch? Are there any strategies for version control of a data warehouse?
@higiniofuentes25513 жыл бұрын
Very interesting video, with a lot of ideas. Because of the title, the content of the video is bit different, I thought it was more focused in presenting the "players" to construct an ETL. The idea of raw data seems to come more from the operational database than from the flat files (csv, etc), and after the cleaning/staging/mapping comes the Stage DB and then the Data warehouse. I can't wait to see part 2 !
@AIMagician9962 жыл бұрын
where is part 2?
@09soleil2 жыл бұрын
No part 2 ? What about the logging and error tracking video? Really nice video, thank you very much
@KoldFiyure5 жыл бұрын
Thank you so much. I needed this for an interview I have coming up. I need some formal concepts for what I am doing at a current job where things aren't exactly referred to in these ways.
@SeattleDataGuy5 жыл бұрын
Let me know if you have any other specific questions or concepts you would like covered!
@Thiago2806902 жыл бұрын
Great explanation!
@SeattleDataGuy2 жыл бұрын
Thank you! Man this video is old..i need to make a new one
@Georgehwp2 жыл бұрын
Would be great if you could make this more specific with something like DBT or prefect/dagster/airflow involved.
@weouthere69022 жыл бұрын
Wish you made more content like this
@peekguyy3194 Жыл бұрын
thanks for this
@SeattleDataGuy Жыл бұрын
Glad you liked it!
@00EagleEye003 жыл бұрын
Hi there. Got a question on raw data (flatfiles). These doesn't have identities or keys so you formulate a candidate key by combining some columns (product, location, target_year). Here's the question, if there are some columns to correct and it belongs to the combination of candidate keys, how can the data be corrected or updated? What approach is to be made ?
@SeattleDataGuy3 жыл бұрын
I am sorry for leaving this question so long. I really wasn't sure the best way to respond. If you're still curious, do you mind rephrasing the question?
@00EagleEye003 жыл бұрын
@@SeattleDataGuy in the absence of a unique/id keys for flatfiles, there are suggestions to use composite keys or combine columns to be used as a candidate keys (sample product, area, product_year columns resulting to : apple-washington-1999) as identity. Question is what if one of the candidate columns have nulls or empty value (e.g. no area value resulting to : apple-1999 as a key) on the first ingestion then a later update came with fill-up value (let say area is indiana that would result to apple-indiana-1999 as a key). What is the best approach to update this kind of scenario? Would that result to a data loss since the first combination keys are incomplete then when an update came and it was completed, record with incomplete combination keys should be removed/cleaned?
@hakank.5603 жыл бұрын
As an financial auditor I want to extract data from our clients database and then manipulate it to have auditiformation. Is learning SQL language the best thing to do? Like to hear form you.
@SeattleDataGuy3 жыл бұрын
That's interesting. Is there any other way to extract the data? Learning SQL is a big lift and might not be a good time trade off. If you need to do analytics on the data then i would say yes. But if you're just auditing, it might not provide the same benefit.
@lambdakicks4 жыл бұрын
What about removing nulls and malformed entries -- would you recommend doing that prior to Staging or afterwards?
@lambdakicks4 жыл бұрын
Excellent video by the way, thank you!
@SeattleDataGuy4 жыл бұрын
Personally, I prefer loading raw data as is. Regardless if there are errors. Why? Because how do you know where the errors came from? If you add a bunch of logic into your raw load, then that logic could mess up your data and you might not know it. However, if you have a problem in your raw data and all you are really doing is loading, then you know the problem is in the data itself. This isn't a rule, more of a guidelines. It depends how messy your data is. If 90% of your data is messy and needs cleaning, then consider doing it prior, but if only 5% is, then it might be better to do later.
@kshitijpathak86462 жыл бұрын
Great Video! I have one clarifying question. Is there a need to create csv, xml etc files from operational DB and then load the data into Raw DB. Will it not be easier and efficient to simply load the data from operational DB to Raw DB without creating any files in between?
@SeattleDataGuy2 жыл бұрын
This can depend. Some people like to do this for observability sake.
@michailo872 жыл бұрын
Did you ever done some data quality monitoriing system and send notifications for loading issues.? It's help to track issues on every level of loading.
@JanUnitra2 жыл бұрын
Where slowly changing dimension process should be done, in staging or DW ?
@mohamedarif23033 жыл бұрын
Very nice but could you provide more videos on ETL pls thanks