Database Normalization for Beginners | How to Normalize Data w/ Power Query (full tutorial!)

  Рет қаралды 117,943

Maven Analytics

Maven Analytics

Күн бұрын

Пікірлер
@paulmarixreyes5595
@paulmarixreyes5595 Ай бұрын
Thank you so much! This is very helpful and it never dawned on me that we can normalize data using Power Query.
@LJ-jd5yy
@LJ-jd5yy Ай бұрын
This is the best video I’ve seen on normalization using Power query!
@kyithatun2277
@kyithatun2277 10 күн бұрын
Thanks a lot sir. The video and dataset saved me.
@JunaidKhan-wg7cj
@JunaidKhan-wg7cj Ай бұрын
I have watched first time and it was realy helpful. Thanks and keep teaching :)
@mehranbarahouei2346
@mehranbarahouei2346 2 ай бұрын
Great tutorial ❤❤❤❤❤❤
@dp70737
@dp70737 2 ай бұрын
😲 wait you can profile the entire data set. 😅😁 game changer
@dennisfidha
@dennisfidha 24 күн бұрын
Great tutorial. Greetings from Mombasa 👋
@MrAhmedHaider
@MrAhmedHaider Ай бұрын
Absolutely brilliant 👍
@omondif9
@omondif9 2 ай бұрын
Thanks Chris, i love the way you always breakdown the concepts into simple bitable bits.
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Thanks for the feedback!
@sambameissa
@sambameissa 2 ай бұрын
Thank you, really love how clear you guys explain concepts. keep up the good work.
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Glad you enjoyed this one!
@awesh1986
@awesh1986 2 ай бұрын
Loved the video. Too good.
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Glad you enjoyed this one!
@mixtape8476
@mixtape8476 20 күн бұрын
Thank, it helps me a lot to understand with data model!
@NitinKunigal
@NitinKunigal 2 ай бұрын
Hey Chris, I really liked the way you explained the concept of Normalization! However, I need some clarification. According to my understanding, I think it's always better to rather reference a query than duplicate a query for creating dim tables. Here's the logic: reference query establishes parent-child dependency with the master query (transaction table). Whenever there's an update made to the transaction table i.e., a new customer being added or something else, the child query (used for dim tables) also gets updated accordingly. On the other hand, duplicate query is independent and doesn't have any relationship with the master query (transaction table). Please share your insights! Cheers!
@txreal2
@txreal2 2 ай бұрын
Makes sense. Thanks 👍
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Great point! Referencing the existing query (vs. duplicating) would be the better approach if the source data will be changing over time. Thanks for the comment!
@Benry1984
@Benry1984 2 ай бұрын
I think it's fine in this instance since they all reference the exact same source - i.e. if the transaction workbook is updated / superseded then all queries would also reflect the changes. If there were changes to the transaction data which was saved in a new separate workbook (or filename changes) then the reference is definitely the option to go with - i.e. all of the other queries would point to the old table, or one that doesn't exist.
@NitinKunigal
@NitinKunigal 2 ай бұрын
@@Benry1984 No, the queries won't get updated at any cost because there's no reference with the master query (relationship doesn't exist). Unless you're absolutely sure that the source data won't get updated, it's always a good practice to reference a query rather than using the duplicate option. Regardless, it's always better to use the reference query option to create lookup tables.
@McKaySavage
@McKaySavage 2 ай бұрын
Additionally, duplicating queries duplicates all the query calculation steps up to that point. So in parallel language, you should be removing the redundancies and normalizing your queries with exactly the same thought as your datasets :)
@rajeshmajumdar4999
@rajeshmajumdar4999 2 ай бұрын
Hi Chris, thank you very much. ⭐⭐⭐⭐⭐⭐ Loved the way you explained it. God bless you.
@Pablo19720531
@Pablo19720531 22 күн бұрын
Thank you very much, your video has been very useful to me. 🤩
@andreag6126
@andreag6126 Ай бұрын
Thanks for sharing!
@balrajvirdee1087
@balrajvirdee1087 2 ай бұрын
Thank you great explanation 👍🏽
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Thank you, glad you found it helpful!
@SMCGPRA
@SMCGPRA 2 ай бұрын
Holy grail channel for learning data analysis
@KumarSrinivasAvvaru
@KumarSrinivasAvvaru 2 ай бұрын
Great work, thanks for sharing
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Glad you enjoyed this one!
@SothearithKONGMrMuyKhmer
@SothearithKONGMrMuyKhmer 2 ай бұрын
Thank you so much! That’s really helpful! 🥰
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Glad to hear it!
@SothearithKONGMrMuyKhmer
@SothearithKONGMrMuyKhmer 2 ай бұрын
@@Chris-at-Maven Thank you! 🙏
@7revor844
@7revor844 25 күн бұрын
thanks I learn from your video
@artiguf
@artiguf Ай бұрын
Amazing video. And yes well explained.. thanks.
@eniolaadekoya5623
@eniolaadekoya5623 2 ай бұрын
Great lesson i will like to this on microsoft fabric over the weekend
@officesuperhero4651
@officesuperhero4651 2 ай бұрын
This is a great tutorial. Thanks. I use a lot of Power BI. I guess my biggest caveat would be these relationships are one to many so slicers in Power BI won't work unless you use a measure with a slicer as a work around.
@CodeWithMuhammadAhsan
@CodeWithMuhammadAhsan Ай бұрын
beautifully explained
@udayteja6595
@udayteja6595 2 ай бұрын
Thankyou very much...
@peterrender4716
@peterrender4716 2 ай бұрын
Thanks Chris. Very helpful. This is by far the best explanation of 'normalisation' that I have seen. One question. I understood that many to many connections were not to be used in Power BI but note that the Order Line Item Table has a many to one connection to the Orders Table and a many to one connection 'chain' with the Products, subcategories and Categories Table. Can you explain how filters work through this structure please. I understood that the dimension tables are used for filtering the fact tables in the star schema but I am struggling to understand how the snowflake structure works, particularly with the addition of the one to many connection between the Orders and Orde Line Items Table. Thanks Peter
@DarcyWhyte
@DarcyWhyte 2 ай бұрын
In power BI you would not have a coneection between Orders and Line Items. Just between those fact tables and dimensions. Further, you'd probably have not completed as much normalization to create a snowflake, just keep it as a star...
@abdielgutierrez881
@abdielgutierrez881 Ай бұрын
Thank you very much. I am a beginner in the use of Power BI and I appreciate this super video that helps me to improve my data management. If you can post a video of how I can keep this file updated after performing the normalization.
@benosborne413
@benosborne413 2 ай бұрын
Great lesson. What if I want to update the info on a daily, weekly or monthly basis?
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
In this case we're working with a local CSV file for simplicity, but it reality you'd likely have a database connection that would pull in new information when the query is refreshed. If you don't have a DB, you could also connect to shared locations like SharePoint or OneDrive, or point to a local folder where you can add new files and automatically append them via Power Query.
@benosborne413
@benosborne413 2 ай бұрын
@@Chris-at-Maven Do you a full video on Power Query?
@ThịnhNguyễnPhúc-x7c
@ThịnhNguyễnPhúc-x7c Ай бұрын
Your video so great and helpful but I have a question i hope you can answer: Could you tell me the largest different between 1 table and star schema in building the dashboard when i have a big dataset?
@Benry1984
@Benry1984 2 ай бұрын
Great video. Really summarises the various concepts well. Interested in your thoughts on using this single source method to normalise the data versus just using the master data tables (e.g. get the customer, product, category etc table data separate and bring it in). I like to see what is not in the transactional data to spot things not being used (e.g. what cost centers are not being used at all) and means outputs will have a more consistent look. I figure it will mostly come down to the practical problem of actually being able to get access to updated master tables rather than any profound arguments.
@ExcelWithChris
@ExcelWithChris 2 ай бұрын
Great video explaining the concepts, but I do agree that there is no need starting off with everything in one file and then using PowerQuery to split. It still means that that when you add data you are still going to do it in the original file (which will get very large). Just makes sense to have all those dimensions from the start in different files, import into PowerQuery and then create relationships. If the original file with everything inside is going to be your source, then you are not gaining anything by splitting it into different tables.
@TrustRobert
@TrustRobert 24 күн бұрын
Thanks for the good job! Quick question pls. In the context of the table being worked on, What does the "Line item" mean? Also i struggled to understand how someone can just look at the table and say, Delivery date, Store ID, Order date are dependent on the Order Number only, so lets move them to a new Table. What about Product ID or even Quantity, how was the assumption made that those do not depend on the Order Number, but only the one you selected?
@contactpaulo
@contactpaulo Ай бұрын
Great!!! Thanks.
@pierre-yves_david
@pierre-yves_david 2 ай бұрын
Thanks, Chris, good to remind the different steps to normalise a database. Your video raises several questions (you have already answered to replacing duplicates by references). Is there a good reason to use CPU to sort the dimension tables (i.e., a form of investment to accelerate future table joins)? How can we identify functional (partial) dependencies between columns when you have no idea of the data contained in a base.
@eniolaadekoya5623
@eniolaadekoya5623 2 ай бұрын
First I have been able to go through the tutorial my question is instead of having an order line items and orders table can i just have a single transactional table? for star schema to be achieved. Second is if i have a date target table how can i link this to the model so that i can compare metrics by target value. Thank you Chris
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Yeah that's part of the point we're making at the end of the video., While techincally you CAN continue to normalize your models, sticking with a simpler star schema will often be the best bet in terms of balancing storage and query optimization. You can approach a date table in a similar way - by referencing the transaction table, depuplicating the transaction date field to get the full range of dates (assuming it's contiguous, no gaps), then adding date columns like day name, start of week, year, etc. That becomes your Calendar dimension table, which you can add to your star schema just like the others. Hope that helps!
@eniolaadekoya5623
@eniolaadekoya5623 2 ай бұрын
@@Chris-at-Maven Thank you Got That
@Loves-f3y
@Loves-f3y 2 ай бұрын
Question - @8:00 when you chose Home > Close & Load To… then used 'Only create Connection' saying you don't want to dump all the rows and data points into a worksheet and that you want to create a datamodel instead Choosing [x] Add this data to the Data Model. I guess I’m confused because right then the Queries & Connections shows all the records being loaded. Have you gone over that in more depth in another video?
@Jonay1990
@Jonay1990 11 күн бұрын
I get the concept, but actually applying it to live data that continuously updates I need to see in action. The original 'Transaction' I guess needs to have the exported columns adjusted to match the final version, and what about new products, stores, customers? How are they updated in the model again without just having the original 'Transaction' sheet exported in the first place?
@JJ_TheGreat
@JJ_TheGreat 2 ай бұрын
Question: When you remove the duplicate IDs, how do you know that you are removing the proper rows (in general)? Thanks.
@muhammadomer1654
@muhammadomer1654 2 ай бұрын
Thanks, Chris, for another great informative tutorial. But can you please tell us how you identified that in the process of the 2nd NF, fields like order date, delivery date, CustomerID, and StoreID depend on OrderNumber, while fields like Quantity and ProductID depend on both OrderNumber and LineItem?
@kossiviaglee2997
@kossiviaglee2997 Ай бұрын
I really enjoyed your video. I needed to know this. I give to myself some database projects in Ms-Access but I don't always know the number of normalisation levels to go. Sometimes I run into a total confusion at the end specially when it's time to build my queries. That's why I appreciate the end of your video. So please do you have a video on how be successful building queries no matter what level of normalization you got? Thanks again for this video.
@丁恩妃-h9d
@丁恩妃-h9d 2 ай бұрын
thanks for the tutorial will there are chances that excel refuse to work data mode module since the amount of data is too large, like several thousands of rows and many columns?
@ericgusmao5164
@ericgusmao5164 Ай бұрын
Great video, but I'd like to point out that we might also need a price column on Order Line Items table because the price of the product may vary over time.
@MrSlurrpy
@MrSlurrpy 2 ай бұрын
For Power BI it is suggested by Microsoft own documentation to use the Star Schema which means data should be and stop at the 2NF level? Is this a correct understanding?
@mjbine
@mjbine Ай бұрын
When refreshing the data, will the dimension keep the uniqueness or do you have to remove dups with each refresh?
@jefffisher1377
@jefffisher1377 2 ай бұрын
Great Video - Learnt a bunch of things. One question for clarity if I may - When I remove duplicates of Store ID being my main key , I have a database file with weekly data that 'grows' every week, and history data snowballs. I need to keep history , so removing duplicates of Store ID , will this remove history ?
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Tough to say without the full context, but generally speaking you'll want to make sure that your dimension tables contain EVERY possible primary key that might appear in your data tables.
@jefffisher1377
@jefffisher1377 2 ай бұрын
@@Chris-at-Maven Thanks Chris - More Context : I receive weekly data collected, by our field staff per outlet (Each outlet has a unique numeric code) by date in that week. This data is extracted from the back-end every Monday and added (more rows at the bottom of the table). I now have original raw data daily data which has a Month , QTD and YTD component added via Excel formula. If I keep the main file (as you have in this video) and duplicate it 3 times , in one of the copies , I could get rid of duplicates and when I want to get Monthly , QTD and YTD , I could reference back to the original master file ?
@MoonInfinity31323
@MoonInfinity31323 16 күн бұрын
Bottom Line: Proficient. It is effective how you slice and dice this visual for better understanding. Onwards
@md.shamsuzzaman4156
@md.shamsuzzaman4156 2 ай бұрын
Great. Thanks
@yonisupersaiyanyoni3693
@yonisupersaiyanyoni3693 20 күн бұрын
Hi, if you have two facts, how can you use two of them in the same visuals?
@davidpinkney7301
@davidpinkney7301 17 күн бұрын
Greetings from NZ... where did you get the t-shirt from?
@KeoniPKim
@KeoniPKim 2 ай бұрын
So when you get a new "Transactions File" would you just refresh the data?
@potatocraftd6629
@potatocraftd6629 2 ай бұрын
Yes
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
It depends. In this case we're working with a local CSV file for simplicity, but it reality you'd likely have a database connection that would pull in new information when the query is refreshed. If you don't have a DB, you could also connect to shared locations like SharePoint or OneDrive, or point to a local folder where you can add new files and automatically append them via Power Query. If the file name and path didn't change, then yes you could update the source file and refresh the query to see the new data.
@mahmoudmohsen8105
@mahmoudmohsen8105 2 ай бұрын
Lovely!
@karimmasarweh9844
@karimmasarweh9844 18 күн бұрын
Thank you for this great session. My 2 cents here... I've taken a course in the past about PowerBI data modeling by RADACAD and it should be the same for PowerPivot. It's funny how the instruction there was "Say NO to snowflake schema", i.e. PowerBI/PowerPivot prefer star schemas even if there is a bit of redundancy. In the course, the product, productcategory and productsubcategory (each of them a dimension table) were actually intentionally merged together into a single dimension table containing the 3 tables. This would increase the redundancy but since we are dealing with dimension tables, the number of rows shouldn't very high, and thus the redundancy won't be very high. From what I understand, 3NF is crucial in OLTP (Online Transactional Processing) models. Those are used in applications like webapps. But for BI, an OLAP (Analytical Processing) model is required. In these models, full normalization is less required. What is important is to have a performant model.
@bhavishyaaggarwal3523
@bhavishyaaggarwal3523 Ай бұрын
Hey I downloaded the excel file from the link in description and then without any normalization I created a pivot table using the info originally given in the excel sheet !! And it gave the same results as u were getting , the what was the need for all that normalization ?
@seanys
@seanys Ай бұрын
I understand that this is a beginner’s tutorial but @4:40, as soon as you remove duplicates, you’re potentially throwing away current data. Transactions take place over time and customer’s names and addresses change over time. The least you should do is sort the initial table by time to ensure only the oldest duplicate Client IDs are removed. A better approach would involve retaining historical customer data.
@Loves-f3y
@Loves-f3y 2 ай бұрын
Nice. Subscribed. What type of data model is best for a model driven Power App?
@95andresillo
@95andresillo 2 ай бұрын
Hi Chris I love your videos. I was looking at making the business intelligence roadmap that you have on your page, is it possible to make that same roadmap but with your udemy courses? If so, what would they be?
@DookyButter
@DookyButter 2 күн бұрын
I would love to see the opposite: data model de-normalization for Power BI star schemas.
@abidurrahmanabir7745
@abidurrahmanabir7745 2 ай бұрын
Salaam Brother..... This is masterclass video for any1 to understand the Basic of full Dataset Analyzing... ❤️❤️😱😊.......
@orlandofury
@orlandofury Ай бұрын
I'll do choose columns instead of removing
@christabelabraham8402
@christabelabraham8402 2 ай бұрын
Hey Chris , my power pivot is not on my excell what can I do
@DataThomas
@DataThomas 2 ай бұрын
What if my initial table does not have all the IDs (transaction, order, date, etc.) already in it?
@ExcelWithChris
@ExcelWithChris Ай бұрын
Maybe a stupid question. All of these are based on the original data source with all the columns in it. What is the use of splitting it if you are still going to add new data to the original source to update all these queries. Are we assuming the source data is a once off? Or am I missing something here.
@yonisupersaiyanyoni3693
@yonisupersaiyanyoni3693 20 күн бұрын
Most of the time the original data is separate not all in the same table, and then you build a etl process that organises the data to Facts and Dims
@GoDFazel2
@GoDFazel2 6 күн бұрын
get data, you get one result, and i get 20 !
@orthodox_gentleman
@orthodox_gentleman Ай бұрын
Dang brother you look GOOD! Male perfection…
@NirRobinson
@NirRobinson 2 ай бұрын
Why do you prefer duplicate vs reference?
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Ah I just reread your comment and get what you're asking. I used the duplicate approach here since I'm assuming the data won't change, but in reality I think referencing would be a smarter approach to keep everything in sync if the data changes - thanks for the comment!
@pierre-yves_david
@pierre-yves_david 2 ай бұрын
@@Chris-at-Maven There is an additional benefit (and possibly two): 1) if the source location change, you have only one request to correct => maintability 2) probably it is more efficient to download the Excel worksheet or CSV file once and to create the dimension tables from that unique source => performance
@NirRobinson
@NirRobinson 2 ай бұрын
@@pierre-yves_david 2- fully agree 1- via connection options you can also change it once 😇
@christabelabraham8402
@christabelabraham8402 2 ай бұрын
Never new life could be this simple 😂
@sunnygawande5283
@sunnygawande5283 2 ай бұрын
So if I do calculations on denormalized data then will it give an error??
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
No, it's not that there's anything WRONG about denormalized tables, it's just that they contain more information than you technically need. Normalization is essentially just about reorganizing your data to minimize redundancy.
@ExcelWithChris
@ExcelWithChris 24 күн бұрын
Can you guys please come back to me ............(posted a week ago): Maybe a stupid question. All of these are based on the original data source with all the columns in it. What is the use of splitting it if you are still going to add new data to the original source to update all these queries. Are we assuming the source data is a once off? Or am I missing something here.
@DJ-pn9te
@DJ-pn9te 7 күн бұрын
normalization is nice, but at the end of the day, computers are so powerful, it has lost its edge. its a good video none the less.
@202lbs
@202lbs 2 ай бұрын
being wiped out due to overstaying in mid, not pushing lanes
@abd5184
@abd5184 Ай бұрын
Remove background music
@ahmeddafa-allah3603
@ahmeddafa-allah3603 2 ай бұрын
🎉🎉🎉🎉🎉🎉🎉
@MaxOnTheRun
@MaxOnTheRun 2 ай бұрын
Day ❤
@Milhouse77BS
@Milhouse77BS 2 ай бұрын
Good example of making a 3NF, but final result is not a good dimensional model aka star schema for Power BI. Never join fact tables together.
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
That's the point we make at the end of the video, when we talk about how much normalization is appropriate. 99% of the time, simplifying the model to a star schema is the right move even if your tables aren't fully normalized.
@Milhouse77BS
@Milhouse77BS 2 ай бұрын
​@@Chris-at-Maven Understood, another point is that the fact table in a star schema is usually already in 3NF. the star schema fact table is not denormalized, but reorganized as opposed to the header-detail design. In this case, at 9:36, with dimensions in Transactions table, and TransactionID, OrderID, and LineID as "degenerate dimensions", the Transactions table is in 3NF. There can be more than one 3NF design. The artificial Transaction/Order/Line IDs don't contribute much to dimensionality of model. If have Customer and Time in fact table, could probably drop those IDs, except for benefit to tie back to source systems.
@hoang16191
@hoang16191 2 ай бұрын
Is that your normal voice or u attend to lower your tone?
@mrbartuss1
@mrbartuss1 2 ай бұрын
Why are all examples on this channel so simple and in real world it is all messed up...
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
Fair point! The intention here is to help people build a really clear intuition for what data normalization means, and how to apply normalization techniques to create star and snowflake schemas. In later demos we'll start layering in more complexity and messier datasets, but that would be counterproductive unless you'll built that foundational understanding first.
@mrbartuss1
@mrbartuss1 2 ай бұрын
@@Chris-at-Maven I totally agree. However, you’ve already made plenty of videos for total beginners. More complex content would be really appreciated
@vishalmane3139
@vishalmane3139 2 ай бұрын
half information
@Chris-at-Maven
@Chris-at-Maven 2 ай бұрын
🤔
@vishalmane3139
@vishalmane3139 Ай бұрын
@@Chris-at-Maven how to make dim and fact table hw to identify pk and fk
@remek5758
@remek5758 2 ай бұрын
Actually, isnt normalization meant to be for Transactional databases (inda like snowflake schema) and denormalization is simplifying queries for analytics purposes (star schema) ? maybe I'm just confused. But I dont know anyone who wants to have Snowflake over Star Model in analytics, (I Know that sometime you have to but thats not ideal)
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF
28:34
Decomplexify
Рет қаралды 2,1 МЛН
Try The DGET Function Instead of INDEXMATCH & XLOOKUP
12:29
Kenji Explains
Рет қаралды 113 М.
5 Conditional Formatting Hacks That Will Blow Your Mind
11:16
Mike’s F9 Finance
Рет қаралды 9 М.
Master Data Modeling in Power BI - Beginner to Pro Full Course
2:09:21
Pragmatic Works
Рет қаралды 81 М.
The Excel Tool That Does What PivotTables Can’t (File Included)
13:59
MyOnlineTrainingHub
Рет қаралды 165 М.
Telling a Story with Data | Dashboard Build Demo
13:41
Maven Analytics
Рет қаралды 157 М.
How to use Microsoft Power Query
16:35
Kevin Stratvert
Рет қаралды 2,1 МЛН
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
Adam Finer - Learn BI Online
Рет қаралды 260 М.
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН