Lakehouse data validation with Great Expectations in Microsoft Fabric

  Рет қаралды 3,681

Learn Microsoft Fabric with Will

Learn Microsoft Fabric with Will

Күн бұрын

FREE 40-minute Fabric fundamentals course: www.skool.com/microsoft-fabri...
End-to-end project playlist: • Playlist
GitHub code for the notebooks: github.com/LearnMicrosoftFabric
Great Expectations docs: docs.greatexpectations.io/docs/
When users spot errors in your data or dashboards, they lose trust immediately and it can be very hard to regain that trust.
So in this video, we look at (in my biased opinion), THE most important part of any data analysis/ business intelligence/ data science workflow - data validation.
We learn how to implement Great Expectations, an industry standard Python library for data testing and validation.
The video contains two parts, with a separate notebook for each part:
Part 1: Initial setup and configuration of Great Expectations within Microsoft Fabric
Part 2: Notebook to run validation on new datasets (for example when loading and validating data between bronze and silver layers of a medallion architecture.
-BROWSE MY OTHER FABRIC PLAYLISTS-
DATA ENGINEERING • Data engineering (Micr...
END-TO-END FABRIC PROJECT • Playlist
INTRO TO MICROSOFT FABRIC • Intro to Microsoft Fabric
DATA FACTORY • Data Factory (Microsof...
#microsoftfabric #lakehouse #datavalidation #greatexpectations
-TIMELINE-
0:00 Coming up
0:29 Why this is so important
4:43 End-to-end project recap
5:37 Plan for this video
6:11 Intro to Great Expectations
8:00 NOTEBOOK 1 START: Installing Great Expectations
9:48 Setting up the GX Data Context
12:47 Adding data sources/ assets to the Context
15:07 Defining our tests (Expectations)
17:46 Defining and running a checkpoint
19:50 Initial look at results
21:37 IMPORTANT! Copying configuration to Lakehouse FIles
24:20 NOTEBOOK 2 START
24:57 Re-initialize data context from Files
26:03 Feeding in fresh data and running the validation
29:01 Handing the results
34:08 Wrapping up
-LINKEDIN-
Not following the LinkedIn page yet? Here's the link: / learnmicrosoftfabric
-ABOUT WILL-
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.
-SUBSCRIBE-
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.
youtube.com/@learnmicrosoftfa...

Пікірлер: 29
@oskarlindberg4869
@oskarlindberg4869 4 ай бұрын
Such good content!!! Can so relate to the moving away from PBI to data engineering because of validation needs
@LearnMicrosoftFabric
@LearnMicrosoftFabric 4 ай бұрын
Glad you enjoyed! I'll be doing a lot more content on data quality in Fabric soon
@evanb639
@evanb639 4 ай бұрын
Very helpful demo, thanks Will!
@LearnMicrosoftFabric
@LearnMicrosoftFabric 4 ай бұрын
No problem, thanks for watching! I’ve got a lot more content about data quality and validation coming soon on the channel 💪🙌🏽
@VagnerKogikoskiJunior
@VagnerKogikoskiJunior 7 ай бұрын
Congrats for your channel Will. Very good stuff! Keep going
@LearnMicrosoftFabric
@LearnMicrosoftFabric 7 ай бұрын
Thanks mate, lots more to come shortly
@stevefox7469
@stevefox7469 3 ай бұрын
Excellent demo. Seems to open up opportunities to build release processes for PowerBI . For example, checking for local date tables, Referential integrity issues, bidirectional relationships etc.
@LearnMicrosoftFabric
@LearnMicrosoftFabric 3 ай бұрын
Thanks for watching, you're right, lots of opportunities for Power BI semantic model validation. I've gone into more detail about that in my latest video: kzbin.info/www/bejne/rXLEqnZjf56Hqbc
@I_Love_Coding12
@I_Love_Coding12 3 ай бұрын
well explained will, thanks
@LearnMicrosoftFabric
@LearnMicrosoftFabric 3 ай бұрын
Thanks! This one might also be of interest to you too kzbin.info/www/bejne/rXLEqnZjf56Hqbc
@harvey2242
@harvey2242 2 күн бұрын
What if I have multiple data sources/tables to validate, do i need to set up the data sources, data assets, my batch requests, validator, yaml files individually? do i do some sort of loop to address this?
@MrPhillard
@MrPhillard 2 ай бұрын
This is a great video series and has been helping me a lot. I am trying to implement a Fabric architecture from scratch and running into issues. However, your videos really help. I have one question about the above video: I was working through the example and after creating the yaml file and saving expectation results to context, I do not have a 'great_expectations' folder in my root directory. Therefore, I cannot copy the results over to the lakehouse for viewing. Do you have any direction on where else this could have gone? Or any possible issues? All steps were successful and expectations passed... I just can't find this file... I did have to create an environment to add the library manager as that is a new feature... I have even checked the 'env' folder I see but it is empty... Please help!!! if you can... Thanks again for all the knowledge!
@LearnMicrosoftFabric
@LearnMicrosoftFabric 2 ай бұрын
Hey thanks for watching! The directory in which GX saves the context has changed since I recorded the video - please look at another comment below - someone mentioned this
@muftkuseng5924
@muftkuseng5924 3 ай бұрын
This video was great! Just a quick question - in the end you are transfering it to the silver lakehousenand your check is if it is validated. But should the check not be about the results of the validatiom? Something like „if percentage of validation 100% successful then do this“? The current if statement just checks if there is an object or am I wrong?
@LearnMicrosoftFabric
@LearnMicrosoftFabric 3 ай бұрын
Hey, the ‘success’ property is a Boolean, in the validation results object, GX already does that check for us. In general though, it’s safer to write ‘if success == True’ (being explicit). ‘if success:’ is a bit lazy and could introduce errors
@chennachennu9215
@chennachennu9215 25 күн бұрын
I am using Fabric with Pro license version and I dont see an option for Library management in the workspace settings. Should this be enabled somewhere by the admin? is there any alternate way to install great expectations once and reuse that in all the notebooks we will have?
@LearnMicrosoftFabric
@LearnMicrosoftFabric 25 күн бұрын
Yes they removed Library Management - now you will need to create an environment and install GX in that environment. See here: learn.microsoft.com/en-us/fabric/data-engineering/create-and-use-environment
@LearnMicrosoftFabric
@LearnMicrosoftFabric 10 ай бұрын
😳 Went deep on data validation in this video, because I think it's SUCH an important topic, and I haven't seen anyone else tackle it yet in Fabric. All the code is in this repo here: github.com/LearnMicrosoftFabric/KZbin/tree/main/data_validation_examples Are you going to try to implement Great Expectations or another data validation/ testing framework in Fabric? Please share your experiences!!
@edyau4667
@edyau4667 10 ай бұрын
Love this focus on Data Quality! Sidebar: is GX the one you recommend above all others? There are a few other frameworks like Soda, DBT that offer a more SQL-flavour
@LearnMicrosoftFabric
@LearnMicrosoftFabric 10 ай бұрын
@@edyau4667 Hey thanks for the message! I think if you're more comfortable with SQL yeh DBT is great (plus they have a Fabric integration built already), if python is your thing then Great Expectations is a good one to focus on.
9 ай бұрын
have you tried pydantic or panderas and made some comparison tests? @@LearnMicrosoftFabric
@wimsetm
@wimsetm 8 ай бұрын
Hi, first of all great tutorial this is very applicable to what I am trying to achieve; however, in following your code I am hitting a bug in fabric. The final part of the setup where you save the expectations into the Files area via !cp -r great_expectations/ /lakehouse/default/Files is giving an error of cp: cannot stat 'great_expectations': no such file or directory. Any ideas why this is? The code I am running is from your notebooks, so am wondering if Fabric has been updated?
@oskarlindberg4869
@oskarlindberg4869 4 ай бұрын
Would be interesting to see a video on dbt in fabric!@@LearnMicrosoftFabric
@harvey2242
@harvey2242 2 күн бұрын
when running this line "!cp -r great_expectations /lakehouse/default/Files/" , it throws out this error "cp: cannot stat 'great_expectations': No such file or directory". Im using Fabric lakehouse as well. Any idea how to debug?
@harvey2242
@harvey2242 2 күн бұрын
Nvm. i have resolved the issue. i "import great_expectations as gx" hence the correct line should be "!cp -r gx /lakehouse/default/Files/"
@LearnMicrosoftFabric
@LearnMicrosoftFabric Күн бұрын
Yes, the later version of great expectations has changed the folder name, as you've seen 👍
Semantic Link 1 HOUR Tutorial - Microsoft Fabric
52:54
Learn Microsoft Fabric with Will
Рет қаралды 4,5 М.
Elevating Data Quality: Great Expectations and Airflow at PepsiCo
23:54
Red❤️+Green💚=
00:38
ISSEI / いっせい
Рет қаралды 75 МЛН
Scary Teacher 3D Nick Troll Squid Game in Brush Teeth White or Black Challenge #shorts
00:47
🤔Какой Орган самый длинный ? #shorts
00:42
End-to-end data validation strategies in Microsoft Fabric (+ 3 DEMOS)
51:56
Learn Microsoft Fabric with Will
Рет қаралды 7 М.
Organize a Fabric Lakehouse using Medallion Architecture Design
36:06
Kamil Data Geek - Azure explained
Рет қаралды 1,2 М.
What role will Microsoft Fabric play in your future careeer?
19:26
Learn Microsoft Fabric with Will
Рет қаралды 5 М.
Extract and Load from External API to Lakehouse using Data Pipelines (Microsoft Fabric)
16:49
Learn Microsoft Fabric with Will
Рет қаралды 11 М.
Advancing Fabric - Lakehouse vs Warehouse
14:22
Advancing Analytics
Рет қаралды 23 М.
Great Expectations Demo // Modern (Python Based) Quality Checks for Data Pipelines  | Demohub.dev
42:30
DemoHub | Demos For Modern Data Tools
Рет қаралды 2 М.
СТРАШНЫЙ ВИРУС НА МАКБУК
0:39
Кринжовый чел
Рет қаралды 1,4 МЛН
ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК
1:00
VA-PC
Рет қаралды 2 МЛН
Battery  low 🔋 🪫
0:10
dednahype
Рет қаралды 11 МЛН
Как распознать поддельный iPhone
0:44
PEREKUPILO
Рет қаралды 2 МЛН
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,8 МЛН
Samsung Galaxy 🔥 #shorts  #trending #youtubeshorts  #shortvideo ujjawal4u
0:10
Ujjawal4u. 120k Views . 4 hours ago
Рет қаралды 8 МЛН