What Is A Data Catalog And Why Do People Use Them?

  Рет қаралды 17,799

Seattle Data Guy

Seattle Data Guy

Жыл бұрын

Special Thanks To Atlan For Partnering With Me On This Video. Learn more about them here: bit.ly/3VMCCXV
What is a data catalog?
iData was Facebook’s data discoverability tool. It provided a lot of functionality that I have started to miss. This included the baseline functions you would expect including the ability to find tables, trace lineage, and track down owners of said tables.
But there were also other beneficial features like cost tracking, data quality assessments, and table certification. All of these features made it easy for a new data engineer to quickly orient themselves as they started on new projects.
My Favorite iData Feature
My favorite features involved being able to see how other users were using the data on a query level. This provided a lot more context than just commented fields. ERDs and data lineage are all great. But seeing exactly how other users were using the data made it easy to understand(also they were great people to ping if you had questions).
It was so easy to quickly understand how the data was already being used. This provided several benefits including:
Reducing the duplication of work
Providing context on how data could join together(even across multiple data sources)
It would let you know who to ask questions about the data. Sure, the owner is one great place to start, but sometimes owners, over time, move away from datasets
Upon leaving the company formerly known as Facebook I felt like I kept stumbling on a new data catalog or discoverability tool every week. At this point, I am sure I have come across at least 3-5 dozen data discovery tools all of which add their own flair to helping teams manage their metadata.
If you enjoyed this video, check out some of my other top videos.
Top Courses To Become A Data Engineer In 2022
• Top Courses To Become ...
What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
• What Is The Modern Dat...
If you would like to learn more about data engineering, then check out Googles GCP certificate
bit.ly/3NQVn7V
If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
seattledataguy.substack.com/​​
Or check out my blog
www.theseattledataguy.com/
And if you want to support the channel, then you can become a paid member of my newsletter
seattledataguy.substack.com/s...
Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
_____________________________________________________________
Subscribe: / @seattledataguy
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Пікірлер: 24
@dumisaralane
@dumisaralane Жыл бұрын
Awesome video - thanks. I have started our organisation's data catalog. We are using Microsoft Purview. One thing I have already realised is that it takes time to document your enterprise data in a data catalog, you have to be patient and perhaps take it one business domain at a time depending on you organisation size. Happy cataloging everyone!
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
Yeah, modern ones try to automate the process but someone always has to put in the metadata
@advaitchabukswar4163
@advaitchabukswar4163 5 ай бұрын
Really great videos. Learning a lot.
@SeattleDataGuy
@SeattleDataGuy 5 ай бұрын
Glad you found it helpful!
@juliustuckayo8973
@juliustuckayo8973 Жыл бұрын
Another great nugget Ben.
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
glad you liked it!
@ArtemioP
@ArtemioP Жыл бұрын
What are your thoughts on Open Metadata? It's a interesting one to me because of recent automatic Spark Lineage (Spline) integration.
@ToToDarKDu59
@ToToDarKDu59 Жыл бұрын
Interesting video Ben, thanks ! I'm curious on what is your vision on how to do the change management with the business part of the company (subject for another video ?)
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
Great suggestion, i should do that video too!
@Neferfifi21
@Neferfifi21 Жыл бұрын
Hey Ben, thanks for this video. I was wondering if you have good data management book recommandations?
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
DAMAs DMBOK isn't a bad place to start
@lucashoww
@lucashoww Жыл бұрын
LOVE THY DATA!
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
amen
@rafaaferid1789
@rafaaferid1789 2 ай бұрын
I have a question 🙋 Would it be helpful to implement data catalog for application data? (Not analytics data)?
@kopiking352
@kopiking352 Жыл бұрын
iData is open source or just Facebook proprietary? if not, any data catalog open source to recommend?
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
It is not open source, most people will use datahub for the opensource side of data catalogs
@christinahumtsoe1262
@christinahumtsoe1262 8 ай бұрын
What about Microsoft Purview?
@Buhlebendalo_Mavika
@Buhlebendalo_Mavika Жыл бұрын
If I could see an actual catalogue it will help. Can anybody help?
@picious
@picious Жыл бұрын
is MS Purview a tool for Data Catalog?
@JLRocco43
@JLRocco43 Жыл бұрын
purview works very similar to informatica EDC where it "scans" locations and provides a data lineage in the end--so, its in that realm
@dumisaralane
@dumisaralane Жыл бұрын
Yes. We use it in our organisation.
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
Yeah some people do use it for DC
@Dave-nz5jf
@Dave-nz5jf 10 ай бұрын
Ugggg nothing is more impotent than a data catalog. Data engineers hate it because it's not needed for replication / DE transformation , and business hates it because it puts governance around what they're trying to do . And it's in the nature of analysts to hate all kinds of governance / enforcement. Yuck.
What is the Difference Between Data Management and Data Governance?
6:23
If I could give advice to myself when starting as a data engineer
11:14
Heartwarming moment as priest rescues ceremony with kindness #shorts
00:33
Fabiosa Best Lifehacks
Рет қаралды 38 МЛН
Sigma Kid Hair #funny #sigma #comedy
00:33
CRAZY GREAPA
Рет қаралды 32 МЛН
New model rc bird unboxing and testing
00:10
Ruhul Shorts
Рет қаралды 23 МЛН
Best KFC Homemade For My Son #cooking #shorts
00:58
BANKII
Рет қаралды 56 МЛН
The Data Catalog as a Center of Gravity (Part 1)
23:45
Metaphor Data
Рет қаралды 35
What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
8:47
How I would start as a Data Consultant - if I could press Restart
27:16
What is a Data Catalog?
4:11
ness-intricity101
Рет қаралды 41 М.
Why Everyone Cares About Snowflake
11:41
Seattle Data Guy
Рет қаралды 107 М.
8 Essential Data Catalog Use Cases for Data Leaders
8:39
Atlan
Рет қаралды 1,4 М.
19 июля 2024 г.
0:20
мишук круглов
Рет қаралды 4,6 МЛН
Спит с ОТКРЫТЫМИ ГЛАЗАМИ! 😱😴
0:25
Взрывная История
Рет қаралды 5 МЛН
КАЧЕЛИ ИЗ АРБУЗА #юмор #cat #топ
0:33
Лайки Like
Рет қаралды 3,7 МЛН
😱ВСЕМ БЫ ТАКИЕ СТАЛЬНЫЕ НЕРВЫ
0:18
MEXANIK_CHANNEL
Рет қаралды 7 МЛН
Спит с ОТКРЫТЫМИ ГЛАЗАМИ! 😱😴
0:25
Взрывная История
Рет қаралды 5 МЛН