Core Databricks: Understand the Hive Metastore

  Рет қаралды 16,885

Bryan Cafferky

Bryan Cafferky

Күн бұрын

A core part of the Databricks ecosystem is the Hive Metastore which enables Spark SQL. But how does Hive work and how do you use it? How does Hive relate to the new Unity Catalog? Join me as I answer these questions and more.
Support Me on Patreon Community and Watch this Video without Ads!
www.patreon.co...
Link to slides, data, and code (Databricks Notebook in dbc format):
github.com/bca...

Пікірлер: 41
@haseebjehangir3249
@haseebjehangir3249 Жыл бұрын
Finally a video on databricks hive metastore which is well explained, thanks Bryan
@andrewpotts9948
@andrewpotts9948 4 ай бұрын
That's the right level of detail that I needed. Well explained. Thank you.
@BryanCafferky
@BryanCafferky 4 ай бұрын
You're Welcome!
@soumyavema6515
@soumyavema6515 Жыл бұрын
Pretty clear ...very much needed before exploring Unity catalog ....Waiting for the next
@daminimohite3400
@daminimohite3400 2 ай бұрын
super clear explanation, loved the analogy used in the beginning
@BryanCafferky
@BryanCafferky 2 ай бұрын
Thank You!
@JLRocco43
@JLRocco43 Жыл бұрын
I was just pondering on doing a deep dive in this today and reading a lot of docs and then you put out the video 😂 awesome work Bryan!
@YiminWei-z6w
@YiminWei-z6w 3 ай бұрын
great explanation. Thanks!
@kvin007
@kvin007 Жыл бұрын
Love the direct and clear content! Keep it going!
@martalopezjurado
@martalopezjurado Жыл бұрын
I love this video!! thanks a lot. Waiting for the unity catalog video!
@BryanCafferky
@BryanCafferky Жыл бұрын
YW.
@devigugan
@devigugan Ай бұрын
Excellent narrative ❤❤❤
@mehulkhare8278
@mehulkhare8278 7 ай бұрын
Thanks for making it simple to understand.
@BryanCafferky
@BryanCafferky 7 ай бұрын
You're Welcome! Glad it helped.
@danhai7276
@danhai7276 Жыл бұрын
Great video, waiting for the next one unity catalog.🙌
@BryanCafferky
@BryanCafferky Жыл бұрын
Yeah. There's a lot to Unity Catalog. Also doing Databricks AI Assistant which is very cool.
@joshuawagner5350
@joshuawagner5350 4 ай бұрын
Exceptional explanation. Thank you.
@BryanCafferky
@BryanCafferky 4 ай бұрын
Glad it was helpful.
@sujitunim
@sujitunim Жыл бұрын
Thanks Bryan for this amazing session
@BryanCafferky
@BryanCafferky Жыл бұрын
YW
@renegade_of_funk
@renegade_of_funk Жыл бұрын
You’re doing the Lord’s work. 👌
@nargesrokni6348
@nargesrokni6348 Жыл бұрын
very good explanation, thank you very much man
@BryanCafferky
@BryanCafferky Жыл бұрын
YW
@rabeMa
@rabeMa 9 ай бұрын
Deadly clear, awesome 👌👌👌💯💯💯
@naveenagrawal_nice
@naveenagrawal_nice 2 ай бұрын
Loved it
@etianemarcelino5706
@etianemarcelino5706 Жыл бұрын
Great content... Like always
@ngneerin
@ngneerin 11 ай бұрын
This gave real good idea
@CaponordRevHappy
@CaponordRevHappy 9 ай бұрын
Superb! thank you.
@BryanCafferky
@BryanCafferky 9 ай бұрын
You're Welcome!
@GhernieM
@GhernieM 3 ай бұрын
Hey Bryan, do you plan to create something about Unity Catalog?
@pal3201
@pal3201 10 ай бұрын
Can you tell us when are you releasing your take on Unity Catalog ? Looking forward to it.
@BryanCafferky
@BryanCafferky 10 ай бұрын
So many things to cover these days. Hopefully, soon. Thanks!
@benjaminwootton
@benjaminwootton Жыл бұрын
Good video. Though I understand Hive Metastore, it confuses me why everything in data has a dependency on it. For instance, Iceberg seems to need it for everything even though it’s supposed to be a self describing table format.
@BryanCafferky
@BryanCafferky Жыл бұрын
Technically, you don't need the Hive metastore to read Delta tables. But it provides a look up to where the table is physically stored. Otherwise, you need to provide the full path to the storage location. It also stores schemas for files that don't have built-in schemas like CSV and Text files.
@awadelrahman
@awadelrahman 2 ай бұрын
Thanks A LOT! One question: at 17:05; did you mean "Delta Files" instead of "Delta tables" ? when you said "Detla tables are rather interesting ...."
@BryanCafferky
@BryanCafferky 2 ай бұрын
Just that a Delta file is really a Delta Table that has not been cataloged in the Hive Metastore or the Unity Catalog. But that just by pointing to the Delta file path, you can use as a table.
@ravinarang6865
@ravinarang6865 6 ай бұрын
Very Good.
@Kete-Dude
@Kete-Dude 2 ай бұрын
have some confused about unmanaged and managed, in the step `create delta table that stored in hive` the type of dimgeography is Managed but it still can drop by not get rid of the physical files like Unmanaged(External), so what's the difference point of it?
@BryanCafferky
@BryanCafferky 2 ай бұрын
Yes. It is confusing. Think of a managed table as being like a SQL Server table if that helps. SQL Server tables are created and dropped with all the data via a DROP TABLE statement. Spark supports similar functionality for Managed tables in which the table schema and underlying data are created at the same time. This is to mimic SQL database type of functionality. Unmanaged tables are when you already have an external file and you create a schema defining the columns names and types describing the table so Spark can allow you to use SQL queries against it. Since the file pre-exists and is maintained separately from the Hive Metastore or Unity Catalog, you don't want the physical file deleted when you issue a SQL DROP TABLE statement. Bottom line: if you want the table to be treated just like an RDBMS would treat it, i.e. catalog entry and physical data handled via SQL, you want Managed. If you want to use SQL queries against a pre-existing data file, you want to define it as Unmanaged. Make sense?
@jbab9618
@jbab9618 7 ай бұрын
Hi @BryanCafferky if CSV file meta data is change then hive metastore automatically update metadata in hive store, is it right else we can do any steps for refresh metadata ?
@BryanCafferky
@BryanCafferky 7 ай бұрын
A Hive table definition over a CSV file is read only and to get the meta data reloaded, I believe you would need to drop and re-create the table.
Scale Up Your Databricks Coding with Databricks AI Assistant
31:39
Bryan Cafferky
Рет қаралды 2,7 М.
Data Lakehouse: An Introduction
25:00
Bryan Cafferky
Рет қаралды 20 М.
Good teacher wows kids with practical examples #shorts
00:32
I migliori trucchetti di Fabiosa
Рет қаралды 6 МЛН
From Small To Giant Pop Corn #katebrush #funny #shorts
00:17
Kate Brush
Рет қаралды 72 МЛН
А ВЫ ЛЮБИТЕ ШКОЛУ?? #shorts
00:20
Паша Осадчий
Рет қаралды 10 МЛН
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 122 МЛН
Why Databricks Delta Live Tables?
16:43
Bryan Cafferky
Рет қаралды 17 М.
Databricks Unity Catalog: A Technical Overview
17:29
Pathfinder Analytics
Рет қаралды 28 М.
Building an Open Data Lake House Using Trino and Apache Iceberg
47:06
Data Science Connect
Рет қаралды 6 М.
Advancing Spark - Setting up Databricks Unity Catalog Environments
21:21
Advancing Analytics
Рет қаралды 18 М.
Advancing Spark - Working with Hive
20:15
Advancing Analytics
Рет қаралды 11 М.
Understanding Delta Lake - The Heart of the Data Lakehouse
19:26
Bryan Cafferky
Рет қаралды 7 М.
Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction
32:23
Migration from Hive Metastore to Unity Catalog - 2023.08.30
1:02:14
Stephanie Rivera
Рет қаралды 4,6 М.
Making Apache Spark™ Better with Delta Lake
58:10
Databricks
Рет қаралды 177 М.
Good teacher wows kids with practical examples #shorts
00:32
I migliori trucchetti di Fabiosa
Рет қаралды 6 МЛН