3.2 - Externalize Hive Meta-store with Cloud SQL | Apache Spark on Dataproc | Google Cloud Series

  Рет қаралды 6,154

Sushil Kumar

Sushil Kumar

Күн бұрын

In this video, we'll see how we can externalize the Hive metastore by using a Cloud SQL based MySql Instance in GCP. We will achieve complete Compute-Storage Isolation by moving both metadata and data out of the cluster.
This video is part of the course Apache Spark on Dataproc. You can find all the videos for this course in the following playlist.
• Apache Spark on Datapr...
I regularly blog and post on my other social media channels as well, so do make sure to follow me there as well.
Medium : / sushil_kumar
Linkedin : / sushilkumar93
Github : github.com/kay...

Пікірлер: 12
@shefalishrivastava8950
@shefalishrivastava8950 5 ай бұрын
These videos are literally SAVING my life, thank you!
@javedmohammad6404
@javedmohammad6404 3 жыл бұрын
Great job Sushil
@balakrishnanramachandran8828
@balakrishnanramachandran8828 2 жыл бұрын
great , to point the video .
@balakrishnanramachandran8828
@balakrishnanramachandran8828 2 жыл бұрын
* to the point
@BriteRoy
@BriteRoy Жыл бұрын
Question : what wil be the use-case of such a scenario where we are separating computing and storage and later deleting the dataproc cluster. The example you showed here is just for a one time job. What about a daliy batch processing where files are coming to GCS on a daily basis and pyspark job is required to compute. How will be the cluster availablility be managed in such a case ?
@limeraghu579
@limeraghu579 2 жыл бұрын
This is again great , can you show how to access that data using spark sql shell
@SunnyG9
@SunnyG9 Жыл бұрын
it errors out - Service Hive server2 is not enabled ... localhost port 3306 (tcp) failed: Connection refused ... dataproc cluster is created but not able to connect to Hive metastore ...
@DivyanandSharma-l8d
@DivyanandSharma-l8d Жыл бұрын
Hi shushil, getting this error while creating shared dataproc cluster. There was no instance found at projects/projectid/instances/hive-mysql or you are not authorized to access it. I have enabled the cloudsql admin api
@rutujapabale5080
@rutujapabale5080 Жыл бұрын
Same pinch
@rutujapabale5080
@rutujapabale5080 Жыл бұрын
could you help me if you could solve it?
@loke261989
@loke261989 2 жыл бұрын
U created a mysql for metadata, by default will the data also be stored in cloud storage instead of hdfs?? Maybe bcs of ur wharehouse.dir property I guess, pls clarify
@kaysush
@kaysush 2 жыл бұрын
Hey, no for every hive database you create, you'll have to set the LOCATION property to `gs://your-bucket/` to do that. Otherwise default will still be HDFS.
OYUNCAK MİKROFON İLE TRAFİK LAMBASINI DEĞİŞTİRDİ 😱
00:17
Melih Taşçı
Рет қаралды 13 МЛН
Core Databricks: Understand the Hive Metastore
22:12
Bryan Cafferky
Рет қаралды 16 М.
Run Apache Spark jobs on serverless Dataproc
30:18
PracticalGCP
Рет қаралды 4,5 М.
GCP Cloud SQL Introduction | Create MySQL Instance  and connect it from Cloud Shell
16:29
Anjan GCP Data Engineering
Рет қаралды 1,8 М.
Why Vertical LLM Agents Are The New $1 Billion SaaS Opportunities
37:06
Google Cloud Tutorial - Hadoop | Spark Multinode Cluster | DataProc
13:05
Creating First Function | Cloud Functions | Google Cloud Series
13:12
Google DataProc PySpark Hands on Lab for beginners
7:47
SkillCurb
Рет қаралды 6 М.
Spark ETL with HIVE
14:59
Developer's Home
Рет қаралды 925
Postgres just got even faster
26:42
Hussein Nasser
Рет қаралды 33 М.
OYUNCAK MİKROFON İLE TRAFİK LAMBASINI DEĞİŞTİRDİ 😱
00:17
Melih Taşçı
Рет қаралды 13 МЛН