6.7 Decide Number Of Buckets in Hive and spark | Partition and Bucketing

  Рет қаралды 11,984

Data Savvy

Data Savvy

Күн бұрын

Пікірлер: 26
@gauravmathur56
@gauravmathur56 6 жыл бұрын
Awesome information... thank you harjeet for the great insights.....
@DataSavvy
@DataSavvy 6 жыл бұрын
Thanks Gaurav
@diptiranjannayak5168
@diptiranjannayak5168 6 жыл бұрын
How we will restrict the bucket size as per the block size dynamically? if I will mention 4 buckets and then what will happen if 1 bucket size will gradually increase above 1 GB or above? how I will achieve optimization?
@DataSavvy
@DataSavvy 6 жыл бұрын
You have to check your historical data and analyse what number makes sense for you... If you are expecting same size of data in future then just extrapolate those numbers
@r.kishorekumar1388
@r.kishorekumar1388 2 жыл бұрын
Is it possible to alter bucketted table to change number of buckets ?
@jimitshah7636
@jimitshah7636 2 жыл бұрын
no, 1) create anther table with changed bucket number, 2) insert data from old table, 3) Drop the old table, 4) Rename the new table. Also you may need to stop the jobs running on this table till you are completed with this process.
@simplecooking1323
@simplecooking1323 5 жыл бұрын
How we can decide whether we should do partition or bucketing ?
@soumyakantarath5078
@soumyakantarath5078 6 жыл бұрын
Thank you so much 😊
@routhmahesh9525
@routhmahesh9525 4 жыл бұрын
Can we create buckets on top of partitioning ...can you please explain this?
@rahulsamyal6159
@rahulsamyal6159 3 жыл бұрын
I got confused, block size is 128 mb and our memory can be 4 gb in size. In this case bucket should be 128 mb or 4 gb?
@projjalchakraborty1806
@projjalchakraborty1806 6 жыл бұрын
Can you plz explain what is Hcatalog and what is the use of it??
@DataSavvy
@DataSavvy 6 жыл бұрын
sure.. will create a video on this
@rajareddy47444
@rajareddy47444 6 жыл бұрын
Please explain about oozie...how to schedule and workflows concepts...thank u
@DataSavvy
@DataSavvy 6 жыл бұрын
Ozzie is no more considered a good scheduler... It had lot of limitations... Most of companies are moving away from this...
@rajareddy47444
@rajareddy47444 6 жыл бұрын
Data Savvy okay..can you please explain any scheduler most companies use..where we can talk about it and explain it... thank you
@akashputti
@akashputti 5 жыл бұрын
@@DataSavvy but a question is asked about this in interview
@naresh5273
@naresh5273 6 жыл бұрын
Thank you
@DataSavvy
@DataSavvy 6 жыл бұрын
Please subscribe and share it
@lokeshmvs
@lokeshmvs 6 жыл бұрын
I have a question? 1) buckets are created by writing clusters by. How we can implicitly give the number of buckets
@DataSavvy
@DataSavvy 6 жыл бұрын
Excuse me.. I could not understand your question... We give number of buckets while creating table. Are you asking how can give automatically fund out number of buckets?
@vipulx1
@vipulx1 5 жыл бұрын
Thank you :)
@divendughati6114
@divendughati6114 4 жыл бұрын
Can you please explain how can we optimize if number of buckets get way too much about 1 million?
@DataSavvy
@DataSavvy 4 жыл бұрын
Hi Divendu, can you please elaborate more on this case. Like is bucketing creating lot of small files in this case? and what is ur usecase... if bucketing is creating small files then u should decrease number of buckets while creating table
@AnkitaMishra-di9ub
@AnkitaMishra-di9ub 4 жыл бұрын
Can you please explain how to decide number of partition?
@DataSavvy
@DataSavvy 4 жыл бұрын
In hive, one partition is created per unique value... In spark it depends on no of blocks of a file
@gauravpathak7017
@gauravpathak7017 5 жыл бұрын
Harjeet-What is default no of bucket and partition??
6.8 Catalyst Optimizer | Spark Interview questions
9:53
Data Savvy
Рет қаралды 32 М.
Partition vs bucketing | Spark and Hive Interview Question
9:15
Data Savvy
Рет қаралды 101 М.
I'VE MADE A CUTE FLYING LOLLIPOP FOR MY KID #SHORTS
0:48
A Plus School
Рет қаралды 20 МЛН
«Жат бауыр» телехикаясы І 26-бөлім
52:18
Qazaqstan TV / Қазақстан Ұлттық Арнасы
Рет қаралды 434 М.
OCCUPIED #shortssprintbrasil
0:37
Natan por Aí
Рет қаралды 131 МЛН
The Lost World: Living Room Edition
0:46
Daniel LaBelle
Рет қаралды 27 МЛН
How to Decide [Bucket Count] in Hive #hive #apachehive
11:46
Data Engineering
Рет қаралды 15 М.
Hive Bucket End to End Explained
19:55
Data Engineering
Рет қаралды 30 М.
75. Databricks | Pyspark | Performance Optimization - Bucketing
22:03
Raja's Data Engineering
Рет қаралды 21 М.
Hive Internal Vs External Table
7:13
Data Engineering
Рет қаралды 33 М.
Explode and Lateral view function in Hive
7:48
RealTimeTuts
Рет қаралды 25 М.
I'VE MADE A CUTE FLYING LOLLIPOP FOR MY KID #SHORTS
0:48
A Plus School
Рет қаралды 20 МЛН