How Do you Size Your Azure Databricks Clusters? Cluster Sizing Advice & Guidance in Azure Databricks

Рет қаралды 21,741

Advancing Analytics

Күн бұрын

Пікірлер: 10

@RaghavC20 3 жыл бұрын

Thanks for making short and useful video

@diogodallorto1 4 жыл бұрын

Really good class! Congratulations and thank you! You could make some class about catalyst-optimizer in Spark. Nobody explains it on youtube!

@muritech 3 жыл бұрын

Great video! In your opinion is it best to have one High Concurrency cluster shared among a few analysts (heavy Pandas users) or one small machine per user? I'm worried that even with a High Concurrency setup, I might end up only sharing the Driver capacity among the Data Analysts.

@AdvancingAnalytics 3 жыл бұрын

With a cluster-per-user you end up paying way more as you're more likely to have under-utilised clusters and you're paying for a driver each time. Having one HC cluster means it has more power for any sudden spikes of heavy usage, can fit concurrent queries together to fully utilise the cluster, and only has the single driver. So from a cost perspective, definitely shared. One note on your users - make sure they're using Koalas over Pandas where possible to ensure they're getting the best scalability out of spark! Simon

@joyo2122 Жыл бұрын

can you do a follow up on this video many things changed by now

@ikernarbaiza2138 Жыл бұрын

how does the pricing of the clusters works? or where could I find that information

@NasimaKhatun-jb7qo Жыл бұрын

I see databricks is good for large dataset, what about data processing for few kbs. How it behaves in such scenerio

@AdvancingAnalytics Жыл бұрын

It'll work, but there's always a small overhead for parallelism. So you'll find it slower than a traditional database for working with very small data, just because of that! Otherwise, it works fine, we often have very small datasets being processed alongside some huge ones!

@Sangeethsasidharanak 3 жыл бұрын

6.13 size of driver..could you please explain how largest dataset returned matter to ditermine the driver size.. because unless we call collect() executor will write to destination right?