Synapse Espresso: Partitioning

Рет қаралды 3,837

Azure Synapse Analytics

Күн бұрын

Пікірлер: 15

@paulhernandezgermany 2 жыл бұрын

Great video, I was not aware of the filepath function.

@avikbasu8943 2 жыл бұрын

It would be very helpful if you have a session regarding the Delta table partitions and fetching data through the Serverless SQL pool and creation of External Table over partitioned Delta tables.

@stijnwynants7307 2 жыл бұрын

Hi Avik! We will definitely put it on the list! We will be handling other parts of the synapse stack in the near future!

@avikbasu8943 2 жыл бұрын

@@stijnwynants7307 Thank you !! Would be looking forward to an episode on External Tables over partitioned Delta Tables. 👍

@godhasowjanyakandrakota4642 Жыл бұрын

Can the partitioning be done only based on Range values. Is it possible to partition a table based on if partition column is equal to a value, then all the rows go into that partition. Example: market_id ={1,2,3,4,5} data with mkt_id = 1 in one partition, data with mkt_id = 2 in another partition and so on. Is this scenario possible with synapse tables

@germanareta7267 2 жыл бұрын

Hi, i am interested in the Spark way of generate partitioning tables. Great video.

@stijnwynants7307 2 жыл бұрын

Hi! We are planning a video on this as well! Stay tuned!

@AzureSynapse 2 жыл бұрын

You can find a simple example of using Spark to partition data in the data lake in the second episode of our Azure Synapse + Power BI Datamart series: kzbin.info/www/bejne/jn7CZpaobd2SeLs. Around 5'00'' Pawel demonstrates a piece of PySpark code writing partitioned dataframe to the lake.

@germanareta7267 2 жыл бұрын

Thanks.

@KnowsomeLife Ай бұрын

I thought partitioning is logical division of data but according to this video as we have files of each partition in different folder seems like its a actually a physical partition of data. am I correct ?

@rezcan 2 жыл бұрын

Is spark the only way to generate partition tables? Can I use sql and ADF to generate partition tables?

@stijnwynants7307 2 жыл бұрын

Hi Reza, you will need to initialize the tables in Spark, you can use the multiple languages you have available in the synapse notebooks. If you are more used to SQL , you can take a look at sparkSQL. If you want to know how to initialize them check out this video with Patrick kzbin.info/www/bejne/aZbJamtrlpWJm8k

@BernardoRomeroC 2 жыл бұрын

I see the advantages of partitioning by date, when your queries are date-related. But is it also more efficient if I run queries on other attributes?

@stijnwynants7307 2 жыл бұрын

Hi Bernardo, As such the partitioning column will perform data elimination on date (If your key is on date). Other queries which do not use the date key will not benefit from the partitioning. You could add another partitioning layer below, or you could use the Optimize Z-Order to structure those files and benefit from other attributes as well. (We will be doing a video on z-Order soon.