Azure Synapse Analytics - Parquet, Partitions & PowerBI with SQL On Demand!

  Рет қаралды 12,488

Advancing Analytics

Advancing Analytics

Күн бұрын

Пікірлер: 35
4 жыл бұрын
From a non-native English speaker, I'd love for you to get a decent microphone to improve your audio. Other than that, thanks for letting the world know how great Synapse is.
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
Thanks Gerardo, we are working on it. Thanks for watching and for the feedback.
@platanin2003
@platanin2003 3 жыл бұрын
You can also turn CC in Spanish and voila! you can watch whatever you want in KZbin translated to you
@jugnu1234
@jugnu1234 4 жыл бұрын
Very nice video Simon, once SQL on-demand started supporting delta format it will be much easier to directly expose merged/enriched data form data lake (via hive/directquery) instead of loading it into Azure SQL DW first (for most of cases), what do you think?
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
Absolutely - once Delta support is in, we can do scalable processing in spark, land it properly in Delta tables, then query directly from SQL On Demand without having to move the data anywhere. There's a cost balance to work out, but it certainly opens up a lot of potential solutions that minimise data movement. Also - SQL OD on top of Delta tables that are ingesting a real-time stream, worth investigating when enabled! Simon
@leoafurlongiv
@leoafurlongiv 4 жыл бұрын
@@AdvancingAnalytics it seems like the vanilla Spark pools are noticeably slower as compared to Databricks. What have you seen so far?
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
Hey Leo - apologies, missed this one originally! So I've not done any deep performance comparisons (feels a little unfair given it's still early in the preview for Synapse!) but yeah, I've generally found that Synapse pools are quicker to spin up than Databricks, but they seem to take a little longer to execute. We don't have quite the same diagnostic tools to dig into it, but I'll make a note that a like-for-like performance showdown could be interesting! Simon
@SonPham-zy2zp
@SonPham-zy2zp Жыл бұрын
Hi, Is it possible to query your taxi View created in SQL on demand in a spark notebook?. it does not seem to work for me. Do you have any ideas why?
@zycbrasil2618
@zycbrasil2618 3 жыл бұрын
Hi Simon. How to retrieve (include) the column pickupMonth when reading from partitions? Other option than option("basePath", path)?
@josephcarrier4900
@josephcarrier4900 2 жыл бұрын
Can you save the results from SQL Scripts in a lake or do you have to import on to a local device?
@AdvancingAnalytics
@AdvancingAnalytics 2 жыл бұрын
You'd need to write it to a table to save the query results. You can do that as a CREATE EXTERNAL TABLE AS SELECT command. Docs here: docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-cetas
@falgunoza111
@falgunoza111 4 жыл бұрын
Another great video Simon. Any of your videos explain delta tables in detail? I am struggling to find the good material on it.
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
Hey! Some of the earlier Synapse videos focus on comparing Delta between Synapse and Databricks, this covers some of the major functionality (merging, vacuum, optimise etc). I don't think I have a pure Delta overview knocking around - if that would be useful, I can add it to the list! Simon
@falgunoza111
@falgunoza111 4 жыл бұрын
@@AdvancingAnalytics It'll be extremely helpful. I am not able to understand what could be the best way to keep the data latest. If we get file with value 1 and tomorrow new file with updated value2 do we keep overwritten file or delta table will keep both records etc etc.
@vinayrana4664
@vinayrana4664 3 жыл бұрын
Will it cost me some money if I am adding power bi In my synapse workspace?
@vap72a25
@vap72a25 3 жыл бұрын
Another Great video.. now I want to see if Dedicated pools work the same on Delta.
@axelvulsteke1444
@axelvulsteke1444 4 жыл бұрын
Hi Simon! Your videos are really interesting! I'm used to databricks and delta a couple of years now but the delta tables are really needed to be readable by the SQL-on-demand. Do you have any idea when this will be possible?
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
Hey, thanks for watching! I agree, it's still a big missing gap that delta is not readable by SQL On-Demand. I'm hoping it's a feature they manage to implement before Synapse Workspace goes Generally Available, but I don't have any timescales I can share on that front!
@axelvulsteke1444
@axelvulsteke1444 4 жыл бұрын
@@AdvancingAnalytics Thanks for your fast answer. I assume that MS is also aware of this missing gap! Keep up the good work.
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
@@axelvulsteke1444 Yep, they're definitely aware!
@courtneyh1533
@courtneyh1533 3 жыл бұрын
Hi Simon, thank you for creating these videos. Thanks to your content I have been able to confidently use Azure Synapse Analytics. I was wondering if you knew of a way to interface with a Azure Synapse Analytics Spark Cluster and it's database/tables through Sql Server Management Studio?
@AdvancingAnalytics
@AdvancingAnalytics 3 жыл бұрын
Hey Chaed! The only way to do it currently is to use Serverless SQL as an intermediary. If the tables you create in Spark are parquet, they will be visible (via a metadata replica) to the Serverless SQL side which can be queried from Management Studio. If it's other types of hive table (Delta, Avro etc) then that won't work unfortunately! Simon
@NealAmin
@NealAmin 2 жыл бұрын
Superbly useful. Thank you
@mzhukovs
@mzhukovs 3 жыл бұрын
Great stuff
@jordanfox470
@jordanfox470 4 жыл бұрын
Is there a way to write a pandas data frame back to the data lake?
@AdvancingAnalytics
@AdvancingAnalytics 4 жыл бұрын
Yep, you can use spark.createDataFrame(pandasdf) to take a pandas dataframe "pandasdf" and convert it to a spark dataframe, which you can then write out as usual. If you're dealing with huge dataframes, this might be fairly inefficient, so you'd want to switch and use Koalas (spark-friendly pandas) or just dataframes directly! Simon
@jordanfox470
@jordanfox470 4 жыл бұрын
@@AdvancingAnalytics I'm fetching data from a rest api and using json_normalize and wasn't getting the same results when I tried to use sparks explode
@johndavies4758
@johndavies4758 2 жыл бұрын
Great video Simon, keep them coming. For some reason there is a lot of echo on the video, like you don't have enough soft furnishings or sounds deadening material in your studio.
@YT-yt-yt-3
@YT-yt-yt-3 2 жыл бұрын
Good content you are doing in your channel however audio quality is poor(most of your videos) and making it hard to follow perhaps for non-native english audience.
@AdvancingAnalytics
@AdvancingAnalytics 2 жыл бұрын
All the vids from the first 4-5 months are pretty bad sound quality. If you check the latest vids, they should be a lot better & the subtitles a lot more accurate!
Azure Synapse Analytics - Power BI Integration & Performance
24:56
Advancing Analytics
Рет қаралды 13 М.
Azure Synapse Analytics - Does Delta Merge work?
18:22
Advancing Analytics
Рет қаралды 8 М.
РОДИТЕЛИ НА ШКОЛЬНОМ ПРАЗДНИКЕ
01:00
SIDELNIKOVVV
Рет қаралды 3 МЛН
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 120 МЛН
pumpkins #shorts
00:39
Mr DegrEE
Рет қаралды 57 МЛН
Azure Synapse Analytics - The first 20 minutes!
30:13
Advancing Analytics
Рет қаралды 26 М.
Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!
18:00
Efficient Power BI Dataset refresh with Azure Synapse Analytics Serverless SQL Pool
15:06
Advancing Spark - Rethinking ETL with Databricks Autoloader
21:09
Advancing Analytics
Рет қаралды 26 М.
Azure Synapse Serverless vs Dedicated SQL Pool
10:25
Guy in a Cube
Рет қаралды 25 М.
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52
Azure Synapse Analytics - How does Delta Lake compare to Databricks?
20:49
Advancing Analytics
Рет қаралды 12 М.
Azure Synapse Analytics - Getting Started with SQL On Demand
19:53
Advancing Analytics
Рет қаралды 7 М.
РОДИТЕЛИ НА ШКОЛЬНОМ ПРАЗДНИКЕ
01:00
SIDELNIKOVVV
Рет қаралды 3 МЛН