Microsoft Fabric: Incremental ETL for Warehouse using Dataflow Gen 2, SQL Procedure, Data Pipeline

Рет қаралды 7,971

Learn Microsoft Fabric, Power BI, SQL Amit Chandak

Күн бұрын

Пікірлер: 45

@zhiyingwang1234 Ай бұрын

Hi Amit, you may not know me, but I have encountered you so many times in Power BI Community. I started to learn Fabric recently and suddenly you appear everywhere when I search for Fabric tutorials. What a happy reunion !

@AmitChandak Ай бұрын

Hi there! Thanks for the kind words! 😊 I'm glad to hear we've crossed paths in the Power BI Community and now with Fabric too! Welcome to the Fabric journey - happy to be part of your learning experience!

@MonkXD Жыл бұрын

My friend, i would like to thanks for all the effort and time that you've been dedicating for this series. Thanks!

@AmitChandak Жыл бұрын

Thanks 🙏 Hope you will like the full series Mastering Microsoft Fabric 35+ Videos: kzbin.info/www/bejne/pl7ZYXxriJKsmNU

@arthurvanrijn68 Жыл бұрын

Thank you Amit, very nice done. Fabric is the future 😎

@AmitChandak Жыл бұрын

Thank you for your kind words! I completely agree with you. Microsoft Fabric is an incredible technology that holds immense potential for the future. Its innovative features and capabilities make it a game-changer in the industry. I'm excited to see how it will continue to evolve and shape the future of computing. Kudos to the Microsoft team for their outstanding work! Hope you will like the full series kzbin.info/www/bejne/pl7ZYXxriJKsmNU

@shafa7668 Жыл бұрын

Fantastic. Exactly the vedio I needed. I highly appreciate you Amit for doing this.

@AmitChandak Жыл бұрын

I'm glad to hear that the video was exactly what you needed! Microsoft Fabric is truly an impressive technology, and I'm happy that the video provided you with the information you were looking for. Thank you for sharing your positive feedback! One more is coming soon - Incremental for Warehouse using only data pipeline

@Shreekanthsharma-t6x 6 ай бұрын

Hi Amit, Good Morning!, I have to convert the existing SQL server stored procedure into fabric environment, In my stored procedures there are CURSOR commands but fabric doesn't support CURSOR commands, in this case how do I proceed, is there any alternative.

@AmitChandak 6 ай бұрын

I have tried something very similar, TSQL loops learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-loops Search for Azure Synapse solution without using cursor

@ashi145 5 ай бұрын

Hi Amit - in this demo, you basically pulling all data from source and then doing lookup with help of fact table to identify which needs modification. can't you use proc from source which has data parameter to passed in so it only return records after last data sync?

@AmitChandak 5 ай бұрын

There are many ways you can do this. In some cases, I have even used a SQL statement to get the max date using a SQL Server connection (for Lakehouse and Warehouse-).

@kranthikumar-el5lz 8 ай бұрын

Thanks so much. Please make more videos on Dataflow gen2

@AmitChandak 8 ай бұрын

Sure. Hope you will like more video in the series Mastering Microsoft Fabric 40+ Videos: kzbin.info/www/bejne/pl7ZYXxriJKsmNU

@trivediasit Жыл бұрын

Thanks for the detailed explanation here. What would happen in case there is no incremental data? For example no new sales orders over the weekend for namesake. It'll be great if you can share the process to handle that.

@AmitChandak Жыл бұрын

If there is no data, then add an additional condition , update only when date is not null or use coalesce. If the case is if there no new data coming in.

@sukumarm5926 7 ай бұрын

Hi , Is it possible to use where clause in Copy data activity from ADF source as Microsoft Fabric , SQL Analytics Endpoint?

@AmitChandak 7 ай бұрын

Please check the Incremental load video, I have given an example to use where the clause Microsoft Fabric: Incremental ETL for Warehouse using Data Pipeline, SQL Procedure: kzbin.info/www/bejne/p6SyepmKr892ibM Microsoft Fabric: Incremental ETL for Warehouse using Dataflow Gen 2, SQL Procedure, Data Pipeline: kzbin.info/www/bejne/o6Gpg6GQd72mrLM #microsoftfabric- How to use Append data option of Data Flow Gen 2 and Data Pipeline| Incremental kzbin.info/www/bejne/e2bEqYtsltp1hJY Microsoft Fabric: How to append only Incremental data using Data Pipeline in Lakehouse: kzbin.info/www/bejne/rqvLnpSHo6dlnK8

@sanishthomas2858 7 ай бұрын

Good. But PostRSQL is on Premise right so how we have used the self hosted IR or Data ways to access it?

@AmitChandak 7 ай бұрын

You can use on-premise gateway Microsoft Fabric: Load local SQL server data in Lakehouse using on-premise Gateway| Dataflow Gen2 - kzbin.info/www/bejne/pXapXp1-i9KpmtE Now Integrate On-Premise Data with Microsoft Fabric Using Data Pipelines: Gateway Update- kzbin.info/www/bejne/qYfIgamQhM-npdE

@adilmajeed8439 Жыл бұрын

Thanks again for such a nice blog, is it possible for you to tag the numbers on the videos so that it will be easy to follow instead of looking at the publishibg date. Thanks for understanding.

@AmitChandak Жыл бұрын

Please follow the series link. I will see if, I can rename and add number kzbin.info/www/bejne/pl7ZYXxriJKsmNU

@ADhuidv 6 ай бұрын

Sir, How can we build the JDBC/Pyodbc connection between Fabric Data warehouse and Fabric Notebook. I have been finding it since a long time, but un-successful

@AmitChandak 6 ай бұрын

I have not tried on notebook, but I have tried from local Python to Warehouse. Check if that code can help Microsoft Fabric: Load local SQL server data in Warehouse using Python, Pandas, and sqlalchemy - kzbin.info/www/bejne/hmHSXpRjbdd5fpI

@ADhuidv 6 ай бұрын

❤❤ It will definitely solve my scenario

@ADhuidv 6 ай бұрын

Thank you so much sir 😊😊

@JoanSCruz 10 ай бұрын

Hi Amit, sorry if it looks as I'm not doing the proper effort but I quite don't get how you can use DML in the SQL endpoint, whenever I try to use insert, update, delete, etc, I get an error that those commands are not supported, I don't get what I'm doing wrong. On the other hand I really appreciate all your material.

@AmitChandak 10 ай бұрын

Hi Joan, Hope you are using that on the Warehouse SQL endpoint, Not the Lakehouse. On Lakehouse you can run using Spark SQL - Managing Microsoft Fabric Lakehouse using Spark SQL: Data Manipulation Insert, Update, Delete, Alter- kzbin.info/www/bejne/hqbRfJWwac52e8k

@JoanSCruz 9 ай бұрын

@@AmitChandak you're so kind, yes I kept watching your videos and found out the issue was exactly that, I was using lakehouse. Thanks, your videos are excellent

@ilseespinobarros2418 10 ай бұрын

Hi Amit, do you know if it is possible to connect directly to a Pipeline and use the upsert option for an SQL server, I have tried it but I can not connect to the Server. I have connected succesfully using Data Flows, but Pipeline gives the following error: "The server was not found or was not accessible." This would make it so much easier for incremental ETL in a SQL Server but I think the option is still not available in Fabric.

@AmitChandak 10 ай бұрын

Hi Ilse, It seems like you're encountering issues with connecting directly to an on-premise SQL Server using Pipeline and encountering the error "The server was not found or was not accessible." If you're using an on-premise SQL Server, it requires a gateway to establish a connection. Currently, Dataflow Gen 2 is the only option supporting on-premise sources. Unfortunately, the option to directly connect via Pipeline might not be available in Fabric at the moment. Thanks, Amit

@Kiran-eb3rm 10 ай бұрын

Hi Amit, One question what if we update some records will they also come, or only newly added records will come. Thanks!

@AmitChandak 9 ай бұрын

Depending on the type of incremental load you run, if you use the update date as the basis, it will bring in both new and previously updated data, which you will need to handle in the ETL process.

@lakshmipriyakrishnan2727 Жыл бұрын

I am trying to call a dataflow and execute it from a pipeline, but I get a 'User Configuration issue' error. Any idea how to fix this?

@AmitChandak Жыл бұрын

I am able to call dataflow gen2 from pipeline. Tested few mins back. Please provide more details. to reproduce.

@huongbui4376 Жыл бұрын

I'm wondering, your PosgrestSQL database is on local. So how you can connect Fabric without using a gate way?

@AmitChandak Жыл бұрын

No, It is on cloud, I have used an IP.

@huongbui4376 Жыл бұрын

@@AmitChandak I tried again by using PosgresSQL on cloud and successfully ran each step: Dataflow and Stored procedure. However, when I created a pipeline with these 2 steps like your video. I got fail when ran Dataflow in this pipeline. Could you please explain me the reasons? The details of the error: Error code: 20302 Failure type: User configuration issue Details: Microsoft.DataTransfer.Execution.Core.ExecutionException: Refresh Dataflow failed with status: NotFound message: Not Found

@GudhawalaLadka Жыл бұрын

Can this process handle millions of data. I have a table which has 13 million rows which keeps of increasing every month. I am not able to use incremental refresh on Dataflows because of the large data. I am being hopeful if it can work out

@AmitChandak Жыл бұрын

I successfully managed to load data of up to 4 million records without incremental load. Let me to share a method to achieve this. Create a table that has two dates: "Start Date" and "End Date." Customize the date range to correspond with either a year or a month, depending on your specific requirements. After each loading cycle, once the data processing concludes, adjust the "Start Date" and "End Date" to accommodate the subsequent year or month, as dictated by your business needs. This can be achieved by performing an update within a procedure. Execute this process multiple times or create a loop within a data pipeline to ensure a seamless and efficient data loading operation.

@zhiyingwang1234 Ай бұрын

The tutorial uses sql commands in data warehouse to create/alter tables, I can only load data into lakehouse via copy data. Dataflow Gen2 won't work due to firewall setup. How can I implement incremental loading in data lakehouse? I understand one can't change data in sql endpoint of a lakehouse.

@AmitChandak Ай бұрын

@zhiyingwang1234 , You can use Spark SQL, or PySpark. In spark sql try merge spark.sql(""" MERGE INTO target_table AS target USING source_table AS source ON target.id = source.id WHEN MATCHED THEN UPDATE SET target.value = source.value WHEN NOT MATCHED THEN INSERT (id, value) VALUES (source.id, source.value) """)

@mohammedmanna3488 Жыл бұрын

Thank you Mr.Amit, Great job sir May be aske you a question .MS Fabric , Power bi and Ms Sql server in my laptop The data base which I worked with including more than 4000 tables Actually it is for Jdedwards Enterprisepoweri it is called JDEData and my server name called for example Localhost since last 2 weeks,i tried to connect to my local host but no way icant connect Notic sir i connect to power query by just giving the server name and database name and simple select statement for the table and the fileds which ineed by the way i never work with power bi service before can you please guide me to connect to the my local host throw what ever data flow G 2 or pipeline or what ever in Fabric lakehouse or any thing I am very sorry my comment is very tall but Itried hundreds time but i could not doing and also iam not network experience i again iam sory and thank you

@AmitChandak Жыл бұрын

To connect with Local, first you need install on-premise gateway and in dataflow you have use that gateway, while creating connection. Last time when I tried the flow on Fabric, it was giving some issue. And reported the same. I will check and get back on this.

@AmitChandak Жыл бұрын

I am able to complete flow using local SQL server and latest on-premise gateway -kzbin.info/www/bejne/pXapXp1-i9KpmtE