Рет қаралды 926
In part one of the Data Lake in a Day series I showed you how to set up the infrastructure for the whole workshop with a simple one click deployment using an ARM template. If you've not already done so, go and run through that video at • Data Lake in a Day 1 -... .
In part two of the series we'll go through Lab 1, this lab is designed to show you how easy it is to ingest data from a database into the data lake. Here, we connect to a SQL Server inside the corporate network using an integration runtime as a proxy. We'll then copy three tables using a tubmling window trigger to take all data over the last year or so, dumping data into one CSV file per day ready for processing on the lake. This shows an ELT process, where the load is bringing data onto a lake ready to process with a massively parallel cluster solution such as Hadoop, Databricks, or Azure Synapse.
0:00 - Introduction to the lab
2:34 - Setting up the containers in a storage account
3:12 - Install and configure the Integration Runtime
7:01 - Set up linked services in Azure Data Factory (ADF)
8:45 - Configure CSV datasets
11:17 - Configure SQL Server datasets
12:37 - Set up the copy activity and pipeline
19:01 - Set up the Tumbling Window trigger
21:08 - Results
22:49 - Recap on the architecture
23:34 - Wrap up
You can find the lab content for this video at github.com/davedoesdemos/Data...
You can find the whole one day workshop at github.com/davedoesdemos/Data... including all lab materials, data and instructions.
If you're new to data lakes please ask any questions you have below. Also please comment if you found this workshop series useful or if you'd like to see more of this kind of content.
For all of my other demos, go to davedoesdemos.com or go straight to the GitHub page at github.com/davedoesdemos/Demo.... Also please subscribe to the channel to make sure the latest demos show up in your playlist!