Рет қаралды 1,057
In this session I describe the most common file types found on data lakes, and some considerations for the folder and file structure and naming on the lake.
In part one of the Data Lake in a Day series I showed you how to set up the infrastructure for the whole workshop with a simple one click deployment using an ARM template. If you've not already done so, go and run through that video at • Data Lake in a Day 1 -... .
In part two of the series I took you through Lab 1, this lab showed you how easy it is to ingest data from a database into the data lake. See that video at • Data Lake in a Day 2 -...
Part three introduced you to what a data lake is and what it's for, and can be found at • Data Lake in a day - S...
0:00 - Introduction to the session
1:24 - Common data lake file formats - plain text (CSV, TSV)
2:51 - Common data lake file formats - structured text (JSON, XML)
4:17 - Common data lake file formats - parquet
5:42 - Common data lake file formats - AVRO
6:56 - Common data lake file formats - unstructured files
7:52 - Data lake structure
13:11 - Folder structure
16:10 - Storage Accounts
16:56 - File Naming
18:39 - Wrap up
You can find the whole one day workshop at github.com/davedoesdemos/Data... including all lab materials, data and instructions.
If you're new to data lakes please ask any questions you have below. Also please comment if you found this workshop series useful or if you'd like to see more of this kind of content.
For all of my other demos, go to davedoesdemos.com or go straight to the GitHub page at github.com/davedoesdemos/Demo.... Also please subscribe to the channel to make sure the latest demos show up in your playlist!