Amazon Redshift for Beginners (Full Course)

  Рет қаралды 22,783

ETL-SQL

ETL-SQL

Жыл бұрын

Free SQL Pattern Training: etlsql.kartra.com/page/sps-fr...
Course Transcript:
If you are absolute beginner then this course will give a good overview of the Amazon Redshift.
The goal is that after taking this course you should be comfortable in talking about Redshift. You should be able to participate in group discussions at your work place and understand solutions concerning Amazon Redshift.
We will start with the fundamentals :
Data Warehouse
MPP System
Columnar
Then we will see how these fundamentals are applicable to Amazon Redshift. We will see how parallelism is built as part of the core architecture in Redshift.
Amazon Redshift is a data warehouse offering by AWS (Amazon Web Services).
So what is a Data Warehouse ?
Data Warehouse is a system that allow users to complete 3 main tasks:
Mechanism to gather data from various sources
Provide tools to transform data and apply business logic on it
Enable business to take decisions by supporting Reports & Visualisations.
Massively Parallel Processing (MPP) system are built on mechanism of DIVIDE & CONQUER. The task is divided into multiple smaller & similar tasks by main node. The tasks are further given to delegates to complete. Once the delegates complete their tasks, they share the result with main node.
Summary:
Divide the work into smaller 'similar' tasks
individual teams work in silo to complete the task
"Main node" collate the tasks back into one output
Columnar database use different method of storing data in blocks when compared to traditional row-based storage databases. The columns are stored in same/adjacent storage blocks. This facilitates quick retrieval of data as only the blocks that store required columns are scanned and not all the blocks.
Summary:
Columns are stored in same/adjacent block
Efficient read when few columns are required
Better compression at column level
In this lesson , we will see how Amazon Redshift work as the data warehouse.
Gather data from various sources:
Export to S3 and run COPY command
JDBC connection to Source & load data into table
Amazon DataShare to bring data from another Redshift cluster
Use other services - Glue/Lambda/EMR to process and load data into Redshift
Use Lakeformation table as external table in Redshift
Apply business transformations
Allows you to run SQL on data in the tables
Can connect other AWS services like GLUE/EMR to process
Let you connect ETL tools to process data
Enable business to take decisions
Unload data into S3 bucket for downstream applications
Quicksight and other Reporting tools can connect for visualisation
Can share data via Datashare with other Redshift cluster.
Amazon Redshift architecture consists of 2 types of Nodes:
Leader Node
Compute Node
*There is a third type of node which is Spectrum Node which I will not cover as part of this beginners course.
The end-user will submit request to the Leader Node. There is one and only one leader node in the Amazon Redshift cluster. Leader node will break the task into smaller-similar tasks. These small tasks are passed to compute nodes for processing.
The compute nodes have their own memory & storage portion to complete the task. The compute nodes are divided into slices which are like "mini-computers" that actually process the data. Each compute node has at-least 1 slice depending on the node type in the redshift cluster.
Once the task is complete compute nodes sends the result back to leader node which collates all the result from different compute nodes. Once done, it passes the output to end users.
Amazon Redshift is a columnar database hence it is logically faster than many traditional RDBMS which are row-oriented for data analytics.
Stores data in columnar format
Redshift storage blocks are of 1 MB size
Multiple encoding algorithms are available like AZ64, LZO, ZSTD and more.
We now know that Amazon Redshift is a columnar database. However there is a standard manner which determines how table data is stored in the database.

Пікірлер: 49
@ETLSQL
@ETLSQL 6 ай бұрын
Did you like this video? What else do you want to learn about AWS ? Drop a comment below.
@lakshmivaishnavi7547
@lakshmivaishnavi7547 Ай бұрын
Why is no 'distkey' mentioned for other distribution styles like all and even?
@ETLSQL
@ETLSQL Ай бұрын
There are 4 options - distkey , all , even, auto. Key is applicable to only first option. Rest 3 distribution styles does not need any key for data distribution.
@arokiarajan1230
@arokiarajan1230 Жыл бұрын
Superb... Good pitch
@GouravSharma-us9yq
@GouravSharma-us9yq 7 ай бұрын
I like the way you put lesson, simple, easy and clear in understanding, Thanks, GS
@ETLSQL
@ETLSQL 7 ай бұрын
Glad you liked it 👍
@verosirvi
@verosirvi 7 ай бұрын
I like the distribution style of the content in this video and the way you chose to present it
@ETLSQL
@ETLSQL 7 ай бұрын
Glad you liked it. 👍
@Atlas-ck9vm
@Atlas-ck9vm Жыл бұрын
Very clear and concise introduction to aws redshift.
@ETLSQL
@ETLSQL Жыл бұрын
Glad you liked it
@sriharisrinivasan1307
@sriharisrinivasan1307 Жыл бұрын
Enrolled to the course, Looking forward to gr8 and Informative content as always.
@ETLSQL
@ETLSQL Жыл бұрын
Hope you liked it
@sreechivukula8115
@sreechivukula8115 8 ай бұрын
Best 30min I have spent in recent days! Add next video with more details.
@ETLSQL
@ETLSQL 8 ай бұрын
Thanks for leaving a comment. Any specific topic would you like me to cover next ?
@user-mv1dm3jj7c
@user-mv1dm3jj7c 8 ай бұрын
Hats Of you sir, keep making content like this, Clear explanation
@ETLSQL
@ETLSQL 8 ай бұрын
Glad you liked it. I remember this video took the most time I have invested in any video till date. Do you have any recommendations for next set of videos.
@aykhan.g
@aykhan.g 6 ай бұрын
Thanks for awesome video !
@ETLSQL
@ETLSQL 6 ай бұрын
Woo hoo. Thanks for the comment. 👍
@LoveisHell85
@LoveisHell85 10 ай бұрын
Very clear tutorial. Thank you
@ETLSQL
@ETLSQL 10 ай бұрын
Glad you liked it. 👍
@kolawolegabriel6558
@kolawolegabriel6558 Ай бұрын
Excellent video i have ever watched on AWS Redshift, this is the best that Explained redshift in details
@ETLSQL
@ETLSQL Ай бұрын
Glad you liked it ❤️
@gloirebeya5127
@gloirebeya5127 7 ай бұрын
Thanks for your video
@ETLSQL
@ETLSQL 7 ай бұрын
Glad you liked it 👍
@andriys5772
@andriys5772 3 ай бұрын
Thank you!
@ETLSQL
@ETLSQL 3 ай бұрын
You're welcome!
@adityaf17
@adityaf17 8 ай бұрын
Best one for Redshift!
@ETLSQL
@ETLSQL 8 ай бұрын
Glad you liked it. 👍
@AllAboutDataTechnology
@AllAboutDataTechnology 6 ай бұрын
good video, clear explanation of this topic
@ETLSQL
@ETLSQL 6 ай бұрын
Glad you liked it 👍
@ETLSQL
@ETLSQL 10 ай бұрын
If you like this video, please drop a comment to share your reaction. ❤
@SurajPatil14
@SurajPatil14 Ай бұрын
Great 👍
@ETLSQL
@ETLSQL Ай бұрын
Thanks
@aniketbahalkar223
@aniketbahalkar223 5 ай бұрын
Thanks for such nice video, please create a complete course on this.
@ETLSQL
@ETLSQL 5 ай бұрын
Hey @aniketbahalkar223 Can you suggest few topics that I shall cover in the course
@adityaf17
@adityaf17 8 ай бұрын
Hands on tutorial on Redshift will be the best one.
@ETLSQL
@ETLSQL 8 ай бұрын
Noted. I do plan to work on that one in the coming weeks.
@ETLSQL
@ETLSQL 6 ай бұрын
Hey @adityaf17 I am working on hands-on tutorial however redshift is not free and incur cost. Do you think people will be ready to shell some coins for the hands on tutorials ? Or would you prefer to have video like me doing the actual work and you just watching it ?
@Piyushjoshi6767
@Piyushjoshi6767 3 ай бұрын
nicely explained
@ETLSQL
@ETLSQL 3 ай бұрын
Glad you liked it. Can you suggest me any relevant topic which I can cover next.
@krishnasingh-tf9jw
@krishnasingh-tf9jw 6 ай бұрын
Thanks for insightful tutorial. My only question is while going with distribution style key vs even will choosing key column distribute the rows and retrieve much faster than doing even distribution style as even will distribute evenly
@ETLSQL
@ETLSQL 6 ай бұрын
Yes you are right. If you have distkey and you use that in the query, then it will return rows faster than even distribution style. You should be little careful while picking distkey column. Ideally it should be the one with unique values and used in the queries. Good luck.
@krishnasingh-tf9jw
@krishnasingh-tf9jw 6 ай бұрын
@@ETLSQL Thanks for clarification
@prabhuthiyagarajan7437
@prabhuthiyagarajan7437 3 ай бұрын
Can same node slice share two different column values in case of same datatype?
@ETLSQL
@ETLSQL 3 ай бұрын
Yes it can. But remember data is distributed using distkey only.
@maheshkumar.s2513
@maheshkumar.s2513 6 ай бұрын
Please provide a video on azure data factory like this with atleast ome example
@ETLSQL
@ETLSQL 6 ай бұрын
Hey Mahesh, I am not planning to cover azure as of now. Will focus on general concepts and aws. Hope you find a suitable tutorial soon.
@heetshah5923
@heetshah5923 7 ай бұрын
Is there any videos power BI + amazon redshift
@ETLSQL
@ETLSQL 7 ай бұрын
Not sure about any video with power bi, generally teams prefer to use quicksight with redshift though.
What is Amazon Redshift | How to configure and connect to Redshift
26:54
AWS with Avinash Reddy
Рет қаралды 3 М.
A teacher captured the cutest moment at the nursery #shorts
00:33
Fabiosa Stories
Рет қаралды 44 МЛН
Little girl's dream of a giant teddy bear is about to come true #shorts
00:32
Clown takes blame for missing candy 🍬🤣 #shorts
00:49
Yoeslan
Рет қаралды 42 МЛН
AWS Redshift Query Tuning and Performance Optimization
1:45:40
Aurobindo Saha
Рет қаралды 44 М.
Data Warehousing on AWS : Analytics Pipeline & Technologies
7:29
Quick Tech Bytes
Рет қаралды 609
Database vs Data Warehouse vs Data Lake | What is the Difference?
5:22
Alex The Analyst
Рет қаралды 753 М.
AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]
41:30
Johnny Chivers
Рет қаралды 254 М.
Data Warehousing on AWS with Redshift - with a demo!
39:16
Cloud Architects in Africa
Рет қаралды 18 М.
Redshift Interview by Krishna -  bestonlinetrainings
58:49
Krishna Redshift
Рет қаралды 4,6 М.
Getting Started with Amazon Redshift - AWS Online Tech Talks
46:39
AWS Developers
Рет қаралды 57 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 157 М.
A teacher captured the cutest moment at the nursery #shorts
00:33
Fabiosa Stories
Рет қаралды 44 МЛН