Intro to Amazon EMR - Big Data Tutorial using Spark

  Рет қаралды 18,233

jayzern

jayzern

Күн бұрын

Edit*
Make sure you encrypt your Spark script as you upload it inside S3 (timestamp: 13:42)
There's a small typo in line 41 of the code, should be "add_argument"
Intro
Today we're going to talk about a popular tool in Data Engineering. Amazon EMR is an industry-leading big data platform. It's a really mature service developed way back in 2009, and draws a lot of heuristics from the Apache Hadoop project. EMR is used for processing terabytes worth of data, and training machine learning models. In this tutorial, we'll dive deep into EMR's architecture, a live demo on how to trigger jobs using Steps, and demonstrate how to use Spark to extrapolate data from Amazon S3. Hope you enjoy this one!
Timestamps ⏰
0:00 Intro
1:16 Overview of Amazon EMR
5:10 Create filesystem, VPC, and configure EMR cluster
9:04 Writing our Spark script
13:42 3 ways to Trigger Steps in EMR
18:32 SSH into Resource Manager in YARN
19:50 Enable EMR managed auto-scaling
20:57 Summary
Notes from video 📝
bittersweet-mall-f00.notion.s...
Who am I? 🙋🏻‍♂️
I'm Jay, I love making videos about travel, self-help and tech. I currently work in New York City as a data engineer, but I grew up in Malaysia and lived in the UK when I was 19. Back then, I had no idea what life was about, moving to so many places, navigating career in Tech. Today, I've learned a lot and wanna share my perspective through filmmaking.
Socials 📱
/ jayzern
/ jayzern
Sub Count: 4,539

Пікірлер: 60
@harishchitluri3137
@harishchitluri3137 2 күн бұрын
Absolutely enjoyed watching the entire video. I felt this video is gonna be great start to understand EMR. Thanks for making it jay
@jovelynobias5422
@jovelynobias5422 3 ай бұрын
I hope you create more videos about AWS services. Loved the way you explain things, perfect for beginners.
@miguelhermar
@miguelhermar 22 күн бұрын
We need more videos Jaaay 🙏🏻💪🏻 You're awesome dude!
@Munk-tt6tz
@Munk-tt6tz Ай бұрын
So sad your channel doesn't have more tutorials like this :( thank you so much!
@user-eo3ji8nb2k
@user-eo3ji8nb2k 4 ай бұрын
This is an outstanding tutorial. Thank you for making this!
@yutao1982
@yutao1982 6 ай бұрын
Very clear! Thank you for sharing this excellent tutorial!
@sunnyzhong2905
@sunnyzhong2905 9 ай бұрын
great tutorial! can’t wait to see more
@vineethdas4160
@vineethdas4160 23 күн бұрын
awesome explanation, simple , subtle and to the point!
@DarshilParmar
@DarshilParmar 2 ай бұрын
Great work mate, very crisp!
@jayzern
@jayzern 2 ай бұрын
Thanks man!! Love ur content
@lucashoww
@lucashoww 4 ай бұрын
gnarly stuff man! great content.
@vmmismagic
@vmmismagic Ай бұрын
Hey, thank you so much!!.. you really explain very well!
@prabhathkota107
@prabhathkota107 29 күн бұрын
Very well explained, kudos
@carloshenriquekaphos8814
@carloshenriquekaphos8814 7 ай бұрын
Go ahead bro....CONGRATS TUTO
@goumze
@goumze 2 ай бұрын
Great Article ! Thanks for sharing..
@datexland
@datexland 5 ай бұрын
Thanks for sharing man 👌
@elenciclopedista6426
@elenciclopedista6426 9 ай бұрын
Great!! Thank u so much!
@jasonyuen105
@jasonyuen105 Ай бұрын
nice job, great tutorial
@isaaclee3714
@isaaclee3714 3 ай бұрын
This is so goood :). Please keep making these kind of videos! Hello from Seattle
@jayzern
@jayzern 2 ай бұрын
Thanks Isaac from Seattle! Appreciate your support
@thanhchien1602
@thanhchien1602 8 ай бұрын
Your video is very interesting! Hope you release many new videos :)
@pottamvivek
@pottamvivek 2 ай бұрын
Great job
@user-wy6fd2kw8y
@user-wy6fd2kw8y 9 ай бұрын
impressive and informative video, good job, go on doing tutorials plss :) Would be very interesting to see a video about spark and snowflake on your channel!
@pradeepnim3689
@pradeepnim3689 7 ай бұрын
Thanks .. Good stuff
@sisami2109
@sisami2109 9 ай бұрын
thanks for the video
@hassanlaqrabti4036
@hassanlaqrabti4036 9 ай бұрын
More tutorials 🙏
@StartDataLate
@StartDataLate 4 ай бұрын
this is crazy ❤❤❤ wish i had seen this earlier ! is this how the whole amazon product in a actual work flow look like? and also could you maybe make another showing azure system? pleaaase
@tatenda_mk
@tatenda_mk 6 ай бұрын
Great tutorials! thanks for the headup! do you have a git repo or more notion notes? would like some guidance
@martinghiena5270
@martinghiena5270 2 ай бұрын
You killed it. Loved it! Extremely useful
@jayzern
@jayzern 2 ай бұрын
Thank you man! Hope to create more
@NhungNguyen-wh7uf
@NhungNguyen-wh7uf 7 ай бұрын
Could you share more about project for data engineer beginners? I have start to learn to be a DE recently and I hope to know more about some personal project that help me to enhance my skills. Thank you so much for your sharing and waiting for your next video :> Have a good day
@errrbrrr3821
@errrbrrr3821 9 ай бұрын
great video! can you make also for AWS Glue? Thank you!
@tatenda_mk
@tatenda_mk 6 ай бұрын
when writing the spark script, does it ever change or the skeleton layout remains the same? i truly appreciate this and i cannot wait for more
@bishop9168
@bishop9168 2 ай бұрын
Fantastic tutorial indeed! I did as instructed and I got two fails in deploying the 'add step' part of the EMR Cluster stage, any insights would be appreciated.
@giovannimaia9652
@giovannimaia9652 12 күн бұрын
Please post more videos
@_its_ck
@_its_ck 9 ай бұрын
More videos on Streaming, Airflow and Spark
@mandata143
@mandata143 6 ай бұрын
is this free to use or do i need to have a licensed software in order to use? this is quite interesting.
@jazzypants4047
@jazzypants4047 5 ай бұрын
I am wondering if I only needed to do PySpark, is EMR the best tool or is it overkill and Glue serverless would be good enough with a lot less to manage and fewer configurations to worry about. Is it possible to enable better performance with all the options in EMR?
@jazzypants4047
@jazzypants4047 5 ай бұрын
And thank you for this video - I’m studying for AWS certification and it was helpful to see your demonstration
@shivaramthallapally369
@shivaramthallapally369 8 ай бұрын
From where you learn that coding part 😢
@carloshenriquekaphos8814
@carloshenriquekaphos8814 7 ай бұрын
Don't stop
@jovelynobias5422
@jovelynobias5422 3 ай бұрын
Isnt using EMR notebook one of of the ways to trigger EMR job?
@jayzern
@jayzern 3 ай бұрын
Yes it is! Wanted to keep things simple in the video so didn't include it
@syedmehdi5125
@syedmehdi5125 8 ай бұрын
I hav done masters of science in biotech, 38 yers of age, want to switch to data science...how shud i do it??? Plz reply.....
@CK30585
@CK30585 8 ай бұрын
Do projects and add them in your resume. Try upwork and do some projects as freelancers. Keep applying
@Ved3sten
@Ved3sten 6 ай бұрын
Don’t
@syedmehdi5125
@syedmehdi5125 6 ай бұрын
@@Ved3sten y , plz reply...
@Ved3sten
@Ved3sten 6 ай бұрын
@@syedmehdi5125 bc most companies want senior data analysts or graduate students when it comes to data science. You’ll waste more money chasing a data science job than you’ll make
@koliux1
@koliux1 3 ай бұрын
eah good in EMR AWS but an absolute rookie in Videography and equipment use manual focus since you are stationary.... your autofocus keeps looking for something and change light set-up
@jayzern
@jayzern 3 ай бұрын
Fair point 👍 will work on lighting and camera setup more next time
@DivakarJ-gk6op
@DivakarJ-gk6op 9 ай бұрын
nice try but its not working
@jayzern
@jayzern 9 ай бұрын
Let me know how I can help
@DivakarJ-gk6op
@DivakarJ-gk6op 9 ай бұрын
I can add a step for the spark application@@jayzern
@jayzern
@jayzern 9 ай бұрын
Check if 1. the Spark script is encrypted when you upload it inside S3 2. any typos (line 41 should be "add_argument")
@DivakarJ-gk6op
@DivakarJ-gk6op 9 ай бұрын
I had tried. but it's not working for me @@jayzern
@jayzern
@jayzern 9 ай бұрын
Send me a DM on instagram @jayzern or linkedin, happy to pair up
@christinachen9669
@christinachen9669 3 ай бұрын
Love the ways how you demonstrate! so clear and easy to understand! Thanks for sharing @jayzern
@chulada03
@chulada03 8 ай бұрын
thanks so much
How I would learn Data Engineering (if I could start over)
11:21
A pack of chips with a surprise 🤣😍❤️ #demariki
00:14
Demariki
Рет қаралды 52 МЛН
ИРИНА КАЙРАТОВНА - АЙДАХАР (БЕКА) [MV]
02:51
ГОСТ ENTERTAINMENT
Рет қаралды 4,2 МЛН
AWS Glue ETL Vs EMR - Which one should I use?
8:05
Johnny Chivers
Рет қаралды 36 М.
AWS EMR Tutorial [FULL COURSE in 60mins]
1:01:06
Johnny Chivers
Рет қаралды 57 М.
AWS Tutorials - Absolute Beginners Tutorial for Amazon EMR
46:35
AWS Tutorials
Рет қаралды 29 М.
How to process big data workloads with spark on AWS EMR
36:09
Primus Learning
Рет қаралды 6 М.
AWS EMR Serverless - What is it? [FULL TUTORIAL in 25mins]
23:35
Johnny Chivers
Рет қаралды 14 М.
AWS EMR Cluster Create using AWS Console | Submitting Spark Jobs in AWS EMR Cluster
50:19
How I built my best ML project without going crazy
14:25
Boris Meinardus
Рет қаралды 11 М.
Top 5 FREE Resources to 10X Your Data Engineering Skills
11:49
Jash Radia
Рет қаралды 42 М.