Johnny Chivers

Johnny Chivers

Brought to you by Johnny Chivers this channel provides a platform to gain the skills of an AWS data engineer through lessons and vlogs.

I have spent the last decade being immersed in the world of big data working as a consultant for some the globe's biggest companies.

My journey into the world of data was not the most conventional. I started my career working as performance analyst in professional sport at the top level's of both rugby and football. I then transitioned into a carrer in data and computing. This journey culminated in the study of a Masters degree in Software development. Alongside many a professional certification in AWS and MS SQL Server.

AWS, GCP and MS SQL Server have became my areas of expertise over the years. I have had the privilege of traveling the world to help companies develop innovative solutions to their business problems, so they can derive maximum market value.

I am available for both physical and virtual consulting, as well as tech talks.

AWS Certified Data Engineer - Associate (DEA-C01) [Full Course In 285min]

4:44:31

AWS Certified Data Engineer - Associate (DEA-C01) [Full Course In 285min]

4 ай бұрын

The Top AWS Services A Data Engineer Should Know In 2024

7:09

The Top AWS Services A Data Engineer Should Know In 2024

9 ай бұрын

Amazon Bedrock on AWS [AWS TUTORIAL IN 10MINS]

9:50

Amazon Bedrock on AWS [AWS TUTORIAL IN 10MINS]

Жыл бұрын

My Top 5 Tips For Passing The AWS Certified Data Analytics - Specialty Exam (DAS-C01)

4:27

My Top 5 Tips For Passing The AWS Certified Data Analytics - Specialty Exam (DAS-C01)

Жыл бұрын

What is Amazon DataZone? [AWS TUTORIAL in 12MINS]

12:28

What is Amazon DataZone? [AWS TUTORIAL in 12MINS]

Жыл бұрын

AWS Glue Crawler [AWS Console 2023 Full Demo]

8:31

AWS Glue Crawler [AWS Console 2023 Full Demo]

Жыл бұрын

What Table Format Should I Choose For My Data Lake? Hudi | Iceberg | Delta Lake

6:30

What Table Format Should I Choose For My Data Lake? Hudi | Iceberg | Delta Lake

Жыл бұрын

Run Spark Jobs On Amazon Athena [FULL TUTORIAL IN 12MINS]

12:14

Run Spark Jobs On Amazon Athena [FULL TUTORIAL IN 12MINS]

Жыл бұрын

Build Your Own Search Using Amazon OpenSearch Service [FULL COURSE in 15MIN]

14:15

Build Your Own Search Using Amazon OpenSearch Service [FULL COURSE in 15MIN]

Жыл бұрын

Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

28:04

Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]

2 жыл бұрын

SQL For AWS Athena [FULL COURSE IN 40mins]

40:07

SQL For AWS Athena [FULL COURSE IN 40mins]

2 жыл бұрын

PySpark For AWS Glue Tutorial [FULL COURSE in 100min]

1:36:49

PySpark For AWS Glue Tutorial [FULL COURSE in 100min]

2 жыл бұрын

AWS EMR Serverless - What is it? [FULL TUTORIAL in 25mins]

23:35

AWS EMR Serverless - What is it? [FULL TUTORIAL in 25mins]

2 жыл бұрын

Build An AWS Streaming Fraud Detection App [Full Tutorial using MSK and Kinesis]

28:13

Build An AWS Streaming Fraud Detection App [Full Tutorial using MSK and Kinesis]

2 жыл бұрын

AWS EMR Tutorial [FULL COURSE in 60mins]

1:01:06

AWS EMR Tutorial [FULL COURSE in 60mins]

2 жыл бұрын

AWS Kinesis Tutorial for Beginners [FULL COURSE in 65 mins]

1:03:26

AWS Kinesis Tutorial for Beginners [FULL COURSE in 65 mins]

2 жыл бұрын

AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]

41:30

AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]

2 жыл бұрын

AWS MySQL Aurora Vs RDS - What one should I chose?

7:02

AWS MySQL Aurora Vs RDS - What one should I chose?

2 жыл бұрын

Top 5 Trends For Data Engineering In 2022

7:33

Top 5 Trends For Data Engineering In 2022

3 жыл бұрын

AWS EMR vs AWS SageMaker - What One Should I use?

3:51

AWS EMR vs AWS SageMaker - What One Should I use?

3 жыл бұрын

AWS Glue ETL Vs EMR - Which one should I use?

8:05

AWS Glue ETL Vs EMR - Which one should I use?

3 жыл бұрын

Realtime Streaming With AWS Glue Studio

9:40

Realtime Streaming With AWS Glue Studio

3 жыл бұрын

AWS Glue Studio - Lets Get Hands On!

32:53

AWS Glue Studio - Lets Get Hands On!

3 жыл бұрын

Using AWS Aurora For Full Text Search - Complete Tutorial

18:50

Using AWS Aurora For Full Text Search - Complete Tutorial

3 жыл бұрын

AWS Postgres Aurora Vs RDS - What one should I chose?

4:50

AWS Postgres Aurora Vs RDS - What one should I chose?

3 жыл бұрын

My Top 5 Linux Commands On AWS For Data Engineering - Using Cloud9!

11:47

My Top 5 Linux Commands On AWS For Data Engineering - Using Cloud9!

3 жыл бұрын

What Do Cloud Data Engineers Do In AWS?

6:51

What Do Cloud Data Engineers Do In AWS?

3 жыл бұрын

AWS Data Engineering Tutorial for Beginners [FULL COURSE in 90 mins]

1:31:29

AWS Data Engineering Tutorial for Beginners [FULL COURSE in 90 mins]

3 жыл бұрын

How I Architected A Start Up WebApp Using AWS Amplify

14:55

How I Architected A Start Up WebApp Using AWS Amplify

3 жыл бұрын

Пікірлер

@eduardoamfm 4 сағат бұрын

awesome video tks!

@JinnahClaus 3 күн бұрын

nice and fast , like me when neighbour julie husband is not at home

@林家明-g8l 5 күн бұрын

appreciated for an update version tutorial with AWS Glue UI. Steps are in details and easy to follow.

@LeandroGessner

@LeandroGessner 5 күн бұрын

In the company I work we use Hudi and I can say that's a pain in the ass to make that shit working properly

@solitonbyjob 6 күн бұрын

I thought XML JSON and NoSQL are semi structured not unstructured? 15:06 Best course btw !

@anuragsingh-ed4gk

@anuragsingh-ed4gk 9 күн бұрын

37:36 Hey Johnny, I have followed the demo to the exact point however I got a Permission error (...AWSGlueServiceRole/GlueJobRunnerSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3). I looked into the role and permissions its all there. Its very strange. Can you help me out here?

@JohnnyChivers 9 күн бұрын

It’s 100% permissions related. It’s saying your role doesn’t have permissions to access the s3 bucket. Check your IAM role was created correctly with the S3 bucket name you declared in the cloud formation script - and this name in the IAM policy matches your S3 bucket name. The other issue could be that you haven’t named/selected the S3 locations correctly in the drop downs when configuring the ETL settings.

@anuragsingh-ed4gk

@anuragsingh-ed4gk 9 күн бұрын

@ thanks for your prompt reply… let me revisit the config and see if I missed it somewhere… since so far crawlers an all were working perfectly fine.. it probably has to to do something with ETL job config.

@JohnnyChivers 9 күн бұрын

The script location, temp location etc for the etl job will need to be set to the bucket you created - so make sure that’s the case. There is default locations glue sets these to and the IAM won’t have the permissions to create/write to them.

@anuragsingh-ed4gk

@anuragsingh-ed4gk 9 күн бұрын

It worked, while setting up the visual ETL job a default AWSGlueServiceRole was selected. I was able to fix it after changing it to your pre-defined role. Thank you again for your prompt respo se. This tutorial is THE BEST across you tube and I would request you to keep up creating similar videos for other integrated services like EMR, Kinesis streaming using SQS, SNS, EVENT BRIDGE, AWS lambda ETC.

@ZawmyoHtet-lg7jn

@ZawmyoHtet-lg7jn 14 күн бұрын

Thank you very much, Sir. This is really helpful.

@JohnnyChivers 18 күн бұрын

Folks, before someone else says it - there is a typo with the Redshift slide 1:55:11. It should say OLAP for Online Analytical Processing. No way for me to fix it at this late stage. Mistakes happen with this length of video. The explanation and the use of Online Analytical Processing is correct - it's just a typo on my part.

@raulgatto6326 18 күн бұрын

1:55:20 Redshift is for OLAP not OLTP, it's also wrong in the downloadable pdf

@sagarahuja4386

@sagarahuja4386 18 күн бұрын

its I.A.M not Iaaamm

@JohnnyChivers 18 күн бұрын

@@sagarahuja4386 it’s both. People use them interchangeably and I encounter it every day.

@monsieurdelaperouse9756

@monsieurdelaperouse9756 19 күн бұрын

I have a question about the glue role, if you please: we created it but did not add any authorizations to it, and yet the crawler started: how come?

@JohnnyChivers 17 күн бұрын

If you, as the user, have permissions to execute the crawler then it will start to spin up. Once the crawler has booted up it will start to execute the code to run the crawl. At this point it will check it permission on a particular data source. If it does not have permission to access the data source it will fail.

22 күн бұрын

Sensacional

@roryhunterevans2862

@roryhunterevans2862 25 күн бұрын

@johnnychivers Good video, but often you are not interacting with this layer directly. In databricks for example, I can write delta tables using SQL syntax etl to the same effect. I get this is a native review, but few will be interfacing directly.

@JohnnyChivers 17 күн бұрын

Hi Rory, It really depends on the organisation. I have experience working with many businesses that do operate this layer themselves, and do not use the likes of Databricks. These businesses are running TBs of data through these self built data lakes on a daily basis. On the flip, there are other businesses which do use the likes of Databricks. If this is where you are more familiar it is still good to have an understanding of how the storage layer works under the hood.

@u.s.6909 25 күн бұрын

I watched the whole video and I still do not get how this works, so damn complicated and went completely over my head. How did you figure this out in the first place? Maybe I need to understand what ETL is, so disappointing.

@n8wong 26 күн бұрын

Nice video. Can you do a tutorial on setting up an AWS Glue connection with RDS on a VPC with security groups?

@jimmyjuju 27 күн бұрын

Excellent beginner's guide - thank you Johnny. Much appreciated!

@MrZH6 Ай бұрын

It might also be a good idea to mention that it is important to make sure that only the latest version of the data according to the "processed_timestamp" column is loaded when further transformation has to be done, otherwise data duplication occurs. As far as I know this is the default behavior of Visual ETL and overwriting data is not possible. Which I find very awkward.

@AndyShirey-f9v

@AndyShirey-f9v Ай бұрын

This is a fantastic tutorial, completely got me up and running with OpenSearch in AWS. Viewer beware: the "t3.small.search" instance option didn't show up for me during setup, so I selected "r7g.medium.search" (the minimal option available). Just over a week later and I got an $88 bill from AWS, with only 4 rows of test data in the index. This video is a year old and I realize a lot can change in that time, but if $90/week is table stakes for tinkering with OpenSearch and AWS then this tutorial should probably be updated to reflect that "free tier" no longer applies.

@AndyShirey-f9v

@AndyShirey-f9v Ай бұрын

I was able to locate the "t3" instance types: they only appear under the "Instance Type" menu when "General Purpose" is selected under "Instance Family." Looks like "Instance Family" was added in the past year since this video was posted. Unfortunately the naming patterns for these Instance Types make the Instance Type listings per Instance Family indistinguishable to the untrained eye, so it takes a bit of clicking around to even realize the contents of those menus are affected by one another.

@mdafazal12 Ай бұрын

Excellent explanation of AWS Glue all the features..Thank you very much

@Dan-tk1fb Ай бұрын

This seems really really slow. Painfully slow. 6.6seconds for 5MB. ~15 minutes per GB. Is this the real performance?? Currently with Athena we can scan tens GB of data in a couple of seconds.

@fran993 Ай бұрын

Can I store the name of the uploaded file in a new column of the output?

@theinstigatorr

@theinstigatorr Ай бұрын

I’m puzzled by results at the end. So far following your examples I have two rows of data in the tables from SQL query. With the final query I have 38 row which is way more than 2 but way less than the hundreds you have. Why isn’t it just 2 like the source data in the prior ingest phase?

@theinstigatorr

@theinstigatorr Ай бұрын

I was struggling with this video and it was breaking in many places. First is that the iam roles were insufficient for me to do what was shown in the video. I needed to create another iam role for the administrator named in the top right of the screen to update schema or query Athena. Second I cannot update schema to change partition column names. There seems to be a bug or breaking feature introduced between the video upload date and me trying this tutorial

@theinstigatorr

@theinstigatorr Ай бұрын

I could not get the Athena query to run on the real time ingest database but it does run on the prior batch database? I think I got the same error both times but did something with permissions to get it working just for batch database. That doesn’t seem to fix my problems with real time ingest

@ytmelancholy Ай бұрын

thanks a lot for this!!! keep up the good ;)

@MarieWilkerson-h1o

@MarieWilkerson-h1o Ай бұрын

Passed!!.. Thank you Johnny. Its a great course. Initially though this 5 hours is not enough, but it does. I just referred this course and some AWS whitepapers and in the end did handle full of practice tests from Skillcertpro. Around 80% of the questions were same as these tests. Also during the last 2 days went through exam notes given by them. That's it. Passed with Score of 934. It took me around 15 days to prepare and pass the exam.

@venkatreddy-px8fm

@venkatreddy-px8fm Ай бұрын

Hi @MarieWilkerson-h1o, Thanks for sharing the information please share the resource you used and also let us know if you already did AWS cloud practioner or can we do this AWS Data Engineer directly after watching this course. And also please share any tips to prepare and system you used as I am just starting.

@ytmelancholy Ай бұрын

yes pls!

@jay_wright_thats_right

@jay_wright_thats_right Ай бұрын

@@venkatreddy-px8fm what does cloud practitioner have to do with data engineering

@Cardsjotas Ай бұрын

Thanks, this is great!.. I just surcribed to see more (:

@wah866sky7 Ай бұрын

Thanks so much, Johnny

@shelleycurrie764

@shelleycurrie764 Ай бұрын

really useful walk through thanks Johnny

@sureshdarla5540

@sureshdarla5540 Ай бұрын

Loved it..Thank you bro

@LaurenArmstrong-f7g

@LaurenArmstrong-f7g Ай бұрын

Thank you for adding this updated video! It's very helpful

@HamdyTawfeek-l8p

@HamdyTawfeek-l8p Ай бұрын

Thanks Buddy

@elnicko6 Ай бұрын

why'd you leave AWS?

@alexd627 Ай бұрын

awsome video and accent!

@josephtoscano2099

@josephtoscano2099 Ай бұрын

Great job providing an update to your earlier video! Your timing was just perfect.

@manasisingh294

@manasisingh294 2 ай бұрын

you speak my favorite accent that's such a plus for me. Thank you so much for the quality content! :)

@wah866sky7 2 ай бұрын

Hi Johnny, If I am a new user of AWS, I want to create a new repository with CodeCommit, how can we still create a new one (or use an alternate method for creating a new repository) ? Do you have any tutorial for introducing AWS CI/CD for new users? Thanks

@JohnnyChivers 2 ай бұрын

@@wah866sky7 AWS have actually announced an end of life for AWS code commit and are not allowing any new customers to onboard. They recommend using third party solution such as GitHub, GitLab, Bitbucket etc.

@emmanuelharel 2 ай бұрын

Column names of my csv file is not picked up. Really annoying! I think I figured it out: when the first column name is an empty "" then parsing columns names fails silently and maps column names to col0, col1 ... in the database. Not noice.

@MohammedKareemullah-b6g

@MohammedKareemullah-b6g 2 ай бұрын

what is the content of timestamp column ?

@theinstigatorr

@theinstigatorr 2 ай бұрын

This account does not have access to the Cloud9 service

@BisayangIlokano2.0

@BisayangIlokano2.0 2 ай бұрын

I hope you update this for 2024

@EJBB17 2 ай бұрын

Cool Johnny, thanks for this video.

@IMranZain-x1r 2 ай бұрын

Amazing style of teaching

@santoshwaghmare4081

@santoshwaghmare4081 2 ай бұрын

Hi Johnny, Such a well explained video!!! Please keep creating such end to end hands on lab on AWS Glue. Your videos are great help for beginners like me. I’ll give it a try. Thank U!!

@TheSuperJCN 2 ай бұрын

Great to see an update to your old content, an update with Glue's new console has been a long time coming. It's also great to see more common updates, your videos are extremly useful

@santoshwaghmare4081

@santoshwaghmare4081 2 ай бұрын

I've watched a few of your videos on AWS Glue, and they were incredibly helpful! Your updated content is fantastic-please keep sharing more AWS insights!❤

@RezaGhasemzadeh-l9f

@RezaGhasemzadeh-l9f 2 ай бұрын

I just watched your old video. While it was great, I was hoping that you would update it. What a fortune to see that you just made it an hour ago!

@gauravparasar4391

@gauravparasar4391 2 ай бұрын

Can we not create folders as well in S3 Bucket through cloud formation.yaml file instead of manually creating them ?

@JohnnyChivers 2 ай бұрын

@@gauravparasar4391 not directly using cloud formation. You can create a lambda function inside the stack which contains code to execute that creates the folders, but added complexity in this case. IaC providers like terraform do allow the ability to create folders.

@JohnnyChivers 2 ай бұрын

Hi Folks - The much requested update to this video with the new AWS Console UI for AWS Glue is now available on the channel with a new GitHub repo containing everything you need to follow along. kzbin.info/www/bejne/kKethJSfpLWMr9E.

@marcosoliveira8731

@marcosoliveira8731 2 ай бұрын

Love the explanations and the examples. +1