nice and fast , like me when neighbour julie husband is not at home
@林家明-g8l5 күн бұрын
appreciated for an update version tutorial with AWS Glue UI. Steps are in details and easy to follow.
@LeandroGessner5 күн бұрын
In the company I work we use Hudi and I can say that's a pain in the ass to make that shit working properly
@solitonbyjob6 күн бұрын
I thought XML JSON and NoSQL are semi structured not unstructured? 15:06 Best course btw !
@anuragsingh-ed4gk9 күн бұрын
37:36 Hey Johnny, I have followed the demo to the exact point however I got a Permission error (...AWSGlueServiceRole/GlueJobRunnerSession is not authorized to perform: s3:PutObject on resource: "arn:aws:s3). I looked into the role and permissions its all there. Its very strange. Can you help me out here?
@JohnnyChivers9 күн бұрын
It’s 100% permissions related. It’s saying your role doesn’t have permissions to access the s3 bucket. Check your IAM role was created correctly with the S3 bucket name you declared in the cloud formation script - and this name in the IAM policy matches your S3 bucket name. The other issue could be that you haven’t named/selected the S3 locations correctly in the drop downs when configuring the ETL settings.
@anuragsingh-ed4gk9 күн бұрын
@ thanks for your prompt reply… let me revisit the config and see if I missed it somewhere… since so far crawlers an all were working perfectly fine.. it probably has to to do something with ETL job config.
@JohnnyChivers9 күн бұрын
The script location, temp location etc for the etl job will need to be set to the bucket you created - so make sure that’s the case. There is default locations glue sets these to and the IAM won’t have the permissions to create/write to them.
@anuragsingh-ed4gk9 күн бұрын
It worked, while setting up the visual ETL job a default AWSGlueServiceRole was selected. I was able to fix it after changing it to your pre-defined role. Thank you again for your prompt respo se. This tutorial is THE BEST across you tube and I would request you to keep up creating similar videos for other integrated services like EMR, Kinesis streaming using SQS, SNS, EVENT BRIDGE, AWS lambda ETC.
@ZawmyoHtet-lg7jn14 күн бұрын
Thank you very much, Sir. This is really helpful.
@JohnnyChivers18 күн бұрын
Folks, before someone else says it - there is a typo with the Redshift slide 1:55:11. It should say OLAP for Online Analytical Processing. No way for me to fix it at this late stage. Mistakes happen with this length of video. The explanation and the use of Online Analytical Processing is correct - it's just a typo on my part.
@raulgatto632618 күн бұрын
1:55:20 Redshift is for OLAP not OLTP, it's also wrong in the downloadable pdf
@sagarahuja438618 күн бұрын
its I.A.M not Iaaamm
@JohnnyChivers18 күн бұрын
@@sagarahuja4386 it’s both. People use them interchangeably and I encounter it every day.
@monsieurdelaperouse975619 күн бұрын
I have a question about the glue role, if you please: we created it but did not add any authorizations to it, and yet the crawler started: how come?
@JohnnyChivers17 күн бұрын
If you, as the user, have permissions to execute the crawler then it will start to spin up. Once the crawler has booted up it will start to execute the code to run the crawl. At this point it will check it permission on a particular data source. If it does not have permission to access the data source it will fail.
22 күн бұрын
Sensacional
@roryhunterevans286225 күн бұрын
@johnnychivers Good video, but often you are not interacting with this layer directly. In databricks for example, I can write delta tables using SQL syntax etl to the same effect. I get this is a native review, but few will be interfacing directly.
@JohnnyChivers17 күн бұрын
Hi Rory, It really depends on the organisation. I have experience working with many businesses that do operate this layer themselves, and do not use the likes of Databricks. These businesses are running TBs of data through these self built data lakes on a daily basis. On the flip, there are other businesses which do use the likes of Databricks. If this is where you are more familiar it is still good to have an understanding of how the storage layer works under the hood.
@u.s.690925 күн бұрын
I watched the whole video and I still do not get how this works, so damn complicated and went completely over my head. How did you figure this out in the first place? Maybe I need to understand what ETL is, so disappointing.
@n8wong26 күн бұрын
Nice video. Can you do a tutorial on setting up an AWS Glue connection with RDS on a VPC with security groups?
@jimmyjuju27 күн бұрын
Excellent beginner's guide - thank you Johnny. Much appreciated!
@MrZH6Ай бұрын
It might also be a good idea to mention that it is important to make sure that only the latest version of the data according to the "processed_timestamp" column is loaded when further transformation has to be done, otherwise data duplication occurs. As far as I know this is the default behavior of Visual ETL and overwriting data is not possible. Which I find very awkward.
@AndyShirey-f9vАй бұрын
This is a fantastic tutorial, completely got me up and running with OpenSearch in AWS. Viewer beware: the "t3.small.search" instance option didn't show up for me during setup, so I selected "r7g.medium.search" (the minimal option available). Just over a week later and I got an $88 bill from AWS, with only 4 rows of test data in the index. This video is a year old and I realize a lot can change in that time, but if $90/week is table stakes for tinkering with OpenSearch and AWS then this tutorial should probably be updated to reflect that "free tier" no longer applies.
@AndyShirey-f9vАй бұрын
I was able to locate the "t3" instance types: they only appear under the "Instance Type" menu when "General Purpose" is selected under "Instance Family." Looks like "Instance Family" was added in the past year since this video was posted. Unfortunately the naming patterns for these Instance Types make the Instance Type listings per Instance Family indistinguishable to the untrained eye, so it takes a bit of clicking around to even realize the contents of those menus are affected by one another.
@mdafazal12Ай бұрын
Excellent explanation of AWS Glue all the features..Thank you very much
@Dan-tk1fbАй бұрын
This seems really really slow. Painfully slow. 6.6seconds for 5MB. ~15 minutes per GB. Is this the real performance?? Currently with Athena we can scan tens GB of data in a couple of seconds.
@fran993Ай бұрын
Can I store the name of the uploaded file in a new column of the output?
@theinstigatorrАй бұрын
I’m puzzled by results at the end. So far following your examples I have two rows of data in the tables from SQL query. With the final query I have 38 row which is way more than 2 but way less than the hundreds you have. Why isn’t it just 2 like the source data in the prior ingest phase?
@theinstigatorrАй бұрын
I was struggling with this video and it was breaking in many places. First is that the iam roles were insufficient for me to do what was shown in the video. I needed to create another iam role for the administrator named in the top right of the screen to update schema or query Athena. Second I cannot update schema to change partition column names. There seems to be a bug or breaking feature introduced between the video upload date and me trying this tutorial
@theinstigatorrАй бұрын
I could not get the Athena query to run on the real time ingest database but it does run on the prior batch database? I think I got the same error both times but did something with permissions to get it working just for batch database. That doesn’t seem to fix my problems with real time ingest
@ytmelancholyАй бұрын
thanks a lot for this!!! keep up the good ;)
@MarieWilkerson-h1oАй бұрын
Passed!!.. Thank you Johnny. Its a great course. Initially though this 5 hours is not enough, but it does. I just referred this course and some AWS whitepapers and in the end did handle full of practice tests from Skillcertpro. Around 80% of the questions were same as these tests. Also during the last 2 days went through exam notes given by them. That's it. Passed with Score of 934. It took me around 15 days to prepare and pass the exam.
@venkatreddy-px8fmАй бұрын
Hi @MarieWilkerson-h1o, Thanks for sharing the information please share the resource you used and also let us know if you already did AWS cloud practioner or can we do this AWS Data Engineer directly after watching this course. And also please share any tips to prepare and system you used as I am just starting.
@ytmelancholyАй бұрын
yes pls!
@jay_wright_thats_rightАй бұрын
@@venkatreddy-px8fm what does cloud practitioner have to do with data engineering
@CardsjotasАй бұрын
Thanks, this is great!.. I just surcribed to see more (:
@wah866sky7Ай бұрын
Thanks so much, Johnny
@shelleycurrie764Ай бұрын
really useful walk through thanks Johnny
@sureshdarla5540Ай бұрын
Loved it..Thank you bro
@LaurenArmstrong-f7gАй бұрын
Thank you for adding this updated video! It's very helpful
@HamdyTawfeek-l8pАй бұрын
Thanks Buddy
@elnicko6Ай бұрын
why'd you leave AWS?
@alexd627Ай бұрын
awsome video and accent!
@josephtoscano2099Ай бұрын
Great job providing an update to your earlier video! Your timing was just perfect.
@manasisingh2942 ай бұрын
you speak my favorite accent that's such a plus for me. Thank you so much for the quality content! :)
@wah866sky72 ай бұрын
Hi Johnny, If I am a new user of AWS, I want to create a new repository with CodeCommit, how can we still create a new one (or use an alternate method for creating a new repository) ? Do you have any tutorial for introducing AWS CI/CD for new users? Thanks
@JohnnyChivers2 ай бұрын
@@wah866sky7 AWS have actually announced an end of life for AWS code commit and are not allowing any new customers to onboard. They recommend using third party solution such as GitHub, GitLab, Bitbucket etc.
@emmanuelharel2 ай бұрын
Column names of my csv file is not picked up. Really annoying! I think I figured it out: when the first column name is an empty "" then parsing columns names fails silently and maps column names to col0, col1 ... in the database. Not noice.
@MohammedKareemullah-b6g2 ай бұрын
what is the content of timestamp column ?
@theinstigatorr2 ай бұрын
This account does not have access to the Cloud9 service
@BisayangIlokano2.02 ай бұрын
I hope you update this for 2024
@EJBB172 ай бұрын
Cool Johnny, thanks for this video.
@IMranZain-x1r2 ай бұрын
Amazing style of teaching
@santoshwaghmare40812 ай бұрын
Hi Johnny, Such a well explained video!!! Please keep creating such end to end hands on lab on AWS Glue. Your videos are great help for beginners like me. I’ll give it a try. Thank U!!
@TheSuperJCN2 ай бұрын
Great to see an update to your old content, an update with Glue's new console has been a long time coming. It's also great to see more common updates, your videos are extremly useful
@santoshwaghmare40812 ай бұрын
I've watched a few of your videos on AWS Glue, and they were incredibly helpful! Your updated content is fantastic-please keep sharing more AWS insights!❤
@RezaGhasemzadeh-l9f2 ай бұрын
I just watched your old video. While it was great, I was hoping that you would update it. What a fortune to see that you just made it an hour ago!
@gauravparasar43912 ай бұрын
Can we not create folders as well in S3 Bucket through cloud formation.yaml file instead of manually creating them ?
@JohnnyChivers2 ай бұрын
@@gauravparasar4391 not directly using cloud formation. You can create a lambda function inside the stack which contains code to execute that creates the folders, but added complexity in this case. IaC providers like terraform do allow the ability to create folders.
@JohnnyChivers2 ай бұрын
Hi Folks - The much requested update to this video with the new AWS Console UI for AWS Glue is now available on the channel with a new GitHub repo containing everything you need to follow along. kzbin.info/www/bejne/kKethJSfpLWMr9E.