AWS Glue Job Import Libraries Explained (And Why We Need Them)

Рет қаралды 18,273

Күн бұрын

This video explains the 6 import statements in a boilerplate glue script to help data engineers understand why we need them and what they do.
#aws #awsglue #pyspark

Пікірлер: 30

@mohammedgt8102 2 жыл бұрын

Perfect and straight to the point. I got in 5 min what I couldn't get in an hour.

@DataEngUncomplicated 2 жыл бұрын

Thanks Mohammad, That's the style of videos I go for on my channel. I try to make my videos as short as concise as possible.

@BeABetterDev 2 жыл бұрын

Short and sweet. Thanks.

@DataEngUncomplicated 2 жыл бұрын

I learned from the best 😉

@sukulmahadik0303 2 жыл бұрын

Cool explanation. I had never paid attention to these boiler plate statements

@mickyman753 6 ай бұрын

Just found your channel. can we have a complete playlist , a type of course or a oneshot video/videos, your explain in depth and I found your videos better than the other tutorials on youtube

@DataEngUncomplicated 6 ай бұрын

Thanks! Check out my playlists I have various ones for each AWS service I have made videos for. It sounds like that's what you are looking for.

@danielchicaiza7698 9 ай бұрын

Liked, suscribed and commented! Thank you very much for your help! Greetings from Colombia!

@DataEngUncomplicated 9 ай бұрын

Gracias, amigo!

@nikhilgupta110 2 жыл бұрын

Loved this video. Just a question, isn't it import * a bad coding practice? If you have already created video on practical implementation of those 24 classes then please share link, if not, I request you to make a video on that. "Took the one less traveled by, And that has made all the difference" .

@DataEngUncomplicated 2 жыл бұрын

Hi Nikhil! thanks for the comment and feedback! Honestly, I wasn't sure if people would find this video interesting or not....These are the boilerplate statements that aws glue provides when you create it from scratch. I guess you can even remove some or modify the statements if you want to keep it more focused or don't need them. I have no videos on the 24 classes already but I'm happy to hear that you think there is value in creating videos on these... I will add it to my video backlog list.

@Scott-s7f 3 ай бұрын

nice video! what's the point of using jobs in notebooks since bookmarks aren't supported there? is there another benefit?

@DataEngUncomplicated 3 ай бұрын

Thanks, the notebook is was just a way for me talk through the content. I would say the benefit of using a notebook is to make the development experience better as you can get feedback after every function you run instead of having to trigger the entire job.

@Scott-s7f 3 ай бұрын

@@DataEngUncomplicated oh thanks but I meant what is the use of the Job import and doing job init and commit in a notebook since bookmarks aren't supported?

@sanchitgarg5275 Жыл бұрын

Nice Video! I am struggling to find a way how I can set the script location path in the jupyter notebbok. I can see there is no magic command to do that and aws does not allow to make any changes manually under the tab "job details". Can u help me if there is any way?

@abdullahkheruwala9910 9 ай бұрын

I have files in an s3 bucket whose type is gz. The gz file consists of json records (each line is a record in json format). How can I read such file using glue dynamic frame?

@DataEngUncomplicated 8 ай бұрын

If you use the data catalog crawler on this folder, it should add the dataset to the glue catalog, you can then read and write to the dynamic from aws glue. Check out my other videos where I walk through how to do this with other formats

@AbhishekChauhan-kv7ds 8 ай бұрын

i'm new to aws and i'm working on a project but i'm unable to it. I'm getting Unresolved reference 'awsglue' Can you help me with this?

@DataEngUncomplicated 8 ай бұрын

Where are you developing your glue job?

@saksheegoel2654 Жыл бұрын

Can we not create functions (def fn() ) is streaming glue jobs??

@DataEngUncomplicated Жыл бұрын

Hi Sakshee, I haven't worked with streaming jobs yet but I don't see why we wouldn't able to create functions in streaming glue jobs.

@MuhammadImran-lr5tn Жыл бұрын

Hello sir i am facing no module named awsglue.context when i wrote the above imports in aws glue python shell. can you please help. thank you

@DataEngUncomplicated Жыл бұрын

Hi Muhammad, the python shell doesn't come with pyspark, you need to create a job that leverages the spark script instead of python shell

@MuhammadImran-lr5tn Жыл бұрын

@@DataEngUncomplicated Thank you for your reply. Can you please elaborate step by step procedure what i should do in order to execute awsglue.context library in aws glue job python shell.

@DataEngUncomplicated Жыл бұрын

What are you trying to do exactly in your script? If you need to use spark than you shouldn't be configuring a python she'll script. Select the pyspark script option instead.

@MuhammadImran-lr5tn Жыл бұрын

@@DataEngUncomplicated Thank you so much for your quick reply. I understand now what I was doing wrong now because of your guidance again thank you. The only point I want to get clarification on is that please elaborate is awsglue library is something that is used in pyspark context and it is related to pyspark not related to simple python shell am i right ?

@DataEngUncomplicated Жыл бұрын

@@MuhammadImran-lr5tn You're welcome! Yes, that's my understanding. You don't need that library for creating a python shell job.

@AmritAgarwal07 2 жыл бұрын

Can be update the data in database using glue jobs

@DataEngUncomplicated 2 жыл бұрын

I think you are trying to ask if we can update data in database with aws glue? Yes absolutely. It's one of the main use cases

@Fight3211 Жыл бұрын

Hi I have a question about the interaction between creating a "normal" spark session and glue, I needed to import a JAR and I got it working with spark = SparkSession.builder\ .appName("my-app") \ .config('spark.jars.packages', 'graphframes:graphframes-0.8.2-spark3.2-s_2.12')\ .getOrCreate() I commented out sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session So two things Im missing out is dynamic frames and save job states, how do I modify the original arguments so that I can bring gluecontext back in? Thank you