This video explains the 6 import statements in a boilerplate glue script to help data engineers understand why we need them and what they do. #aws #awsglue #pyspark
Пікірлер: 30
@mohammedgt81022 жыл бұрын
Perfect and straight to the point. I got in 5 min what I couldn't get in an hour.
@DataEngUncomplicated2 жыл бұрын
Thanks Mohammad, That's the style of videos I go for on my channel. I try to make my videos as short as concise as possible.
@BeABetterDev2 жыл бұрын
Short and sweet. Thanks.
@DataEngUncomplicated2 жыл бұрын
I learned from the best 😉
@sukulmahadik03032 жыл бұрын
Cool explanation. I had never paid attention to these boiler plate statements
@mickyman7536 ай бұрын
Just found your channel. can we have a complete playlist , a type of course or a oneshot video/videos, your explain in depth and I found your videos better than the other tutorials on youtube
@DataEngUncomplicated6 ай бұрын
Thanks! Check out my playlists I have various ones for each AWS service I have made videos for. It sounds like that's what you are looking for.
@danielchicaiza76989 ай бұрын
Liked, suscribed and commented! Thank you very much for your help! Greetings from Colombia!
@DataEngUncomplicated9 ай бұрын
Gracias, amigo!
@nikhilgupta1102 жыл бұрын
Loved this video. Just a question, isn't it import * a bad coding practice? If you have already created video on practical implementation of those 24 classes then please share link, if not, I request you to make a video on that. "Took the one less traveled by, And that has made all the difference" .
@DataEngUncomplicated2 жыл бұрын
Hi Nikhil! thanks for the comment and feedback! Honestly, I wasn't sure if people would find this video interesting or not....These are the boilerplate statements that aws glue provides when you create it from scratch. I guess you can even remove some or modify the statements if you want to keep it more focused or don't need them. I have no videos on the 24 classes already but I'm happy to hear that you think there is value in creating videos on these... I will add it to my video backlog list.
@Scott-s7f3 ай бұрын
nice video! what's the point of using jobs in notebooks since bookmarks aren't supported there? is there another benefit?
@DataEngUncomplicated3 ай бұрын
Thanks, the notebook is was just a way for me talk through the content. I would say the benefit of using a notebook is to make the development experience better as you can get feedback after every function you run instead of having to trigger the entire job.
@Scott-s7f3 ай бұрын
@@DataEngUncomplicated oh thanks but I meant what is the use of the Job import and doing job init and commit in a notebook since bookmarks aren't supported?
@sanchitgarg5275 Жыл бұрын
Nice Video! I am struggling to find a way how I can set the script location path in the jupyter notebbok. I can see there is no magic command to do that and aws does not allow to make any changes manually under the tab "job details". Can u help me if there is any way?
@abdullahkheruwala99109 ай бұрын
I have files in an s3 bucket whose type is gz. The gz file consists of json records (each line is a record in json format). How can I read such file using glue dynamic frame?
@DataEngUncomplicated8 ай бұрын
If you use the data catalog crawler on this folder, it should add the dataset to the glue catalog, you can then read and write to the dynamic from aws glue. Check out my other videos where I walk through how to do this with other formats
@AbhishekChauhan-kv7ds8 ай бұрын
i'm new to aws and i'm working on a project but i'm unable to it. I'm getting Unresolved reference 'awsglue' Can you help me with this?
@DataEngUncomplicated8 ай бұрын
Where are you developing your glue job?
@saksheegoel2654 Жыл бұрын
Can we not create functions (def fn() ) is streaming glue jobs??
@DataEngUncomplicated Жыл бұрын
Hi Sakshee, I haven't worked with streaming jobs yet but I don't see why we wouldn't able to create functions in streaming glue jobs.
@MuhammadImran-lr5tn Жыл бұрын
Hello sir i am facing no module named awsglue.context when i wrote the above imports in aws glue python shell. can you please help. thank you
@DataEngUncomplicated Жыл бұрын
Hi Muhammad, the python shell doesn't come with pyspark, you need to create a job that leverages the spark script instead of python shell
@MuhammadImran-lr5tn Жыл бұрын
@@DataEngUncomplicated Thank you for your reply. Can you please elaborate step by step procedure what i should do in order to execute awsglue.context library in aws glue job python shell.
@DataEngUncomplicated Жыл бұрын
What are you trying to do exactly in your script? If you need to use spark than you shouldn't be configuring a python she'll script. Select the pyspark script option instead.
@MuhammadImran-lr5tn Жыл бұрын
@@DataEngUncomplicated Thank you so much for your quick reply. I understand now what I was doing wrong now because of your guidance again thank you. The only point I want to get clarification on is that please elaborate is awsglue library is something that is used in pyspark context and it is related to pyspark not related to simple python shell am i right ?
@DataEngUncomplicated Жыл бұрын
@@MuhammadImran-lr5tn You're welcome! Yes, that's my understanding. You don't need that library for creating a python shell job.
@AmritAgarwal072 жыл бұрын
Can be update the data in database using glue jobs
@DataEngUncomplicated2 жыл бұрын
I think you are trying to ask if we can update data in database with aws glue? Yes absolutely. It's one of the main use cases
@Fight3211 Жыл бұрын
Hi I have a question about the interaction between creating a "normal" spark session and glue, I needed to import a JAR and I got it working with spark = SparkSession.builder\ .appName("my-app") \ .config('spark.jars.packages', 'graphframes:graphframes-0.8.2-spark3.2-s_2.12')\ .getOrCreate() I commented out sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session So two things Im missing out is dynamic frames and save job states, how do I modify the original arguments so that I can bring gluecontext back in? Thank you