Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step

  Рет қаралды 9,680

DataEng Uncomplicated

DataEng Uncomplicated

Күн бұрын

Пікірлер: 82
@GitHubertP
@GitHubertP Жыл бұрын
you just saved me a few bucks that I was spending on Glue during some experiments and learning! Good to have that kind of content on youtube and possibility to support you :)
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Wow, Thanks for the direct support through buying me a coffee, I really appreciate the support Hubert! I'm happy that you were able to save on compute costs!
@kandikondakarthik1432
@kandikondakarthik1432 Жыл бұрын
Wow, simply amazing video. Very well explained and detailed information. Please keep doing the great work!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks my friend 😉
@bartoszturkowyd3608
@bartoszturkowyd3608 Жыл бұрын
Oh, such a great timing for such a great tutorial! Thank you very much! ❤
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks for your kind words! I'm glad it was helpful! I recommend this way to develop glue jobs.
@herleyshaori
@herleyshaori 10 ай бұрын
This video helps me.
@prabhathkota107
@prabhathkota107 7 ай бұрын
Very much helpful. Thanks
@DataEngUncomplicated
@DataEngUncomplicated 7 ай бұрын
You're welcome!
@dougkfarrell
@dougkfarrell 6 ай бұрын
This is fantastic! I'm new to AWS Glue and was really struggling to get traction developing an ETL script. Being able to develop locally, I don't really care about the costs, but the ability to debug, get feedback, and just the turnaround time to try things is amazing. Again, thanks. I'd like to ask you more questions, how can I do that?
@DataEngUncomplicated
@DataEngUncomplicated 6 ай бұрын
Thanks, feel free to post your questions here. Me or someone else might be able to help you out!
@dougkfarrell
@dougkfarrell 6 ай бұрын
@@DataEngUncomplicated Thanks! I'm using Glue ETL to read two different CSV files into Dynamic Frames, normalize and union them together. I need to write some SQL to an existing RDS MySQL database to query records to figure out if I need to update or insert data. Is there a good (as in fast) way to iterate over the normalized, unioned DynamicFrame and read and write to an RDS MySQL database? Thanks in advance for any help!
@wanyingli246
@wanyingli246 2 ай бұрын
I just set up my pycharm based on your tutorials and it works out great!!! Btw, I need to import some external python libraries into the local environment and unfortunately I couldn’t figure that out. Could you make a video of how to import external library in local environment?
@ahm_mask5161
@ahm_mask5161 Жыл бұрын
Loved the video would of loved it more if it was in vs code also if you could make a etl tutorial using glue locally that would be awesome
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks, I will make a video with vs code since there seems to be some demand for this! Yup I'm also working on some tutorials using glue locally in my next couple of videos
@Patrick-ig3cn
@Patrick-ig3cn Жыл бұрын
Amazing tutorial! You explained the whole process extremely clearly. Quick question, do you know if this is also possible to set up in VSCode?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks Patrick! Great I'm glad it made sense. Yes! It is also possible to set up in vs code! I don't use vscode but I could make a video if enough people think it would be useful.
@Patrick-ig3cn
@Patrick-ig3cn Жыл бұрын
Thanks for the reply, if there is demand for it I'd be extremely grateful! Otherwise thanks so much again for the tutorial, it's extremely enlightening on the whole process!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
​@@Patrick-ig3cn you're welcome! I highly recommend developing glue jobs locally. I have another video coming out tomorrow briefly explaining the benefits.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I have just uploaded the video for setting it up with vs code Thanks for the suggestion! kzbin.info/www/bejne/lZDNXoavpNWJeNU
@Fight3211
@Fight3211 Жыл бұрын
Would love a similar tutorial for VScode :)
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
You're the second person that has requested this! Do you think more folks use vs code? I am considering making a video soon.
@ahm_mask5161
@ahm_mask5161 Жыл бұрын
I was literally thinking the same thing
@waleayeni
@waleayeni Жыл бұрын
yes please
@鼠鼠-l8x
@鼠鼠-l8x Жыл бұрын
please >
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! kzbin.info/www/bejne/lZDNXoavpNWJeNU
@maximilianrausch5193
@maximilianrausch5193 Жыл бұрын
Amazing video
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks!
@maximilianrausch5193
@maximilianrausch5193 10 ай бұрын
Are there any instructions on how to set up docker desktop and WSL if that is also required for this?
@mackfarshi8289
@mackfarshi8289 11 ай бұрын
Thank you so much for this video. Very well explained and helpful. I was wondering if there is a way that we can also resolve "SparkContext" error in the import or link to a video you explain it. really appreciate it.
@abhishekgarg6301
@abhishekgarg6301 Жыл бұрын
great tutorial, will you be creating the same with visual studio code?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I will be as soon as I come back from vacation!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! kzbin.info/www/bejne/lZDNXoavpNWJeNU
@giorgosstamatakis7144
@giorgosstamatakis7144 Жыл бұрын
Great video, I would like to ask if anyone else experienced the following issue. When I add the glue-libs repo as a new content root, PyCharm stops recognising the pyspark imports as valid. Moreover, the window visible in 6:10 (showing the available python packages) is empty. Any ideas on what could have gone wrong?
@maximilianrausch5193
@maximilianrausch5193 9 ай бұрын
I am having the same issue (no packages shown as available). Any ideas how to fix it?
@DataEngUncomplicated
@DataEngUncomplicated 9 ай бұрын
Thanks, is your docker container running? That's the first thing I would check to make sure it's not a problem finding the docker container on your machine
@maximilianrausch5193
@maximilianrausch5193 9 ай бұрын
@@DataEngUncomplicated I updated pycharm to newest version and it resolved the issue.
@guyfridman4426
@guyfridman4426 Жыл бұрын
Thank you, any chance to do the same tutorial on Mac ?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I do not I'm sorry!
@jfkpami4194
@jfkpami4194 2 ай бұрын
Can we do this with the pycharm community edition?
@DataEngUncomplicated
@DataEngUncomplicated 2 ай бұрын
No, unfortunately you cannot because it requires docker which is only supported with pro edition.
@kkos
@kkos Жыл бұрын
Great Video! All works, however we cannot use Docker API you're using in tutorial. I've tried to connect to Docker daemon using SSH. I can run Glue Job, but cannot run debugger. Getting ConnectionRefusedError: [Errno 111] Connection refused. Did you manage to make debugger work for Docker SSH?
@yashsrivastava14
@yashsrivastava14 10 ай бұрын
Can you also show how to setup default credentials in the docker container?
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
Hi, yes I cover this in the video. You have to set the credential path in the docker image.
@aabbassp
@aabbassp Жыл бұрын
Thanks for the video! Amazing. Can you deploy this to AWS somehow automatically or you need to do it manually?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Yes! You cann deploy this automatically many ways. Using terraform, cdk , or cloud formation template
@ThiagoDias-d8y
@ThiagoDias-d8y Жыл бұрын
Great video! really nice! I am struggling to find out how to set the "--additional-python-modules" anyone else ? can´t find anything related to it for local run :(
@PuzzlerCraft
@PuzzlerCraft Жыл бұрын
Hello ! I dont have Professional version of PyCharm. Is there any way that you can explain how to configure using VS Code or free version of PyCharm ?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hey! Sorry you need the professional version of pycharm for it to work. I plan on making a tutorial for vs code soon.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! kzbin.info/www/bejne/lZDNXoavpNWJeNU
@nguyentonggiang1994
@nguyentonggiang1994 Жыл бұрын
Nice video. You've got a thumbs up from me. However, I got trouble when installing extra python libraries to the glue container. Could you please guide me how to install external python library to this glue container? Thanks a lot.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks, that's a good question, I will have to get back to you. I'm sure you can do it by going into the docker container and installing them directly in there but I wonder if there is an easier way to do this.
@BrendanDale-v4z
@BrendanDale-v4z Жыл бұрын
Nice video. I want to know how to access tables of the glue catalog that belongs to the related aws account? if run spark.sql('show databases') in your script, will all databases of the online catalog be shown?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
It should as long as your profile has the permission to see the databases
@ahkamnaseek2850
@ahkamnaseek2850 Жыл бұрын
Did you try to install additional python packages to the image?? From the IDE it’s not allowing.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
No I did not, you probably need to go into the docket image and install it that way vs through the UI. Have you tried that?
@ahkamnaseek2850
@ahkamnaseek2850 Жыл бұрын
@@DataEngUncomplicated we can’t log inside the image directly right? Wht I did was I could be able to run the default image as container and logged in to it and installed the library and built the container as a new image. Then from pycharm, I pointed to it. Now the library is visible from pycharm but the import statement is failing while running the code. Idkw 😌
@prabhathkota107
@prabhathkota107 7 ай бұрын
Docker option not available in PyCharm community edition I guess
@DataEngUncomplicated
@DataEngUncomplicated 7 ай бұрын
Yes that's correct unfortunately
@ahkamnaseek2850
@ahkamnaseek2850 Жыл бұрын
Hi, can you please tell what’s your exact pycharm version please? Coz, docker is not working correctly with new pucharm version. I tried with 2023.1.4 and it worked
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Sure! My version is 2023.1. strange, hopefully they fix the issue with the latest version. I wonder if anyone else is experiencing the same issue you have encountered?
@errrbrrr3821
@errrbrrr3821 Жыл бұрын
please make also for vs code
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! kzbin.info/www/bejne/lZDNXoavpNWJeNU
@LearningNewThings0407
@LearningNewThings0407 10 ай бұрын
The sample data used in the script should be present in my personal aws accounts s3 bucket ?
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
For testing locally or deployment to use on the AWS Glue Service? For testing locally you do not need it in your personal aws account s3 bucket if you are running docker locally.
@LearningNewThings0407
@LearningNewThings0407 10 ай бұрын
@@DataEngUncomplicated I am trying to test Glue locally. I have docker running locally. I am not sure about the "Update Docker Container Settings". Why do we need to provide AWS credentials and why IAM permissions are required specifically for this testing ? My understanding is that these credentials and permissions are used to connect/use services on AWS but since we are running it locally, do we still need to provide AWS credentials? Also, say if I don't have an AWS account setup yet, does it mean I cannot run AWS Glue locally as well ?
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
Good question, so if you need to connect to data on an s3 bucket for testing then you need to pass in credentials. If not, then you don't need to pass in any profile and can skip this sense. It's not a requirement.
@LearningNewThings0407
@LearningNewThings0407 10 ай бұрын
@@DataEngUncomplicated thank you so much for confirming this. So is the data file "memberships.json" used in this example located in the docker image running locally? In the code the path points to s3 location. Please let me know if this assumption is correct.
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
No, the member.json file is coming from s3 and I needed an iam role that had permission to access that s3 bucket which is why I had to pass the credential file into the docker image. The data is being moved from s3 into the docker container when I run the code. Hopefully this helps clarify things.
@andresm9051
@andresm9051 20 күн бұрын
Great video, but when seting up a python run configuration, there is not AWS Connection, iam using pycharm 2024 any comment?
@DataEngUncomplicated
@DataEngUncomplicated 20 күн бұрын
Are you using the professional edition?
@andresm9051
@andresm9051 19 күн бұрын
@ Hello iam using professional edition
@ricardoroa5874
@ricardoroa5874 Жыл бұрын
I dont have the AWS Connection window, I need to install something additional on pycharm?
@ricardoroa5874
@ricardoroa5874 Жыл бұрын
I just installed AWS CLI and It work!, thanks, great tutorial!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi Ricardo, Sorry I must have missed that pre-requisite. Thanks for flagging this for others! I'm glad you got it working! It's going to make development much better
@gouravroy4573
@gouravroy4573 6 ай бұрын
@@DataEngUncomplicated I am not getting AWS connection window even after installing aws cli. I am using pycharm professional edition.
@brunoniello2019
@brunoniello2019 Жыл бұрын
i use vs code :(
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I'll make a video setting it up with vs code
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
I have just uploaded the video configuring aws glue with VS Code and docker. Thanks for the suggestion! kzbin.info/www/bejne/lZDNXoavpNWJeNU
Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest
11:41
DataEng Uncomplicated
Рет қаралды 7 М.
Мама у нас строгая
00:20
VAVAN
Рет қаралды 12 МЛН
Don't underestimate anyone
00:47
奇軒Tricking
Рет қаралды 29 МЛН
Муж внезапно вернулся домой @Oscar_elteacher
00:43
История одного вокалиста
Рет қаралды 8 МЛН
Как Я Брата ОБМАНУЛ (смешное видео, прикол, юмор, поржать)
00:59
Docker VSCode Python Tutorial // Run your App in a Container
19:13
Christian Lempa
Рет қаралды 99 М.
Why Data Engineers Should Develop AWS Glue Jobs Locally
6:45
DataEng Uncomplicated
Рет қаралды 8 М.
AWS Glue PySpark: Flatten Nested Schema (JSON)
7:51
DataEng Uncomplicated
Рет қаралды 15 М.
Author AWS Glue jobs with PyCharm Using AWS Glue Interactive Sessions
12:23
DataEng Uncomplicated
Рет қаралды 14 М.
Customize Dev Containers in VS Code with Dockerfiles and Docker Compose
6:16
The intro to Docker I wish I had when I started
18:27
typecraft
Рет қаралды 280 М.
AWS Glue Job Import Libraries Explained (And Why We Need Them)
5:16
DataEng Uncomplicated
Рет қаралды 18 М.
Don't use VSCode
35:31
PyCon South Africa
Рет қаралды 231 М.
Мама у нас строгая
00:20
VAVAN
Рет қаралды 12 МЛН