Why Data Engineers Should Develop AWS Glue Jobs Locally

  Рет қаралды 8,748

DataEng Uncomplicated

DataEng Uncomplicated

Күн бұрын

Пікірлер: 20
@julianromero3359
@julianromero3359 10 ай бұрын
Awesome and valuable information. Great option to develop locally.
@DataEngUncomplicated
@DataEngUncomplicated 10 ай бұрын
Thanks Julian!
@ColdBlkPenguin
@ColdBlkPenguin Жыл бұрын
Great video! Thank you for making this - my only feedback would be that it feels like you are reading a script to me (which I am sure you probably are). The information you are providing is great, but the delivery can feel a bit "lecturer reading off the powerpoint slides"-y. The video would also feel more engaging if you were "making eye contact" with the camera. Keep up the good work
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks for the valuable feedback, I don't do too many talking head videos, it's definitely something I could improve on!
@wilsonwaigant4827
@wilsonwaigant4827 Жыл бұрын
Nice video! I´m currently working on a project but I was worry about the cost of working on AWS. Now I have a question, if I started working locally, where could I storage the data that I´d generate in the process? and, how and when to migrate the whole work to AWS? Thank you!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks Wilson, so if you configure your AWS credentials, you can store your data in AWS s3 that you generate in the process if you need to store it. So you should migrate your process to AWS when you are done developing and ready to run your job on the actual data. I'm assuming your data is large and that's why you might want to use pyspark and a larger cluster to process it all. The best way to migrate it to AWS is by using infrastructure as code like cdf or terraform. I am going to make a video on how to do this with terraform soon.
@wilsonwaigant4827
@wilsonwaigant4827 Жыл бұрын
@@DataEngUncomplicated thank you! I'm waiting your video to learn more about it
@Preetham-f2n
@Preetham-f2n Жыл бұрын
Hi can you make an video on Data migration On premises to AWS cloud with end to end process and what are tools used.
@AdamAdam-oq4fy
@AdamAdam-oq4fy Жыл бұрын
Well, my way of building glue jobs - using glue notebook/zepplin to build all the logic - using vscode/pycharm to wrap things up into classes/modules/methods with all the extentions of vscode - using cdk to deploy the glue job: using the scripts created above and link to the correct folder structure when deploying - once deployed, I should have my glue job ready on the console - run/ test/ or modify when needed, but I encourage doing the changes through code
@joseeeeeeeeeeeeeeee1
@joseeeeeeeeeeeeeeee1 Жыл бұрын
Do you have a tutorial?
@mickyman753
@mickyman753 9 ай бұрын
My team also does the same, I think if you have a established ci/cd setup then , this is the only way to perform addition of new glue jobs
@AlDamara-x8j
@AlDamara-x8j Жыл бұрын
Great informative video. Thanks for sharing. By the way, do you also have a tutorial showing how to work with interactive sessions with jupyter lab/notebooks (anaconda)?
@andrzejkozielec139
@andrzejkozielec139 Жыл бұрын
great video!
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks!
@sjvr1628
@sjvr1628 Жыл бұрын
Keep doing more 😊
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks! I will 😊 I have a lot of video ideas in the pipeline
@harshadk4264
@harshadk4264 10 ай бұрын
How do we orchestrate these aws glue jobs? Do we create the python code for eventbridge, lambda and step functions?
@DataEngUncomplicated
@DataEngUncomplicated 9 ай бұрын
You have many options for orchestrating glue jobs, Glue has an orchestration section which you can orchestrate your glue jobs. You can also orchestrate this in airflow if your company is already using this. If your jobs are more complex and requires trigging other aws services along the way, It would probably be a good idea to leverage step functions.
@externalbiconsultant2054
@externalbiconsultant2054 7 ай бұрын
wondering if watching costs are really a data engineers activity?
@DataEngUncomplicated
@DataEngUncomplicated 7 ай бұрын
Yes, cost optimization is part of every role when working in a cloud environment. If you work for a large funded organization that isn't coming down on costs you might night feel it as much as a start up that freaks out for an extra $100 in cloud costs.
Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest
11:41
DataEng Uncomplicated
Рет қаралды 7 М.
AWS Glue ETL Vs EMR - Which one should I use?
8:05
Johnny Chivers
Рет қаралды 43 М.
How to have fun with a child 🤣 Food wrap frame! #shorts
0:21
BadaBOOM!
Рет қаралды 17 МЛН
БОЙКАЛАР| bayGUYS | 27 шығарылым
28:49
bayGUYS
Рет қаралды 1,1 МЛН
OCCUPIED #shortssprintbrasil
0:37
Natan por Aí
Рет қаралды 131 МЛН
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 33 М.
7 AWS Services That Every App Needs
16:45
Learn Valkey with Mateush
Рет қаралды 1,5 М.
Practical Projects to Learn Data Engineering On AWS
8:04
DataEng Uncomplicated
Рет қаралды 52 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 185 М.
AWS Glue PySpark: Flatten Nested Schema (JSON)
7:51
DataEng Uncomplicated
Рет қаралды 15 М.
Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step
13:43
AWS Glue Job Import Libraries Explained (And Why We Need Them)
5:16
DataEng Uncomplicated
Рет қаралды 19 М.