Awesome and valuable information. Great option to develop locally.
@DataEngUncomplicated10 ай бұрын
Thanks Julian!
@ColdBlkPenguin Жыл бұрын
Great video! Thank you for making this - my only feedback would be that it feels like you are reading a script to me (which I am sure you probably are). The information you are providing is great, but the delivery can feel a bit "lecturer reading off the powerpoint slides"-y. The video would also feel more engaging if you were "making eye contact" with the camera. Keep up the good work
@DataEngUncomplicated Жыл бұрын
Thanks for the valuable feedback, I don't do too many talking head videos, it's definitely something I could improve on!
@wilsonwaigant4827 Жыл бұрын
Nice video! I´m currently working on a project but I was worry about the cost of working on AWS. Now I have a question, if I started working locally, where could I storage the data that I´d generate in the process? and, how and when to migrate the whole work to AWS? Thank you!
@DataEngUncomplicated Жыл бұрын
Thanks Wilson, so if you configure your AWS credentials, you can store your data in AWS s3 that you generate in the process if you need to store it. So you should migrate your process to AWS when you are done developing and ready to run your job on the actual data. I'm assuming your data is large and that's why you might want to use pyspark and a larger cluster to process it all. The best way to migrate it to AWS is by using infrastructure as code like cdf or terraform. I am going to make a video on how to do this with terraform soon.
@wilsonwaigant4827 Жыл бұрын
@@DataEngUncomplicated thank you! I'm waiting your video to learn more about it
@Preetham-f2n Жыл бұрын
Hi can you make an video on Data migration On premises to AWS cloud with end to end process and what are tools used.
@AdamAdam-oq4fy Жыл бұрын
Well, my way of building glue jobs - using glue notebook/zepplin to build all the logic - using vscode/pycharm to wrap things up into classes/modules/methods with all the extentions of vscode - using cdk to deploy the glue job: using the scripts created above and link to the correct folder structure when deploying - once deployed, I should have my glue job ready on the console - run/ test/ or modify when needed, but I encourage doing the changes through code
@joseeeeeeeeeeeeeeee1 Жыл бұрын
Do you have a tutorial?
@mickyman7539 ай бұрын
My team also does the same, I think if you have a established ci/cd setup then , this is the only way to perform addition of new glue jobs
@AlDamara-x8j Жыл бұрын
Great informative video. Thanks for sharing. By the way, do you also have a tutorial showing how to work with interactive sessions with jupyter lab/notebooks (anaconda)?
@andrzejkozielec139 Жыл бұрын
great video!
@DataEngUncomplicated Жыл бұрын
Thanks!
@sjvr1628 Жыл бұрын
Keep doing more 😊
@DataEngUncomplicated Жыл бұрын
Thanks! I will 😊 I have a lot of video ideas in the pipeline
@harshadk426410 ай бұрын
How do we orchestrate these aws glue jobs? Do we create the python code for eventbridge, lambda and step functions?
@DataEngUncomplicated9 ай бұрын
You have many options for orchestrating glue jobs, Glue has an orchestration section which you can orchestrate your glue jobs. You can also orchestrate this in airflow if your company is already using this. If your jobs are more complex and requires trigging other aws services along the way, It would probably be a good idea to leverage step functions.
@externalbiconsultant20547 ай бұрын
wondering if watching costs are really a data engineers activity?
@DataEngUncomplicated7 ай бұрын
Yes, cost optimization is part of every role when working in a cloud environment. If you work for a large funded organization that isn't coming down on costs you might night feel it as much as a start up that freaks out for an extra $100 in cloud costs.