AWS Tutorials - Methods of Building AWS Glue ETL Pipeline

  Рет қаралды 9,094

AWS Tutorials

AWS Tutorials

Күн бұрын

Пікірлер: 36
@santospcs2011
@santospcs2011 2 жыл бұрын
Thank you for the pipeline video, very insightful.
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
Glad it was helpful!
@hirendra83
@hirendra83 3 жыл бұрын
Excellent Tutorial. Thanks
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
You are welcome!
@rollinOnCode
@rollinOnCode 2 жыл бұрын
this is super good and helpful. thank you
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
You're very welcome!
@nttazitt1300
@nttazitt1300 3 жыл бұрын
Very helpful tutorial, thanks.
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
Glad it was helpful!
@nrodriguezgal148
@nrodriguezgal148 Жыл бұрын
Excellent video and explanation.
@AWSTutorialsOnline
@AWSTutorialsOnline Жыл бұрын
Glad it was helpful!
@hsz7338
@hsz7338 3 жыл бұрын
As always thank you for the video. The breakdown comparison is incredibly intuitive. I am curious about your view on which approach is best in handling pipeline replay (i.e. handling pipeline failure) and CI/CD process (i.e. pipeline as code)?
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
CICD support is available for all approaches. Replay is better in event driven approach because you can run part of pipeline based on error raised during the event.
@andresmerchan6418
@andresmerchan6418 Жыл бұрын
Hello! Which of the three methods is more cost effective?
@AWSTutorialsOnline
@AWSTutorialsOnline Жыл бұрын
Event based.
@DavidChoqueluqueRoman
@DavidChoqueluqueRoman Жыл бұрын
Hello, good video. Maybe someone knows when use Glue workflows and when use StepFunctions?
@AWSTutorialsOnline
@AWSTutorialsOnline Жыл бұрын
Glue workflow when you want to orchestrate Glue Job and Crawler only. StepFunction when you want to orchestrate Glue Job, Crawler plus other services as well.
@adityag020
@adityag020 3 жыл бұрын
Insightful tutorial. Can you make a practical video based on event based pipeline using Dynamodb to store metadata and configurations with retry mechanism in case if it fails?
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
sure - coming soon :)
@timmyzheng6049
@timmyzheng6049 2 жыл бұрын
Thank you for the pipeline video, very insightful. Quick question: to avoid hardcoding, can I also use DynamoDB for storing environment parameters like s3 paths / file names / business date for my ETL pipeline let's say using step functions, and what do you think is the best industry practice for storing parameters for AWS ETL pipeline?
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
if you want to decouple the configuration, you should keep the configuration centralized in the services likes DynamoDB or Parameter Store. DD is especially good if you are going for multi-account deployment and you still want to keep the configuration centralized.
@timmyzheng6049
@timmyzheng6049 2 жыл бұрын
@@AWSTutorialsOnline Thank you for the response. After doing some research it seems that to pass parameters to glue job, I have to use Lambda with boto3 in step function, and as lambda can call glue job using python glue API too, does that mean there is no need to put glue job separately to step function?
@alokanand851
@alokanand851 2 жыл бұрын
Hi all, We are using AWS Glue + PySpark to perform ETL to a destination RDS PostgreSql DB. Destination tables have columns with primary & foreign keys with UUID data type. We are failing to populate these destination UUID type columns. How can we achieve this, please suggest.
@AWSTutorialsOnline
@AWSTutorialsOnline 2 жыл бұрын
I am not sure what error you are getting. ETL job has to respect table level column constraint. As long as you are doing it; there should not be a problem.
@mangeshshinde2844
@mangeshshinde2844 3 жыл бұрын
Nice tutorial. Can you make some practical tutorial for event based pipeline?
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
Yes, sure. I am getting multiple requests for that. I will do it.
@pachappagarimohanvamsi4641
@pachappagarimohanvamsi4641 3 жыл бұрын
Could you please make some practical workshop kind of thing on these approaches?
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
Sure. will do. The Glue workflow lab is already available @ aws-dojo.com/workshoplists/workshoplist29/
@ryany420
@ryany420 2 жыл бұрын
awesome tutorial! I have a quetion to ask if dont mind: how shall we deal with upsert/delete in those landing/clean/curated zones? I know databricks has similar archtechture with brozne/silver/gold, but it comes with delta lake. if our destination is Redshift, should we move data into Redshift(RDBMS) in earlier stage, like before curated zone. I also send you email, hope you can help to answer. thanks heaps....
@radhasowjanya6872
@radhasowjanya6872 3 жыл бұрын
Hello Sir..I follow all your videos. They are very useful in my project. Thank you very much.I have a quick question: Is there a possibility to add multiple SQL statements in one AWS glue Studio job? if yes can you help me with it.(use case: want to truncate the target table(Snowflake) before loading)
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
You can multiple SQL Transform one after another to run multiple SQL statement in sequence
@YEO19901
@YEO19901 3 жыл бұрын
Wonderful.
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
Thanks
@SurendraUddagiri
@SurendraUddagiri 3 жыл бұрын
Can we do machine learning algorithm in glue job using coding
@AWSTutorialsOnline
@AWSTutorialsOnline 3 жыл бұрын
Yes, you can but not recommended. You should use glue job for feature engineering but not for training the model. Model training should be done in SageMaker.
@ankursinhaa2466
@ankursinhaa2466 Жыл бұрын
I love you
AWS Tutorials - Building Federated Data Lakes with AWS
51:02
AWS Tutorials
Рет қаралды 1,6 М.
AWS Tutorials - Data Quality Check using AWS Glue DataBrew
42:50
AWS Tutorials
Рет қаралды 9 М.
Help Me Celebrate! 😍🙏
00:35
Alan Chikin Chow
Рет қаралды 87 МЛН
Всё пошло не по плану 😮
00:36
Miracle
Рет қаралды 2,9 МЛН
REAL 3D brush can draw grass Life Hack #shorts #lifehacks
00:42
MrMaximus
Рет қаралды 11 МЛН
AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]
41:30
Johnny Chivers
Рет қаралды 271 М.
AWS Tutorials - ETL Pipeline with Multiple Files Ingestion in S3
41:30
AWS Tutorials - Working with Data Sources in AWS Glue Job
42:06
AWS Tutorials
Рет қаралды 9 М.
AWS Tutorials - Data Quality Check in AWS Glue ETL Pipeline
41:33
AWS Tutorials
Рет қаралды 9 М.
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 29 М.
Help Me Celebrate! 😍🙏
00:35
Alan Chikin Chow
Рет қаралды 87 МЛН