Use airflow to orchestrate a parallel processing ETL pipeline on AWS EC2 | Data Engineering Project

  Рет қаралды 12,212

tuplespectra

tuplespectra

Күн бұрын

Пікірлер: 41
@zuesbenz
@zuesbenz 8 ай бұрын
you did a fine job, I plan to watch the whole series.
@tuplespectra
@tuplespectra 7 ай бұрын
Thanks so much.
@go27sia
@go27sia 6 ай бұрын
Thank you very much for creating this project. I followed all 3 videos from this series and learnt a lot. Thank you!
@tuplespectra
@tuplespectra 5 ай бұрын
Great to hear! You r welcome.
@gyungyoonpark
@gyungyoonpark 9 ай бұрын
thank you for the true masterpiece tutorial again!!! following the video and practicing is truly a joy of learning! P.S. Guys. if you are changing "houston" to other city, then sql "JOIN" part will not work at all. so make sure to just use "houston".
@viishhnudatta5124
@viishhnudatta5124 5 ай бұрын
Excellent Tutorial
@tuplespectra
@tuplespectra 5 ай бұрын
Thanks for your comment.
@bethuelthipe-moukangwe7786
@bethuelthipe-moukangwe7786 Жыл бұрын
Thank you very much , you video lesson helped me to build my first data pipeline.
@tuplespectra
@tuplespectra Жыл бұрын
Awesome. I'm glad you find it valuable and were able to build your first data pipeline.
@yixiangzhang2834
@yixiangzhang2834 Жыл бұрын
good stuff. Initial cost of building a DE channel is high but it's worth it. Keep up the good work.
@tuplespectra
@tuplespectra Жыл бұрын
Thank you!
@AjaySinghTomar05
@AjaySinghTomar05 11 ай бұрын
phenomenal work, super informative and clear explanations. keep it up
@tuplespectra
@tuplespectra 11 ай бұрын
Thanks a lot!
@fatimaezzahrasoubari5928
@fatimaezzahrasoubari5928 Жыл бұрын
thank you so much for this help i really appreciate that , please keep working and don't forget subtitle is do helpful for me
@tuplespectra
@tuplespectra Жыл бұрын
Thanks for the comment and feedback.
@vaibhavverma1340
@vaibhavverma1340 Жыл бұрын
Part1 is worth watching , I learnt a lot looking forward to complete part 2. thank you so much please keep doing what you are doing :)
@tuplespectra
@tuplespectra Жыл бұрын
Thanks so much for the comment. I'm glad you found the videos valuable and learnt a lot.
@facuoppi
@facuoppi Жыл бұрын
Men you are the best, thx for this 🙌🏻
@tuplespectra
@tuplespectra Жыл бұрын
Thank you!
@TvsCar30
@TvsCar30 Жыл бұрын
So cool!
@tuplespectra
@tuplespectra Жыл бұрын
Thanks so much!
@atharvbajare7398
@atharvbajare7398 7 ай бұрын
Is it work with Instance type t2.small Cause its cheaper than medium one I referred your last video and my airflow goes smoothly with t2.small I need to start working on this project so I'm asking like should it goes smoothly with this project or i have to use medium version Reply me as soon as possible & Thanks a lot for making such great videos 🙏❣️
@tuplespectra
@tuplespectra 7 ай бұрын
Airflow works better on medium than t2 small. Airflow has frozen a couple of times for me on t2.small. if you are thinking about the cost, I think you can give t2.small a try and see how it works for you.
@femiotitolaiye1531
@femiotitolaiye1531 Жыл бұрын
hello, nice work sir. this is highly resourceful. but my question is when creating the tables, what about conditions where the columns from the API changes constantly, do we have to always go and change our code which is not a good engineering practice.
@tuplespectra
@tuplespectra Жыл бұрын
One way to do this is to ask your code to check the tables in your database and if there is any columns that is in your incoming data but not already in your table, you should alter the table and add the columns.
@user-zb5jl6tj1v
@user-zb5jl6tj1v Жыл бұрын
hi may i ask why is it that in your previous video, we were required to expose the AWS credentials (using session token) to access S3 to load the final results in the bucket? however, we do not need to do so in this video.
@gyungyoonpark
@gyungyoonpark 9 ай бұрын
I have a question regarding the csv file. why do you create csv file in the first place? isn't it better to just upload "df_data" to the postgres? for example, let's say we run this dag every day. there will be too many csv files in the folder. so why create the csv file at all?
@tuplespectra
@tuplespectra 9 ай бұрын
You are correct, you may not need to produce csv files. Your architecture depends on the requirement. So whatever I have taught is for educational purpose.
@josephostrow4876
@josephostrow4876 9 ай бұрын
My DAG is failing at 'tsk_uploadS3_to_postgres' with error 'HTTP 301. No response body.' - any ideas? I uploaded the same .csv from your GitHub to my S3 bucket and followed all the steps for importing S3 data into RDS PostgreSQL
@josephostrow4876
@josephostrow4876 9 ай бұрын
Ok super simple fix but I'll leave here in case anyone runs into this - just had to make sure the region specified in the SQL for this task aligns with the S3 region (in my case changing us-east-1 to us-east-2) Btw thanks for an amazing tutorial! Been learning a lot
@atharvbajare7398
@atharvbajare7398 7 ай бұрын
i am getting error while connecting psql postgres=> this is not starting
@latabharti8175
@latabharti8175 11 ай бұрын
failed to connect the ssh error is showing
@vasudevreddy3527
@vasudevreddy3527 10 ай бұрын
Hi, am getting error while importing airflow.providers.postgres from airflow.providers.postgres.operators.postgres import PostgresOperator ModuleNotFoundError: No module named 'airflow.providers.postgres' I followed the same installation approach but am getting error.. checked all the possibilities, can you give me the solution ?
@vasudevreddy3527
@vasudevreddy3527 10 ай бұрын
After debugging, got to know PostgresOperator is deprecated. So we should use SQLExecuteQueryOperator and pass the conn_id as postgres_conn 🙂
@tuplespectra
@tuplespectra 10 ай бұрын
Thanks for your comment and the knowledge sharing.
@JuanCruz-nu4mg
@JuanCruz-nu4mg Жыл бұрын
Im stuck 1:25:49 - My CSV will not upload to postgres, I am getting a 'extra data after last expected column' log error. Ive even copy pasted your code and tried saving the excel file several ways and still nothing
@tuplespectra
@tuplespectra Жыл бұрын
I'm guessing your csv has an extra column data. Did you use the csv in my github? Also ensure the file is a .csv
@JuanCruz-nu4mg
@JuanCruz-nu4mg Жыл бұрын
@@tuplespectra I figured it out, I had a coding error on my first run of the postgres table, and didnt have all the correct things loaded in, once I deleted the table out of postgres with DROP TABLE and reran it, it worked!
@atharvbajare7398
@atharvbajare7398 7 ай бұрын
Hello sir please help me I'm getting bill for using RDS it's goes to $75 I'm student , I'm not getting how to stop getting those bills it's, yesterday I have terminated all EC2 instance and RDS instance still I got bill for RDS $62 Day before yesterday it was $20 but that time I didn't stop RDS yesterday i deleted RDS and stop all services regarding to it at 7 in evening still in today's morning I see I have bill of $75 Please help me how to stop getting bills I'm student I have ask for money to my parents please help me inr 6000 is big amount for me but I wish this bill will not get exceeds Please reply me as soon as possible I'm begging for here for help please help me to find way to stop getting those bills
@priyamtamrakar2738
@priyamtamrakar2738 Ай бұрын
Go to the billing and cost management of your AWS account and check exactly for which service are you being billed for. Is it for compute, IP address, ENI, storage volumes, data transfer charges, NAT gateway or something else. Check if your RDS instance is in the free tier and doesn’t have any read/write replica in another availability zone. Check if the storage of the RDS instance is still saved. Check if the RDS instance is in the public subnet and has an elastic IP. Check each and every detail which you have selected while spinning up the RDS instance and there there will be a catch to the problem. For the bill to be waived off you should try to contact AWS customer care.
@atharvbajare7398
@atharvbajare7398 Ай бұрын
Yah after putting this comment i found that my rds instance is on. Any one who is working on this project takes care of stopping your rds instance.@@priyamtamrakar2738
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 46 МЛН
小丑揭穿坏人的阴谋 #小丑 #天使 #shorts
00:35
好人小丑
Рет қаралды 54 МЛН
Perfect Pitch Challenge? Easy! 🎤😎| Free Fire Official
00:13
Garena Free Fire Global
Рет қаралды 92 МЛН
SH: Let's build a data pipeline with Prefect!
1:46:51
CodeSeoul
Рет қаралды 11 М.
Building a Batch Data Pipeline using Airflow, Spark, EMR & Snowflake
41:11
Knowledge Amplifier
Рет қаралды 27 М.
Apache Spark End-To-End Data Engineering Project | Apple Data Analysis
3:01:19
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 46 МЛН