Hello, Great video, great channel, I learned kafka alot from your content and started working with it. 1 question: How this DAG handles multiple requests from multiple files arriving in S3? And if handles bad, what is the best way to orchestrate it? thanks for the attention
@KnowledgeAmplifier1 Жыл бұрын
Thank you for watching the video and for your question, Imagine Zat! I'm glad to hear that you find the content helpful. Regarding your question about how the DAG handles multiple requests from multiple files arriving in S3, the answer is _SUCCESS File. The _SUCCESS file is a marker file that is automatically generated by Apache Spark when writing data to a destination, such as a file system (e.g., HDFS, S3). Its purpose is to indicate the successful completion of a write operation. So instead of triggering lambda code for any file write , only if _SUCCESS File is written in s3 , then only Lambda should be triggerd , that way , lambda will run only once and this process make sure , the airflow dag will run only when whole data is written by source system 😊For details , you can refer this video -- kzbin.info/www/bejne/paXQaIGYotusaac I hope this clarifies your question. If you have any further inquiries, please feel free to ask.
@SanjayKumar-d4t2q9 ай бұрын
Let say, we lambda triggered multiple REST API request to Airflow? How will Airflow handle that scenario? Will it create multiple DAG instances and run concurrently? Or there will be one DAG run with multiple concurrent tasks for each request?