How to integrate with LDAP in this version? I couldn't find 🥲
@kenamia91364 күн бұрын
I prefer the first method. Easier to read.
@eduardoamfm5 күн бұрын
awesome class tks
@ashimov19705 күн бұрын
Excellent content terrible accent
@marc22235 күн бұрын
It’s the 🥐 accent, the more you hear it the more you love it ❤
@ashimov19705 күн бұрын
@@marc2223 too hard to understand him
@RemiAdeleye6 күн бұрын
This was a great tutorial - easy to conceptualize and a great example which was easy to follow. It's a great feeling to look at your data team's airflow github repo and actually understand what the DAGs are doing when you didn't know how airflow worked an hour ago!
@rahul-x3y6l7 күн бұрын
Hi , I am trying to run dbt transforms in Airflow, my table raw_invoices is already created in US location. still I am getting below error in Airflow. Error in model dim_customer (models/transform/dim_customer.sql)\x1b[0m', '\x1b[0m20:02:13 404 Not found: Table airflow-project-446808:retail.raw_invoices was not found in location US; Can you suggest here?
@tina_rahi077 күн бұрын
Thank you so much for this amazing video! It was very helpful and well-explained. However, I encountered the following error when running my code: airflow.exceptions.AirflowException: Cannot execute: spark-submit --master spark://spark-master:7077 --conf spark.master=spark://spark-master:7077 --num-executors 2 --executor-memory 2g --driver-memory 2g --name arrow-spark --verbose --deploy-mode client /usr/local/airflow/include/scripts/read.py /usr/local/airflow/include/1.csv. Error code is: 1. The only additional part in my code compared to yours is the inclusion of dropna. Could you please help me figure out what might be causing this issue? Thank you in advance!
@darshansolanki-u6q8 күн бұрын
Backfill cannot happen when the dag is not scheduled ? In that case airflow dags trigger command works, but it will not be able to trigger specific tasks but entire dag
@premsaikarampudi394410 күн бұрын
@MarcLamberti : Can you help me fix it? Docker image I am using is 12.6.0-python-3.10. Broken DAG: [/usr/local/airflow/dags/retail.py] Traceback (most recent call last): File "/usr/local/airflow/dags/retail.py", line 12, in <module> from include.dbt.cosmos_config import DBT_PROJECT_CONFIG, DBT_CONFIG File "/usr/local/airflow/include/dbt/cosmos_config.py", line 1, in <module> from cosmos.config import ProfileConfig, ProjectConfig ModuleNotFoundError: No module named 'cosmos'
@premsaikarampudi394410 күн бұрын
if you can re-run the project with latest docker image with python version 3.10, I guess most of the questions from everyone will be addressed
@premsaikarampudi394411 күн бұрын
Cant Astro just incorporate utg-8 encoding parameter as part of it's load file method?
@PhuocJoshDang11 күн бұрын
Can I pls ask that why didnt we just use dbt for quality checks. What the advantages of soda over dbt
@oreallyseven11 күн бұрын
Thanks for the great video, By way is the any possibility to do push and pull request to git from the airflow, could please share any useful resource to go through. Thanks!
@oreallyseven11 күн бұрын
This video is very informative, But Is there any way to pull and push to the git repo, if there is an resource is already there, can you please share me the link or can you please share any resource that would be very helpfull, Thanks!
@dalicodes12 күн бұрын
Done it thank you Marc. How can I query the table using athena. I tried cataloging the data using glue but crawlers can only detect general purpose buckets.
@afzalandthedreams13 күн бұрын
Hi Marc!, I noticed you used the BashOperator in the example pipeline @13:16 Could you please explain why you chose BashOperator over PythonOperator in this case? Also, how do you typically decide which operator to use when designing your DAGs? Thank you again for sharing your knowledge...I really appreciate your efforts!
@azriabdrahim704013 күн бұрын
Thanks Marc for the great video 🎉🎉. Sorry I am a beginner, just out of curiosity, how much $ does AWS charge you for this project?
@gaurav5400114 күн бұрын
Hi Marc, After selecting the python interpreter still my autocomplete doesnt work. I have tried switching between all the available one in my machine, any suggestion.
@eduardofarias8715 күн бұрын
I imagine this solution in a different way. I thought it would be just putting a file or updating a file in the dataset directory (or the file) without the need for a "producer" dag. I have an external system and not a DAG that unloads the data into a directory. For me, the "TriggerDagRunOperator" already fulfills exactly the same role as the Datasets. Help me understand if that's not it. Thanks for the content Marc!
@XiaomingZhou-t5f15 күн бұрын
could you pls share knowledge on how to integrate airflow v2 with custom oauth2 provider (not in the supported list from the airflow doc) ?
@hussainshaik36822 күн бұрын
Hey marc , I'm a beginner. loved your content. It would be really helpful if you could number the videos in this playlist to make it easier going with the right order.
@MarcLamberti16 күн бұрын
Good idea
@gtonizuka999026 күн бұрын
Thanks for the video ! Is soda essentiel and how much does it cost, i couldnt find the price on their website
@hicks_dwaynes27 күн бұрын
really thanks for lesson)
@palodans1217Ай бұрын
Good job.
@BuiTung-nw5xwАй бұрын
Sir, when I try to inspect the network, these Airflow containers and other containers are on a different network, even though I followed the instructions. Could you please post a video to help fix this?
@MarcLambertiАй бұрын
Look at the notion page in description
@BuiTung-nw5xwАй бұрын
@@MarcLamberti although i followed the instruction in the notion page, these still use difference network
@jacopomalatesta152222 күн бұрын
@@BuiTung-nw5xw Have you A) explicitly defined the network in the Docker compose file and B) passed the network to each service?
@zaminhassnain7570Ай бұрын
Great this worked fine locally. But when I tried to deploy the same code for AF deployed on using cloud.astronomer instance using celery executor, Th DAG is failing with this exceptions: raise AirflowException( airflow.exceptions.AirflowException: Cannot execute: spark-submit --master spark://spark-master:7077 --name arrow-spark --verbose --deploy-mode client ./include/scripts/read.py. Error code is: 1
@hicks_dwaynesАй бұрын
Thanks for lesson. I have a problem - TypeError: _choose_best_model() missing 1 required positional argument: 'ti'. Can you help?) returned values 8, 9 and 6
@MarcLambertiАй бұрын
Put ti=none in the parameters
@hicks_dwaynes28 күн бұрын
@@MarcLamberti Thanks, i changed Airflow 1.0 to 2.0 and had success))
@xavierangaleuАй бұрын
i am trying to upload the csv file to google cloud bucket but am getting a TIMEOUT ERROR. I tried uploading another file that was less heavy it worked. please can someone help me @MarcLamberti
@NamrataAnandkumarHurukadliАй бұрын
Hey Marc, I have deferred tasks in my DAG and when I use the backfill command to it, the task isn't going to deferred state, it's just marking all the tasks as successful ..... does the backfill command not work with deferrable operators ??
@LoveDarkChocolateАй бұрын
I would like to ask you about the DAG being scheduled on a Dataset or Dataset Alias. I assume that such DAG will only run if those datasets are modified by another DAG, not by any event that modifies those files outside the Airflow architecture?
@MarcLambertiАй бұрын
Correct
@tomasoonАй бұрын
Is it possible to run pyspark jobs from azure synapse? (for example a notebook)?
@MarcLambertiАй бұрын
Didn’t try yet
@forgottenvyАй бұрын
how to get the compose file with the latest images? Is the doc updated regularly or I grab latest images manually from docker hub and customize my compose file
@MarcLambertiАй бұрын
use the doc yes :)
@prakashraj4264Ай бұрын
I’ve one doubt sir what’s the usage of Installing airflow any specific reason for that ? And in what kind of scenario can we use this one ?
@Clement-r7tАй бұрын
Great tutorial, thanks Marc!
@ashutosh4912Ай бұрын
Now waiting for video where we deploy multinode worker of spark and airflow with Kubernetes in production 🎉
@MrBestshortyАй бұрын
Hey great video! Could you show us how to set this up using Kubernetes and helm?
@MarcLambertiАй бұрын
Tell me more :)
@RushikeshMule-p4bАй бұрын
If you're using Apprise, it's possible to tag people on Teams?
@MarcLambertiАй бұрын
I think so
@ashutosh4912Ай бұрын
They are in same network i verified using inspect command🍖 able to ping spark-master from airflow webserver as well
@MarcLambertiАй бұрын
here we go :)
@ashutosh4912Ай бұрын
Getting this error raise PySparkRuntimeError( pyspark.errors.exceptions.base.PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.
@Ferdi-g1qАй бұрын
Hey Marc, great video! Indeed that seems to be really easy to play around with Spark and Airflow in a dev environment with your explanation, but I have 3 questions though: (1) At the end of the video when you present the TaskFlow API way of triggering a PySpark job, you import pandas at a "top-level import" which is not really recommended if I am not wrong (should be imported within the function of the task)? I get that you need to do so since your function is returning a pandas DataFrame though :). (2) Same context as before: your function returns a pandas DF, it will be returned as a XCOM right? I remember that XCOM used to be used to pass around "little amount" of data from tasks to tasks, but I think this limitation is gone given you can set an XCOM backend to be a cloud object store (GCS, S3, ABFS). But do we really want to do that given the fact that Spark is designed to deal with big amount of data (although I totally get that it can make sense for small amount of data as showcased in your example) ? To rephrase my question: should not it be better to code your PySpark job so that the result is written in a cloud object storage instead of passing it around through XCOM? (3) In a higher environment (prod), would you still deploy Spark as a Kubernetes application (docker in your case but it is a container), or would it better to leverage the fact that PySpark applications can use Kubernetes as a cluster manager? Thanks again for the great video :)
@MarcLambertiАй бұрын
Lot of great remarks here :) 1) It depends. You have some imports that natually take time (like numpy). My recommendation is to make a local import if you use that import in one task otherwise you can make it top-level IF it's not a heavy import (like numpy again). You can verify that with the parsing time. 2) Absolutely. Here the data is very small so it doesn't matter. But if you have large data, then it's better to offload the work that Spark. That being said, you should always know what you're doing. It's ok to pass/process data through Airflow tasks as long as you know the limitations of your architecture/resources. 4) I prod, I would deploy Spark as a Kubernetes application and use Spark Connect as explained here airflow.apache.org/docs/apache-airflow-providers-apache-spark/stable/decorators/pyspark.html#spark-connect <3
@sseeer-r5dАй бұрын
Nice, need to YARN + TEZ and HDFS and will be grate e2e project
@fixxer47Ай бұрын
i dont have memberOf attribute in user's internal atrribute. How did you configure your LDAP so it creates memberOf attribute?
@JohnGunvaldsonАй бұрын
I think this is going to work great. I have a hundred or so PHP files to schedule and call from Airflow leveraging SSHOperator, and these calls only differs in name of php file, and sometimes location of php file, all other tasks are the same (solves a legacy issue left over from years past)... I can control scheduling in batches, and arrange for any specifics with properties...
@KarlKloppenborg2 ай бұрын
Loving your energy!
@ruksanabegum18952 ай бұрын
You are simple "The Best" <3
@starlord91092 ай бұрын
Hey.. Marc Now please tell ne how to stop repetition of tasks as no one told in comments 😂
@abhipoornisnsadvocate51582 ай бұрын
Hi Marc, Glue job finishes in 5 minutes but it shows as running in astro airflow local instance
@JyotiMalik-f5u2 ай бұрын
I am unable to restart the airflow instance after 29:30 , can anyone plzzzzzz help
@mohd_fawad2 ай бұрын
Same here. @MarcLamberti would you be able to help us?
@sewingwithcope2 ай бұрын
Wow this was such a great tutorial! Very easy to understand and can’t wait to try it myself!