Build An Airflow Data Pipeline To Download Podcasts [Beginner Data Engineer Tutorial]

  Рет қаралды 34,113

Dataquest

Dataquest

Күн бұрын

Пікірлер: 44
@manyes7577
@manyes7577 2 жыл бұрын
I think you’re the best data science lecturer so far. Keep going thanks for your hard work
@Dataquestio
@Dataquestio 2 жыл бұрын
Thanks :) -Vik
@devanshsharma5159
@devanshsharma5159 Жыл бұрын
Beautiful explanation and a great project to get me started! Many thanks vik!! One thing to add from my experience: I installed airflow on my Mac M1 and it was working fine but I couldn't run any of the tasks we performed here (not even in the get_episodes task).. to solve that I made an EC2 instance and with some tweaks everything ran :D
@mahmoodhossainfarsim6292
@mahmoodhossainfarsim6292 Жыл бұрын
It was very useful. Thank you. It will be really helpful if you cover Apache Hadoop, Spark, MLFlow, Flink, Flume, Pig, Hive etc. Thank you
@dataprofessor_
@dataprofessor_ 2 жыл бұрын
Can you make more advanced Apache Airflow tutorials too?
@HieuLe-tw7qm
@HieuLe-tw7qm Жыл бұрын
Thank you very much for this amazing tutorial :D
@Maheshwaripremierleague
@Maheshwaripremierleague 6 ай бұрын
if you are facing an issue with creating database, that your dag is running and not completing then put this line after importing the packages os.environ['NO_PROXY'] = '*' , it will work then for sure
@demohub
@demohub Жыл бұрын
This video was a great resource. Thanks for the tutelage and your take on it.
@kiish8571
@kiish8571 2 жыл бұрын
this is very educational thanks a lot, i was wondering if you would be making a video of the automatic transcriptions
@Dataquestio
@Dataquestio 2 жыл бұрын
Yes, I will be doing a webinar for this tomorrow, and the video should be live next week on KZbin. -Vik
@lalumutawalli9497
@lalumutawalli9497 2 жыл бұрын
thanks you for your tutorials, let me know about your airflow version on your tutorial to practice.
@lightman2130
@lightman2130 2 жыл бұрын
What a amazing tutorial ! Thank you a lot
@Funkykeyzman
@Funkykeyzman 2 жыл бұрын
Debug tip #1: If you run into error "conn_id isn't defined", then use the Airflow browser interface to instead create the connection. Select Admin --> Connections --> + Debug tip #2: If your Airflow runs fail, try logging out of the Airflow UI and restarting the Airflow server by pressing Ctrl + C and then airflow standalone.
@parikshitchavan2211
@parikshitchavan2211 Жыл бұрын
Hello Vikas Thanks for such a great tutorial everting you made smooth like butter thanks for that ,just one question whenever we made new DAG ( we will have to add docker-compose-CeleryExecutor, docker-compose-LocalExecutor, and Config for that particular DAG )
@rohitpandey9920
@rohitpandey9920 Жыл бұрын
I am stuck at 14:50 where you try to run the task in airflow. You simply switched the screen from pycharm terminal to git master terminal without any explanation, and I am unable to connect sqlite to pycharm terminal, neither could I establish connection with airflow. Please guide me through this
@diasfetti8393
@diasfetti8393 Жыл бұрын
👍👍👍excellent tuto. Thks a lot
@lolujames7668
@lolujames7668 2 жыл бұрын
nice one @Vik
@vish949
@vish949 Жыл бұрын
whenever i run airflow standalone (or even airflow webserver) i get the ModuleNotFound error for pwd. Im running it on a windows, how do i solve this?
@OBGynKenobi
@OBGynKenobi Жыл бұрын
So where is the dependency chain where you set the actual task flow? I would have expected something like task1 >> Task2, etc... at the bottom of the Dag.
@NishanthVarma-m2q
@NishanthVarma-m2q Жыл бұрын
At 33:48, how did we get the 'Done loading. Loaded a total of 0 rows'. We haven't used this text in our code anywhere. Is this the work ok hook.insert_rows
@investandcyclecheap4890
@investandcyclecheap4890 2 жыл бұрын
really liked this tutorial. The download episodes task is freezing on me. The task is "running" but it appears to be getting held up and has not actually downloaded the first episode for some reason
@Dataquestio
@Dataquestio 2 жыл бұрын
That's strange - can you access the podcast site in your browser? It may be blocking you for some reason. It's also possible that the airflow task executor isn't running properly.
@sungwonbyun5683
@sungwonbyun5683 Жыл бұрын
I ran into the same issue except on the very first task "get_episodes" nothing happens and eventually times out. tested the script in python console and it returned the list of episodes just fine. @@Dataquestio
@sungwonbyun5683
@sungwonbyun5683 Жыл бұрын
Fix for me was to start the airflow server as the root user; "sudo airflow standalone"
@bryancapulong147
@bryancapulong147 Жыл бұрын
My download_episodes task succeeds but I cannot see the mp3 files
@yousufm.n2515
@yousufm.n2515 Жыл бұрын
When I change the 'dags_folder' path, everything breaks in airflow. What could be the reason
@thangnguyen3786
@thangnguyen3786 Жыл бұрын
hi everyone. I have configured airflow with docker in a folder which include docker yaml file. Now I want to use airflow in another folder, so How can i do it without docker yaml file ? must I configure again in that folder ?
@СлаваУкраине-ь2т
@СлаваУкраине-ь2т Жыл бұрын
hello. I did everything as it is but it fails and no logs are visible
@dolamuoludare4383
@dolamuoludare4383 Жыл бұрын
Please kindly help, when I write my DAG on vscode, it doesn't show on the WEB UI and I keep getting this DAGNOTFOUND Error
@youssefelfhayel7078
@youssefelfhayel7078 Жыл бұрын
Add these lines to your airflow.cfg config file : min_file_process_interval = 0 dag_dir_list_interval = 30 and then the dags will be updated automatically. P.S : Be sure that your DAG is in the dags file.
@NishanthVarma-m2q
@NishanthVarma-m2q Жыл бұрын
I need help. For the final task it showed audio_path no such file or directory. So i used 'os.makedirs(audio_path, exist_ok=True)'. The dag was a success. But i couldnt find any files in my episodes folder. Please help
@Maheshwaripremierleague
@Maheshwaripremierleague 6 ай бұрын
def download_episodes(episodes): for episode in episodes: filename = f"{episode['link'].split('/')[-1]}.mp3" audio_path = os.path.join("episodes", filename) if not os.path.exists("episodes"): os.makedirs("episodes") if not os.path.exists(audio_path): print(f"Downloading {filename}") audio = requests.get(episode["enclosure"]["@url"]) with open(audio_path, "wb+") as f: f.write(audio.content)
@Maheshwaripremierleague
@Maheshwaripremierleague 6 ай бұрын
it will create the episodes folder if it is not created
@Maheshwaripremierleague
@Maheshwaripremierleague 6 ай бұрын
it happens because your airflow might not be directing to correct folder so it will create the folder somewhere ese where it is pointing, you can then search the folder to find out where it is
@DayOneCricket
@DayOneCricket 10 ай бұрын
you're first bit on seting the environment variable didn't work
@omarhossam285
@omarhossam285 Жыл бұрын
how did you change your terminal to git:(master)
@Dataquestio
@Dataquestio Жыл бұрын
I use a terminal called zsh. There is a plugin for zsh that can show you your git branch.
@omarhossam285
@omarhossam285 Жыл бұрын
@@Dataquestio thx man
@aminatlawal21
@aminatlawal21 2 жыл бұрын
How did he get the web page metadata in Xml?
@HJesse88
@HJesse88 2 жыл бұрын
Look at the link in the video, type that link in a Chromium browser and it should appear. Wala ..
@rohitpandey9920
@rohitpandey9920 Жыл бұрын
@@HJesse88 it didn't appear for me
@parkuuu
@parkuuu 2 жыл бұрын
Awesome tutorial! Just had some confusion on the transform and loading function, particularly this code: stored = hook.get_pandas_df('SELECT * FROM episodes;') I thought you were querying from the episode list that was returned from the extract function, but suddenly realized that it was also the same as the Table name in the database lol.
@Dataquestio
@Dataquestio 2 жыл бұрын
Hi Park - that code is selecting from the sqlite3 database that we create. It's making sure the podcast hasn't been stored in the database yet (if it is, we don't need to store it again).
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,7 МЛН
Twin Telepathy Challenge!
00:23
Stokes Twins
Рет қаралды 61 МЛН
Random Emoji Beatbox Challenge #beatbox #tiktok
00:47
BeatboxJCOP
Рет қаралды 67 МЛН
Data Engineering Course for Beginners
3:03:43
freeCodeCamp.org
Рет қаралды 560 М.
Airflow for Beginners - Run Spotify ETL Job in 15 minutes!
16:38
Karolina Sowinska
Рет қаралды 143 М.
Bhavani Ravi - Apache Airflow in Production - Bad vs Best Practices
35:55
Build your first pipeline DAG | Apache airflow for beginners
12:00
MaxcoTec Learning
Рет қаралды 51 М.
SH: Let's build a data pipeline with Prefect!
1:46:51
CodeSeoul
Рет қаралды 11 М.
Using dbt And Snowflake To Develop And Deploy Analytics Code  | LAB
1:17:39
Snowflake Developers
Рет қаралды 33 М.
Chat with SQL and Tabular Databases using LLM Agents (DON'T USE RAG!)
58:54
Farzad Roozitalab (AI RoundTable)
Рет қаралды 71 М.
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,7 МЛН