Рет қаралды 23,465
In this we are going to kick off the Data Streaming series with PySpark using Kafka. We have covered PySpark basics and developed a batch processing data pipeline in the previous session, link in the description below. Today we will setup Apache Kafka, Debezium and Postgres for Data Streaming. Once the set up is complete we develop a Kafka Producer and Consumer to test the Data Streaming. Set up your environment now and get ready for the Data Streaming!
Link to PostgreSQL install video: • How to install Postgre...
Link to Docker video: • Why you should learn d...
Link to ETL Pipeline with PySpark video: • How to Build ETL Pipel...
Initialize replication permissions
Add the following lines to the end of the pg_hba.conf PostgreSQL configuration file. These lines configure the client authentication for the database replication.
############ REPLICATION ##############
local replication postgres trust
host replication postgres 127.0.0.1/32 trust
host replication postgres ::1/128 trust
#apachekafka #DataStreaming #etl
💻 GitHub: github.com/hnawaz007/pythonda...
💥Subscribe to our channel:
/ haqnawaz
📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣
🔗 GitHub: github.com/hnawaz007
📸 Instagram: / bi_insights_inc
📝 LinkedIn: / haq-nawaz
🔗 / hnawaz100
-----------------------------------------
Topics covered in this video:
0:00 - Introduction Apache Kafka, Debezium and requirements
1:36 - Postgres Logical Replication
2:43 - Docker Setup Overview
3:57 - Postgres Setup
6:27- Kafka and Debezium Docker Setup
10:27 - Kafka Manager UI and Topic Creation
12:14 - Python Kafka Producer and Consumer
15:03 - Kafka Console Interaction