Set up Debezium, Apache Kafka and Postgres for real time Data Streaming | Real Time ETL | ETL

  Рет қаралды 23,465

BI Insights Inc

BI Insights Inc

Күн бұрын

In this we are going to kick off the Data Streaming series with PySpark using Kafka. We have covered PySpark basics and developed a batch processing data pipeline in the previous session, link in the description below. Today we will setup Apache Kafka, Debezium and Postgres for Data Streaming. Once the set up is complete we develop a Kafka Producer and Consumer to test the Data Streaming. Set up your environment now and get ready for the Data Streaming!
Link to PostgreSQL install video: • How to install Postgre...
Link to Docker video: • Why you should learn d...
Link to ETL Pipeline with PySpark video: • How to Build ETL Pipel...
Initialize replication permissions
Add the following lines to the end of the pg_hba.conf PostgreSQL configuration file. These lines configure the client authentication for the database replication.
############ REPLICATION ##############
local replication postgres trust
host replication postgres 127.0.0.1/32 trust
host replication postgres ::1/128 trust
#apachekafka #DataStreaming #etl
💻 GitHub: github.com/hnawaz007/pythonda...
💥Subscribe to our channel:
/ haqnawaz
📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣
🔗 GitHub: github.com/hnawaz007
📸 Instagram: / bi_insights_inc
📝 LinkedIn: / haq-nawaz
🔗 / hnawaz100
-----------------------------------------
Topics covered in this video:
0:00 - Introduction Apache Kafka, Debezium and requirements
1:36 - Postgres Logical Replication
2:43 - Docker Setup Overview
3:57 - Postgres Setup
6:27- Kafka and Debezium Docker Setup
10:27 - Kafka Manager UI and Topic Creation
12:14 - Python Kafka Producer and Consumer
15:03 - Kafka Console Interaction

Пікірлер: 37
@BiInsightsInc
@BiInsightsInc 5 ай бұрын
Link to the series: kzbin.info/aero/PLaz3Ms051BAkwR7d9voHsflTRmumfkGVW
@bralabala
@bralabala Жыл бұрын
this could not have come at a better time, thank you!!!
@rezahamzeh3736
@rezahamzeh3736 Жыл бұрын
Great content! Looking forward to the next one
@sunsas4275
@sunsas4275 Жыл бұрын
Hello, this is really great video, thank you for sharing. By the way, I have a question, in my current production project we have 3 different Postgre hosting in different databases, plus I would probably need 3 Debeziums for each database. So, would it be fine to have 3 Debeziums, 1 Kafka, 1 Zookeeper & 1 schema registry running within a docker-compose in 1 Virtual Machine ? (let's say the VM is 4cpu 8GBs of ram)
@BiInsightsInc
@BiInsightsInc Жыл бұрын
You can have a single Debezium container connecting and querying three different databases. Simply setup different connectors. However, for a production environment I would create a proper cluster for fault tolerance. So I would advise three Kafka instances just in case of failures. If one instance fails then the second can cover it until the first comes online.
@fakrifarid4434
@fakrifarid4434 Жыл бұрын
When the next video upload sir ? I’m waiting, btw thank you for sharing, Nice video!
@BiInsightsInc
@BiInsightsInc Жыл бұрын
Thanks Fakri. Next video in this series is underway. Will be up sometime next week.
@kshitijbansal3672
@kshitijbansal3672 Жыл бұрын
Hi Sir, i need to load data from on prem Sql server to S3 and then from S3 need to perform ETL operation using Aws glue and then load that transformed data to new S3 again and from S3 to finally Redshift. From your videos, I am able to do first step of loading from Sql server to S3, but now for next steps, how to do it, pls help.
@BiInsightsInc
@BiInsightsInc Жыл бұрын
Please watch the whole AWS series and you will find the solution to the next step. Moving data from S3 to Redshift. AWS kzbin.info/aero/PLaz3Ms051BAnncfOUYQOum1MRca_Cwy2h
@JoshDingus
@JoshDingus 4 ай бұрын
Are these able to be ran commercially? Seems like a lot of the Confluent products had commercial exceptions without a fee
@BiInsightsInc
@BiInsightsInc 4 ай бұрын
Yes, you can run them commercially. However, they do have some exceptions under the "Exclude Purpose" so I'd contact Confluent on exact exceptions. Here is the similar question on their site: forum.confluent.io/t/what-if-i-use-confluent-kafka-community-version-commercially/9960
@cvarak3
@cvarak3 Ай бұрын
Hi, would you suggest this method to extract data from an active postgres table that has ~5billion rows? If not do you have any videos on what method you would suggest to extract from postgres to s3? Thanks! (Tried with airbyte but keeps failing)
@BiInsightsInc
@BiInsightsInc Ай бұрын
Hi, if you have a kafka cluster running then you can stream data from Postgres to Kafka. A cluster can handle large dataset. You can stand up your own or utilize confluent cloud. Once this set up is in place then configure an S3 sink connector. I have covered that in the following vidoe. kzbin.info/www/bejne/oJDHdoimi56medE
@gidaltilopes9677
@gidaltilopes9677 6 ай бұрын
This is a great video. But i dont understood where is the config of integration beteween debezium and postgres. I Just can see the config: debezium > kafka I cant see: postegres > debezium > kafka
@BiInsightsInc
@BiInsightsInc 6 ай бұрын
Thanks. The debezium interacts with postgres via a connector. The connector configs are covered in details in part 2 of this series. Here is the link: kzbin.info/www/bejne/rpmco4mJprN7g6s&t Here is the link to the whole playlist. kzbin.info/aero/PLaz3Ms051BAkwR7d9voHsflTRmumfkGVW
@rahmadiyanmuhammad9208
@rahmadiyanmuhammad9208 7 ай бұрын
is there a docker image for kafka-manager for arm64 architecture?
@BiInsightsInc
@BiInsightsInc 7 ай бұрын
Not that I am aware of. You can raise an issue in their GitHub repo and they may add support for it. Here is their official repo: github.com/yahoo/CMAK
@udaynj
@udaynj 3 ай бұрын
You didn't mention updating the pg_hba.conf file in the posgresql data directory. That needs to be done enable debezium work with postgres
@BiInsightsInc
@BiInsightsInc 3 ай бұрын
I believe you can add the following lines to initialize replication permissions. Add the following lines to the end of the pg_hba.conf PostgreSQL configuration file. These lines configure the client authentication for the database replication. ############ REPLICATION ############## local replication postgres trust host replication postgres 127.0.0.1/32 trust host replication postgres ::1/128 trust
@thalibarrifqi
@thalibarrifqi 3 ай бұрын
the setting didn't chnage after I restart the postgresql server, it is still replica. I run this on docker container
@BiInsightsInc
@BiInsightsInc 3 ай бұрын
Here is a comment from a community member for docker user: if you're using PostgreSQL in a docker container, run ALTER SYSTEM SET wal_level = logical; to change the wal_level to logical
@udaynj
@udaynj 4 ай бұрын
There are so many moving parts and components that I wonder how stable this would be in a production environment. You have postgresql replication, debezium, zookeeper, kafka with other components used by zookeeper. How stable and reliable is this if you were to use this in a production env
@BiInsightsInc
@BiInsightsInc 4 ай бұрын
Most large or complex systems have many components and that work together to deliver certain functionality. Anyhow, there are many companies using Kafka in production. In the newer version Zookeeper is longer required so you will have three components database, debezium and Kafka. The most challenging component to manage is Kafka but you can opt for a managed instance.
@banhkha3917
@banhkha3917 9 ай бұрын
please help me i have already chang Postgre wal_level to logical and restart services but no effect
@banhkha3917
@banhkha3917 9 ай бұрын
after restart, when i run query again i got this message: The application has lost the database connection: ⁃ If the connection was idle it may have been forcibly disconnected. ⁃ The application server or database server may have been restarted. ⁃ The user session may have timed out. Do you want to continue and establish a new session I pressed Continue and then my wal_level still be replica, wal_level in postgresql.conf is logical but it not actually be
@BiInsightsInc
@BiInsightsInc 9 ай бұрын
@@banhkha3917 you can set this in the postgresql.conf file. Open the file and change the following setting then restart the postgres services. wal_level = logical # minimal, replica, or logical Try and connect to this postgres database outside of this project and make sure you are able to connect. Here are few steps to remedy the Postgres connection issues. In the Potsgres installed directory locate and open the postgresql.conf. add this line to that file listen_addresses = '*' Then open file named pg_hba.conf. Add this line to that file. host all all 0.0.0.0/0 md5 Now restart your pogresql server and try again.
@DataEngineeringToolbox
@DataEngineeringToolbox 10 ай бұрын
Hi, Does this solution work for SQL Server?
@BiInsightsInc
@BiInsightsInc 10 ай бұрын
Yes, this works with SQL Server and most relational databases. Here is an example of SQL Server: kzbin.info/www/bejne/nYHZqKmheLuGpLs
@DataEngineeringToolbox
@DataEngineeringToolbox 10 ай бұрын
thanks, I have another question, If I want to load data from sql server to kafka topic Do I need to debezium ? @@BiInsightsInc
@MahadiHasanshimul
@MahadiHasanshimul Жыл бұрын
Would u please provide me the reference where from in can implement this in Ubuntu and Linux environment.
@BiInsightsInc
@BiInsightsInc Жыл бұрын
Docker is available on Linux. Since this is docker based the setup will be similar. Hope this helps.
@MahadiHasanshimul
@MahadiHasanshimul Жыл бұрын
@@BiInsightsInc Would you please tell me how to configure Postgres DB in Linux for Debezium kafka connect. I tried in Ubuntu as per your video but not woking when i send /connect request the debezium the connection refused.
@BiInsightsInc
@BiInsightsInc Жыл бұрын
@@MahadiHasanshimul you can try and connect to the Postgres DB with a client like PgAdmin to make sure you are able to connect to the database engine. Then try it with Python with user/password combination to make sure database is accessible by the user. In additon, check the debezium containers logs. They tell you why a connector is failing. This will help you trouble the issue at hand.
@MahadiHasanshimul
@MahadiHasanshimul Жыл бұрын
@@BiInsightsInc i have checked using data grip and spring boot application and the database is accessible but when i connecting with post: /connectors with json config then the debezium connector cannot connect to database.
@BiInsightsInc
@BiInsightsInc 11 ай бұрын
@@MahadiHasanshimul Try and connect to this postgres database outside of this project and make sure you are able to connect. Here are few steps to remedy the Postgres connection issues. In the Potsgres installed directory locate and open the postgresql.conf. add this line to that file listen_addresses = '*' Then open file named pg_hba.conf. Add this line to that file. host all all 0.0.0.0/0 md5 Now restart your pogresql server and try again.
@bralabala
@bralabala Жыл бұрын
if you're using PostgreSQL in a docker container, run ALTER SYSTEM SET wal_level = logical; to change the wal_level to logical
@banhkha3917
@banhkha3917 9 ай бұрын
can you help me
Apache Kafka Vs. Apache Flink
12:22
The Data Guy
Рет қаралды 3,6 М.
Fast and Furious: New Zealand 🚗
00:29
How Ridiculous
Рет қаралды 37 МЛН
НРАВИТСЯ ЭТОТ ФОРМАТ??
00:37
МЯТНАЯ ФАНТА
Рет қаралды 7 МЛН
Smart Sigma Kid #funny #sigma #comedy
00:40
CRAZY GREAPA
Рет қаралды 8 МЛН
Apache Kafka in 6 minutes
6:48
James Cutajar
Рет қаралды 986 М.
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
1:02:47
Stream your PostgreSQL changes into Kafka with Debezium
12:40
Code with Irtiza
Рет қаралды 47 М.
What is Apache Kafka®?
11:42
Confluent
Рет қаралды 345 М.
Apache Kafka in 6 minutes:  Apache Kafka Tutorial #1
6:34
Anton Putra
Рет қаралды 34 М.
Integrating Oracle and Kafka
54:49
Robin Moffatt
Рет қаралды 22 М.
ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК
1:00
VA-PC
Рет қаралды 2,5 МЛН
1$ vs 500$ ВИРТУАЛЬНАЯ РЕАЛЬНОСТЬ !
23:20
GoldenBurst
Рет қаралды 1,9 МЛН
Это Xiaomi Su7 Max 🤯 #xiaomi #su7max
1:01
Tynalieff Shorts
Рет қаралды 2,1 МЛН