Set up Debezium, Apache Kafka and Postgres for real time Data Streaming | Real Time ETL

Set up Debezium, Apache Kafka and Postgres for real time Data Streaming | Real Time ETL | ETL

Рет қаралды 30,406

BI Insights Inc

Күн бұрын

Пікірлер: 42

@BiInsightsInc 10 ай бұрын

Link to the series: kzbin.info/aero/PLaz3Ms051BAkwR7d9voHsflTRmumfkGVW

@bralabala Жыл бұрын

this could not have come at a better time, thank you!!!

@rezahamzeh3736 Жыл бұрын

Great content! Looking forward to the next one

@honzajazz 5 ай бұрын

You should also uncomment (remove #) the row with wal_level key to take settings effect.

@gidaltilopes9677 Жыл бұрын

This is a great video. But i dont understood where is the config of integration beteween debezium and postgres. I Just can see the config: debezium > kafka I cant see: postegres > debezium > kafka

@BiInsightsInc Жыл бұрын

Thanks. The debezium interacts with postgres via a connector. The connector configs are covered in details in part 2 of this series. Here is the link: kzbin.info/www/bejne/rpmco4mJprN7g6s&t Here is the link to the whole playlist. kzbin.info/aero/PLaz3Ms051BAkwR7d9voHsflTRmumfkGVW

@udaynj 10 ай бұрын

There are so many moving parts and components that I wonder how stable this would be in a production environment. You have postgresql replication, debezium, zookeeper, kafka with other components used by zookeeper. How stable and reliable is this if you were to use this in a production env

@BiInsightsInc 10 ай бұрын

Most large or complex systems have many components and that work together to deliver certain functionality. Anyhow, there are many companies using Kafka in production. In the newer version Zookeeper is longer required so you will have three components database, debezium and Kafka. The most challenging component to manage is Kafka but you can opt for a managed instance.

@udaynj 9 ай бұрын

You didn't mention updating the pg_hba.conf file in the posgresql data directory. That needs to be done enable debezium work with postgres

@BiInsightsInc 9 ай бұрын

I believe you can add the following lines to initialize replication permissions. Add the following lines to the end of the pg_hba.conf PostgreSQL configuration file. These lines configure the client authentication for the database replication. ############ REPLICATION ############## local replication postgres trust host replication postgres 127.0.0.1/32 trust host replication postgres ::1/128 trust

@JoshDingus 9 ай бұрын

Are these able to be ran commercially? Seems like a lot of the Confluent products had commercial exceptions without a fee

@BiInsightsInc 9 ай бұрын

Yes, you can run them commercially. However, they do have some exceptions under the "Exclude Purpose" so I'd contact Confluent on exact exceptions. Here is the similar question on their site: forum.confluent.io/t/what-if-i-use-confluent-kafka-community-version-commercially/9960

@cvarak3 7 ай бұрын

Hi, would you suggest this method to extract data from an active postgres table that has ~5billion rows? If not do you have any videos on what method you would suggest to extract from postgres to s3? Thanks! (Tried with airbyte but keeps failing)

@BiInsightsInc 7 ай бұрын

Hi, if you have a kafka cluster running then you can stream data from Postgres to Kafka. A cluster can handle large dataset. You can stand up your own or utilize confluent cloud. Once this set up is in place then configure an S3 sink connector. I have covered that in the following vidoe. kzbin.info/www/bejne/oJDHdoimi56medE

@ayocs2 4 ай бұрын

how to get entire payload of insert, update, deletes? this is I think simple cdc

@BiInsightsInc 4 ай бұрын

Hey there, you can enable the "key.converter.schemas.enable" in the connector. This will include the schema level changes and various changes a row goes through.

@thalibarrifqi 9 ай бұрын

the setting didn't chnage after I restart the postgresql server, it is still replica. I run this on docker container

@BiInsightsInc 9 ай бұрын

Here is a comment from a community member for docker user: if you're using PostgreSQL in a docker container, run ALTER SYSTEM SET wal_level = logical; to change the wal_level to logical

@rahmadiyanmuhammad9208 Жыл бұрын

is there a docker image for kafka-manager for arm64 architecture?

@BiInsightsInc Жыл бұрын

Not that I am aware of. You can raise an issue in their GitHub repo and they may add support for it. Here is their official repo: github.com/yahoo/CMAK

@banhkha3917 Жыл бұрын

please help me i have already chang Postgre wal_level to logical and restart services but no effect

@banhkha3917 Жыл бұрын

after restart, when i run query again i got this message: The application has lost the database connection: ⁃ If the connection was idle it may have been forcibly disconnected. ⁃ The application server or database server may have been restarted. ⁃ The user session may have timed out. Do you want to continue and establish a new session I pressed Continue and then my wal_level still be replica, wal_level in postgresql.conf is logical but it not actually be

@BiInsightsInc Жыл бұрын

@@banhkha3917 you can set this in the postgresql.conf file. Open the file and change the following setting then restart the postgres services. wal_level = logical # minimal, replica, or logical Try and connect to this postgres database outside of this project and make sure you are able to connect. Here are few steps to remedy the Postgres connection issues. In the Potsgres installed directory locate and open the postgresql.conf. add this line to that file listen_addresses = '*' Then open file named pg_hba.conf. Add this line to that file. host all all 0.0.0.0/0 md5 Now restart your pogresql server and try again.

@sunsas4275 Жыл бұрын

Hello, this is really great video, thank you for sharing. By the way, I have a question, in my current production project we have 3 different Postgre hosting in different databases, plus I would probably need 3 Debeziums for each database. So, would it be fine to have 3 Debeziums, 1 Kafka, 1 Zookeeper & 1 schema registry running within a docker-compose in 1 Virtual Machine ? (let's say the VM is 4cpu 8GBs of ram)

@BiInsightsInc Жыл бұрын

You can have a single Debezium container connecting and querying three different databases. Simply setup different connectors. However, for a production environment I would create a proper cluster for fault tolerance. So I would advise three Kafka instances just in case of failures. If one instance fails then the second can cover it until the first comes online.

@kshitijbansal3672 Жыл бұрын

Hi Sir, i need to load data from on prem Sql server to S3 and then from S3 need to perform ETL operation using Aws glue and then load that transformed data to new S3 again and from S3 to finally Redshift. From your videos, I am able to do first step of loading from Sql server to S3, but now for next steps, how to do it, pls help.

@BiInsightsInc Жыл бұрын

Please watch the whole AWS series and you will find the solution to the next step. Moving data from S3 to Redshift. AWS kzbin.info/aero/PLaz3Ms051BAnncfOUYQOum1MRca_Cwy2h

@fakrifarid4434 Жыл бұрын

When the next video upload sir ? I’m waiting, btw thank you for sharing, Nice video!

@BiInsightsInc Жыл бұрын

Thanks Fakri. Next video in this series is underway. Will be up sometime next week.

@MahadiHasanshimul Жыл бұрын

Would u please provide me the reference where from in can implement this in Ubuntu and Linux environment.

@BiInsightsInc Жыл бұрын

Docker is available on Linux. Since this is docker based the setup will be similar. Hope this helps.

@MahadiHasanshimul Жыл бұрын

@@BiInsightsInc Would you please tell me how to configure Postgres DB in Linux for Debezium kafka connect. I tried in Ubuntu as per your video but not woking when i send /connect request the debezium the connection refused.

@BiInsightsInc Жыл бұрын

@@MahadiHasanshimul you can try and connect to the Postgres DB with a client like PgAdmin to make sure you are able to connect to the database engine. Then try it with Python with user/password combination to make sure database is accessible by the user. In additon, check the debezium containers logs. They tell you why a connector is failing. This will help you trouble the issue at hand.

@MahadiHasanshimul Жыл бұрын

@@BiInsightsInc i have checked using data grip and spring boot application and the database is accessible but when i connecting with post: /connectors with json config then the debezium connector cannot connect to database.

@BiInsightsInc Жыл бұрын

@@MahadiHasanshimul Try and connect to this postgres database outside of this project and make sure you are able to connect. Here are few steps to remedy the Postgres connection issues. In the Potsgres installed directory locate and open the postgresql.conf. add this line to that file listen_addresses = '*' Then open file named pg_hba.conf. Add this line to that file. host all all 0.0.0.0/0 md5 Now restart your pogresql server and try again.