Practical Change Data Streaming Use Cases with Apache Kafka & Debezium

  Рет қаралды 14,096

InfoQ

InfoQ

Күн бұрын

Пікірлер: 6
@filipebanzoli902
@filipebanzoli902 4 жыл бұрын
Amazing presentation!! Thanks so much for sharing this!
@kashyapsp4656
@kashyapsp4656 2 жыл бұрын
Amazing. Really informative and upto the point
@davidpeng8431
@davidpeng8431 3 жыл бұрын
very informative, great presentation
@sudhanwadindorkar3815
@sudhanwadindorkar3815 3 жыл бұрын
Great presentation and overview for Debezium !! The opening use case of using CDC for dual writes is more of an anti-pattern. If you are already using Kafka (since you are using Debezium), it would be better to emit Kafka events that are modelled after your domain (as described in the Outbox pattern section) instead of Debezium events which are essentially just CDC logs. That advantage there is that you are building good Event Driven Architecture which will pay off in the long term. A potential downside is scalability as each database is handled by only one Debezium cluster. If you are generating events in the outbox at rate higher that what a single Debezium connector can process, you cannot really scale to achieve higher throughput. That's just the theory though :-) Debezium is an excellent choice for ETL implementations - you can pair it with Kafka Streams and you got real time ETL.
@LusidDreaming
@LusidDreaming 3 жыл бұрын
Since Debezium tails transaction logs (for databases that support replication via logs) I think it would be difficult to overload Debezium since it's reacting as soon as transactions are commited. However, if you did run into that issue, you could actually remove Debezium and use a simple polling mechanism to publish to Kafka, which would act almost as a (heavy-handed and inefficient) back pressure method since the extra reads would slow down your write throughput. This is just a theory, I have no benchmarking to prove how frequently you'd have to poll to see this occur. You make a great point about scalability though, as each microservice would need its own Debezium cluster (assuming they all have dedicated outbox tables).
@ankitgomkale7508
@ankitgomkale7508 2 жыл бұрын
36:47 I think running two CDC pipelines is not required for achieving high availability for connectors. You could run the connector in distributed mode. That allows you to have multiple connectors in a cluster; only one will stream the events at a time while others will be on standby. If the connector streaming the events goes down, another node in the cluster will kick in and start from the last LSN (Log Sequence Number). This topology obviates the need for the duplicator.
Debezium - Capturing Data the Instant it Happens (with Gunnar Morling)
1:02:47
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
Quando A Diferença De Altura É Muito Grande 😲😂
00:12
Mari Maria
Рет қаралды 45 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Streaming Data into Delta Lake with Rust and Kafka
28:39
Databricks
Рет қаралды 6 М.
Kafka Needs No Keeper
43:53
InfoQ
Рет қаралды 4,6 М.
Introduction to Apache Kafka by James Ward
49:48
Devoxx
Рет қаралды 280 М.
From Zero to Hero with Kafka Connect
33:49
Robin Moffatt
Рет қаралды 30 М.
How Slack Works
49:54
InfoQ
Рет қаралды 155 М.
Change Data Capture for Distributed Databases @Netflix
19:46
Managing Failure Modes in Microservice Architectures
51:34
Processing Streaming Data with KSQL - Tim Berglund
48:59
Devoxx
Рет қаралды 55 М.
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН