Martin Kleppmann | Kafka Summit London 2019 Keynote

Martin Kleppmann | Kafka Summit London 2019 Keynote | Is Kafka a Database?

Рет қаралды 33,605

Confluent

Күн бұрын

Пікірлер

@demokraken 4 жыл бұрын

Martin's book is a marvel, highly recommend for people interested in design of distributed applications.

@harshitsinghai1395 4 жыл бұрын

His book is my first tech book ever. I'm proud to have chosen his book as my first. Totally worth it.

@kevinhock1041 5 жыл бұрын

Really awesome talk, his book is great too

@xinyuanliu1959 3 жыл бұрын

Trying to make some notes here..By replying to the ordering of messages in a Kafka topic partition, we have achieved serializable executing this transaction, because the stream processor for each individual partition is just a single-threaded linear sequential process. We get scalability by being able to do lots of partitions in parallel. Partition by a partition key. Transactions in a database is broken down into multi-stage streaming pipeline in Kakfa. We can get better consistency than many real database.

@hl5768 2 жыл бұрын

it likes using lua in redis

@fb-gu2er 5 ай бұрын

Durability is loosely defined. A durable record doesn’t disappear after you read it at some point

@applerr22 2 жыл бұрын

For achieving the positive account balance consistency the suggested model is not enough as there is nothing stopping credit event being processed even if debit fails. This can be achieved with additional checks for same event id before performing credit event but this will require a db. Other way could be to generate credit event only after debit succeeds but that will have its own trade offs.

@sandeepkumarverma8754 4 жыл бұрын

In the case of isolation, what if one consumer picked the message to create user 'Jane' and Kafka rebalanced and delivered the same message to another consumer. Now both the consumers are trying to create user 'Jane' into some database. Now again we have a problem of two 'Jane' users get created.

@metaocloudstudio2221 2 жыл бұрын

The all talk make sense, but I have heard the opposite while ago that "Kafka is not a database". So I am confused why not using Kafka as a SoT?

@el_chivo99 6 ай бұрын

ok i’ve actually asked myself this very question

@Kingslyt 4 жыл бұрын

Great talk. I like the idea illustrated here and not a fan of XA, but wanted to point out a factual inaccuracy in this talk. It is not true that read commited isolation level allows the scenario described at 16:28, which is dirty reads (neither read uncommited nor phantom reads), which is reading what has not been commited yet. Even if one considers the stretched definition of atomicity in this talk and read-commited isolation level together, then there won't be a scenario with relational databases that you would see account1 debitted and account2 not credited.

@asn90436 4 жыл бұрын

I think what he said is write skew not dirty reads

@gstraylz 4 жыл бұрын

Its not about dirty reads. Suppose you are selecting both accouns and you've selected one before commit and second after. Read committed does allow that, although both Oracle and Postgres have a bit stronger guarantees on default level (snapshot isolation), thus for provided example you won't be able to see inconsistent sum over accounts in aforementioned databases.

@Rusebor 3 жыл бұрын

It is a HUGE mistake from Martin. Which makes the whole talk not great at all. His example should have proved that Kafka and a relational database were the same thing. But it proved the opposite. Unfortunately he did not show what would happened should account 12345 had 0 balance. I assume that in that case we should have to emit an event to credit account 54321. But we could’t do this. We separated the original message (transaction) into two independent events. In his example we should have emitted the credit event for 54321 _only after_ we debited 12345 successfully. But even in that case it is not possible to do it in one step. We can’t write to database and Kafka in the same transaction. Kafka is needless here.

@sumitstir 3 жыл бұрын

@@Rusebor yeah, what we really want in this situation is a transactional database with a CDC based approach to update the cache and search index.

@sumitstir 3 жыл бұрын

@@gstraylz It's not the same, in that case if the user refreshes the balance for first account he is guaranteed to have updated value, while same is not true with kafka approach as the 2 events might be published to different partitions, and there is no guarantee for when events in different partitions are processed due to lag.

@MechanicalEI 5 жыл бұрын

So... kafta is a database?

@Ayoub-adventures 6 ай бұрын

Actually, he didn't project the concept of Durability on Kafka, which for me is what is missing in Kafka to be a database. Conclusion of the talk is that the hard to implement ACID guarantees in traditional databases are made easy using Kafka. But that's not a new idea, since most NoSQL databases use commit log to achieve that

@iavasilev 4 жыл бұрын

Link to the article from presentation: queue.acm.org/detail.cfm?id=3321612

@sumitstir 3 жыл бұрын

How's the scalability gains from having a partitioned message bus compare with directly partitioning a transactional database like Mysql? Given that we need to support the required write throughput irrespective of if there is kafka in between, what exact advantage is kafka providing here?

@rishabhgpt3 3 жыл бұрын

Distributed transactions !!

@rajsaraogi 5 жыл бұрын

How about using change data capture and listen to changes of our primary database and then capture them to update others like search indexes or the caching dbs ??

@rajsaraogi 5 жыл бұрын

@@thebeckettgroup yes then which way to take log based architecture or the change Capture ?

@HassanDibani 4 жыл бұрын

@@rajsaraogi CDC is essentially reading the database's log.

@Rbcksqheclfy 3 жыл бұрын

Dear Confluent, So what do you want to achieve here compare to the previous naive example? How can that be compared to a proper distributed transaction? kzbin.info/www/bejne/eKaoZ32shqqSebs In this example, let's imagine some event appending to Kafka was succeeded, index and cache updates were applied, but not to the database, the dead event just did not get applied to the database, the data integrity between index\cache and database is corrupted. The advantage of this approach is to have an event log; I don't see anything about proper distributed transactions and atomicity for non-eventually consistent systems. Please explain.