What is the Event Sourcing Pattern? | Designing Event-Driven Microservices

Рет қаралды 11,841

Күн бұрын

► LEARN MORE: cnfl.io/microservices-101-mod...
Event Sourcing is a pattern of storing an object's state as a series of events. Each time the object is updated a new event is written to an append-only log. When the object is loaded from the database, the events are replayed in order, reapplying the necessary changes.
Check out the Designing Event-Driven Microservices course on Confluent Developer for more details: cnfl.io/microservices-101-mod...
The benefit of this approach is that it stores a full history of the object. This can be valuable for debugging, auditing, building new models, and a variety of other situations. It is also a technique that can be used to solve the dual-write problem when working with event-driven architectures.
RELATED RESOURCES
► What is the Dual Write Problem?: • What is the Dual Write...
► Microservices course playlist: bit.ly/designing-event-driven...
► Event Sourcing: cnfl.io/4breRfT
► Event Sourcing and Event Storage with Apache Kafka®: cnfl.io/3w0n627
► Microservices: An Introduction cnfl.io/3ZMt3up
► Event-Driven Microservices Architecture: cnfl.io/48FSYbj
► Migrate from Monoliths to Event-Driven Microservices: cnfl.io/3tsqlhu
► Get Started on Confluent Developer: cnfl.io/48FnKRB
CHAPTERS
00:00 - Intro
00:22 - How are objects stored in traditional databases?
01:01 - What is an audit log, and why is it useful?
01:35 - Do audit logs create duplicated data?
01:57 - Should the audit log be used as the source of truth?
02:23 - What is event sourcing?
03:27 - What are some advantages of event sourcing?
04:28 - Does event sourcing solve the dual-write problem?
05:02 - What are some disadvantages of event sourcing?
05:20 - Closing
--
ABOUT CONFLUENT
Confluent is pioneering a fundamentally new category of data infrastructure focused on data in motion. Confluent’s cloud-native offering is the foundational platform for data in motion - designed to be the intelligent connective tissue enabling real-time data, from multiple sources, to constantly stream across the organization. With Confluent, organizations can meet the new business imperative of delivering rich, digital front-end customer experiences and transitioning to sophisticated, real-time, software-driven backend operations. To learn more, please visit www.confluent.io.
#microservices #apachekafka #kafka #confluent

Пікірлер: 43

@ConfluentDevXTeam 4 ай бұрын

Wade here. I am a big fan of event sourcing. I love that it gives you a lot of options that aren't available in a normal system. I worked on a platform once, where users could set certain rules about how their data flowed in the system. We stored these rules in an event-sourced fashion. It was great that if a user came to us with a question about why their data suddenly started flowing differently, we could go to the rules and show them not just what they were today, but also what they were yesterday, or the day before. That allowed to to explain why things suddenly changed. We could show them the full evolution of the rules.

@event-sourcing 4 ай бұрын

Now that is an excellent use case for event sourcing! - Erik

@charlesopuoro5295 4 ай бұрын

This was clearly explained. Thank you very much.

@ConfluentDevXTeam 4 ай бұрын

Wade here. I'm glad you found it valuable. Stay tuned for more.

@abhishekgowlikar 4 ай бұрын

Nice explanation in simple video about Event Sourcing Pattern.

@ConfluentDevXTeam 4 ай бұрын

Wade here. Glad you enjoyed the video. Event Sourcing is a pretty complex topic, but I tried to boil it down into a relatively short primer.

@AndrewReeman_RemD 4 ай бұрын

Great video and clear explanation

@ConfluentDevXTeam 4 ай бұрын

Wade here. I'm glad you enjoyed the video.

@oingomoingo7411 Ай бұрын

this was very nice explained

@ConfluentDevXTeam Ай бұрын

Wade here. I'm glad you enjoyed it.

@alexgodwin5806 4 ай бұрын

Very informative.

@ConfluentDevXTeam 4 ай бұрын

Wade here. Glad you enjoyed it. Keep watching for future videos.

@YO3ART Ай бұрын

A video on progress tracking problems would be great. Let's say event A triggers various events in different microservices. How can a microservice determine when all events that were produced in response to event A are fully processed? For example, if event A is a file upload, which triggers virus scanning, format conversion, and metadata extraction, how can we track when all related processes are complete? Some of those processes are optional or depend on context, and the total amount of processing to be done can be unpredictable. Additionally, there may be no sequential order for these processes. I also find progress hard to track if you can't use historic data and don't know the total amount of work that needs to be done. Is coupling and increasing complexity inevitable in such cases? Let's say one microservice consumes a FileProcessingQueued event which triggers a not easily predictable amount of other events, and another microservice expects a FileProcessingFinished event, while some other microservice may expect progress reports even before FileProcessingFinished is produced. This problem (or anti-pattern) deserves its own name and strategies for dealing with it.

@ConfluentDevXTeam Ай бұрын

Wade here. What you are talking about seems to be more about event-driven architecture than event-sourcing. However, I'll provide some information anyway. In general, I would say that emitting an event should be treated as fire and forget. You send the event and you don't worry about what happens to it downstream. Part of the goal of event-driven systems is to ensure that the various services are decoupled from each other. The producer of the events shouldn't even know that the consumers exist, much less whether or not they have done the work. Ideally, it wouldn't matter. Now, that's not always possible. So in cases where you are expecting some kind of result from the consumers, usually it would be communicated via more events. So when the downstream finishes processing, it would emit another event. The upstream can listen for the event. So in your example, when the virus scanning, format conversion, and metadata extraction all finish, each emits a separate event. Some service can then listen for those events and correlate them together (Correlation Ids can help here).

@YO3ART Ай бұрын

@@ConfluentDevXTeam I believe many devs may find it valuable. After more learning, I can see it may be related to choreography and orchestration. Having progress reporting or completion events in choreography seems impossible. Much like trying to predict Conway's Game of Life

@ConfluentDevXTeam Ай бұрын

@@YO3ART Wade here. I'd suggest taking a look at the Saga Pattern or Process Managers. While not directly related, they are techniques that are often used to coordinate multiple complex steps and could be adapted to work with events.

@hakongrtte8326 4 ай бұрын

Thank for the video and I really enjoy your Microservices 101 series, Wade! Quick question, what type of system would you typically store the event log in? My assumption would before this video be systems like Apache Kafka, but your drawings in this video suggest that there is a separate storage system for the event log?

@ConfluentDevXTeam 4 ай бұрын

Wade here. Kafka is certainly an option for storing the logs. If you go that route, the secondary step of emitting the events isn't necessary. Having said that, it's not always the best option. The problem with Kafka for this type of use is that it doesn't really have query capabilities. That means you can't put all of the events for a single microservice into one topic because you won't be able to query for a specific Id. You could do a topic per Id, but that could result in many topics, depending on the cardinality of your keys. In addition, for debugging purposes, it can be useful to be able to do other queries on the data. For that reason, a database is often used. What kind of database is up for debate. There are databases specifically designed for this, but it's common to see relational DBs, as well as NoSQL DBs, Document stores, etc.

@mehrdadk.6816 3 ай бұрын

It's really great but it comes with a big cost, and it is the checkpointing implementation complexity. Because our audit logs grow over time, there should be a stateful mechanism to memories the last event read, otherwise each time an application restarts, it has to read all events at once. Using an associated framework to deploy event-sourcing is better than implementing it ourselves.

@ConfluentDevXTeam 3 ай бұрын

Wade here. You don't necessarily have to read all events when the application restarts. You have to read them all when the individual domain objects are brought into memory. That's an important distinction because individual objects might be small and their event logs might be short. In that case, reading them isn't really a big deal. So even though your entire application may have millions of events, you aren't reading all of those at once. You might be reading 5, 10, or even 100 events at a time which could be quite fast. Of course, this depends a lot on your domain. Some domains may require you to read thousands, or even millions of events to recreate an object. In those cases, snapshots are a common tool for optimizing the speed. With a snapshot, you periodically capture the state of the object and store it alongside your events. Then, when you need to recreate the object, you start with the most recent snapshot and only replay the events that happen after. An important distinction is that snapshots are an optimization. You should be able to delete all snapshots and still recreate the object. It will just take longer. It's important to recognize this because otherwise you might be tempted to delete events older than the snapshot which destroys a lot of the value of event-sourcing. Or you might find yourself doing snapshots after every event which basically means you are no longer doing event sourcing, again reducing the value. Most mature event-sourcing frameworks will include some kind of snapshot mechanism. I would definitely recommend starting with a decent event-sourcing framework if it's available. It will save you time rather than rolling your own.

@mehrdadk.6816 3 ай бұрын

@@ConfluentDevXTeam Hi Wade, thanks for your valuable response.

@mehrdadk.6816 3 ай бұрын

When an application restarts it brings all relevant individual domains into its memory one by one. For an application that has dozens of Domains, it can be not valuable to separate the queries per each domain, because eventually all of the events need to be queried. A solution for using streaming or paging to query events is fine, but remember that those old events should be processed, and at the sometime new events will be generated as a result. A solution to scale it is using CQRS is viable however considering that the events are stored in one table, scaling a database might be easy but scaling a table is not, therefore here we just helped with separating read and write but adding new challenge which is the inconsistency issues. Implementing an event-streaming tool, is not the business of many companies. My experience with working in event-streaming showed that using a dedicated event-streaming platform is beneficial, but it also can be blocking if our core business relies on it. All of these challenges are because of database being as our source of truth. I'd like to discuss this more of possibility of using Kafka as our event-sourcing. Kafka store events, we can replay event, we can scale easily, we can monitor events, we can use full-data-compatibility features out of the box. It would be great if you make a video about this in future.

@ConfluentDevXTeam 3 ай бұрын

@@mehrdadk.6816 It's always possible to design an application that is either a poor fit for event sourcing or just makes it difficult. If your application has to load a bunch of objects on startup, then yes, it will have to read a bunch of events on startup. But, that's not a requirement of event sourcing. That's a requirement of your application. And it would probably be a requirement even if you didn't use event sourcing. Interestingly, if you look at some of the content from Greg Young, he would suggest that if you use Event Sourcing, you must use CQRS. While I wouldn't go that far, I would say that for the vast majority of cases, that's true. Event Sourcing is good at a lot of things, but queries that access multiple domain objects is not one of them. That's where you need CQRS. A dedicated event streaming platform can definitely be "blocking" if you rely on it. But that's true of any technology you rely on. A database is just as likely (if not more likely) to be a blocking part of your system. However, used properly, these tools bring benefits that make it worthwhile. As far as using Kafka as an event store goes, it is possible. It's not necessarily what Kafka was designed for. It is designed for event streaming. Its job is to take events from one place and get them to another. That's what it is really good at. People have used it as the single source of truth in event-sourced systems, but that does take additional work, and I wouldn't necessarily recommend it.

@mehrdadk.6816 3 ай бұрын

@@ConfluentDevXTeam partially agree with your points. Unfortunately, I've witnessed instances where developers have tampered with events in database after a domain field updated for example. While developers can edit event logs in a database, they cannot do so in Kafka. Also, I noticed that since database is earlier than Kafka people who are expertise in DDD design, tend to stick to database as a familiar tool that they trust. Over decade Kafka supports more than just being a simple event-streaming platform, transactions supported, compact topics, locking is supported, there was time we could say that Kafka is not enough. Some argue that Kafka can't guarantee ordering of messages as it's crucial in global event ordering, However, with careful design and configuration, Kafka can still achieve a high degree of event ordering and consistency. Also, my point when I said, we have database as a one point of failure, is because it can't scale as good as Kafka does. Even using CQRS will create another eventual consistency to our system. We need to implement a busy pull mechanism from projectors, and this put a lot of pressure on database - remember the first initial idea to have CQRS was to lift the pressure from database. So, IMHO while we can use database for implementing a nitty gritty event sourcing, we can evaluate Kafka that already provide them.

@s_konik 4 ай бұрын

Awesome video, thanks for doing this. Question: what strategy would you use to share the event log with other microservices? It seems like you have to copy everything from one service database to another microservice database. Kafka has retention period which is not suitable for event sourcing, I guess? This question follows me everytime I hear about event sourcing. How can we sync event log records among microservices in a correct way?

@ConfluentDevXTeam 4 ай бұрын

Wade here. Typically, you would use your database for permanent event storage, and then push the events to Kafka to share them with other services. You can set Kafka's retention period to be something crazy high (say many years) at which point you don't need to worry about it. Alternatively, since the events are always available in the original database, you can use a short retention in Kafka. If you build a new service that needs all of the historical events, you can push them to kind of a "catch up" topic for the new service to consume before it goes back to the standard topic. Long story short, original database for storage, Kafka for sharing the events.

@bibhavlamichhane2538 3 ай бұрын

I'm still trying to understand the database architecture for this. Will there be a single 'EventLog' table to store events for all aggregates, with the other tables only used for projection purposes?

@ConfluentDevXTeam 3 ай бұрын

Wade here. Typically, you'd have a separate event log for each type of Aggregate. So you might have an event log for "Customers" (eg. "customer_events"), one for "Orders", etc. Then, you would use other tables for projections as you mentioned. Of course, this can depend a lot on the database you are using. You may not even store your projections in the same database. You might use NoSQL for your Events and SQL for your projections, or some other interesting combination.

@trejohnson7677 3 ай бұрын

Thank god for Temporal

@ConfluentDevXTeam 3 ай бұрын

Wade here. Tell me more. What is it that you like about Temporal?

@Lucky-brook-1 4 ай бұрын

what is sourcing event managment ?

@ConfluentDevXTeam 4 ай бұрын

Wade here. I am afraid I don't understand your question. Can you clarify?

@alecmkov9124 3 ай бұрын

Having a traditional database with two hundred tables, 400GB size, running complex joins and relying on fast retrieval of current state... I just cannot imagine how we could query a virtually endless eventlog to compute the current state... What is the bridge to go from a relational database to this sexy eventlog?

@ConfluentDevXTeam 3 ай бұрын

Wade here. Assuming your numbers are correct, the first thing to recognize is that you have small data. If you can fit your entire database on a micro SD card, then storing events is likely not going to cause you any issues. Greg Young essentially says that unless your data storage outpaces Moore's Law, you can basically store events forever. I.E. Data storage tends to get cheaper over time and you just need to ensure you don't outpace that. And related to that is to consider how often these events arrive. If you are receiving millions of events per second, you might have a problem. If you are receiving tens of events per second, you probably don't. And of course, a well-designed event will take up less space than a poorly designed one. Avoid bloating them with too much data, and use compact serialization formats to keep the size small. Event Sourcing may not be the solution for every use case. Parts of your system might benefit greatly from Event Sourcing, others might not. If it's a critical part of your business that helps differentiate you from the competition, Event Sourcing might be a good option. If it's a relatively unimportant system that has little impact on your business, then it might not be worth it. The idea is to use Event Sourcing strategically. Apply it in places where it matters, and skip it where it doesn't. Don't just assume that every database in every microservice needs to be Event Sourced.

@gerardklijs9277 4 ай бұрын

Wrong, on many levels.

@MikeTypes 4 ай бұрын

How come?

@gerardklijs9277 4 ай бұрын

Mostly because in event sourcing, you validate before, based in the current events, to ensure the business rules apply. So you can never get in an inconsistent state.

@ConfluentDevXTeam 4 ай бұрын

@@gerardklijs9277 Wade here. The video never suggests that you can't validate your business rules in Event Sourcing or where you might do that. It mentions two possible inconsistencies. Both of them are in situations where you specifically aren't using Event Sourcing. The first is if you store both traditional DB entries and an Audit Log. There is no event sourcing happening at this point. This can lead to drift between the two. The second is when you encounter the dual-write problem if you try to store state and emit events in the same process. The point of the video is to suggest that event sourcing is the solution to these inconsistencies, not the cause.

@gerardklijs9277 4 ай бұрын

@@ConfluentDevXTeam so the video is not about event sourcing?

@aflous 4 ай бұрын

@@gerardklijs9277you're not getting the point here