Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or Frenemies?

Рет қаралды 89,251

Күн бұрын

Пікірлер: 76

@santiagopouget3473 4 жыл бұрын

Dear Kai, my congratulation about this material, i was a really good base for non technical people like. Good and great job (accurate, realistic and technical based). Thank you

@shravanparepally3551 4 жыл бұрын

If I'm clearing my interview at big 4, consider my big thank you already :-) this presentation is an excellent summary

@abobakrnasr9814 4 жыл бұрын

Excellant video Kai, very informative and you explained it in a very nice way. The presentation just put the right context for me to understand the new era technologies. Thank you so much and waiting for more videos

@sukumard 2 жыл бұрын

Thanks for this Kai, very very useful to explain to a 175 year old company why they need to change their ways of working to enter the new Digital way of doing things.

@sitaluk21 3 жыл бұрын

This is an excellent presentation, putting this whole thing like a story. Amazing articulation 👏 thank you very much 😊

@elodiechaumet-doucet1045 2 жыл бұрын

Super! Got a middleware past and need to understand the concept of Events integration and how it could be frenemy with existing middleware.

@philosopher46 4 жыл бұрын

So long but worth watching it, thank you ka

@JohnTube2K 3 жыл бұрын

Nice video. This is what I come across as an EA at my job, there is already a sunk cost in legacy technology and it’s a struggle to get business to update their technology unless there is a true business or technical need.

@ralfik14 4 жыл бұрын

Very good movie. Synthetic knowledge provided in easy way. I recommend to watch it for everybody interested in modern ways of data processing.

@JoaquinPonte 5 жыл бұрын

This video is amazing, you presented everything in a clear way, congratulations

@welbsantos 3 жыл бұрын

Excellent video !!! It was very helpful!

@tomw0815 4 жыл бұрын

Nice explanation with a good example of the Host-unloading. But this is "only" the technical side. I assume that the most complex part will be the business definiton of "what is an event", "what data is needed for other systems to process an event", "how do you ensure a certain order of event processing through systems that listen to the event stream - first system A, then System C, then B"? It's not done putting some database changes in the event stream. No one on the other side can do anything without more process context of that changes.

@kai-waehner 4 жыл бұрын

It really depends on the use case. In some cases, it is "that simple". But in other cases, in some scenarios, you don't integrate with the IBM DB2 on the mainframe or the SAP Hana directly (for the reasons you described), but integrate with a high-level API, for instance, SAP's business APIs like BAPI or iDoc.

@ammarhassan4571 4 жыл бұрын

Indeed very informative, especially the conclusion the right tool for right job, because this is where Architects and companies can't decide well and try to do every job in same tooling and ends up in having very high maintenance code/projects, and sometimes integration layer become bottle neck issues and stop the business from achieving agility.

@augustohdzalbin 4 жыл бұрын

Good explanation. Thanks, Augusto

@ashuetrx 4 жыл бұрын

What is the rate that the traditional MQs can handle & Apache Kafka ? I understand it will also depend on the consumer rate but lets say take an average consumer . Rate of Kafka answered @ 18:08

@kai-waehner 4 жыл бұрын

The short answer is that traditional MQ can handle around 100-1000 messages per second per broker, Kafka factor 100 (= 10000-100000 and more) per broker. Some more modern MQ systems like RabbitMQ scale better, but still provide various limitations regarding scalability, reliability, etc. compared to Kafka. The throughput is similar. Kafka can process Gigabytes per second in a cluster. MQ systems were not built for scale. Hence, each MQ deployment is typically its own. Thus, you would have to deploy 100s of MQ systems to get the scale of one Kafka cluster.

@srik006 4 жыл бұрын

Excellent video.. One question. So you recommend kafka only for read ? what about write ? can we write into KAfka and which intrun updates the mainframe ?

@kai-waehner 4 жыл бұрын

@Srikant: No. Read and offloading is just the simplest part of mainframe integration. Please check out the following material (blog, slides, video) about bi-directional integration between Kafka and Mainframe: www.kai-waehner.de/blog/2020/04/24/mainframe-offloading-replacement-apache-kafka-connect-ibm-db2-mq-cdc-cobol/

@srik006 4 жыл бұрын

Kai Wähner thanks .

@srinub523 5 жыл бұрын

Very clear explanation. Thank you.

@marsimark 4 жыл бұрын

A/B testing for middleware services is a new idea to me

@TheGeoDaddy 4 жыл бұрын

My basic question is - what is in Kafka - that you can’t “code your own” if you already have MQ implemented on the IBM mainframe and on Websphere across a distributed platform? Yes, Kafka is bullet tested and provided a higher level user interface/functionality. But there’s something to be said about keeping knowledge of something so - application dependent - in-house...

@kai-waehner 4 жыл бұрын

As always in software business, it depends, there is no short answer to this question. Of course, you can always code your own stuff. Against IBM MQ, #1 is scalability - i.e. if you need to process high volumes of data. That cannot be done with IBM MQ. However, also all the additional stuff like data processing and data integration in Kafka help "out-of-the-box" in a reliable and scalable way. You can code your own solution (or combine different products from the IBM Websphere portfolio). That is more complex, probably more expensive, and does not scale well (for many use cases). To be clear, there is also some use cases which only IBM MQ can handle. For instance, if you need transactional integration with IMS on the mainframe, then Kafka is probably not the right choice, but more complementary. I see many customers integrating IBM MQ and Kafka (e.g. via Confluent's IBM MQ Connector) to leverage the best of both worlds.

@TheGeoDaddy 4 жыл бұрын

Thank You (Viele Dank?) Asked the question - BEFORE - getting thru the entire video and that was pretty much the Caveat at the end...I was specifically thinking about the last scenario where the Bank still uses IMS as its main data repository that runs the critical batch every night and has all the “answers” by the AM. The bank uses Kafka but more for fringe applications and experimentations... but the “meat & potatoes” is millions of transactions coming in during the - in bitstrings that requires Assembler code to pre-process before we can even use COBOL to update IMS... and that has years of trial and error and fixes that would all have to be experienced again... going to any other system... because NO ONE really knows ALL it does... it just works... and we can ETL, MQ and Kafka the results. 😏

@kai-waehner 4 жыл бұрын

@@TheGeoDaddy Also check out this blog (including slides and video) for more details about mainframe integration (it even shows a 3rd party tool that can do transactional end-to-end integration between Mainframe and Kafka): www.kai-waehner.de/blog/2020/04/24/mainframe-offloading-replacement-apache-kafka-connect-ibm-db2-mq-cdc-cobol/

@TheGeoDaddy 4 жыл бұрын

Thanx!

@jksharma7 3 жыл бұрын

Thank you for a wonderful knowledge Sir

@ComisarioLobo 5 жыл бұрын

Nice video Kai, how does Kafka compare to Apache Pulsar and Nats? Also when Kafka is gonna be fully cloud native?

@kaiwaehner5702 5 жыл бұрын

Thanks for the feedback, Santiago. My thoughts about your question are opinionated, of course, as I work for Confluent. Pulsar has a pretty similar approach to Kafka with some pros and cons. I think the main difference is that Kafka is battled-tested and adopted all over the world in almost any big company. Pulsar is used in a few projects, but you really need to find a good reason not to use Kafka. This Twitter post from December 2018 is a nice story around this discussion: blog.twitter.com/engineering/en_us/topics/insights/2018/twitters-kafka-adoption-story.html I don't know NATS well. But I think the key difference is that it is a messaging platform while Kafka is an event streaming platform (including messaging, storage and processing). Therefore, Kafka is used for much more than just messaging today. For instance, almost every microservice architecture is built using Kafka because it decouples the microservices well with its feature set like storage, event log and really decoupled producers / consumers. We are not far away from Kafka being cloud-native. Stay tuned and follow the announcements and KIPs (Kafka Improvement Proposals) of next few Kafka releases.

@KarstenHeymann 4 жыл бұрын

Just a very small note from a fellow german: "Event" is pronounced "Iwänt" with an emphasis on the "ä".

@krystianfeigenbaum238 4 жыл бұрын

another fellow German here - the problem is where he stresses. -> [i'wänt] (not ['i:wänt])

@raydickenson6511 4 жыл бұрын

@@krystianfeigenbaum238 You guys are focusing on a slightly-off pronunciation of "event" but saying nothing about the very-wrong pronunciation of "paradigm" (the "g" is silent). I really appreciate the video, Kai!

@sunildevan 5 жыл бұрын

Really a good one. What program languages it has been built using. Trying to understand what skills would one need to extend or customize?

@theoquasi 5 жыл бұрын

sunil dev COBOL

@deonvanniekerk871 4 жыл бұрын

Great video, really explained it well. Appreciate your contribution.

@JohnTube2K 3 жыл бұрын

subscribed!!

@1m1r0z 3 жыл бұрын

Thank you Kai

@mitenmehta79 5 жыл бұрын

whole agenda was described but not really clear even till end how existing middleware can be replaced easily with kafka without any loss of features. also its pull only so if existing clients are using push based then how to replace easily ?

@kai-waehner 5 жыл бұрын

This is just a high level talk. I did not cover a specific replacement in detail. It depends on how much of the existing middleware you want or need to replace. In general, Push vs. Pull has important differences and the Kafka API provides technical details to handle things as you expect it. But on a high level you just replace the JMS Push-based consumer with the Kafka Pull-based consumer and still receive all messages. For example, you could replace a JMS-based broker with Kafka completely and just use the Confluent JMS Client instead of the existing one (docs.confluent.io/current/clients/kafka-jms-client/index.html). It implements JMS with Kafka under the hood. This way you don't even need to change the client implementation (but note that it has some limitations in feature support). Another option is to keep the existing producers running to the JMS broker but just consume from a Kafka consumer instead with the general Kafka Consumer API.

@gopalakrishnans2003 4 жыл бұрын

Thank you. I think Workday Studio also uses Kafka on Confluence. I'm a beginner here. Very interesting presentation. How can I learn more / start from zero on Apache Kafka.??

@doctari1061 3 жыл бұрын

Very nice. Thanks

@ayyapusettykiran7005 5 жыл бұрын

Hello Kai Wahner, I just want to know that do Kafka handles the SCD type 1, Type 2 which we handle in ETL, other than this we will use ETL to do lot more, where we implement many Data Warehousing Concepts. Can we achieve all those through Kafka?

@kai-waehner 5 жыл бұрын

Well, it depends on what exactly you want to do. Kafka is not a replacement for a traditional data warehouse. However, you can easily built a real time streaming infrastructure, which can also handle and store state and structures for further analysis. Slowly changing dimension (SCD) type 2 (add new row) is the default of Kafka's log. Type 1 (overwrite) is provided with the Kafka feature "compacted topics". Type 3 (new attribute) is provided with Confluent Schema Registry and Apache Avro, which is built in. Thus, many features you are looking for, are built into Kafka's core, but you still need to doublecheck if it is the right tool for your use case.

@onewizzard 4 жыл бұрын

What happens when you need production support due to a bug?

@kai-waehner 4 жыл бұрын

Various vendors support Apache Kafka, including Confluent, IBM, TIBCO, Cloudera, and others. The quality of support and expertise differs significantly. Also, most vendors don't support the full Kafka solution but exclude features like Kafka Streams or Exactly-Once-Semantics. Confluent is the leading Kafka vendor due to its huge commitment and focus on this Apache project.

@onewizzard 4 жыл бұрын

@@kai-waehner we use Spring Framework, I'd be interested in that tutorial

@normanfung7124 4 жыл бұрын

34'15" Kafka not best for real time communication with latency micro seconds range..? I haven't used Kafka but like to hear more on this.

@kai-waehner 4 жыл бұрын

It is always important to define the term and requirement „real-time“. Kafka can process events end-to-end (from producer via broker to consumer) in 10+ milliseconds (even millions of events per second). Hence, it you need faster processing (e. g. trading on the stock market), other (proprietory) products have to be used.

@Pifagorass 3 жыл бұрын

Real time doesn't mean low latency. For low latency solutions can look at

@Pifagorass 3 жыл бұрын

@@kai-waehner www.quora.com/What-are-some-alternatives-to-Apache-Kafka/answer/Pranas-Baliuka?ch=10&oid=279463270&share=1452513f&srid=hM0m&target_type=answer

@Pifagorass 3 жыл бұрын

Active - passive is legacy. Read about Island architecture before throwing such phrases as marketing pitch ...

@Pifagorass 3 жыл бұрын

'Kafka as cashing layer' - hmmmm? I'll not donwote, but next time consider giving marketing slides for review by technical team ;)

@vadymmishchenko67 4 жыл бұрын

I don't understand how is for example Tibco EMS, or Tibco RV(no broker) or ActiveMQ has less of event base nature than a Kafka ? where is the difference ?

@kai-waehner 4 жыл бұрын

Both are event-based. There is no difference regarding this paradigm. However, Kafka is not just a message queue (where messages are deleted when consumed). Instead, Kafka also persists the data after consumption. This way one consumer can consume in real-time while another one at a later point in time (batch, request-response, etc). This way, you get real decoupling between producers and consumers. Other differences to TIBCO and ActiveMQ are much higher scalability, rolling upgrades, etc. Also check out this article for more details:: www.confluent.io/blog/apache-kafka-vs-enterprise-service-bus-esb-friends-enemies-or-frenemies/

@dmitriishapkin8578 4 жыл бұрын

Hi, could you explain why having one integration team instead of keeping specialists in development teams has become an anti pattern? Thank you

@kai-waehner 4 жыл бұрын

In most bigger organizations (or even projects), it creates a single bottle neck (both people and technology). For smaller projects, one single integration team is fine, of course.

@visasimbu 5 жыл бұрын

Good info.

@timbeil 4 жыл бұрын

Do you know Smalltalk(-80)? Events and Messages are allready out there all the time, long ago.

@kai-waehner 4 жыл бұрын

Did you get watch the video? I am also saying that messaging exists for 20+ years. And if you compare the content of this talk to Smalltalk, then you compare not apples and oranges, but apples and chocolate.

@maxmag76 5 жыл бұрын

Really Nice explanation! Thank you

@sujukrish 4 жыл бұрын

very nice

@p0rti100 4 жыл бұрын

Nice slides... Small typo that made me laugh: "Eat your own dog GOOD"!

@Calphool222 3 жыл бұрын

"paradigm" is pronounced "peh-ruh-daim" not "para-dig-um"

@Pifagorass 3 жыл бұрын

Read Kafka topic again and again for each training epoch ... It wasn't good example at all ☺️

@jianchiwei5379 3 жыл бұрын

Event: pronounce it stressing the second syllable; so is the word development

@danielkrajnik8865 4 жыл бұрын

ooh its event that doesnt sound new at all

@Pifagorass 3 жыл бұрын

95% correct, but for the rest I was tempted to thumbs down. E.g. 'legacy active passive' or 'machine learning training reading Kafka over and over again', or let's use Kafka as 'cashing laeyer' for directly queries by web clients, or one can easily migrate from Confluent platform to Apache Kafka open source ...

@kai-waehner 3 жыл бұрын

There are always different perspectives. Each of the points you mention is worth its own discussion or presentation. And terms like "easily" can be interpreted differently. Hence, thanks for your comment. I will try to be more clear on these topics in future talks.

@Pifagorass 3 жыл бұрын

@@kai-waehner yes, consider expanding on such statements. E.g. how Kafka can contribute in ML or why one would like RAFT for leader election. It's never easy to escape vendor lock-in and should not be part of genuen marketing and explanation of benefits having commercial solution would be not deceving direction.

@kai-waehner 3 жыл бұрын

@@Pifagorass For ML, I have plenty of blog posts and videos. Just google for "Kafka Machine Learning". Here is a talk that also covers model training from Kafka topics in more detail: kzbin.info/www/bejne/i5iapIKDjLqIl80 For the vendor lock-in discussion, it is not just marketing. I agree that it is not for free (meaning it takes some efforts), but it is much easier to migrate from Confluent Platform to Apache Kafka than e.g. from Oracle to MySQL or IBM MQ to RabbitMQ as the heart of CP is AK, i.e. the same code and infrastructure. But I agree that it is important to point out that the migration is not just a button click :-)

@Pifagorass 3 жыл бұрын

@@kai-waehner more focused video makes sense not in for 'consume from Kafka over and over again' can be intercepted as replaying topic for each training epoch. Thanks 👍 for the more specific video.