ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka

  Рет қаралды 275,776

InfoQ

InfoQ

7 жыл бұрын

InfoQ Dev Summit Boston, a two-day conference of actionable advice from senior software developers hosted by InfoQ, will take place on June 24-25, 2024 Boston, Massachusetts.
Deep-dive into 20+ talks from senior software developers over 2 days with parallel breakout sessions. Clarify your immediate dev priorities and get practical advice to make development decisions easier and less risky.
Register now: bit.ly/47tNEWv
----------------------------------------------------------------------------------------------------------------
Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. She covers some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, etc.
Download the slides & audio at InfoQ: bit.ly/2ldN6P0
This presentation was recorded at QCon San Francisco 2016. The next QCon is in London, March 5-7, 2018. Check out the tracks and speakers: bit.ly/2hxsoN1
For more awesome presentations on innovator and early adopter topics check out InfoQ’s selection of talks from conferences worldwide: bit.ly/2lRQCll

Пікірлер: 105
@cenai61983
@cenai61983 5 жыл бұрын
Very good introduction to streaming ETL architecture and Kafka. Misleading title. Streaming ETL is just another way of implementing ETL. Traditional batch-oriented ETL doesn't have to be totally replaced by Streaming ETL.
@IntrepidClown
@IntrepidClown 5 жыл бұрын
Introduction to Kafka really starts at 17:36.
@tonybernoulli7859
@tonybernoulli7859 5 жыл бұрын
Comments like this are helping this world become better place
@mwandulu
@mwandulu 4 жыл бұрын
At a speed of 1.25 too
@ch012
@ch012 4 жыл бұрын
@@mwandulu You can go to 1.5 too with very little difference. :)
@bernardlowe5433
@bernardlowe5433 4 жыл бұрын
For me the whole talk was pretty good. See no reason to skip.
@oluwoleoyekanmi6052
@oluwoleoyekanmi6052 3 жыл бұрын
No reason to skip. The preamble puts things into context.
@niranchanadevirajmohan3232
@niranchanadevirajmohan3232 3 жыл бұрын
This was a well thought out presentation by sharing a brief introduction of existing systems, their limitations. And transitioning to the need for kakfa, the way it is designed and also explaining how the limitations are addressed by Kafka. Good one.
@smyk1975
@smyk1975 7 жыл бұрын
Great architecture overview of Kafka Streams. Convinced me to look deeper into the Streams API and capabilities.
@filipedelbel
@filipedelbel 6 жыл бұрын
Very clarifying explanation about Kafka, helped me a lot to understand the concept.
@MrNau007
@MrNau007 5 жыл бұрын
2 Observantions: 1) History of ETL - missed the entire evolution of data warehouse from MIS systems 2) example of old and new “T”. You applied “remove PII fields” at streaming platform . Who will identify what is this common transformations which would have to be applied at streaming platform. One benefit is : one higher level of abstraction
@renatoalencar4451
@renatoalencar4451 5 жыл бұрын
So many angry comments. It's just an attractive title, not an actual PhD thesis.
@gcbzzzz
@gcbzzzz 5 жыл бұрын
"event and batch have tradeoffs. now ignore the trade offs and try to use streams for everything" :/
@msambare
@msambare 6 жыл бұрын
Great presentation and talk. Now I want to explore streaming platforms in detail.
@Ravi86055
@Ravi86055 6 жыл бұрын
Great content... useful information
@sharathchandra5314
@sharathchandra5314 5 жыл бұрын
Nice Presentation...I would like to know what Vendors of ETL Tools like Informatica, DataStage ..etc., has to say about their products in the sense of this briefing..bec these two are quite busy in coming up with new versions.
@abobakrnasr9814
@abobakrnasr9814 3 жыл бұрын
Wonderful talk...than you so much Neha for the presentation.
@chakrapanireddy1358
@chakrapanireddy1358 7 жыл бұрын
Really helpful.. Nice explanation..
@audreymciver4863
@audreymciver4863 5 жыл бұрын
all Principles should be implemented in any streaming data to be in compliance at all time.
@im2crazyin
@im2crazyin Жыл бұрын
Very informative, precise and too the point introductory talk on data streams. It gives enough information that one knows why and when to look for streaming solutions and one also knows what specific areas to dig in for once they decide to go for such solution.
@flynntsang
@flynntsang 6 жыл бұрын
This is an intelligent and articulate overview of how Kafka in particular manages increasing volume, velocity and variety of "big data" using real-time streams. It may not resonate with everyone; not everyone needs this. Excellent for those getting started with streaming data and transitioning away from messaging queues or redundant ETL processes.
@susmitdey9172
@susmitdey9172 3 жыл бұрын
ETL and EAI probably addresses different problems compared to streaming, practically according to me streaming is more of using capabilities of the platform to integrate rather than using a tool to do ETL or Real-Time it addresses the data transfer logic so we can avoid tools, correct me if I'm wrong.
@chandraprakashmatam672
@chandraprakashmatam672 5 жыл бұрын
Though it is not paradigm shift, the approach given here eventually modern EDW with real time streams.
@dx4816
@dx4816 4 жыл бұрын
The "messy" diagram can simply be redrawn to match the Kafka-based diagram. Lots of good information, but the real differentiate is not the integration patterns. Anyway, Kafka is a great product.
@manojsembekar5703
@manojsembekar5703 5 жыл бұрын
great thanks for information..
@jocalvo
@jocalvo 5 жыл бұрын
ETL's are not dead, they just transformed. The KEY is not apache kafta, the key is DATA ARCHITECTURE, otherwise it will add more mess.
@davidk7212
@davidk7212 11 ай бұрын
Not all data is big data, and all data will never be all big data. There will always be a huge place for standard ETL.
@ericpham6192
@ericpham6192 4 жыл бұрын
Share distribute processing by using percentage of iddling resource in cloud sharing processing network
@MrLyonliang
@MrLyonliang 5 жыл бұрын
Thanks a lot for explaining clearly about: what happened yesterday, what's the pain point, what's the new requirement, and HOW.
@babylon_bob
@babylon_bob 5 жыл бұрын
I think instead of saying ETL is dead, just say I have not clue. I've never in my life recreated two streams to process the same data into different destinations (12:59) I'd do exactly the same as at (13:29) but with ETL tools.
@prashanthtalla
@prashanthtalla 5 жыл бұрын
I agree. If step 2 is same for both the destinations, why will you repeat for Cassandra. You'll just add that destination also to the load logic of the existing ETLs. I wish the speaker gave better example where we may end up doing this and how streaming could have helped. I believe streaming is advantageous from cost perspective (ETL tools are super expensive) and for real time very large volumes, they cannot scale. I'm also not sure if streaming really solves this problem - I've yet to work on streaming technologies.
@sanchitkumar9862
@sanchitkumar9862 5 жыл бұрын
Absolutely True, No one is dumb enough to run the computation twice when we have the option of adding the data to multiple destinations.
@navalsaini
@navalsaini 5 жыл бұрын
A very well structured talk. Thanks for it. :-)
@ericpham6192
@ericpham6192 4 жыл бұрын
Can parallel processing in bandwidth fill multiple packages help in big data and distributed database and buffering work in hand in hand help in streaming. Also 3d volume fill data storage and extracting data format
@sreeRocksRocks
@sreeRocksRocks 6 жыл бұрын
Great video and gave overall idea on what Kafka is and how to play with it in real use cases. Excellent and kudos!
@michalmefli
@michalmefli 7 жыл бұрын
Great talk.
@kevinshoang
@kevinshoang 3 жыл бұрын
April 2021, Batch is still more popular than stream.
@piggybox
@piggybox 4 ай бұрын
6 years later, ETL is still alive
@IliaTernovich
@IliaTernovich 6 жыл бұрын
31:56 link to video please. Unfortunately can't hear names clearly
@anant3104
@anant3104 5 жыл бұрын
Great and it is very helpful, thank you
@arunasjunevicius533
@arunasjunevicius533 6 жыл бұрын
Really? Data integration and Application integration is not the same. ETL and EAI solve two totally unrelated problems. And how can one say that MQ does not scale when if one want's to scale he can choose DDS or whatever different messaging technology.
@xfactor740501
@xfactor740501 4 жыл бұрын
Great presentation. She makes it look simple...Does anyone know the program used to create the presentation?? I like the look, as though was drawn ""free-hand"....very sharp
@tansudasli
@tansudasli 5 жыл бұрын
data integration and service integration layer are handled by different products on the market. that's the main problem. and it is good to see them in a convergent approach. that's why Kafka is on the spot. this convergence brings organizational effectiveness to enterprise. because you can now combine BI's ETL team and Middleware team, so you can get holistic integration capabilities which will also creates advantage point for transformation. on the other hand, scalability is a relative concept. in an enterprise, EAI or ESB is scalable. ETL is batch oriented but it is feasible for an enterprise's near realtime concerns.
@RicardoMontee
@RicardoMontee 4 жыл бұрын
7:30 "ETL (Extract Transform Load) and EAI (Enterprise Application Integration) are outdated"
@anuragakella
@anuragakella 4 жыл бұрын
Awesome presentation skills.and clear explanation about ETL changing from batch to Real -Time
@gauravaithmia
@gauravaithmia 3 жыл бұрын
First 15 minutes are more like a pitch deck dumbed down for a VC.
@veerun3104
@veerun3104 6 жыл бұрын
ETL is not only meant for data integration.. what about business intelligence and analytics apps..
@8Trails50
@8Trails50 4 жыл бұрын
I think they are saying ETL in the form of ingestion of data INTO some tool. Not Spark or Hadoop jobs. In that case you could just subscribe to Kafka.
@md.mottakinchowdhury7898
@md.mottakinchowdhury7898 2 жыл бұрын
Misguiding. Why would batch processing be dead if it is just enough to do batch processing of your data?
@jersute
@jersute 6 жыл бұрын
the T in ETL has nothing to do with scrubbing ('data cleaning') or normalization. if you're using ETL to scrub you're already too late in the pipeline and using a hammer as a screwdriver when you want a paintbrush. it's gibberish. ETL is for data snapshots to move between environments where you want only a subset of the data but it is transactionally stable. ETL is how you leave the house. Kafka is the road you drive on to deliver the payload from said house. different topics. Kafka should be viewed as a simd replacement for amqp/zmq or as she has presented it a comparison vs elk for log processing as a limited use case. the streams discussion should be compared with apache storm for analytical capability or a distributed replacement for memcached performance counters. local state is a poor way of saying cache locality and migration. this talk is all over the place. no mention of the problem of dealing with subaggregation and priority dependency issues inherent in kafka/storm without explicit payload tagging or reentrant use of the architecture in general as befits any simd speedup discussion. if you are familiar with the concepts of noshared architectures for data presentation and want a messaging solution with the same principles then kafka may interest you. do not expect magic.
@allanhouston22
@allanhouston22 4 жыл бұрын
Kafka is not a ETL replacement, it is a streaming/message broker. ETL is a platform that offers adapters for receiving and writing data from/to multiple source/destination types (files, DBs, queue systems), its a centralized mapper tool (say XMLCVS), and supports various integration patterns (best practices). So ETL can typically be used to read/write from/to Kafka while it is performing mapping so that the destination system understands what the source system is trying to send, in real-time. EAI systems, another platform type she mentioned, are particularly written for event/real-time purposes so its more suitable platform for such type of work as it supports transactional behavior and unified monitoring of what is flowing through it, in addition to adapters and centralized mapping. How this woman managed to compare oranges and apples without receiving more down votes it beyond me.
@void0818
@void0818 6 жыл бұрын
E --(k)-- T --(k)-- L this is where the kafka in ETL, ETL will never dead but kafka is a good stream used in ETL processes.
@blobbyflobby6752
@blobbyflobby6752 3 жыл бұрын
ETL is dead. Long live ETL!
@zisispontikas2038
@zisispontikas2038 5 жыл бұрын
36:50 come on. You just took the stream processing java app and the dashboard app and put them inside in one application. So the database is inside kafka and the job processing and dashboard are merged. There should have been 2 boxes not 1
@MM-zd8sx
@MM-zd8sx 4 жыл бұрын
Very helpful video! Quality content and great presentation. Stellar job
@audreymciver4863
@audreymciver4863 5 жыл бұрын
im only using this to identify any hackers uploading anything of any kind. number one it was without my permission. hacking is a federal offense.and it violates my privacy rights.
@allmhuran
@allmhuran 5 жыл бұрын
ETL is outdated? That's news to any company that has no need to process terabytes of data in real time. This is the problem with keynotes from super giant companes. They only speak from the perspective of a super giant company. The overwhelming majority of enterprises do not have scale problems in this category, but people from such companies walk out of the keynotes thinking "yeah, this is what we should do!". No, you probably shouldn't.
@pajeetsingh
@pajeetsingh Жыл бұрын
Just use dma.
@pajeetsingh
@pajeetsingh Жыл бұрын
By Mark 5:00 you'd figure out all the shenanigans regarding streams, data integration and why these corporate tech lords created Kafka. Good presentation.
@wennwenn1422
@wennwenn1422 5 жыл бұрын
this butthurted all ETL folks..
@sanchitkumar9862
@sanchitkumar9862 5 жыл бұрын
It's a very harsh statement to say ETL is dead. No, ETL is not dead.
@dantepraxedis
@dantepraxedis 4 жыл бұрын
catch title
@podunkman2709
@podunkman2709 4 жыл бұрын
You confuse two loosely connected areas. Kafka is NOT the successor to ETL. ETL is a completely different group of products with a completely different application. Kafka may be the next generation of ESB. In addition, you must know that in the vast majority of companies around the world their "even driven architecture" is MS Excel. Why companies like Google or Faceook have their power? Because they are really unique. Meanwhile most of companies do things like 20 years ago. For them ETL is miracle. They do not need any Kafka. It's beyond their perception.
@saurabh3614
@saurabh3614 6 жыл бұрын
this is not at all comparable, Both meant for different purpose. I doubt if she has ever looked at the DWH code and design .And bet you if you show me one single implementation which include complete fact table design to solve customer business problem
@Ranjan316
@Ranjan316 5 жыл бұрын
Saurabh u are right, if u look at her work history she worked for just 1 company ( linkedin) and took kafka out as a new company, she is trying to just make money out of that......she has no idea why facts and dimensions are needed, you add any stream someone needs to transform them into data which data analysts or data scientists can use,
@chrisl.9750
@chrisl.9750 3 жыл бұрын
ETL is not dead and if you want to be taken seriously in the world of data, I recommend you drop this suggestion...
@vikramachandranselvakumar6316
@vikramachandranselvakumar6316 5 жыл бұрын
The speaker has no inkling of what ETL is or what a Datawarehouse is and how they are architected, designed, developed, provisioned and sustained. Apache Kafka is great open source tool for integrating streaming data into your data lake and is not a paradigm that will replace technology agnostic paradigm name ETL. I have used Spark SQL to accomplish/realize a ETL based solution. Again Spark SQL is a tool and not a paradigm.
@chandanjha3205
@chandanjha3205 4 жыл бұрын
It was a nice presentation but majority of data generated by user actions are still stored in databases(SQL,Oracle) and thus ETL tools like SSIS are still needed to read them and send processed data to destinations. Some data could be in flatfiles but not too often seen these days unless we are gathering from multiple public sources. Whenever I try to read into the minds of speakers in youtube presentations to see why they are using Kafka or Spark, all they give is an example of 'word count' which is sad. Take an example of Spark, sure it can do distributional computing but so can a lot of other tools too if you have an array of cheap servers.
@dataguygamer
@dataguygamer 6 жыл бұрын
Trolling title... I'm not sure if the speaker would approve of this title. It opens her idea for ridicule
@robinsoncarter3432
@robinsoncarter3432 6 жыл бұрын
hello you use chroma key in this video?
@nareshgb1
@nareshgb1 6 жыл бұрын
elsewhere: kzbin.info/www/bejne/anTOg5itorehiMU
@MelvinStudios
@MelvinStudios 4 жыл бұрын
Do you even know what "dead" means? ETL is used in many companies. Therefore ETL is not dead. Floppy disk is dead.
@attilaviniczai7215
@attilaviniczai7215 3 жыл бұрын
I love how americans can make acronyms out of the most important words in a title and just assume everyone knows what they abbreviate. It always amazes me how they try to get thoughts across an audience with a bunch of these 3 letter, context specific, magic words flying around.
@Yi5Zhou
@Yi5Zhou 3 жыл бұрын
you don't have to use this kind of name to attract viewers
@VoxNerdula
@VoxNerdula 5 жыл бұрын
I vant to try her curry
@debashishroy3485
@debashishroy3485 6 жыл бұрын
I think you are HR rather Technical ...from you scrap it is clear that you don't know both hadoop and ETL
@NothingMatress
@NothingMatress 6 жыл бұрын
Think again.
@Ranjan316
@Ranjan316 5 жыл бұрын
She is clueless i am shocked she is even allowed to talk at a summit
@IA-xh5ly
@IA-xh5ly 6 жыл бұрын
From what I’ve heard from this lady I’m making assumption she has a very little experience in ETL development (manual validation for example), she just follows the modern fashion.
@Ranjan316
@Ranjan316 5 жыл бұрын
Igor Andriychuk yup, lets see how much longer silicon valley supports such scam artists in the name of VC funding....
@KC-zn4gt
@KC-zn4gt 6 жыл бұрын
It's a shame someone knows on just one tiny topic thinks she knows how it works and applies for all. On the final diagram there an icon of a DWH, I wonder how she explains how that DWH is getting populated without ETL. Oh...she probably thinks that is readymade available for her to stream from. lol.
@onlyitj
@onlyitj 6 жыл бұрын
You will be subscribe to multiple topics, and using Stream API process those message, which can potentially do the job.
@darshansangodkar6173
@darshansangodkar6173 6 жыл бұрын
I wish this presentation was given by some techie guy.
@tinameh
@tinameh 4 жыл бұрын
Darshan Sangodkar really? I’d like to actually hear your tech talk some day. Do pick a deeply technical topic please. And an original one while you’re at it. If you struggle with that though, drop me a word. Happy to share some tips.
@atulavhad1661
@atulavhad1661 4 жыл бұрын
@@tinameh I guess many are not aware that she was amongst ppl who built Kafka, I have seen her other talks and I found those enlightening and also built a unicorn startup.
@debashishroy3485
@debashishroy3485 6 жыл бұрын
bullshit ...I don't know which platform give these people to open their mouth even they don't have clear knowledge this shows the quality of Indian IT managers and Leaders
@Ranjan316
@Ranjan316 5 жыл бұрын
Completely agree , kafka is nice technology but this person doesn’t seem to have any idea about enterprise architecture or problems ETl tries to solve.....
@20cmusic
@20cmusic 5 жыл бұрын
2018. ETL is still alive. I really hate this kind of marketer style shitty title.
@b4bhanu
@b4bhanu 5 жыл бұрын
click bait title... kafka is great but this talk is a disaster
@rajeshn5829
@rajeshn5829 5 жыл бұрын
U r relly pretty
@nguyen4so9
@nguyen4so9 7 жыл бұрын
Crap talks. ETL is a concept that is always there.
@TheEnfernuz
@TheEnfernuz 6 жыл бұрын
She doesn't deny it in the talk actually. She says that the batching ETL is dead / outdated, and now the streaming ETL is a way to go. Though I agree that part of the title is a bit misleading.
@temaz3334
@temaz3334 6 жыл бұрын
Shitty comment. U dont even understand what she is talking about.
@flipper71100
@flipper71100 6 жыл бұрын
People always have a tendency to resist change, as a result, they don't listen carefully
@jcrshankar
@jcrshankar 6 жыл бұрын
she mentioned etl tools not concept
@Ranjan316
@Ranjan316 5 жыл бұрын
Shankar K the title says ETL is dead, she is dumb as a rock....
@ShivaKumar-ps1vh
@ShivaKumar-ps1vh 3 жыл бұрын
Not worth......
@jianhuang7993
@jianhuang7993 6 жыл бұрын
This talk is a disaster
@danpal6737
@danpal6737 5 жыл бұрын
Rubbish material, holy cow letter from india
@MalleusDei275
@MalleusDei275 Жыл бұрын
Lol, A silica nigre.... 😉
@MalleusDei275
@MalleusDei275 Жыл бұрын
Your mum should have advice to you for dont play with the Hammer... Yes, Tyrannosaurus burgers were greeeeeeat.
@msftora3
@msftora3 3 ай бұрын
BullSsssssssst
Processing Streaming Data with KSQL - Tim Berglund
48:59
Devoxx
Рет қаралды 55 М.
Omega Boy Past 3 #funny #viral #comedy
00:22
CRAZY GREAPA
Рет қаралды 35 МЛН
CAN YOU HELP ME? (ROAD TO 100 MLN!) #shorts
00:26
PANDA BOI
Рет қаралды 36 МЛН
Do you have a friend like this? 🤣#shorts
00:12
dednahype
Рет қаралды 53 МЛН
Introduction to Apache Kafka by James Ward
49:48
Devoxx
Рет қаралды 278 М.
Scaling Facebook Live Videos to a Billion Users
51:31
InfoQ
Рет қаралды 87 М.
Is Flink the answer to the ETL problem?  (with Robert Metzger)
1:04:27
Developer Voices
Рет қаралды 4,8 М.
How Slack Works
49:54
InfoQ
Рет қаралды 151 М.
System Design: Why is Kafka fast?
5:02
ByteByteGo
Рет қаралды 1 МЛН
Эффект Карбонаро и бумажный телефон
1:01
История одного вокалиста
Рет қаралды 2,7 МЛН
😱НОУТБУК СОСЕДКИ😱
0:30
OMG DEN
Рет қаралды 3,1 МЛН
cool watercooled mobile phone radiator #tech #cooler #ytfeed
0:14
Stark Edition
Рет қаралды 7 МЛН
Очень странные дела PS 4 Pro
1:00
ТЕХНОБЛОГ ГУБАРЕВ СЕРГЕЙ
Рет қаралды 375 М.