Building Realtime Data Pipelines with Kafka Connect & Spark Streaming by Ewen Cheslack-Postava

  Рет қаралды 12,854

Spark Summit

Spark Summit

Күн бұрын

Spark Streaming makes it easy to build scalable, robust stream processing applications - but only once you’ve made your data accessible to the framework. If your data is already in one of Spark Streaming’s well-supported message queuing systems, this is easy. If not, an ad hoc solution to import data may work for a single application, but trying to scale that approach to complex data pipelines integrating dozens of data sources and sinks with multi-stage processing quickly breaks down. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges.
The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier. This talk will first describe some data pipeline anti-patterns we have observed and motivate the need for a tool designed specifically to bridge the gap between other data systems and stream processing frameworks. We will introduce Kafka Connect, starting with basic usage, its data model, and how a variety of systems can map to this model. Next, we’ll explain how building a tool specifically designed around Kafka allows for stronger guarantees, better scalability, and simpler operationalization compared to other general purpose data copying tools. Finally, we’ll describe how combining Kafka Connect and Spark Streaming, and the resulting separation of concerns, allows you to manage the complexity of building, maintaining, and monitoring large scale data pipelines.

Пікірлер
Jaidarman TOP / Жоғары лига-2023 / Жекпе-жек 1-ТУР / 1-топ
1:30:54
Caleb Pressley Shows TSA How It’s Done
0:28
Barstool Sports
Рет қаралды 60 МЛН
Andro, ELMAN, TONI, MONA - Зари (Official Audio)
2:53
RAAVA MUSIC
Рет қаралды 8 МЛН
"Apache Kafka and the Next 700 Stream Processing Systems" by Jay Kreps
28:56
Strange Loop Conference
Рет қаралды 44 М.
Real-time vehicle telemetry analysis with Kafka Streams
11:03
Alex Woolford
Рет қаралды 19 М.
Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)
32:18
Spark Summit
Рет қаралды 96 М.
Jaidarman TOP / Жоғары лига-2023 / Жекпе-жек 1-ТУР / 1-топ
1:30:54