Building Real Time BI Systems with Kafka, Spark & Kudu: Spark Summit East talk by Ruhollah Farchtchi

  Рет қаралды 13,915

Spark Summit

Spark Summit

Күн бұрын

One of the key challenges in working with real-time and streaming data is that the data format for capturing data is not necessarily the optimal format for ad hoc analytic queries. For example, Avro is a convenient and popular serialization service that is great for initially bringing data into HDFS. Avro has native integration with Flume and other tools that make it a good choice for landing data in Hadoop. But columnar file formats, such as Parquet and ORC, are much better optimized for ad hoc queries that aggregate over large number of similar rows.

Пікірлер
Who is More Stupid? #tiktok #sigmagirl #funny
0:27
CRAZY GREAPA
Рет қаралды 10 МЛН
Жездуха 41-серия
36:26
Million Show
Рет қаралды 5 МЛН
SLIDE #shortssprintbrasil
0:31
Natan por Aí
Рет қаралды 49 МЛН
Think Fast, Talk Smart: Communication Techniques
58:20
Stanford Graduate School of Business
Рет қаралды 44 МЛН
3. Apache Kafka Fundamentals | Apache Kafka Fundamentals
24:14
Confluent
Рет қаралды 502 М.
Lecture 3 | Loss Functions and Optimization
1:14:40
Stanford University School of Engineering
Рет қаралды 905 М.
Apache Spark Meet Up at Spark Summit East 2017
1:35:47
Spark Summit
Рет қаралды 4,5 М.
What is Apache Kafka®?
11:42
Confluent
Рет қаралды 382 М.
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН
Who is More Stupid? #tiktok #sigmagirl #funny
0:27
CRAZY GREAPA
Рет қаралды 10 МЛН