The Icehouse Revolution | Starburst Icehouse Architecture

  Рет қаралды 674

Starburst

Starburst

Күн бұрын

Apache Iceberg has rocketed to the forefront of the big data industry in recent years. When combined with Trino, the open source engine that powers both Starburst Galaxy and Starburst Enterprise the Icehouse architecture is born.
Sections:
00:00 - Why Apache Iceberg is winning the table format race
00:30 - Icehouse architecture is the new open data lakehouse
00:43 - Data lakehouses replace Hive data lakes
2:27 - Icehouse data architecture over Delta Lake and Hudi
04:02 - Why Iceberg defines Icehouse architecture
04:17 - Apache Iceberg and Trino
What is a data Icehouse architecture?
At its heart, an Icehouse architecture is two revolutions rolled into one. It uses the Apache Iceberg table format with the Trino query engine, and adds in a few other components like data ingestion, data governance, data management, and automatic capacity management. If you want to read more about Icehouse architecture, Starburst CEO Justin Borgman sets it out in the Icehouse Manifesto: www.starburst.io/blog/icehous...
You can think of an Icehouse architecture as a roadmap for achieving an open data lakehouse. This is a big shift, and is truly a revolution in data strategy.
You can think of this a revolution in two parts.
The first revolution pushes against the dominant data lake technology still in use today, Apache Hive. If you use production data lakes today, there's a good chance they use Hive. This is a legacy of its early position in Hadoop clusters. But as data velocity has increased exponentially, one of Hive's inability to easily update files has become a major drawback. The Icehouse architecture pushes against this, disrupting Hive's dominance and presenting a data lakehouse alternative along with the other data lakehouse table formats.
The second revolution pushes against Delta Lake and Hudi, the other data lakehouse table formats. Although all data lakehouse table formats collect more metadata and capture changes in the state of the dataset, only Apache Iceberg fully realizes the potential of an open data lakehouse. This potential basically amounts to achieving all of the flexibility advantages of a cloud object storage data lake and all of the performance of a data warehouse. This data warehouse-like experience is based on inexpensive cloud object storage using AWS, Azure, or GCP. It presents major cost savings for organizations that adopt it and embraces data openness.
#Data #DataAnalytics #DataEngineering #Trino #ApacheIceberg #Iceberg #Icehouse #IcehouseArchitecture #DataIngestion #Hive #ApacheHive #DataLake #DataLakehouse #DataWarehouse #OpenDataLakehouse #DataStrategy #DataRevolution #Starburst #StarburstGalaxy #DataIngestion #DataManagement #DataGovernance

Пікірлер
Vivaan  Tanya once again pranked Papa 🤣😇🤣
00:10
seema lamba
Рет қаралды 26 МЛН
터키아이스크림🇹🇷🍦Turkish ice cream #funny #shorts
00:26
Byungari 병아리언니
Рет қаралды 27 МЛН
Smart Sigma Kid #funny #sigma #comedy
00:25
CRAZY GREAPA
Рет қаралды 6 МЛН
What is Apache Iceberg?
12:54
IBM Technology
Рет қаралды 16 М.
What's New: Apache Iceberg With Snowflake | Summit 2023
22:08
Snowflake Developers
Рет қаралды 8 М.
Building an Open Data Lake House Using Trino and Apache Iceberg
47:06
Data Science Connect
Рет қаралды 3,9 М.
Data Lakehouses Explained
8:51
IBM Technology
Рет қаралды 80 М.
7 Best Practices for Implementing Apache Iceberg
57:01
Tabular
Рет қаралды 4 М.
Why You Shouldn’t Care About Iceberg | Tabular
20:26
Data Council
Рет қаралды 12 М.
Building an ingestion architecture for Apache Iceberg
1:01:06
Tabular
Рет қаралды 3,2 М.
Игровой Комп с Авито за 4500р
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 1,5 МЛН
Что не так с яблоком Apple? #apple #macbook
0:38
Не шарю!
Рет қаралды 60 М.
Hisense Official Flagship Store Hisense is the champion What is going on?
0:11
Special Effects Funny 44
Рет қаралды 2,4 МЛН
Собери ПК и Получи 10,000₽
1:00
build monsters
Рет қаралды 1,8 МЛН