Building a Multimodal Data Lakehouse with the Daft Distributed Python Dataframe

  Рет қаралды 336

Databricks

Databricks

Күн бұрын

Modern data workloads come in all shapes and sizes - numbers, strings, JSONs, images, whole PDF textbooks and more. To process this data we still rely on utilities such as: ffmpeg for videos, jq for JSON and Pytorch for tensors. However, these tools were not built for large-scale ETL. This means that we often need to build bespoke data pipelines that orchestrate data movement and custom tooling. If only downloading images, resizing them and running vision models was as simple as extracting a substring in SparkSQL! Daft (www.getdaft.io) is a next-generation distributed query engine built on Python and Rust. It provides a familiar dataframe interface for easy and performant processing of multimodal data at scale. Join us as we demonstrate how to build a multimodal data lakehouse using Daft on your existing infrastructure (S3, DeltaLake, Databricks and Spark).
Talk By: Jay Chia, Co-Founder, Eventual Computing
Here’s more to explore:
Big Book of Data Engineering: 2nd Edition: dbricks.co/3Xp...
The Data Team's Guide to the Databricks Lakehouse Platform: dbricks.co/46n...
Connect with us: Website: databricks.com
Twitter: / databricks
LinkedIn: / data…
Instagram: / databricksinc
Facebook: / databricksinc

Пікірлер
How to Improve LLMs with RAG (Overview + Python Code)
21:41
Shaw Talebi
Рет қаралды 65 М.
«Кім тапқыр?» бағдарламасы
00:16
Balapan TV
Рет қаралды 293 М.
Это было очень близко...
00:10
Аришнев
Рет қаралды 1,6 МЛН
Who’s the Real Dad Doll Squid? Can You Guess in 60 Seconds? | Roblox 3D
00:34
Inside Out 2: ENVY & DISGUST STOLE JOY's DRINKS!!
00:32
AnythingAlexia
Рет қаралды 18 МЛН
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 219 М.
Multi-modal RAG: Chat with Docs containing Images
17:40
Prompt Engineering
Рет қаралды 22 М.
Building Iceberg native applications in simple Python (Eventual)
41:03
How to Create Databricks Workflows (new features explained)
37:58
Bryan Cafferky
Рет қаралды 15 М.
vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024
56:09
Neural Magic
Рет қаралды 1,1 М.
SQL Databases with Pandas and Python - A Complete Guide
16:59
Rob Mulla
Рет қаралды 130 М.
Snowflake vs. Databricks: A deep dive
59:09
SELECT
Рет қаралды 10 М.
«Кім тапқыр?» бағдарламасы
00:16
Balapan TV
Рет қаралды 293 М.