DuckDB versus Pandas: What's Left on the Table?

  Рет қаралды 6,665

Phillip in the Cloud

Phillip in the Cloud

Күн бұрын

In this video I'll show some performance comparisons of duckdb and pandas on basic data cleaning task. We'll see that DuckDB comes close to saturating compute resulting in shorter time-to-insight than pandas.
Link to array functions: • Custom Array Functions...

Пікірлер: 22
@xaviernogueira
@xaviernogueira 9 ай бұрын
Great video. Subbed! Curious what you use for your terminal to get the nicr colors?
@cpcloud
@cpcloud 9 ай бұрын
I'm using Alacritty and starship!
@holandacaua643
@holandacaua643 Жыл бұрын
Does this approach also work using the 'mssql' backend ?
@cpcloud
@cpcloud Жыл бұрын
You can certainly use MSSQL with ibis, but the examples are designed to work with duckdb because it's easy to set up and get started with.
@holandacaua643
@holandacaua643 Жыл бұрын
@@cpcloud thanks
@kefahissa
@kefahissa 7 ай бұрын
Nice. How did you manage to configure ipython to produce tables in that pretty way?
@cpcloud
@cpcloud 3 ай бұрын
We're using a library called rich for this!
@aakoss
@aakoss 8 ай бұрын
How does it compare to Polars?
@cpcloud
@cpcloud 8 ай бұрын
Good question! Perhaps a Polars comparison is in order!
@cpcloud
@cpcloud 7 ай бұрын
Stay tuned, a blog post about this is on the way!
@aakoss
@aakoss 7 ай бұрын
​@@cpcloudawesome, looking forward to it
@bFix
@bFix 9 ай бұрын
2.8 times slower on execution, but you shouldn't forget load times. and memory usage (but you looked at that already)
@cpcloud
@cpcloud 9 ай бұрын
What load times are you referring to?
@bFix
@bFix 9 ай бұрын
df = t.execute() where pandas loaded the db into ram. or am I misunderstanding, what pandas/ibis did there? around 14:00
@cpcloud
@cpcloud 8 ай бұрын
Pandas loads the entire dataset into memory. DuckDB operates in a streaming fashion. If you have enough RAM, it might behave similarly with respect to memory as pandas. If you're RAM-limited, then it will work, whereas pandas will simply fail.
@bFix
@bFix 8 ай бұрын
well yes, but loading it into ram takes additional time. so pandas is even slower Sure if you execute many operations in pandas the initial overhead for loading data in memory gets less (per operation), but not every script does that many operations.
@walterdiaz2003
@walterdiaz2003 Жыл бұрын
reminds me of slick for scala.
@cpcloud
@cpcloud Жыл бұрын
Nice, hadn't heard of that. Thanks!
@walterdiaz2003
@walterdiaz2003 Жыл бұрын
@@cpcloud I forgot to say, excellent video though. Subscribed!
Accelerating Python Data Analysis with DuckDB
35:05
Michigan Python
Рет қаралды 371
Why should you care about DuckDB? ft. Mihai Bojin
14:35
MotherDuck
Рет қаралды 8 М.
Pleased the disabled person! #shorts
00:43
Dimon Markov
Рет қаралды 27 МЛН
ПРОВЕРИЛ АРБУЗЫ #shorts
00:34
Паша Осадчий
Рет қаралды 7 МЛН
Comparing duckdb and duckplyr to tibbles, data.tables, and data.frames (CC279)
41:14
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 264 М.
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 260 М.
DuckDB-Wasm: Fast Analytical Processing for the Web
6:01
Andre Kohn
Рет қаралды 3,1 М.
DuckDB: Supercharging Your Data Crunching  by Richard Wesley
30:45
Ibis and DuckDB: Bring blazing fast analytics to SQLite
17:10
Phillip in the Cloud
Рет қаралды 1 М.
Why use DuckDB in your data pipelines ft. Niels Claeys
22:26
MotherDuck
Рет қаралды 17 М.
DuckDB vs Pandas vs Polars For Python devs
12:05
MotherDuck
Рет қаралды 15 М.
Что делать если в телефон попала вода?
0:17
Лена Тропоцел
Рет қаралды 3 МЛН
Xiaomi SU-7 Max 2024 - Самый быстрый мобильник
32:11
Клубный сервис
Рет қаралды 523 М.
8 Товаров с Алиэкспресс, о которых ты мог и не знать!
49:47
РасПаковка ДваПаковка
Рет қаралды 163 М.
iPhone 15 Pro в реальной жизни
24:07
HUDAKOV
Рет қаралды 468 М.