DuckDB vs Pandas vs Polars For Python devs

  Рет қаралды 17,958

MotherDuck

MotherDuck

Күн бұрын

Пікірлер: 20
@Shawn-cr8ep
@Shawn-cr8ep Жыл бұрын
DuckDB is the most underused and underrated Python library. I started using it a couple weeks ago and I'm blown away by the efficiency increase over Pandas. Plus SQL is easier and it forces you to think I'm vectorized operations rather than being tempted by Pandas built in loop methods that are super slow
@porlando12
@porlando12 Жыл бұрын
I appreciate the nods to the R community going on in here. Great video!
@matej6418
@matej6418 10 ай бұрын
all 5 of them.
@MrRubix94
@MrRubix94 Жыл бұрын
Well I had just started to learn Polars, but your video and another one comparing DuckDB and Polars are making me doubt my choice… DuckDB seems MUCH faster. Besides, SQL knowledge can be leveraged for everything. Why one would use pandas or polars over DuckDB? Am I missing something?
@mehdio
@mehdio Жыл бұрын
I understand the doubt :) Apart from features there is the debate about DataFrame vs SQL approach. While both Polars and DuckDB support DataFrame & SQL, DuckDB is primary designed to interface through SQL. So if your a SQL lover, DuckDB is a no brainer. Polars has also a SQL interface but it's a pretty recent.
@MrRubix94
@MrRubix94 Жыл бұрын
@@mehdio Hum, I’m not really a SQL lover, I just want to use what works best as a data scientist. Manipulating a DataFrame is really convenient when exploring data. Maybe DuckDB + Polars? But I like simplicity, I would rather use one tool only. Choices, choices…
@incremental_failure
@incremental_failure Жыл бұрын
Same here. Just finished a rewrite from Pandas to Polars and it's already out of date. Although I'll likely be using Polars for the in-memory stuff and DuckDB for out-of-memory persistent data. The differences in speed are not gigantic if you consider the bigger picture and Polars development is very active, they are getting faster with every minor version.
@armeyavaidya3464
@armeyavaidya3464 Жыл бұрын
Polars is best for continuous operation on columns, Also it doesn't support indices so can't do (I at some point and j at some point)
@incremental_failure
@incremental_failure Жыл бұрын
@@armeyavaidya3464 Indexes can be simulated, using a column as an index.
@Emotekofficial
@Emotekofficial 11 ай бұрын
How about DUCKDB and SQLALCHEMY? Do they shake hands? Can I do ORM like this?
@motherduckdb
@motherduckdb 11 ай бұрын
yep, here’s MotherDuck instructions for it: motherduck.com/docs/integrations/sqlalchemy (though also works with vanilla OSS duckdb, with driver linked from there)
@kpyoutuber4671
@kpyoutuber4671 7 ай бұрын
Thank you, for this valuable content!!. Can you also explain the parquet dataset? I used to create partitioned Parquet datasets by using Pandas and Polars. But I want to know how to read data from such partitioned parquet datasets directly to Polars lazy frame format (not to pandas as data size is larger than memory) to do some analytics. import polars as pl import pyarrow.parquet as pq # Read data written to parquet dataset pq_df = pq.read_table(r"C:\Users\test_pl", schema=pd_df_schema, ) pl_df = pl.from_pandas(pq_df.to_pandas()).lazy() Is there any better way to do this
@motherduckdb
@motherduckdb 6 ай бұрын
As per polars documentation, docs.pola.rs/py-polars/html/reference/api/polars.scan_pyarrow_dataset.html#polars.scan_pyarrow_dataset You can use scan_pyarrow_dataset() to read from partitioned datasets.
@user-fv1576
@user-fv1576 2 ай бұрын
Is DuckDb a query language, a real db like sqlite or both?
@motherduckdb
@motherduckdb 2 ай бұрын
It's a real DB like sqlite! But it innovates a lot around SQL, read more here : duckdb.org/2022/05/04/friendlier-sql.html
@HitAndMissLab
@HitAndMissLab 2 ай бұрын
what about DuckDB vs Dask?
@allthingsdata
@allthingsdata 5 ай бұрын
I guess I'm stating the obvious but for anyone who doesn't use SQL for data operations DuckDB is second class. And I surely do not like to use SQL for transformations and such.
@tmb8807
@tmb8807 3 күн бұрын
I agree. DuckDB seems great for what it is but I find method chaining and the expression syntax of Polars much less cognitively demanding than SQL. But then I don't have a ton of experience with SQL so I'm not used to thinking in the way it requires.
@JOHNSMITH-ve3rq
@JOHNSMITH-ve3rq Жыл бұрын
SQLite is faster yo
@shogun8-9
@shogun8-9 Жыл бұрын
not for analysis. SQLite is OLTB, not OLAP.
DuckDB Tutorial For Beginners
11:25
MotherDuck
Рет қаралды 36 М.
DuckDB in Python - The Next Pandas Killer?
19:32
NeuralNine
Рет қаралды 30 М.
Which One Is The Best - From Small To Giant #katebrush #shorts
00:17
Amazing Parenting Hacks! 👶✨ #ParentingTips #LifeHacks
00:18
Snack Chat
Рет қаралды 22 МЛН
iPhone or Chocolate??
00:16
Hungry FAM
Рет қаралды 36 МЛН
АЗАРТНИК 4 |СЕЗОН 3 Серия
30:50
Inter Production
Рет қаралды 1 МЛН
The Home Server I've Been Wanting
18:14
Hardware Haven
Рет қаралды 12 М.
Software Interview Question Patterns You Should Know
13:38
Jose Tapizquent
Рет қаралды 415
Why should you care about DuckDB? ft. Mihai Bojin
14:35
MotherDuck
Рет қаралды 10 М.
Why I chose Python & Polars for Data Analysis
24:33
John Watson Rooney
Рет қаралды 7 М.
Introduction to Scaling Analytics Using DuckDB with Python
29:33
Bryan Cafferky
Рет қаралды 2,4 М.
The Journey of DuckDB Spatial Extension
53:02
MotherDuck
Рет қаралды 670
The Fast and the Frugal, MPA with HTML in FastHTML
17:29
Draupner Data
Рет қаралды 193
Polars is the Pandas killer / Igor Mintz (Viz.ai)
21:46
PyData
Рет қаралды 5 М.
Which One Is The Best - From Small To Giant #katebrush #shorts
00:17