Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

  Рет қаралды 7,923

R Consortium

R Consortium

Күн бұрын

Пікірлер: 18
@tomfenn4
@tomfenn4 2 жыл бұрын
Really useful presentation, and timely for me. Personally I find data.table statements are greatly improved with just a little whitespace.
@prashanthb6521
@prashanthb6521 4 күн бұрын
Thanks a lot sir for this introductory video about Arrow.
@musicspinner
@musicspinner Жыл бұрын
Masterful deployment of the "Kobayashi Maru" reference. 🖖
@carvalhoribeiro
@carvalhoribeiro 3 ай бұрын
Very good presentation. Thanks for sharing this
@tdawry
@tdawry 8 ай бұрын
A neat question to answer. I'm using the duckplyr library and it's nice to not have to think about anything. It does make a strong argument for having a fast hard drive (an SSD is an order of magnitude faster than a traditional HDD, an M2 is an order of magnitude faster than that, and modern nvme drives are even faster).
@winspyre
@winspyre 3 ай бұрын
Wow. perfectly narrated.
@higgi13425
@higgi13425 2 жыл бұрын
For further learning, here are the links from the next to last slide: Arrow cheatsheet: raw.githubusercontent.com/rstudio/cheatsheets/master/arrow.pdf video intro: kzbin.info/www/bejne/hWWVfYijf7-DrpI full workshop from useR!: arrow-user2022.netlify.app DuckDB website: duckdb.org R package: cran.r-project.org/web/packages/duckdb/index.html data.table website: rdatatable.gitlab.io/data.table dtplyr (a data.table translator): dtplyr.tidyverse.org
@tmuffly1
@tmuffly1 9 ай бұрын
This talk blew my mind. Thank you very much!
@matthewson8917
@matthewson8917 Жыл бұрын
Perfectly summarizes my big data journey. Really good!
@VictorOrdu
@VictorOrdu 2 жыл бұрын
Wow, thank you for this illuminating presentation.
@gueyenono
@gueyenono 2 жыл бұрын
Great presentation.
@torbjornstorli2880
@torbjornstorli2880 Жыл бұрын
Loved your presentation. Well done Sir!😊
@JohnoScott
@JohnoScott 2 жыл бұрын
Great talk. Concise and to the point.
@porlando12
@porlando12 Жыл бұрын
Excellent presentation!
@multitaskprueba1
@multitaskprueba1 8 ай бұрын
You are a genius! Fantastic video! Thanks!
@ZachRenwickData
@ZachRenwickData Жыл бұрын
great video and interesting analysis use case!
@arunabhbarua1924
@arunabhbarua1924 6 ай бұрын
How about just using duckdb and SQL?
@higgi13425
@higgi13425 21 күн бұрын
Certainly reasonable for the SQL-fluent. The {duckplyr} package (which came out after this talk) has made it easier for the dplyr-fluent to use DuckDB.
Doing More with Data: An Introduction to Arrow for R Users
26:49
Voltron Data
Рет қаралды 4,1 М.
Big Data is Dead | MotherDuck
25:58
Data Council
Рет қаралды 14 М.
"Идеальное" преступление
0:39
Кик Брейнс
Рет қаралды 1,4 МЛН
ВЛОГ ДИАНА В ТУРЦИИ
1:31:22
Lady Diana VLOG
Рет қаралды 1,2 МЛН
Air Sigma Girl #sigma
0:32
Jin and Hattie
Рет қаралды 45 МЛН
진짜✅ 아님 가짜❌???
0:21
승비니 Seungbini
Рет қаралды 10 МЛН
Why should you care about DuckDB? ft. Mihai Bojin
14:35
MotherDuck
Рет қаралды 12 М.
Querying 100 Billion Rows using SQL, 7 TB in a single table
9:07
Arpit Agrawal (Elastiq.AI)
Рет қаралды 55 М.
Database Sharding and Partitioning
23:53
Arpit Bhayani
Рет қаралды 108 М.
7 Database Design Mistakes to Avoid (With Solutions)
11:29
Database Star
Рет қаралды 95 М.
Hannes Mühleisen - DuckDB, an in-process analytical DBMS
1:32:44
Lander Analytics
Рет қаралды 8 М.
DuckDB: Supercharging Your Data Crunching  by Richard Wesley
30:45
DuckDB An Embeddable Analytical Database
16:19
FOSDEM
Рет қаралды 10 М.
"Идеальное" преступление
0:39
Кик Брейнс
Рет қаралды 1,4 МЛН