I love it! There’s at least one more audience suggested video in the queue 🤓
@eric13hill2 жыл бұрын
I really enjoyed this type of content Pat. Seeing different ways to do the same thing opens my mind to other options.
@Riffomonas2 жыл бұрын
Thanks! These have been fun to make
@wilsonsouza3582 Жыл бұрын
It's really important this video. Thanks for to do it! But I would like to see the comparison with data.table library too.
@Riffomonas Жыл бұрын
Thanks for the suggestion - I don't usually use data.table, but I'll keep this in mind if I need to use it at any point 🤓
@johneagle43842 жыл бұрын
Ah... there is something new to learn. Thank you for showing something I was totally oblivious about
@Riffomonas2 жыл бұрын
Thanks for watching John 🤓
@bassamsaleh80342 жыл бұрын
Amazing video, I actually surprised with the benchmarks. I thought the base pipe is faster than the Magritter but that's not true. (sorry Magritter) so I'll keep using the magritter. I learned R lately, and the reason I loved it because of the tidyverse. if someone forced me to use base R, I'd switch to python immediately hahaha.
@Riffomonas2 жыл бұрын
Thanks for watching! I think if we did a different set of operations with different data it’s possible to get different results. context matters a lot with benchmarks
@k1llyah2 жыл бұрын
The benchmarks in the video are a bit of a red herring - for this operation the bottleneck is probably the cor.test function that does way more than what is needed. It would make more sense to optimize that part as pipes are almost never the bottleneck of a program. That said, the base R pipe is not a function, it is resolved by the parser, it simply does less than magrittr, so should always be faster given a fair comparison. You can try for example `f f(y = _)` and `1 %>% f(y = .)`. Note that on the micro and nano time scale it can be surprising that the first expression performs slower simply because it is the first expression in the bench mark, try switching them around a few times. For fair benchmarks, make sure the 1) the operations are minimal, 2) the number of iterations are similar and sufficient, 3) you run them in a fresh R session.
@Riffomonas2 жыл бұрын
Thanks. The context of the benchmarking was in reference to the pipes. I agree that the pipes aren’t the problem. Even if they’re slower they significantly improve readability
@SammanMahmoud2 жыл бұрын
Can you please have one video regarding big data and parallel computing ... Thank you your videos Dr. Pat
@Riffomonas2 жыл бұрын
Thanks for watching! Check out my previous episodes using the furrr package
@julianrozenberg20362 жыл бұрын
Hi Pat! Thank you for making you sessions so entertaining and usefull. Well, I was trying to make a dataframe of 66000 rows and 2500 columns from the text files and encounted two problems: I was using base R and memory overflows keep shutting down R session (i have only 6g available) and being relatively slow. Eventially I managed to do this by the tidyverse map function. This is a typical task and it would be great if you will be interested in making a code club session about it. Thank's
@musicspinner2 жыл бұрын
Hi Julian, I think you could really benefit from using the "arrow" library here. Some videos to introduce you to `library(arrow)`: Danielle Navarro: kzbin.info/www/bejne/hWWVfYijf7-DrpI Neal Richardson: kzbin.info/www/bejne/sH-nXoqgZ72DrMU Helped me manage a similar (> memory data) problem. Good luck. 👌🏽
@Riffomonas2 жыл бұрын
Hi Julian - thanks for watching! R really struggles with wide data frames. I'd encourage you to check out the fread function from the data.table package. I did a video with it about 200 episodes ago ;) kzbin.info/www/bejne/ZnWlhGyejJ5kiqM
@chenxiao3152 жыл бұрын
I think the fact dplyr::filter is slower than base::subset in your example is because of the fixed overhead cost. If the dataset is much larger, filter should be faster than subset.
@Riffomonas2 жыл бұрын
Thanks for watching- I think context is very important and you could get different results with different data or functions. Regardless, for most of us it’s fast no matter what!
@bassamsaleh80342 жыл бұрын
I'm about to learn the arrow package, but it scares me a bit. I'm wondering if you use it before. I noticed that many mentioned in the comments.
@Riffomonas2 жыл бұрын
I haven't done anything with arrow yet, but it looks pretty straightforward for super long data frames. You might also consider checking out the vroom package and fread function from data.table
@s.m.habiburrahaman24432 жыл бұрын
stay on tune with what you want to learn, just because it's hard now, doesn't an it's impossible. It's all about ntal mindset and
@rayflyers2 жыл бұрын
Your most popular episode, you say? I await my royalties check.
@Riffomonas2 жыл бұрын
Hah! Thanks for always watching 🤓
@xballspitzer39272 жыл бұрын
MUCH!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@matthewson891720 күн бұрын
It was surprising that base pipe was generally slower than magrittr pipe
@Riffomonas20 күн бұрын
Thanks for watching! More experimenting with both suggests that it really depends on the context. Any difference is really minimal
@VenSensei2 жыл бұрын
If you want speed and efficiency, you can use collapse and data.table.
@Riffomonas2 жыл бұрын
Yep. I’ve used data.table in an earlier episode and I use it a lot with very wide data frames. This episode was really about the pipes. When I want efficiency I use C++ 😂
@VenSensei2 жыл бұрын
@@Riffomonas It's funny you mention that, because that's how I found your channel; "Writing C++ code in R...".
@haraldurkarlsson11472 жыл бұрын
Pat, This is interesting but there is a trade off in memory allocation. Dplyr-magrittr combo seems to use the least in your case. This might become an issue when dealing large dataset -e.g. maps such as rasters. Any thoughts?
@Riffomonas2 жыл бұрын
Hmmm I forgot to look at the memory performance 😂 I think I would worry about memory only if it was limiting and then I’d try different options. I’m usually more time constrained than memory.
@haraldurkarlsson11472 жыл бұрын
Pat, I agree - think this is only an issue with truly large dataset and even with these huge weather datasets one might barely see a difference. I agree that readability of code is much more important than shaving off a few seconds.
@broderickeleazar2 жыл бұрын
send you the link of it
@musicspinner2 жыл бұрын
Have you used `library(arrow)` yet? Been working with increasingly large datasets... laptop went on strike. Started using arrow. Laptop started cooperating with me again.
@Riffomonas2 жыл бұрын
Hey, if it works, that's all that matters, right? :) You might also consider checking out the vroom and data.table packages. The latter is great for working with really wide data frames.
@JOHNSMITH-ve3rq Жыл бұрын
Now do data.table! 😂😂😂
@Riffomonas Жыл бұрын
HAH! We'll see 🤓
@haraldurkarlsson11472 жыл бұрын
Hi Pat, Could you do a presentation on the echarts4r package? I have tested it a little bit and it seems like a marriage made in heaven between ggplot2 and plotly. It is the best and easiest to use interactive pacakage that I have encountered in my brief life with R. Another neat new package is Quarto - the successor to RMarkdown. H
@Riffomonas2 жыл бұрын
Thanks. I’ll likely do something with quatro. Unfortunately most people aren’t very interested in interactive plots
@haraldurkarlsson11472 жыл бұрын
Pat, You can count me among those that think that in general dynamic/interactive plots are more fluff than substance. However, the eCharts4r package is quite good at generating time series plots such as those you have generate for your climate change series. H @@Riffomonas P
@danielvaulot51052 жыл бұрын
@@Riffomonas Actually, this is not completely true. I think for microbiological data interactive plots where you can zoom in on certain parts of graph for example to get into the detail of a given group or for climatogical data zooming on a period is quite useful. A brief look at eCharts4r is that what it seems to propose.... PS. This pipe comparison is very interesting and I guess could be critical when for example you develop Shiny applications to prevent users to become impatient...