Using dplyr's group_by function with and without summarize (CC233)

Рет қаралды 4,993

Riffomonas Project

Күн бұрын

Пікірлер: 33

@szco9814 2 жыл бұрын

Your video is a gem!

@Riffomonas 2 жыл бұрын

Thanks!🤓

@timmytesla9655 2 жыл бұрын

Wonderful video as usual. Thumbs up.

@Riffomonas 2 жыл бұрын

Thanks Timmy!

@djangoworldwide7925 2 жыл бұрын

Beautiful analysis work.

@Riffomonas 2 жыл бұрын

Thanks! I’m glad people are enjoying it

@dasrotrad 2 жыл бұрын

Pat you have tutorials for all levels, which is fabulous. You are so prolific, unfortunatly, I can't keep up with all you produce. You are amazing. Somehow I missed "riffomonas." Where does that word, "Riffomonas", come from?

@Riffomonas 2 жыл бұрын

Hah! It comes from the idea of riffing in music but riffing on other peoples code. My hope is that people can see how I riff on my own code to do the same for their own purposes. The “omonas” is a common ending for bacteria

@PhilippusCesena 2 жыл бұрын

Excellent job!

@Riffomonas 2 жыл бұрын

Thanks!

@caseyj1144 2 жыл бұрын

I like to save versions of my data after/as I clean it as .RDS files so I can see what I did/reproduce easily later. People asking about organization: Usually I group my projects with /background_info /in_data /out_data /code as separate directories. I don't think there's anything special about those files except that it's organized enough and general enough to be consistent so I easily program and reuse paths across projects. I organized this way originally when reading about reproducible research and data sharing in neuroscience and psychology, so you might want to see if there's something that a group has suggested for your field that you can work within (if you want to data share). If it's a big project I also have a README with dependencies/version info and an RProj with a source.R that auto-opens and runs everything. Thanks Dr. Schloss! Learning a lot here :)

@Riffomonas 2 жыл бұрын

Awesome! My only caution against Rds files is that they limit you to R and they aren’t text files. I prefer to work with csv/tsv files as much as possible

@haraldurkarlsson1147 2 жыл бұрын

I did not see the same trend you see in your data - namely a gradual increase. The curve for my local station (near the southern tip of Lake Michigan) is essential flat. What we may be looking at is the moderating effect of the lake. But I do see the cold October of 1925 (the relative deviation is -7.3) but I am missing measurements for 1917. Interesting stuff - also a good lesson in how to deal with NAs. Please bring more stuff like this with broad appeal and data that is easily and freely obtained. Thanks.

@Riffomonas 2 жыл бұрын

Cool results and insights! 🤓

@haraldurkarlsson1147 7 ай бұрын

Pat, I think I have pointed this out before but the fpp3 package will do most of these things that your are doing with much simpler code. fpp3 was after all designed for time series. I still love your code gymnastics and have watch some of your videos multiple times - each time I learn something new. Thanks!

@haraldurkarlsson1147 7 ай бұрын

Pat, So when you replace the empty spaces with zero was that a form of imputation? Basically replacing missing values?

@r.hainez2131 2 жыл бұрын

could you please explain how you manage your .R files (workflow wise)? And why setwd() is not your favorite?

@Riffomonas 2 жыл бұрын

Using paths in R and why you shouldn't be using setwd (CC179) kzbin.info/www/bejne/iaXUdYyggpuIgtE

@sven9r 2 жыл бұрын

@R.Hainez look for the here package very useful

@szco9814 2 жыл бұрын

Hello Boss! Could you please elaborate why you drop the groups after you group by and summarise. It was so confusing that you said when group by and summarize will remove the grouping to the right. I did not see any change after you drop the groups. The tibble size is 1558*3 which is exactly same size compared to the tibble without drop groups. Thank you sir!

@Riffomonas 2 жыл бұрын

Thanks for watching and for your question! It doesn’t change the size of the tibble only the grouping or structure of the tibble. I remove the groupings because they can mess with downstream processes. If I did another mutate with the data still grouped there could be unintended results

@shadyamigo 2 жыл бұрын

Thank you for another great video. Quick Q What does the ‘group’ argument do in the ggplot aesthetics as you also have Color set to year. Thank you

@Riffomonas 2 жыл бұрын

They group aesthetic here links all the data from the same year together. You could use color=year but then every year would be a different color. Instead I used color=is_this_year to get the two colored figure.

@shadyamigo 2 жыл бұрын

@@Riffomonas thank you

@shadyamigo 2 жыл бұрын

I must have missed the last few minutes when I posted the earlier question. I meant before you added the is_this_year column you already had group= year Color = year so to rephrase my question at that point of the tutorial is the group parameter doing anything in addition to the Color parameter as both are set to year at that point

@Riffomonas 2 жыл бұрын

Right - in this case they do the same thing. I tend to use group for line plots even if it’s redundant with color just to be safe

@dmalarekable 2 жыл бұрын

I still can't wrap my head around the fact why you normalize the temps between 50's and 80's. Shouldn't you normalize between all the years?

@Riffomonas 2 жыл бұрын

Here’s a FAQ describing the idea of the temperature anomaly and why nasa does it this way… data.giss.nasa.gov/gistemp/faq/

@PeperazziTube 2 жыл бұрын

Great stuff. You can make your life easier sometimes by using the %in% operator, e.g normalized_range = year %in% 1951:1980 also gives you TRUE/FALSE indicator and more concise code. The nice thing about the %in% operator is that it works on many datatypes (bools, integers, reals, chars) in both lists and vectors.

@Riffomonas 2 жыл бұрын

Thanks! It’s all a matter of what I remember when I’m under the spotlight of recording 😂