Using readxl and dplyr to format messy data to see change in poverty with R (CC335)

Рет қаралды 1,133

Күн бұрын

Пікірлер: 16

@muhammedhadedy4570 12 күн бұрын

Oh Allah. This video alone worth tons of paid courses. I really don't know how to thank you. I appreciate your work, my dear professor. Greetings from Egypt. ❤❤❤❤

@Riffomonas 12 күн бұрын

Fantastic! Glad it was useful 🤓

@borinsroy8992 12 күн бұрын

At 16:44 use mutate(across(-name, as.numeric))

@ahmed007Jaber 11 күн бұрын

Nicely executed, Pat If i were to do it, i would approach it differently I would use regex to extract the \\d{4} as year Then fill down NAs Then skip the first couple of tows Then mutate(across(colname:colname,as.numeric)) Then rename

@ahmed007Jaber 11 күн бұрын

Last step would be to promot first row as headers after skipping the top The interesting wonder would be, how would you approach annotating peaks and bottoms in the line. Dynamic annotating so that whatever changes it updates

@tedhermann3424 15 күн бұрын

Great video! I think what you wanted for converting all your columns to numeric was the across function. e.g., mutate(across(total:percent, as.numeric). You can use it with summarize as well. Also, FYI, the code in your linked blog post looks to be from your gapminder episode! Any thoughts on your next series? The targets package or tidymodels could be interesting.

@Riffomonas 15 күн бұрын

Thanks for the across tip! I'll keep tidymodels in mind for the future

@PeperazziTube 15 күн бұрын

One small point of pedantic nitpicking: taking the average poverty rate of all states is not the average national poverty rate, as the population of states varies by 2 orders of magnitude. The original data has the population data by state/year, so a national average could be calculated by data %>% summarize(pct_national = sum(in_poverty)/sum(population, .by = year)

@Riffomonas 15 күн бұрын

You're of course correct - thanks for catching this! When I used code like yours it doesn't appear that the line moves meaningfully from what I had in the video. Well done 🤓

@PhilippusCesena 14 күн бұрын

Thanks for the very useful video, unfortunately we often find ourselves having to deal with datasets that have been collected in a rather unorganized manner.

@Riffomonas 14 күн бұрын

There used to be a hashtag .... #otherpeoplesdata that cataloged some of the more humorous challenges🤓

@fabianhellmold9331 15 күн бұрын

Another great video. Your plots have helped me a lot for the visualization of a master thesis. When using lineend = “round”, I noticed that the keys in the legend change strangely. Any tips on how to fix this?

@Riffomonas 15 күн бұрын

Thanks! Hmmm, I'm not seeing that. If I do the following it looks ok... library(tidyverse) library(gapminder) gapminder %>% filter(country %in% c("India", "Afghanistan")) %>% ggplot(aes(x = year, y = lifeExp, color = country)) + geom_line(lineend = "round", linewidth = 2)

@fabianhellmold9331 14 күн бұрын

@@Riffomonas In my example, I work simultaneously with geom_line and geom_segment, which each have different color groupings. Lineend=“round” draws lines in the keys, which then extend to the left and right. To stay with your code: library(tidyverse) library(gapminder) gapminder %>% filter(country %in% c("India", "Pakistan")) %>% ggplot(aes(x = year, y = lifeExp, color = country)) + geom_segment(aes(y = gdpPercap/10, xend = year, yend = 0, color = factor(gdpPercap > mean(gdpPercap))), linewidth = 4.8, alpha = 1) + geom_line(lineend = "round", linewidth = 2)

@Riffomonas 14 күн бұрын

@@fabianhellmold9331 Hey - I'm not seeing a difference if lineend="round" or not. It looks like the four values of country have two differeent line widths. If you want to simplify the legend to only have one linewidth, you could add this to the end of your code... + scale_color_discrete(guide = guide_legend(override.aes = list(linewidth = 1)))

@fabianhellmold9331 14 күн бұрын

@@Riffomonas Thankts allot! That actually improved my Legend :)