Generating a ROC curve with ggplot2 in R: Balancing the specificity and sensitivity of ASVs (CC058)

Рет қаралды 6,063

Riffomonas Project

Күн бұрын

Пікірлер: 17

@PaoloItalyanca 10 күн бұрын

Your videos are utterly unvaluable!

@Riffomonas 6 күн бұрын

wonderful!

@cdeanj 3 жыл бұрын

Hi Dr. Schloss, thanks for the cool video. I am a little confused about the problem you're trying to solve though. What I like about ASVs is that they are just error-corrected amplicons, with no clustering by sequence similarity, which is reflective of reality. When ASVs are clustered in the way you are proposing, as has been the standard practice over the last several years, you risk creating something (i.e., an OTU) that is not reflective of something in nature. However, bacteria do contain multiple copies of 16S, with varying degrees of heterogeneity, so clustering in the way you are proposing may help obtain more accurate estimates of richness and diversity, but that assumes the entire community was sequenced. These were just some thoughts I had while watching your video. Great video, thanks for sharing :)!

@Riffomonas 3 жыл бұрын

Hi Christopher - thanks for you comment! I'll grant you that ASVs represent an error corrected sequence. But, should that be the unit of inference in microbial ecology studies? Should we be splitting one genome of E. coli into 5 bins? That doesn't make sense to me. We don't have a bacterial species concept or at least we don't have one that can be captured by 16S rRNA gene sequences. Short of a massive improvement in our databases, the best we can do are OTUs. OTUs, with the right definition will make E. coli one bin rather than 5. To me, that makes a lot more sense as a unit of inference than a part of a genome. So then the question becomes, what threshold should we use to define an OTU. And here we are :) I don't follow your comment about needing to assume the entire community be sequenced. We can make relative comparisons of richness, diversity,, etc using OTUs (or ASVs if you'd like). Hope this helps a bit and thanks again for your comments.

@cdeanj 3 жыл бұрын

@@Riffomonas Ignore the comment about sequencing depth, rereading it, I am not sure where I was going with that. Agreed, I think the microbial ecology community needs to continue a discussion around this because there are certainly pros and cons to each approach. With regards to your E. coli example: that's assuming you knew that E. coli was in your sample. Depending on the hypervariable region you sequenced (or combination of), those amplicon(s) may not have enough information content available to make that call, as might be the case for two species (whatever we take that to mean) within the same genus sharing a similar V4 region, for example. In this case, we can both agree that splitting them into separate bins (assuming they are not 100% identical) would be ideal. To be honest, though, I am unfamiliar with how distant some of these regions are among closely related species, so maybe I'm talking about a problem that doesn't exist all that much in reality. Enjoying the debate. I realize that my comments aren't things you haven't thought about yourself, but making them helps advance my knowledge about this fun topic.

@Norainjoe 3 жыл бұрын

This is like trying to steal home, when I'm just trying to put the bat on the ball

@Riffomonas 3 жыл бұрын

I think ichiro will be a first ballot hall of famer. Last I checked Barry is still waiting for a call. Right?! You got this 😂

@FreeDataScientist 2 жыл бұрын

Hello Professor Rifomanas, You asked why R doesn't open to you root directory. You can set the working directory from the sessions tab or by setwd("C:/PATH NAME) An d i believe it will always open thereafter. Blessings

@Riffomonas 2 жыл бұрын

Hi John-Eric - the best way to get things to start in the correct working directory is to create a RStudio project. Then when you double click on that file it will open to that directory directly. Using setwd is widely frowned upon because it often isn't reproducible across computers. Also, try to use relative paths from the project root directory rather than absolute paths from the _computer's_ root

@adrenalinerush2009 Жыл бұрын

why data is missing from GitHub repository I checked all the folders? :/

@Riffomonas 5 ай бұрын

You can find the data and the repository at the time of the video at github.com/SchlossLab/Schloss_rrnAnalysis_mSphere_2021/tree/63c560670c128fade5eeef717c8c8a9e8ff081e2

@forpirate4695 2 жыл бұрын

Hello, Thanks alot for video. But I have a feeling that you look very familiar with Ryan Reynold (Dead Pool)

@Riffomonas 2 жыл бұрын

Ha! I think someone else has said that too. Nope, we aren't related and surprisingly, I've never seen Dead Pool

@louiseweschler1746 5 ай бұрын

Dear Dr. Schloss, Please tell me how to access the data set for this video. I must apologize for this question ~~~ I think I am missing something obvious and wasting your time.

@Riffomonas 5 ай бұрын

You can see what the repository looked like including the data files at this link... github.com/SchlossLab/Schloss_rrnAnalysis_mSphere_2021/tree/63c560670c128fade5eeef717c8c8a9e8ff081e2

@Riffomonas 5 ай бұрын

You can find the data and the repository at the time of the video at github.com/SchlossLab/Schloss_rrnAnalysis_mSphere_2021/tree/63c560670c128fade5eeef717c8c8a9e8ff081e2

@PaoloItalyanca 14 күн бұрын

@@Riffomonas this was so helpful. Thank you so much!