I'm curious to hear your thoughts about treating microbiome data as compositional data (and the centre-log ratio transformation). Analysing using a compositional data approach does away with the need to rarefy at all. However, it means that Bray-Curtis and Jensen Shannon distances can't be used and we need to use Aitchinson distance instead.
@Riffomonas2 жыл бұрын
Thanks for asking! I don’t think clr gets over the need to rarefy data… I should test that 🤓 a problem with clr and Jensen Shannon is that you have to add 1 to every thing to avoid dividing by zero or taking the log of zero. That pseudocount has been shown to cause problems in data interpretation
@meliora19852 жыл бұрын
@@Riffomonas I would LOVE it if you could test that!
@Riffomonas2 жыл бұрын
I just ran it with out rarefaction using zCompositions to impute the zeroes and it looks pretty bad. The distances go up as the difference in the number of sequences goes up. Do you know if a vignette where someone has used clr with microbiome data? I want to make sure I’m being fair
@meliora19852 жыл бұрын
@@Riffomonas Hi Pat, did you test Bray-Curtis or Aitchinson distance? I'm no expert, just curious- I believe that Bray-Curtis and Jensen Shannon divergence are incompatible with compositional data analysis. The distance measure for comp data is Aitchinson- which is just the Euclidean distance of clr-transformed data. I'm curious whether a reads-distance relationship is bad using unrarefied clr-transformed data with Euclidean distance. Not a vignette, but lots of info from Gloor. "Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134." Apologies if you've already seen this. The title has the same vibe as "Waste not, want not: why rarefying microbiome data is inadmissible"
@Riffomonas2 жыл бұрын
I used atchison with the clr data
@gimanibe2 жыл бұрын
Very interesting, Pat. Thank you for your time making these videos. I use/used a different approach, that is including the sequencing depth in all my models to take into account of the depth variation across samples. What do you think? Maybe you can discuss in the next video. Best, Gian.
@Riffomonas2 жыл бұрын
Thanks for watching! I'm not really sure how one would implement that for beta-diversity metrics. For alpha diversity, maybe, but it still seems like it would introduce more questions than it is worth
@gimanibe2 жыл бұрын
@@Riffomonas Thanks much. I was thinking about adonis() and glm() or other modeling approach one can use for alpha- and beta-diversity.
@Clayke12 жыл бұрын
Hi Pat, fantastic to see how you perform magic in R with all the piping and tidyverse tools. Not working on community samples myself, I was wondering whether similar issues (and solutions) would also apply to count experiments that are a bit closer to my field. In essence, RNASeqbor CRISPRiSeq could all be affected by the sequencing depth (or the difference therein) between samples one wants to compare. I guess the differences will be much smaller than in your case (max 10x difference in number of reads) which might be important.
@Riffomonas2 жыл бұрын
I know there are normalization procedures that work fairly well for transcript data. That’s a little out of my lane. I think the main difference is that microbiome data is far more patchy/sparse than what is seen with transcript data and that causes problems
@benjaminandresleytoncarcam562 жыл бұрын
Hi, What do you think of the compositional approach?
@Riffomonas2 жыл бұрын
Keep watching the series 🤓 It performs pretty horribly relative to rarefaction
@sven9r2 жыл бұрын
Your PhDs must be the wisest. Amazing what one can learn from you! Do you give ecology lectures at UM?
@Riffomonas2 жыл бұрын
You're too kind! I used to teach a microbial ecology course, but really enjoy teaching R a lot more. That's what I've been teaching the last few years
@sven9r2 жыл бұрын
@@Riffomonas Usually we are using is the GUnifrac package to rarefy the data to the smallest sum of the reads in the samples. SRS does the same, right? Im still kinda confused by your smallest_group var. Is it the number of reads oder numbers of OTUs?
@Riffomonas2 жыл бұрын
Smallest_group is the number of sequences in the smallest sample. SRS normalizes the samples to have the same number of sequences
@janarigonato2 жыл бұрын
hello Pat!!! I was wondering if you have published this data!
@Riffomonas2 жыл бұрын
Sorry - I’m working on it but got distracted over the past month. Hopefully it will be up on biorxiv in the next month. Thanks for your interest!