Is normalization an acceptable alternative to rarefaction? Nope. (CC190)

Рет қаралды 4,191

Күн бұрын

Пікірлер: 23

@meliora1985 2 жыл бұрын

I'm curious to hear your thoughts about treating microbiome data as compositional data (and the centre-log ratio transformation). Analysing using a compositional data approach does away with the need to rarefy at all. However, it means that Bray-Curtis and Jensen Shannon distances can't be used and we need to use Aitchinson distance instead.

@Riffomonas 2 жыл бұрын

Thanks for asking! I don’t think clr gets over the need to rarefy data… I should test that 🤓 a problem with clr and Jensen Shannon is that you have to add 1 to every thing to avoid dividing by zero or taking the log of zero. That pseudocount has been shown to cause problems in data interpretation

@meliora1985 2 жыл бұрын

@@Riffomonas I would LOVE it if you could test that!

@Riffomonas 2 жыл бұрын

I just ran it with out rarefaction using zCompositions to impute the zeroes and it looks pretty bad. The distances go up as the difference in the number of sequences goes up. Do you know if a vignette where someone has used clr with microbiome data? I want to make sure I’m being fair

@meliora1985 2 жыл бұрын

@@Riffomonas Hi Pat, did you test Bray-Curtis or Aitchinson distance? I'm no expert, just curious- I believe that Bray-Curtis and Jensen Shannon divergence are incompatible with compositional data analysis. The distance measure for comp data is Aitchinson- which is just the Euclidean distance of clr-transformed data. I'm curious whether a reads-distance relationship is bad using unrarefied clr-transformed data with Euclidean distance. Not a vignette, but lots of info from Gloor. "Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol. 2017 Nov 15;8:2224. doi: 10.3389/fmicb.2017.02224. PMID: 29187837; PMCID: PMC5695134." Apologies if you've already seen this. The title has the same vibe as "Waste not, want not: why rarefying microbiome data is inadmissible"

@Riffomonas 2 жыл бұрын

I used atchison with the clr data

@gimanibe 2 жыл бұрын

Very interesting, Pat. Thank you for your time making these videos. I use/used a different approach, that is including the sequencing depth in all my models to take into account of the depth variation across samples. What do you think? Maybe you can discuss in the next video. Best, Gian.

@Riffomonas 2 жыл бұрын

Thanks for watching! I'm not really sure how one would implement that for beta-diversity metrics. For alpha diversity, maybe, but it still seems like it would introduce more questions than it is worth

@gimanibe 2 жыл бұрын

@@Riffomonas Thanks much. I was thinking about adonis() and glm() or other modeling approach one can use for alpha- and beta-diversity.

@Clayke1 2 жыл бұрын

Hi Pat, fantastic to see how you perform magic in R with all the piping and tidyverse tools. Not working on community samples myself, I was wondering whether similar issues (and solutions) would also apply to count experiments that are a bit closer to my field. In essence, RNASeqbor CRISPRiSeq could all be affected by the sequencing depth (or the difference therein) between samples one wants to compare. I guess the differences will be much smaller than in your case (max 10x difference in number of reads) which might be important.

@Riffomonas 2 жыл бұрын

I know there are normalization procedures that work fairly well for transcript data. That’s a little out of my lane. I think the main difference is that microbiome data is far more patchy/sparse than what is seen with transcript data and that causes problems

@benjaminandresleytoncarcam56 2 жыл бұрын

Hi, What do you think of the compositional approach?

@Riffomonas 2 жыл бұрын

Keep watching the series 🤓 It performs pretty horribly relative to rarefaction

@sven9r 2 жыл бұрын

Your PhDs must be the wisest. Amazing what one can learn from you! Do you give ecology lectures at UM?

@Riffomonas 2 жыл бұрын

You're too kind! I used to teach a microbial ecology course, but really enjoy teaching R a lot more. That's what I've been teaching the last few years

@sven9r 2 жыл бұрын

@@Riffomonas Usually we are using is the GUnifrac package to rarefy the data to the smallest sum of the reads in the samples. SRS does the same, right? Im still kinda confused by your smallest_group var. Is it the number of reads oder numbers of OTUs?

@Riffomonas 2 жыл бұрын

Smallest_group is the number of sequences in the smallest sample. SRS normalizes the samples to have the same number of sequences