Si is a user on wandering.shop. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
Si @silicon

I really want to have a deep conversation with someone about PCA but a) few around me find this a thrilling topic and b) I'm still in the process of fully understanding the mathematical underpinnings.

This is really a / thing. I see lots of people using PCA on gene expression data. But fundamentally, don't the features input into PCA NEED to be uncorrelated?

This has always confused me, since expression data is NEVER uncorrelated. But it's everywhere!

I might be wrong about the uncorrelated thing too?

@silicon I don't think that's correct that they need to be uncorrelated, yeah. That said, there is definitely some abuse of PCA around--but I am not sure how much it matters when the purpose is purely visualization or handwavy "look, it forms clusters". See also t-SNE, which has zero interpretability, on purpose.

@eskay8 Thanks! Yeah I can't 100% remember who told me/where I read they couldn't be correlated but I definitely need to look it up properly.

I was wondering because in single cell I've seen people give quite a lot of importance to the loadings in the 1st PC (as a way of finding highly variable genes).

In general there are so many wildly different methods in single cell RNAseq and it's hard to know which are valid, especially since I'm pretty new to it. See also: scRNAseq normalization!

@silicon Yeah I know some people doing single cell stuff and it seems like a bear to analyze.