Nice visualization! This provides me an opportunity to go on a random tangent on...

jonchang · on May 23, 2017

I always used Jackson's broken-stick method for determining the number of dimensions to retain.

http://onlinelibrary.wiley.com/doi/10.2307/1939574/abstract

larrydag · on May 24, 2017

Frank Harrell demonstrates a way to use PCA to reduce dimensionality with regression modeling strategies. His Course Notes (pdf) is a good reference point for multiple strategies on regression.

http://biostat.mc.vanderbilt.edu/wiki/Main/RmS

autokad · on May 24, 2017

you can look at it in terms of reconstruction error

alexcnwy · on May 24, 2017

Yeah PCA will give you eigenvalues of the PCs in descending order of variance explained so just summing and weighting those tells you the first 3 PCs explain say 93% of the variance in the data.

Houshalter · on May 24, 2017

But the question is when exactly to stop, because the reconstruction error is always going to get lower. The bayesian solution, IIUC, is something like "stop when information required to store the new PCs is more than the information gained from the reduced reconstruction error."

The only issue with this is, if you get tons of data then there will be less uncertainty in the principal components. And so it will recommend as many as possible, even if they only decrease the reconstruction error a tiny bit.

neumann · on May 24, 2017

This is the standard way when looking for reduced order models in fluid mechanics.

Variations on this: i) How 'faithfully' does it represent the data, eg, how many modes (components) are needed to resolve accuracy in a particular metric, or the entire system ii) What is the cut-off component number which has a signal of order of the measurement uncertainty.

JorgeGT · on May 24, 2017

He, I just happened to have MATLAB on my other screen performing a PCA of a CFD simulation. I got bored of watching the progress bar and decided to browse HN a bit. I guess I cannot escape...