> For something like optimal diets for everyone, I imagine it's more of an issue...

mattkrause · on June 24, 2019

Yup. For example, look at this paper from Google: https://www.nature.com/articles/s41746-018-0029-1

The whole thing is about how to build these fancy networks, and it created a fair bit of buzz. Table S1, in the supplement, however shows that it's rather pointless.

The deep model has an AUC (95% CI) of [0.94, 0.96] for in-patient mortality. The "full feature-enhanced" logistic regression baseline has an AUC of [0.92, 0.95]. Same pattern for 30 remission. Length of stay is the only one that's not overlapping, and it just squeaks that out: [0.86, 0.87] for the deep model vs. [0.84, 0.85] for the baseline.

ImaCake · on June 24, 2019

Yup. As an outsider looking in on bioinformatics, a lot of the datasets I have played with have some real glaring problems. They often have small amounts of data or some serious bias. A naive ML approach on a lot of sequencing data will probably be undermined by the bias of the sequencing machine and the researcher who processed the biological materials.