> For something like optimal diets for everyone, I imagine it's more of an issue with incomplete understanding of diet, microbiome, genetics, epigenetics, etc. Once that's known, I'd venture to say you don't need AI.
Even if the data exists, doesn't mean the AI folks would use it. In my research (a particular subfield of fluid dynamics), the machine learning/AI papers/talks always seem to have incomplete or even bad data, as if how advanced their algorithm is makes up for that. (I don't think they actually believe that. I think they just have bad habits.) I've published pretty good linear regressions of a much larger data compilation and received much less attention, despite the fact that my linear regressions are probably more accurate than the ML models...
The whole thing is about how to build these fancy networks, and it created a fair bit of buzz. Table S1, in the supplement, however shows that it's rather pointless.
The deep model has an AUC (95% CI) of [0.94, 0.96] for in-patient mortality. The "full feature-enhanced" logistic regression baseline has an AUC of [0.92, 0.95]. Same pattern for 30 remission. Length of stay is the only one that's not overlapping, and it just squeaks that out: [0.86, 0.87] for the deep model vs. [0.84, 0.85] for the baseline.
Yup. As an outsider looking in on bioinformatics, a lot of the datasets I have played with have some real glaring problems. They often have small amounts of data or some serious bias. A naive ML approach on a lot of sequencing data will probably be undermined by the bias of the sequencing machine and the researcher who processed the biological materials.
Even if the data exists, doesn't mean the AI folks would use it. In my research (a particular subfield of fluid dynamics), the machine learning/AI papers/talks always seem to have incomplete or even bad data, as if how advanced their algorithm is makes up for that. (I don't think they actually believe that. I think they just have bad habits.) I've published pretty good linear regressions of a much larger data compilation and received much less attention, despite the fact that my linear regressions are probably more accurate than the ML models...