How Artificial Intelligence Is Changing Science

throwawaywego · on June 24, 2019

The article states positive impacts on science, but there are also negative impacts on science. For instance, the hype of AI has caused a brain-drain on related fields (such as cognitive science or applied mathematics). AI research itself suffers from companies buying up the academic talent. And researchers slap AI (which is usually deep learning) on a decade-old problem, without any care for complexity/benchmarks, implementation/usage, and proper validation methods, just to get published or receive funding.

tgb · on June 24, 2019

It's also makes for a lot of terrible talks. In my field (bioinformatics) it's frustratingly common for a PI to give a keynote which boils down to "some grads students made DeepX and got an AUC of 0.85 on this problem and some others made ML-Y and got an AUC of 0.78 on this other problem and a postdoc did this other ML thing." There's no details or insight, basically just a sales pitch for their software package. Nothing in the talk can be transferred to other topics. The only time the talk is useful is if you happen to need to solve problem X and they've got a tool to solve it for you. You couldn't give a talk about a statistics model without explaining the model, but it seems to be totally OK to give talks about ML projects without saying anything more than that it's a deep net or random forest.

AlexCoventry · on June 25, 2019

That kind of bullshit has been happening in bioinformatics from the beginning, though. You can't really put it on deep learning. It's a feasible way to get publications and grants, because no one is ever held to account for it.

I also wish people would stop using AUC, and start using a measure reflecting realistically useful specificities. I don't care if you have 99% sensitivity at 90% specificity.

monocasa · on June 24, 2019

I'm really afraid that ML is mainly just going to become automated p-hacking, and bring about a dark age to much of science. In a publish or perish world, how can you compete with someone with enough budget to set a bunch of models looking for any specious correlations in data sets and publishing what comes out the other end? Like we'll still have great breakthroughs from the top of the field, but a lot of grunt work style studies are going to lose their ability to be trusted.

rramadass · on June 27, 2019

>ML is mainly just going to become automated p-hacking, and bring about a dark age to much of science

It is already happening. A lot of people who can contribute to actual Science are moving into AI/ML field for the money and the industry/media hype are reinforcing this. Everything is "Deep${NONSENSE}" nowadays whether it is relevant or not. As a beginner, when i started to learn NNs, i couldn't get past my initial hurdle on how to validate the results on actual real-world data. What Statistical metrics do i use to "know" that the blackbox is working correctly? What are the assumptions and limitations that i need to be aware of to understand and have faith in the output? Most people don't seem to know or care; it is "magic" to them. In a world awash with data, reckless application of NN models to any and every problem is only going to drown us in spurious results and muddying all Scientific endeavours.

analog31 · on June 25, 2019

Contributing to this problem is the fact that professors are grimly aware of needing to confer some marketable skills on their students. And ML is perceived as being a meal ticket right now.

khawkins · on June 24, 2019

I love how many paper titles nowadays follow the pattern: "Deep-<topic>: <Actual title of the paper>". And often they aren't doing anything "deeper" than a fully-connected multilayer neural network--a machine learning algorithm competitive with SVMs and been around well over a decade.

moultano · on June 24, 2019

That's true, but there's a lot of value in waking people up to the idea that ML works, even if what they're doing has worked for a long time.

There are a lot of situations where before people would have assumed their best option is to carefully tweak a custom statistical model, whereas now they're just happy to throw a black box at it and see what happens. This is as much a cultural change as a technological change, and it's good that it is finally happening. That's what a "paradigm shift" is after all.

mattkrause · on June 25, 2019

But why is "throwing a black box at it" good?

The goal of research is usually to rip those boxes open to figure out what's inside and how it works. Moving away from that towards opaque predictions doesn't make a lot of sense to me, especially when the predictions aren't even that much better. Plus, a lot of this work seems weirdly disconnected from what the rest of the field knows to be (im)plausible.

Obviously, black boxes can be useful tools. DeepLabCut is incredibly helpful and will save a lot of grad students a lot of tedium, and that probably wouldn't happen if it involved a lot of tuning. Predictions can also be very useful--frankly, we'll take anything we can get for most neuropsych conditions--but mechanisms and targets for intervention are so much more useful. I know there is some work on this, but it's drown out by the 0.99AUC!!1!! (in a small, cherrypicked group) stuff.

hadsed · on June 25, 2019

Black boxes are better than nothing. The ultimate black box is the universe, where experimental scientists can fiddle to try and understand. They give a great starting point to make progress, if that's what you want, and if not (like in some commerical applications) you have something that works (ish).

gubbrora · on June 24, 2019

I think something is lost when doing this. I'd bet the researcher who first builds a model and then reaches for ml will outperform the researcher who goes straight for ml.

Building a custom model will help with feature selection. It will provide a baseline to compare the ml model to which can help debug problem points of the ml model. And finally it serves as a sanity check that you aren't leaving a lot of performance on the table.

sansnomme · on June 25, 2019

More money is always good for researchers. At the end of the day, being paid more is the free market doing resource allocation when basic research in other fields isn't being appropriately subsidized.

mr_overalls · on June 25, 2019

In classical economic theory, one pre-condition for efficient market allocation of resources is accurate information providing a basis for rational levels of investment.

If AI is subject to crazes, with investors as a whole drastically overestimating its potential, then it's certainly possible to over-allocate capital (human and otherwise) to it in the hopes of a payoff.

Consider the Dutch tulip mania of the 1630s. Imagine if it had lasted a bit longer, long enough for promising scientists and scholars of every type to be trained solely to optimize the growth of tulips.

This allocation of capital would provide a benefit to tulip investors for as long as the craze lasted, but would prove to be a detriment to society once the craze ended.

https://en.wikipedia.org/wiki/Tulip_mania

sansnomme · on June 25, 2019

To be fair, the connectionist variety of ML is extremely compatible with the majority of the hard sciences (Linear Alg, Calculus, not really much CS/discrete math if you think about it). A physics/rigorous CogSci background prepares you just as well as CS for most of the interesting AI stuff. The fact that AI and CS in general has such a low barrier of entry is something to be celebrated. Be glad that our domain do not suffer from the gatekeeping in Medicine and Law.

mav3rick · on June 25, 2019

Lol what about CS then ? So many STEM students go for CS rather than pure sciences. You can't not have new fields because other fields may suffer.

robertAngst · on June 25, 2019

Companies don't care what it is called. People are hired to do jobs.

'AI' is just math + programming. Don't overthink it.

emiliobumachar · on June 24, 2019

Here's my crackpot idea, in case anyone out there is willing and qualified to put in the hard work:

Start with a detailed model of the solar system. Make a million copies of it. In each copy, insert a planet in a random orbit, with random mass. Measure the orbits of everything, perturbed by the new planet.

Feed the measurements of everything, except the new planet, to an A.I., and have it estimate the position of the new planet. Give it feedback on how accurate is was. Repeat a million times. It should learn to pinpoint ninth planets in solar systems like ours, from the perturbations on orbits of known bodies.

Then feed it the real measurement history from the real Solar System. It should output the location of Planet Nine.

https://en.wikipedia.org/wiki/Planet_Nine

antognini · on June 25, 2019

As someone who used to work on the dynamics of few body systems this idea actually isn't too crazy. But there isn't really any need for AI (neural networks wouldn't be necessary). Things like deep learning are really useful when you don't have a good model for the underlying reality (coming up with a principled model of what cats look like is really hard), but in the case of planetary dynamics the physics is well understood and can be modeled explicitly.

But the underlying idea of searching for statistical perturbations to known orbits to find new objects is a good one! In fact, Mike Brown and Konsntantin Batygin did just this a few years back. They argued that perturbations to the orbits of objects in the Kuiper belt suggested that there is a planet with about the mass of Neptune somewhere out there:

https://ui.adsabs.harvard.edu/abs/2016AJ....151...22B/abstra...

This object hasn't been found yet, but it could still be out there!

nitwit005 · on June 25, 2019

Here's a good talk from around the same date that I watched previously: https://www.youtube.com/watch?v=CMCwezegPNg

balabaster · on June 25, 2019

Wasn’t this concept the basis of the Netflix show Salvation where they found the asteroid plummeting on a collision course with earth?

starpilot · on June 24, 2019

Orbital equations are straightforward and deterministic, I am wondering why an AI would be needed for this? You could solve explicitly.

opportune · on June 24, 2019

Well, an unperturbed orbit is straightforward and deterministic. But three+ body problems (which I believe the original comment is describing) do not have closed form solutions and are general simulated: https://en.wikipedia.org/wiki/Three-body_problem.

The reason planet nine is suspected to exist is due to the commonalities in the orbits of trans-neptunian objects. That is, there appears to be a large gravitational influence on TNOs that causes the distribution of their orbits to exhibit irregularities that don't make sense with only two factors influencing their orbits.

analog31 · on June 25, 2019

Indeed, and it's worth generalizing that the vast majority of contemporary problems do not yield themselves to closed form solutions. We are constantly finding numerical solutions.

If you went to college when I did, first of all, you're eligible to join the AARP, and second, the problems that you studied were overwhelmingly solved in closed form. This was true in math as well, both in college and at the K-12 level.

Now, are ML algorithms worth adding to our tool belt of numerical methods? Oh, probably. When the fad is over, some useful applications will remain. We already use regression a lot, and that's a primitive form of ML.

jackpirate · on June 24, 2019

Just to clarify, the GP is describing what is essentially the n body problem in physics [1]. There is in general no closed form solution for solving n-body problems, but and so numeric solutions are generally required.

[1] https://en.wikipedia.org/wiki/N-body_problem

alexmlamb2 · on June 25, 2019

It's kind of an inverse problem. Once you put the planets in position you can (reasonably easily?) simulate their oribts with some noise. The idea is that the neural network would try to quickly approximate which planet configurations could produce those stable orbits.

crimsonalucard · on June 24, 2019

Also one perturbed configuration actually has multiple solutions.

deepnotderp · on June 25, 2019

I think he's talking about many body problems.

pinouchon · on June 25, 2019

Sounds like a good fit for probabilistic programming, much like Stuart Russell did here https://www.youtube.com/watch?v=GYQrNfSmQ0M&feature=youtu.be.... His method could find locations of nuclear tests better than the existing UN system at the time by a lot. His model uses bayesian inference in a stochastic/symbolic/simplified/statistic model of physics (how shockwaves propagate on the surface of the earth).

You could do the same here: assuming the relevant laws (kepler? newton law of gravity?), and a prior distribution on the location/mass of your 9th planet, given what we observe for the other planets, what's the posterior distribution on the mass/location of the 9th planet.

The statistical model is likely to be small (the Russell statistical model for Nukes fits on one slide). The issue is how to do inference efficiently. Fortunately, probabilistic systems have come a long way and can do these kind of inferences.

ChrisFoster · on June 25, 2019

In principle this is actually quite a reasonable idea and is a common pattern of many types of physical measurement. Often we have a detailed and accurate physical model of the forward dynamics of a system, given some system parameters, but we can't measure these parameters directly. Instead, we measure some data in a "sensor domain" and we'd like to map it back to the physical parameters.

This setup is known as an "inverse problem" and is often ill conditioned / singular or very complex, therefore requiring some regularization in the form of prior knowledge. Treating the inverse problem as a regression problem (given these observations in sensor domain, predict the state of the system) with a neural network as the regressor is one way of attacking these problems and is becoming very successful in some areas, for example MRI reconstructions, eg https://www.biorxiv.org/content/10.1101/278036v1. In this case you are adding the regularization / priors by constructing the training data with a physical model.

I think this kind of approach is interesting because it scales to input and output spaces with high dimensionality. However, it's not exactly clear to me what kind of estimate such a regressor provides (is it kind of like doing maximum likelihood?)

From a more standard statistical point of view, you'd like to estimate the full probability distribution over system parameters. In this case, the orbital elements and mass of the unknown bodies. Because this inference problem has relatively low dimensionality (I think?) you might do better to treat it as a problem of Bayesian inference and sample it using MCMC. Then you'd have a rigorous way to understand the uncertainty of the estimates and also to attack the problem of "unknown number of bodies" in a systematic way.

TuringNYC · on June 24, 2019

So...the 8 body problem? https://en.m.wikipedia.org/wiki/Three-body_problem

tomrod · on June 25, 2019

Often called the N body problem, but the issue occurs when N>=3, so calling it "the 3 body problem" usually sufficiently identifies it in the literature.

Great question, and I am happy to have my 1 in 10k time today! (Nice being on the informing end for once).

https://xkcd.com/1053/

terminalhealth · on June 25, 2019

This seems related: https://www.researchgate.net/publication/11955388_Solving_N_...

c3534l · on June 25, 2019

It sounds like you've invented a crappy version of MCTS[0].

[0] https://en.wikipedia.org/wiki/Monte_Carlo_tree_search

stanfordkid · on June 25, 2019

MCTS estimates a score for each action by stochastically sampling the future states. It has nothing to do with a problem such as this with continuous inputs and outputs.

hadsed · on June 25, 2019

The negative comments here are disappointing. ML is a fantastic tool for science, where it can propose a model that works as a starting point for getting to a model that works AND that you can understand.

This is quite common in physics, for example, where people are happy to build elaborate experiments just to poke at the universe in weird ways. An ML algorithm is a theorist's particle accelerator where they can treat it as something to be explored to gain insight.

The reason people are pissed about this is that we're doing this breadth-first, because the incentives make it that way. People are right to be concerned if we never get back to deeper analyses, but I'm not at all concerned.

At some point the low hanging fruit will be gone and every scientific community will be better off having these new results. As we get better at probing the black box, and we will because there's a lot of value behind doing so, we will start to shift back to the deeper questions.

cheez · on June 25, 2019

It can't really propose a model that is understandable by most human means but I agree that it can find new relations that we can explore.

ngcc_hk · on June 24, 2019

Very odd and may I say wrong article. The basic about AI provide a breakthrough is ok. But I would not call newton law as simulation.

The real development is verbal non-maths theory. Not simulation. In fact this kind of theory go first. If Aristotle etc. said ... sun must be revolving about earth. Some basic maths (geometry and algebra).

Observation is the second approach. And a breakthrough. Kepler and later Galilei watching juipter’s Moon.

Then maths as a tool. Not just simple verbal theory. Calculus, non-Euclid geometry, wave mechanics, ... to these days physics is nothing but maths like. The sad thing about social science is only data. Only economic has some maths. Still data and verbal theory.

Then computer provide data analysis tool as well as simulation and visualisation. This is an aid more.

Then data as a tool. And the new breakthrough is AI helping to suggest models.

Theory, Mathematical model, Data observation (to disprove, to hint, to post question and to generate theory based on pattern)

then Various tool to assist above including AI.

yters · on June 24, 2019

Ever more algorithms and models and data, ever less understanding and scientific theories.

Soon, instead of "theory of gravity" we'll have "generative DNN of science papers and grant writing" that no one will understand, but can generate papers that pass peer review and earn grants and pull in all the monies, effectively monopolizing and halting all government funded scientific progress.

Meanwhile, actual science will continue on in the amateur ranks, from which has always come the true breakthroughs.

thrwayxyz · on June 24, 2019

[[citation needed]]

What has been an amateur science breakthrough in the last century or two which didn't have at it's base some billions of dollars of government funding.

AnimalMuppet · on June 25, 2019

The Special Theory of Relativity, from some moonlighting patent clerk.

nradclif · on June 25, 2019

That was over a century ago (although within the “century or two” limit specified). But it’s not at all characteristic of the majority of scientific discoveries made in the last century, which were largely made by professional scientists and grad students on their way to becoming professionals. The list of Nobel Prizes in various sciences over the past century I think demonstrates this.

cheez · on June 25, 2019

Maybe government money has squeezed out innovation a la Einstein.

teekert · on June 25, 2019

In my field of science the only thing that changed is that people started calling algorithms ai. For marketing purposes.

EGreg · on June 24, 2019

Can we use this to make an app and figure out the optimal diets for everyone? Or a GAN for generating the funniest jokes?

jacquesm · on June 24, 2019

> Or a GAN for generating the funniest jokes?

You really want to be careful with that, Monty Python made an excellent documentary about the weaponization of such high grades of humor and the results, to put it mildly, weren't funny.

EGreg · on June 24, 2019

Sorry what? Can you offer us a link?

D-Coder · on June 25, 2019

More than you want to know: https://en.wikipedia.org/wiki/The_Funniest_Joke_in_the_World

Just the right amount: https://www.youtube.com/watch?v=_yo9WHrTvks

stcredzero · on June 24, 2019

Monty Python made an excellent documentary about the weaponization of such high grades of humor and the results, to put it mildly, weren't funny.

It was a skit, and it was funny. Not their best work.

gwern · on June 25, 2019

Speaking of diet: https://www.nytimes.com/2019/06/10/health/nutrition-diet-gen... https://www.nytimes.com/2019/05/08/science/precision-medicin... https://www.gwern.net/docs/longevity/2019-rose.pdf

Diet response is so genetically confounded that I think it's going to be a while before you can make any sort of confident prediction, and just plain hard to predict even when you're using identical twins. Probably more leverage in figuring out how to make continuous glucose monitors more feasible to measure individual response directly.

wpasc · on June 24, 2019

For something like optimal diets for everyone, I imagine it's more of an issue with incomplete understanding of diet, microbiome, genetics, epigenetics, etc. Once that's known, I'd venture to say you don't need AI.

But a GAN where the discriminator is determining if a joke is made by an AI or a human might be pretty cool :)

btrettel · on June 24, 2019

> For something like optimal diets for everyone, I imagine it's more of an issue with incomplete understanding of diet, microbiome, genetics, epigenetics, etc. Once that's known, I'd venture to say you don't need AI.

Even if the data exists, doesn't mean the AI folks would use it. In my research (a particular subfield of fluid dynamics), the machine learning/AI papers/talks always seem to have incomplete or even bad data, as if how advanced their algorithm is makes up for that. (I don't think they actually believe that. I think they just have bad habits.) I've published pretty good linear regressions of a much larger data compilation and received much less attention, despite the fact that my linear regressions are probably more accurate than the ML models...

mattkrause · on June 24, 2019

Yup. For example, look at this paper from Google: https://www.nature.com/articles/s41746-018-0029-1

The whole thing is about how to build these fancy networks, and it created a fair bit of buzz. Table S1, in the supplement, however shows that it's rather pointless.

The deep model has an AUC (95% CI) of [0.94, 0.96] for in-patient mortality. The "full feature-enhanced" logistic regression baseline has an AUC of [0.92, 0.95]. Same pattern for 30 remission. Length of stay is the only one that's not overlapping, and it just squeaks that out: [0.86, 0.87] for the deep model vs. [0.84, 0.85] for the baseline.

ImaCake · on June 24, 2019

Yup. As an outsider looking in on bioinformatics, a lot of the datasets I have played with have some real glaring problems. They often have small amounts of data or some serious bias. A naive ML approach on a lot of sequencing data will probably be undermined by the bias of the sequencing machine and the researcher who processed the biological materials.

robertAngst · on June 25, 2019

I want to do linear algebra on this data

https://efficiencyiseverything.com/food-nutrition-per-dollar...

EDIT: Direct link to the data https://efficiencyiseverything.com/data/Nutrition%20Per%20Do...

jl2718 · on June 25, 2019

George Dantzig invented Simplex for this problem.

jl2718 · on June 25, 2019

The optimal diet problem was the first problem ever solved in the field of operations research. Get ready for some navy beans.

readhn · on June 25, 2019

AI is only as good as the programmer who created it. We are really in the stone age when it comes to AI.