Companies spend a lot of money on AI because they have a lot of money and don't know what to do with it. Companies lack creativity and an appetite for riskier and more creative ideas. That is what Universities must do instead of trying to ape companies. The human brain doesn't use a billion dollars in compute power, figure out what it is doing.
Sort of by definition, it can never be too costly to be creative. Only too timid. And too unimaginative.
+1 on calling this BS, even though I think it is only partly BS:
While it is true that training very large language models is very expensive, pre-trained models + transfer learning allows interesting NLP work on a budget. For many types of deep learning a single computer with a fast and large memory GPU is enough.
It is easy to under appreciate the importance of having a lot of human time to think, be creative, and try things out. I admit that new model architecture research is helped by AutoML, like AdaNet, etc. and being able to run many experiments in parallel becomes important.
Teams that make breakthroughs can provide lots of human time, in addition to compute resources.
There is another cost besides compute that favors companies: being able to pay very large salaries for top tier researchers, much more than what universities can pay.
To me the end goal of what I have been working on since the 1980s is flexible general AI, and I don’t think we will get there with deep learning as it is now. I am in my 60s and I hope to see much more progress in my lifetime, but I expect we will need to catch several more “waves” of new technology like DL before we get there.
> The human brain doesn't use a billion dollars in compute power, figure out what it is doing.
This may not be true, if we’re talking about computers reaching general intelligence parity with the human brain.
Latest estimates place the computational capacity of the human brain at somewhere between 10^15 to 10^28 FLOPS[1]. The worlds fastest supercomputer[2] reaches a peak of 2 * 10^17 FLOPS, and it cost $325 million[3].
To realistically reach 10^28 FLOPS today is simply not possible at all: If we projected linearly from above, the dollar cost would be $16 quintillion (1.625 * 10^19 dollars).
So, when it comes to trying to replicate human intelligence in today’s machines, we can only hope the 10^15 FLOPS estimates are more accurate than the 10^28 FLOPS ones — but until we do replicate human level general intelligence, it’s very difficult to prove which projection will be correct (an error bar spanning 13 orders of magnitude is not a very precise estimate).
P.S. Of course, if Moore’s law continues for a few more decades, even 10^28 FLOPS will be commonplace and cheap. Personally, I am very excited for such a future, because then achieving AGI will not be contingent on having millions or billions of dollars. Rather, it will depend on a few creative/innovative leaps in algorithm design — which could come from anyone, anywhere.
The dirty secret though is that AI isn't doing anything nearly comparable to what a whole human brain is doing. It's performing the functions of perhaps a small subset of NNs in the brain or maybe the equivalent of what a small rodent's brain is capable of doing. Obstacle avoidance, route planning, categorizing objects in vision, even language related functions are more of a mapping than an understanding. I think the point still stands that we're making very inefficient use of the hardware that we have. Universities need to be smarter about this and figure out how such a limited network of squishy cells do all the things they do, that's the whole point of concentrating smart people in an environment where they're given the freedom to pursue ideas without worrying about whether or not it generates a profit. You learn the 'how' and the 'why' rather than just the 'what makes money'.
Most AI is doing overly complicated versions of plain old decision trees or just pattern matching.
Or in the worst cases they are presenting one thing, and really just relying on hundreds or thousands of people in Bangalore to pore through the data sets and tag and categorize.
"Dirty secret" is a weird term for something practitioners and researchers are trying to tell anyone who will listen. It's a dirty secret on the marketing side.
Of course, and we’re already seeing incredible results, even from size-compressed deep neural networks running on custom acceleration hardware embedded now in most major smartphones.
I was simply responding to the parent post’s false claim (”The human brain doesn’t use a billion dollars in compute power, figure out what it is doing.”), in isolation from the rest of the post (which I generally agree with).
The higher numbers there (eg. 10²⁸) are irrelevant. It claims a single neuron is operating at 10¹⁶ operations per second—that is, as much computation is happening within a single neuron as is the sum of all computation between neurons in the whole brain!
Bostrom's estimate of 10¹⁷ is much, much more reasonable.
Note that this is still a number biased in favour of the brain, since for the brain you are measuring each internal operation in an almost fixed-function circuit, and for Summit you are measuring freeform semantic operations that result from billions of internal transitions. A similar fixed-function measure of a single large modern CPU gives about 10¹⁷ ops/s as well; the major difference is that a single large modern CPU is running a much smaller amount of hardware many times faster, and uses binary rather than analogue operations.
While I agree that 10^17 seems like a more accurate number, don’t forget that each neuron contains ~10^5 synapses which all process at timescales of 10s of microseconds. This gives you an additional factor of 10^9
The 10¹⁷ includes that factor. 10¹⁰ neurons by 10⁵ synapses by 10² Hz. It seems unlikely to me that the meaningful temporal resolution is going to be 1000x that of the firing rate, but if you want to add a factor of 10 or so I wouldn't object.
More than interaction, it's a living organisation in the service of life.
Damasio has a very interesting take on how emotions, consciousness etc. emerge from the the way the brain & body together process information from the "external" world.
I also think this is kinda ridiculous. If anything I feel like the big consumer tech companies are disadvantaged because they need to deploy something that can scale to a billion people. They can only spend pennies per customer because the margins are so low. Sure they will have a research team for marketing purposes but when it comes to deployments they aren't doing anything too fancy.
The work coming out of companies with higher profit margins per customer are doing much more novel work from what I have seen.
All this is to say, I don't see universities getting shut out anytime soon. The necessary compute to contribute is pretty cheap and most universities either have a free cluster for students or are operating with large grants to pay for compute (or both).
I don't think you have the right numbers in mind talking about the compute you need for AI. The prices are getting lower and lower of course, but you still need tons of money to train the kind of networks that make the news.
I don't think you have the right numbers in mind talking about the networks used in academic works. The majority of network used in publications are good old references like VGG
Example? I have yet to see something actually deployed by one of the big tech companies that could not be trained by students on a university cluster. I also think you underestimate grant funding. I worked at a state school a couple years back in a research lab that had over a million dollars in grant funds specifically for equipment and outside compute (not for salaries or new hires) and this is not at all abnormal.
State of the art CV models (image, not video) can cost 3-figure dollars per training run.
State of the art language models can cost 5-figure dollars per training run.
There are a lot of variable in play here so your mileage will definitely vary (how much data, how long are you willing to wait, do you really need to train from scratch, etc) and these should only be considered very rough ballpark numbers. However, those are real numbers for SotA models on gold-standard benchmark datasets using cost-optimized cloud ML training resources.
At 5-figures per training run, the list of people who can be innovators in the LM research space is very small (fine-tuning on top of a SotA LM is a different, more affordable matter).
Sure but 3 figure and 5 figure runs certainly do not eliminate universities (see my above comment). Not to mention as I have said, most good universities will have clusters capable of training these that they maintain on premise drastically reducing that cost (and in a worst case just take longer to train).
It really does. You've got to remember that a good SotA paper takes hundreds of training runs, at least.
I can't go into detail about budgets, but suffice to say if you think $1M is a university compute budget that lets you be a competitive research team on the cutting edge, you are __severely__ underestimating the amount of compute that leading corporate researchers are using. Orders of magnitude off.
On-prem is good for a bit until you're 18 months into your 3 year purchase cycle and you're on K80s while the major research leaders are running V100s and TPUs and you can't even fit the SotA model in your GPUs' memories any more.
Longer to train can mean weeks or even months for one experiment - that iteration speed makes it so hard to stay on the cutting edge.
And this is before considering things like neural architecture search and internet scale image/video/speech datasets where costs skyrocket.
The boundary between corporate research and academia is incredibly porous and a big part of that is the cost of research (compute, but also things like data labelling and staffing ML talent).
Your goalposts moved a few figures. Furthermore, $1 million+ was not a university compute budget - that was money for a single lab on campus (at a general state school nonetheless) on a specific project.
You still have yet to provide any concrete sources to back up your claims. We're talking about contributing to research here. If multi-million dollar training jobs are what it takes to be at the cutting edge you should be able to provide ample sources of that claim.
- "Some of the models are so big that even in MILA we can’t run them because we don’t have the infrastructure for that. Only a few companies can run these very big models they’re talking about" [1]. NOTE: MILA is a very good AI research center and, while I don't know too much about him, that person being quoted has great credentials so I would generally trust them.
- "the current version of OpenAI Five has consumed 800 petaflop/s-days" [2].
- Check out the Green AI paper. They have good number on the amount of compute to train a model and you can translate that into numbers.
I'm not an expert in on-prem ML costs, but I know many of the world's best on-prem ML users use the cloud to handle the variability of their workloads so I don't think on-prem is a magic bullet cost wise.
$1M annually per project (vs per lab) isn't bad at all. It's also way out of whack with what I saw when I was doing AI research in academia, but that was pre deep learning revolution, so what do I know.
Re: the moving goalposts - the distinction is between the cost of a training run and the cost of a paper-worth research result. Due to inherent variability, architecture search, hyperparameter search and possibly data cleaning work, the total cost is a couple orders of magnitude more than the cost of a training run (multiple will vary a lot by project and lab).
I understand why you don't trust what I'm saying. I wish I could give hard numbers, but I'm limited in what I can say publicly so this is the best I can do.
What company would pay for this kind of research without expecting some sort of profitability from it? I imagine Facebook is trying to figure out how to brain works, but I hope they don't get very far.
If you are the kind of person that might be able to figure out what the brain is doing, chances are a company is going to pay you better and provide more resources to you than just about any university.
Depends on the resources. Amazon undoubtedly has much more compute, but they’re not exactly known for their wetlab facilities—-and this question undoubtedly needs both.
That said, I think you are right that academia’s structure may need to change. Right now, we’re locked into a model where projects mostly need to be doable by a handful of researchers (almost entirely trainees) in a few years. Other than these time-limited positions, there’s not a lot of room for skilled individual contributors, which seems goofy when tackling such a hard problem.
Boston Dynamics work seems very cool, but I dont see a big market for it until AI makes more progress and the price of such hardware drops significantly. Even then it seems like a novelty.
>The human brain doesn't use a billion dollars in compute power, figure out what it is doing
Phahahaha, this made me a good laugh :) To find out how your brain works guess what are you going to use - the brain itself. It's like trying to cut a knife with itself, or trying to use a weigher to weight itself.
I don't know if we can use anything at all for this. More so I assume that engineering and math are not the right approaches in the same way they are not the right approaches to things like humor, poetry or design.
My conclusion after reading many papers in the Natural Language Processing field (which is now all about machine learning) is that, generally, company papers focus on tweaking pipelines until they have increased the accuracy score. If they have done so they quickly publish this result and leave the analysis of their results for others. (BERT[1] is a prime example of this.) However, I do not agree with the fact that companies lack creativity. If you look at all the wide research currently being undertaken at the big tech companies you will be amazed by their scope. (I found this out by coming up with 'new' ideas during my thesis, only to find out that some researcher at some big tech company was already working on it.)
[1] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Companies spend a lot of money on AI because they have a lot of money and don't know what to do with it. Companies lack creativity and an appetite for riskier and more creative ideas. That is what Universities must do instead of trying to ape companies. The human brain doesn't use a billion dollars in compute power, figure out what it is doing.
Sort of by definition, it can never be too costly to be creative. Only too timid. And too unimaginative.