Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> may not be a fair comparison to newer hardware like the A100

in fairness most of those entries use 4-8 V100s, to OP's single GPU. while the A100 is more powerful, I think just the "on a single GPU" framing is valuable



I commented in the parent comment addressing this too, sorry that I topic-leaked! I'm cruising in my personal ML sabbatical on savings, so I'm sorta money-incentivized to be as thrifty as possible. Hence as noted before, right now I'm just at $50 a month!

I'm hoping this research is valuable to people in other areas, too. The concepts about order-of-operations, information flow, scaling, information-efficiency-at-high-throughputs, etc I think are applicable anywhere, given the right contexts. Though I have some sneaking huge suspicions that many of these laws (like the traditional scaling laws) only start popping up in importance and becoming more relevant as the ideally efficient architecture families are slowly approached through iterative optimization.


How much does training this cost?

The top on DAWNBench a few years ago was $0.02, but that was a single V100 and their best time was 45s on 8*V100. No idea how much the 10s (top time) cost to run, but it was also 8*V100.


I think it's maybe something like 13.8 'credits' an hour on Colab, and you get 500 credits for $50 straight up, or $50 a month (I'm truly a sucker for simple flat pricing schemes with a natural cap on them, it's good for the overzealous network trainer's/developer's wallet! :D). So that's like, I dunno, $1.38 per hour for an A100 basically guaranteed (not bad at all! And the H100 is coming soon, I'd assume! :D)

If training takes ~9.91-9.96 seconds, and we ignore everything else in the process (assuming we have some kind of strange Elvish magical computers that don't require any spinup of any sorts)... then that's (9.91 to 9.96)/60/60 * 1.38 = 0.0037988 - 0.0038180 dollars per run, or .37988 - .38180 cents per run. The full setup including install from clone, data download, and network init, I'd estimate being lower bounded at maybe 1.2-1.3 cents per run or so with a good internet connection (but I'm not entirely sure about that! D:). Upper bound for a reasonably fast machine I think would be no more than 2 cents, clean start to finish for a single training run. Multiples for best-of (maybe not the safest idea), or better yet -- simple ensembling of the EMA-ed models could be upper bounded at likely no more than ~4 cents or so for 5 models, if I'm doing my math correctly.

That said, the 'cents' calculation likely I think is .37988 - .38180 cents in this case.

What's weird is that that does seem a bit steep considering it's 8 V100s for 45 seconds, and those were...pretty pricy at that time, I think? So maybe something is horribly wrong with my math! D:

Hope that helps, great question and many thanks for the question, happy to answer any follow-up questions you might have. This is a very interesting line of inquiry, and I haven't yet spent enough time developing it yet! :D


It's a standard benchmark on DAWNBench- you should submit! Seems like you'll easily top the leaderboard in multiple categories.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: