Combine Multiple AWS Instances into a 16-GPU Monster Machine

manav · on March 31, 2016

I've found amazon GPU instances to be really expensive (even the spot prices have been high recently), especially if you need it for longer runs for deep learning. The other issue is that the additional layers of virtualization create bandwidth overhead issues.

I'd like to see something in the cloud thats bare-metal / full access to GPUs (Maybe a good idea to start one). For scaling higher with a very large number of GPUs, you'd need Infiniband but at some point there is going to be a bandwidth tradeoff.

It would be interesting if someone could run some benchmarks of these instances versus a physical server.

rm999 · on April 1, 2016

I did the analysis about 9 months ago for my team when the 980 ti's came out, and the AWS pricing was expensive (we built a server that paid for itself in 2 weeks compared to a g2.8xlarge). This is largely because the 980 ti is actually a ridiculously good deal for price/performance.

The bigger problem we ran into is that AWS instances use a lot of small GPUs, which don't scale well using a lot of deep neural network tools (e.g. theano). It was never really a viable option for us.

ant6n · on March 31, 2016

Somebody should make a startup that allows people to sell access to their computers by the minute. Like spot instances in the cloud ... in people's basements. The true sharing economy.

Razengan · on April 1, 2016

Why not just move forward to realizing some sort of distributed/decentralized internet?

Something like a combination of Freenet, TOR, BOINC, blockchain etc. technologies, using the current "legacy" internet as a backbone, where anyone can voluntarily offer their computing and storage resources to the network at varying levels of participation.

Say you could offer your laptop as a simple discovery/directory node to simply help others connect and find stuff, and your desktop as either a static-content serving node or as a computation node that can host distributed applications, like SETI@home or web apps like Facebook.

Maybe even reward cryptocurrency to those who offer the most resources.

manav · on April 1, 2016

There is Gridcoin which is a cryptocurrency that rewards for BOINC projects.

I wouldn't be surprised if certain services became more centralized, offloading computing to the cloud. Consumer grade devices would become thin clients (like back in the day). NVIDIA has hinted in this direction, I can remember something about GaaS (Gaming as a Service).

dsl · on April 1, 2016

Various companies have tried this. What you end up with is 99% of your workforce is compromised machines, and law enforcement at your door every other day asking where checks are being mailed.

eximius · on March 31, 2016

That sounds like a nightmare. You'd have zero uptime guarantees!

_clhx · on April 1, 2016

Would you?

What if you limited people to writing in a domain specific language: one that ran distributed on this infrastructure? How would that make it different than folding at home, for example?

ant6n · on April 1, 2016

Like Amazon spot instances. People follow habits, so you can make uptime predictions.

Johnny555 · on April 1, 2016

Does Amazon have any instance uptime guarantees?

joombaga · on April 1, 2016

Yes, >99.95% Monthly Uptime Percentage.

"“Monthly Uptime Percentage” is calculated by subtracting from 100% the percentage of minutes during the month in which Amazon EC2 or Amazon EBS, as applicable, was in the state of “Region Unavailable.” Monthly Uptime Percentage measurements exclude downtime resulting directly or indirectly from any Amazon EC2 SLA Exclusion (defined below)."

https://aws.amazon.com/ec2/sla/

Johnny555 · on April 1, 2016

That's a regional outage, they provide no SLA for individual instances:

Amazon EC2 SLA Exclusions... (v) that result from failures of individual instances or volumes not attributable to Region Unavailability

Presumably if you're using a cloud hosted in people's basements, if someone's basement server dies, you'd just pick one from someone else's basement, so this model could provide better availability than AWS.

manav · on March 31, 2016

IO is the bottleneck there for most applications, but there are some massively distributed grid computing projects that run on BOINC like Seti@HOME.

schhibber · on March 31, 2016

Looks like you can install their software on baremetal GPUs too: boost.bitfusion.io. Doesn't say if they have support for Infiniband though.

mtweak · on March 31, 2016

That's right, you can install on your own GPU servers. Infiniband+RDMA transport is also supported which typically doubles the number of GPUs you can scale to.

We're adding support for other clouds, particularly ones with higher-end GPUs so feedback like this is good to know.

3legcat · on April 1, 2016

Softlayer has something like it:

http://www.softlayer.com/gpu%20

TheGuyWhoCodes · on March 31, 2016

That's some really cool tech. It seems like it's Linux only. Is there windows support planned? That would solve the problem with wanting to run code on the GPU within a Linux VM while the host is windows.

mtweak · on March 31, 2016

Windows support is coming in mid-April, stay tuned!

TheGuyWhoCodes · on March 31, 2016

That's great! Reading the documentation it seems there is no support for multiple clients and multiple GPUs (Many-To-Many), is there anything planned on that side?

mtweak · on March 31, 2016

You can absolutely do that. That's actually one of the more interesting configurations: the ability to pool GPU systems.

Just go to the custom link at the bottom of the page, the link is: https://console.aws.amazon.com/cloudformation/home?region=us...

There you can select any number of clients and servers. For example: 5 clients and 1 server (many to one), or 5 clients to 5 servers (many to many).

TheGuyWhoCodes · on March 31, 2016

Nice. The doc at https://bitfusionio.readme.io/docs/bitfusion-boost is a bit misleading with the possible configurations. Maybe add one configuration with multiple Boost Clients(CPU) and many Boost Servers(GPU)

mchahn · on March 31, 2016

At first I thought this was the same problem as automatically breaking up apps to run in multiple cpus. This problem has been heavily researched with no success.

Is it the fact that GPU code already runs in parallel streams that makes this possible?

mtweak · on March 31, 2016

Yes, your app would have to support multiple GPUs. What's done here is remoting CUDA/OpenCL/etc. calls so that remote GPUs can be accessed from a single instance. When performing device/platform enumeration, all GPUs appear to be directly connected to a single instance -- hence no change to the application required.

derefr · on March 31, 2016

Sounds like Plan9's concept of "CPU server mounts" has been reborn as "GPU server mounts." Could actually get traction this time, given that existing multi-GPU programs will Just Work.

avereveard · on March 31, 2016

I cant wait for company to provide opencl/cuda mflops as a service instead of giving you vms as a whole, so one could just attach remote engine to any smallish controller vm

mbajkowski · on April 1, 2016

What you suggest is technically possible by installing our Boost software on any GPU machine, and then accessing that machine from any clients running our Boost software as well. That client does not need to have a GPU. This configuration is supported in AWS today, where for example you can connect one or more t2.large isntance to a g2.8xlarge. All that would have to be done is some metering on the GPU machine to implement the service you suggest :)

We are not limiting our software to AWS so you can built this kind of service on any kind of cluster by installing our software directly from https://boost.bitfusion.io - I say cluster, because we have played with the idea of thin devices accessing remote GPU instances in the cloud, but over public networks the network performance was a limiting factor.

avereveard · on March 31, 2016

Also that gpu have little bandwith toward main memory and most cuda patterns involve loading, computing and retrieving which works as well remotely as locally, as opposed to programs accessing memory at random time all over their memory space the gpu memory is neatly organized in texture areas and the programming paradigm already entails moving them in and out the device as few times as possible

minimaxir · on March 31, 2016

Are there benchmarks/code examples for the Monster Machines?

mtweak · on March 31, 2016

Yes! Whenever you spin up one of our AMIs, there is a README that will guide you through a couple of simple examples. We are about to publish performance results on the monster machines in a few days, so watch out for it. Scaling depends on the compute density of the GPU workload, but in general we've seen pretty good results with 1) Deep learning (caffe) scaling to 16 GPUs (near native scaling with local GPUs, especially deep nets), 2) Raytracing of photo-realistic and complex scenes - near linear scaling with increasing GPUs, and 3) Physical modeling and simulation does very well too.

semi-extrinsic · on March 31, 2016

Have you done any molecular dynamics benchmarks? If so, what kind/what system? I'd be very interested to see those.

If you haven't, I could probably contribute some strong and weak scaling testcases.

mtweak · on March 31, 2016

We've only done cursory evaluation of NAMD scaling. We saw a 7X improvement going from a non-GPU system to remote GPUs located in a different datacenter over shared 10g. We're not sure if that was with a representative dataset (MD is not our skill set), so if you can help us with a case study we'd be excited to work with you. Please do contact me.

semi-extrinsic · on March 31, 2016

I sent you a message with some more info and my email via the contact form at bitfusion.io

yankoff · on March 31, 2016

Do you have a support for spot instances?

mbajkowski · on March 31, 2016

No yet, but it is on our roadmap. We have had several customer inquire about it. Drop us a note on our site and I will ping you when it becomes available.

mtanski · on April 1, 2016

Congrats guys this is a really neat hack, really impressive.

Did you guys think about building it further out to provide a GPU load balancer for multiple frontend machines running Cuda / OpenCL?

mtweak · on April 1, 2016

Hmm, can you elaborate? Do you mean having multiple smaller instances talk to a single GPU instance?

mtanski · on April 1, 2016

I'm talking about time-sharing. It doesn't mater if it's smaller instances sharing a single GPU instance or many instances sharing many GPU instances. Essentially N:M sharing (with some scheduling).

Since the GPU client is now abstracted from the GPU devices by placing the GPUs across the network. It seams like time-sharing should be next logical step.

mtweak · on April 1, 2016

Got it, this is actually already supported. At the very end of the blog post there is a link to create a custom configuration. You can create any N:M configuration, that is any number of clients to servers and therefore the level of performance scaling or GPU pooling.

Check it out: https://console.aws.amazon.com/cloudformation/home?region=us...

mtanski · on April 1, 2016

Great, thanks for clearing that up.

vessenes · on March 31, 2016

This is really cool; publishing an AMI seems like such a good win for you guys; configuration is done, you get paid as customers use it.

Hopefully you'll see some good uptake.

Noughmad · on March 31, 2016

It goes nicely with the "supercomputing to the masses" mission. Especially when the alternative is buying lots of machines and installing all the required software manually.

skamma77 · on April 1, 2016

Congrats Bitfusion Team. This is really exciting!

homero · on April 1, 2016

You can do this 10x cheaper at home

man5quid · on April 1, 2016

Congratulations it's awesome to see the AMIs published.

flamethrower · on April 1, 2016

When will Bitfusion be available for Google and Microsoft?