I've found amazon GPU instances to be really expensive (even the spot prices have been high recently), especially if you need it for longer runs for deep learning. The other issue is that the additional layers of virtualization create bandwidth overhead issues.
I'd like to see something in the cloud thats bare-metal / full access to GPUs (Maybe a good idea to start one). For scaling higher with a very large number of GPUs, you'd need Infiniband but at some point there is going to be a bandwidth tradeoff.
It would be interesting if someone could run some benchmarks of these instances versus a physical server.
I did the analysis about 9 months ago for my team when the 980 ti's came out, and the AWS pricing was expensive (we built a server that paid for itself in 2 weeks compared to a g2.8xlarge). This is largely because the 980 ti is actually a ridiculously good deal for price/performance.
The bigger problem we ran into is that AWS instances use a lot of small GPUs, which don't scale well using a lot of deep neural network tools (e.g. theano). It was never really a viable option for us.
Somebody should make a startup that allows people to sell access to their computers by the minute. Like spot instances in the cloud ... in people's basements. The true sharing economy.
Why not just move forward to realizing some sort of distributed/decentralized internet?
Something like a combination of Freenet, TOR, BOINC, blockchain etc. technologies, using the current "legacy" internet as a backbone, where anyone can voluntarily offer their computing and storage resources to the network at varying levels of participation.
Say you could offer your laptop as a simple discovery/directory node to simply help others connect and find stuff, and your desktop as either a static-content serving node or as a computation node that can host distributed applications, like SETI@home or web apps like Facebook.
Maybe even reward cryptocurrency to those who offer the most resources.
There is Gridcoin which is a cryptocurrency that rewards for BOINC projects.
I wouldn't be surprised if certain services became more centralized, offloading computing to the cloud. Consumer grade devices would become thin clients (like back in the day). NVIDIA has hinted in this direction, I can remember something about GaaS (Gaming as a Service).
Various companies have tried this. What you end up with is 99% of your workforce is compromised machines, and law enforcement at your door every other day asking where checks are being mailed.
What if you limited people to writing in a domain specific language: one that ran distributed on this infrastructure? How would that make it different than folding at home, for example?
"“Monthly Uptime Percentage” is calculated by subtracting from 100% the percentage of minutes during the month in which Amazon EC2 or Amazon EBS, as applicable, was in the state of “Region Unavailable.” Monthly Uptime Percentage measurements exclude downtime resulting directly or indirectly from any Amazon EC2 SLA Exclusion (defined below)."
That's a regional outage, they provide no SLA for individual instances:
Amazon EC2 SLA Exclusions... (v) that result from failures of individual instances or volumes not attributable to Region Unavailability
Presumably if you're using a cloud hosted in people's basements, if someone's basement server dies, you'd just pick one from someone else's basement, so this model could provide better availability than AWS.
That's right, you can install on your own GPU servers. Infiniband+RDMA transport is also supported which typically doubles the number of GPUs you can scale to.
We're adding support for other clouds, particularly ones with higher-end GPUs so feedback like this is good to know.
That's some really cool tech.
It seems like it's Linux only. Is there windows support planned?
That would solve the problem with wanting to run code on the GPU within a Linux VM while the host is windows.
That's great! Reading the documentation it seems there is no support for multiple clients and multiple GPUs (Many-To-Many), is there anything planned on that side?
Nice. The doc at https://bitfusionio.readme.io/docs/bitfusion-boost is a bit misleading with the possible configurations. Maybe add one configuration with multiple Boost Clients(CPU) and many Boost Servers(GPU)
At first I thought this was the same problem as automatically breaking up apps to run in multiple cpus. This problem has been heavily researched with no success.
Is it the fact that GPU code already runs in parallel streams that makes this possible?
Yes, your app would have to support multiple GPUs. What's done here is remoting CUDA/OpenCL/etc. calls so that remote GPUs can be accessed from a single instance. When performing device/platform enumeration, all GPUs appear to be directly connected to a single instance -- hence no change to the application required.
Sounds like Plan9's concept of "CPU server mounts" has been reborn as "GPU server mounts." Could actually get traction this time, given that existing multi-GPU programs will Just Work.
I cant wait for company to provide opencl/cuda mflops as a service instead of giving you vms as a whole, so one could just attach remote engine to any smallish controller vm
What you suggest is technically possible by installing our Boost software on any GPU machine, and then accessing that machine from any clients running our Boost software as well. That client does not need to have a GPU. This configuration is supported in AWS today, where for example you can connect one or more t2.large isntance to a g2.8xlarge. All that would have to be done is some metering on the GPU machine to implement the service you suggest :)
We are not limiting our software to AWS so you can built this kind of service on any kind of cluster by installing our software directly from https://boost.bitfusion.io - I say cluster, because we have played with the idea of thin devices accessing remote GPU instances in the cloud, but over public networks the network performance was a limiting factor.
Also that gpu have little bandwith toward main memory and most cuda patterns involve loading, computing and retrieving which works as well remotely as locally, as opposed to programs accessing memory at random time all over their memory space the gpu memory is neatly organized in texture areas and the programming paradigm already entails moving them in and out the device as few times as possible
Yes! Whenever you spin up one of our AMIs, there is a README that will guide you through a couple of simple examples. We are about to publish performance results on the monster machines in a few days, so watch out for it. Scaling depends on the compute density of the GPU workload, but in general we've seen pretty good results with 1) Deep learning (caffe) scaling to 16 GPUs (near native scaling with local GPUs, especially deep nets), 2) Raytracing of photo-realistic and complex scenes - near linear scaling with increasing GPUs, and 3) Physical modeling and simulation does very well too.
We've only done cursory evaluation of NAMD scaling. We saw a 7X improvement going from a non-GPU system to remote GPUs located in a different datacenter over shared 10g. We're not sure if that was with a representative dataset (MD is not our skill set), so if you can help us with a case study we'd be excited to work with you. Please do contact me.
No yet, but it is on our roadmap. We have had several customer inquire about it. Drop us a note on our site and I will ping you when it becomes available.
I'm talking about time-sharing. It doesn't mater if it's smaller instances sharing a single GPU instance or many instances sharing many GPU instances. Essentially N:M sharing (with some scheduling).
Since the GPU client is now abstracted from the GPU devices by placing the GPUs across the network. It seams like time-sharing should be next logical step.
Got it, this is actually already supported. At the very end of the blog post there is a link to create a custom configuration. You can create any N:M configuration, that is any number of clients to servers and therefore the level of performance scaling or GPU pooling.
It goes nicely with the "supercomputing to the masses" mission. Especially when the alternative is buying lots of machines and installing all the required software manually.
I'd like to see something in the cloud thats bare-metal / full access to GPUs (Maybe a good idea to start one). For scaling higher with a very large number of GPUs, you'd need Infiniband but at some point there is going to be a bandwidth tradeoff.
It would be interesting if someone could run some benchmarks of these instances versus a physical server.