Hybridizer: High-Performance C# on GPUs

vergessenmir · on Dec 18, 2017

I've had to grab my throwaway a/c just because I've worked on this. I've used this framework in a commercial settings, in finance where it is gaining popularity. Have had to work through the bugs, been through several upgrade iterations.

When it works. It works. When it fails you're in for a world of pain. A lot is going on behind the scenes. The data structures, classes need special annotations to carefully translate the c# structures into something that cuda understands. You have to pay special attention to your object hierarchy, you have to be aware of all C# keywords that are supported and not supported. The fields in your class may be misaligned by a few bytes of the wrong annotation is used.

Don't get me started on cuda pointers - you're not shielded from this. My experience with this has put me off cuda frameworks in general.

You're better off learning cuda or hiring a cuda Dev than investing heavily in stacks like this.

My background is in HPC in finance.

AboutTheWhisles · on Dec 18, 2017

Why would someone use C# in the first place? It seems to me that any advantage from the environment has gets lost by trying to make it work on a GPU.

sebazzz · on Dec 18, 2017

I was going to say something similar to this. Usually such abstractions are leaky. I don't know the target market for this product. The one's who know CUDA will just use C++. The one's who know C++ won't use this. The one's who do not know CUDA will probably not use this because they will need to learn CUDA first anyway.

(Same story with those C# to Javascript compilers)

vergessenmir · on Dec 18, 2017

Well C# is a more familiar language for quants and quant devs than cuda.

Very few quants know CUDA. My brother happens to be one of them but he does most of his development now in python/pandas. He gets someone else to do the heavy lifting to target the HPC platform of choice.

Now the motivation behind the layer or framework is to provide greater productivity, remove the step of pairing quants with cuda Devs when productionsing the code.

These are all valid reasons to adopt such frameworks. But in practice the cost of upgrades, strange pointer bugs and delays to production releases don't make it worth it. Also you need the CUDA Dev for situations that the framework fails. If such a framework does become available that removes these pain points, I'd adopt it in a heartbeat

This may vary in other industries other than finance.

sharpercoder · on Dec 18, 2017

Are there structural problems with this approach or is it mainly because of immaturity? Usually version 3 of most software is mature enough to judge it by its merits.

naasking · on Dec 19, 2017

Sounds like an IL analysis could point out all the problematic cases without all of the pain.

Bob2019 · on Dec 18, 2017

I'd take what you have described over Python and C++ easily.

pjmlp · on Dec 18, 2017

I disagree, if people have given up trying to fly on the years of continous crashes, we would never haven gotten mainstream comercial flights.

Many times there is a need of ongoing investment, where things are failing pretty bad, until they actually take off.

kornish · on Dec 18, 2017

Your example doesn’t account for substitutes, though. Before commercial flights, civilians couldn’t fly. Before Hybridizer, programmers could still use CUDA.

pjmlp · on Dec 18, 2017

The point being that it takes time to get the right solution for a problem, and we shouldn't be stuck with C and C++ for GPGPU forever, just because no one has found out the ideal solution for multi-language GPGPU programming.

Just yesterday I saw a documentary about Gotthard Tunnel, also considered impossible to achieve, with high human costs and risk of insolvency until they finally managed it.

John Harrison took almost his entire life to create the first usefull marine chronometer, amid disbelief and issues to get proper funding.

Or if you want to bring it closer to home, very few people believed JavaScript would ever become fast or even leave the browser.

EpicEng · on Dec 18, 2017

Sure, and on the road to flight many bad/impractical ideas were thrown out.

taspeotis · on Dec 18, 2017

There's some interesting stuff in the .NET-on-hardware space, CUDAfy [1] has been around for a while and I recently stumbled upon Hastlayer [2].

[1] https://cudafy.codeplex.com/

[2] https://hastlayer.com/

benaadams · on Dec 18, 2017

Also Alea GPU [1]

[1] http://www.aleagpu.com/release/3_0_4/doc/introduction.html

samuell · on Dec 18, 2017

Yeah, would be great with some comparison to Alea ... optimally a point-to-point comparison.

allisterb · on Dec 18, 2017

also: http://www.ilgpu.net/

MaxBarraclough · on Dec 18, 2017

Ctrl-f for "garbage", "memory", or "allocate", and find nothing.

Am I to assume the point here is to allow the C# programmer to run tight loops of parallel numerical code on the GPU, with minimal adaptation?

They support virtual functions, though.

Neat project either way, but the scope isn't clear.

pjmlp · on Dec 18, 2017

You also don't realy allocate in C or C++ GPU code, beyond what is done before calling it, setting up buffers.

Athas · on Dec 18, 2017

Does this solve some of the core problems with GPU programming, such the difficulty of writing reusable code without significant performance overhead (via nested parallelism or fusion), or the need to follow some potentially awkward rules for performance reasons (struct-of-arrays and coalesced memory access patterns?).

I mean, it is surely nice to easily launch a bunch of threads within a single-source program, but there are already plenty of C++ libraries that let you do this, and it has not really led to an explosion of making efficient GPU programming accessible to the layman.

arianvanp · on Dec 18, 2017

If you want composable kernels in a high level language, I can really recommend http://hackage.haskell.org/package/accelerate

with the CUDA backend: https://hackage.haskell.org/package/accelerate-llvm-ptx

See http://chimera.labs.oreilly.com/books/1230000000929/ch06.htm... for a great introduction

rafinha · on Dec 18, 2017

For "easy gpu programming" look for tensorflow or any of these "run on CPU and GPU" libraries. For anything else there's no magic, need to write kernel code and take care of memory layout.

_dcwr · on Dec 18, 2017

I have not tried it myself and it's limited in scope but the I ran across this language: http://halide-lang.org/

sydd · on Dec 18, 2017

Looks really neat, its a pity that you are hardware-locked so its not usable to general apps like games.

EpicEng · on Dec 18, 2017

What would the use case for games be? We already have plenty of graphics libraries which abstract the GPU from the developer (kinda sorta). This is for number crunching.

srean · on Dec 18, 2017

Any good library for matrix and linear algebraic operations ?

I would be interested in anything that can serve as good building block material that has a good set of primitives built in.

pjmlp · on Dec 18, 2017

Yes, IMSL Numerical Libraries have good .NET support.

https://www.roguewave.com/products-services/imsl-numerical-l...

srean · on Dec 18, 2017

Thanks. Good to know. Seems closed source (nothing wrong there, just fyi for others)

Athas · on Dec 18, 2017

This raises a question that I'm still wondering about. Matrix operations are useful for high-performance computing because a lot of useful operations can be transformed (with varying effort) into matrix operations. Hence, with access to a relatively small library of high-performance implementations of these operations, one can obtain good computational performance. This is certainly practical.

However, is the matrix formulation the best one, if we have access to a programming language (like CUDA) where we have more flexibility in how we perform computation? For example, while we might be able to express k-means clustering as a matrix operation, we might express it more efficiently by programming it directly.

srean · on Dec 18, 2017

I think this will be an even more meaningful discussion if you can identify an overhead or a redundant computation in the reduction of k-means to matrix and linear algebraic operations. BTW specific APIs are a different matter, they can and do entail overheads unless compiler optimization and runtime JIT can remove them.

If there aren't any redundant operation, matrices give a good abstraction that does not leak much. Libraries take care of the best use of cache.

If you have to build castles out of individual grains of sand and burnt clay, it will limit how many of them you can build. That's why building blocks are useful. Where the building blocks don't quite fit, there is always sand, clay and mortar to fill the gaps.

alkonaut · on Dec 18, 2017

For anything that just uses "standard" matrices/vectors I'd wager the best bet is to just use a C# wrapper around a native library such as MKL or something.

This seems like the choice when you want to have custom kernels run on your large arrays.