I've had to grab my throwaway a/c just because I've worked on this. I've used this framework in a commercial settings, in finance where it is gaining popularity. Have had to work through the bugs, been through several upgrade iterations.
When it works. It works. When it fails you're in for a world of pain. A lot is going on behind the scenes. The data structures, classes need special annotations to carefully translate the c# structures into something that cuda understands. You have to pay special attention to your object hierarchy, you have to be aware of all C# keywords that are supported and not supported. The fields in your class may be misaligned by a few bytes of the wrong annotation is used.
Don't get me started on cuda pointers - you're not shielded from this. My experience with this has put me off cuda frameworks in general.
You're better off learning cuda or hiring a cuda Dev than investing heavily in stacks like this.
I was going to say something similar to this. Usually such abstractions are leaky. I don't know the target market for this product. The one's who know CUDA will just use C++. The one's who know C++ won't use this. The one's who do not know CUDA will probably not use this because they will need to learn CUDA first anyway.
(Same story with those C# to Javascript compilers)
Well C# is a more familiar language for quants and quant devs than cuda.
Very few quants know CUDA. My brother happens to be one of them but he does most of his development now in python/pandas. He gets someone else to do the heavy lifting to target the HPC platform of choice.
Now the motivation behind the layer or framework is to provide greater productivity, remove the step of pairing quants with cuda Devs when productionsing the code.
These are all valid reasons to adopt such frameworks. But in practice the cost of upgrades, strange pointer bugs and delays to production releases don't make it worth it. Also you need the CUDA Dev for situations that the framework fails. If such a framework does become available that removes these pain points, I'd adopt it in a heartbeat
This may vary in other industries other than finance.
Are there structural problems with this approach or is it mainly because of immaturity? Usually version 3 of most software is mature enough to judge it by its merits.
Your example doesn’t account for substitutes, though. Before commercial flights, civilians couldn’t fly. Before Hybridizer, programmers could still use CUDA.
The point being that it takes time to get the right solution for a problem, and we shouldn't be stuck with C and C++ for GPGPU forever, just because no one has found out the ideal solution for multi-language GPGPU programming.
Just yesterday I saw a documentary about Gotthard Tunnel, also considered impossible to achieve, with high human costs and risk of insolvency until they finally managed it.
John Harrison took almost his entire life to create the first usefull marine chronometer, amid disbelief and issues to get proper funding.
Or if you want to bring it closer to home, very few people believed JavaScript would ever become fast or even leave the browser.
Does this solve some of the core problems with GPU programming, such the difficulty of writing reusable code without significant performance overhead (via nested parallelism or fusion), or the need to follow some potentially awkward rules for performance reasons (struct-of-arrays and coalesced memory access patterns?).
I mean, it is surely nice to easily launch a bunch of threads within a single-source program, but there are already plenty of C++ libraries that let you do this, and it has not really led to an explosion of making efficient GPU programming accessible to the layman.
For "easy gpu programming" look for tensorflow or any of these "run on CPU and GPU" libraries. For anything else there's no magic, need to write kernel code and take care of memory layout.
What would the use case for games be? We already have plenty of graphics libraries which abstract the GPU from the developer (kinda sorta). This is for number crunching.
This raises a question that I'm still wondering about. Matrix operations are useful for high-performance computing because a lot of useful operations can be transformed (with varying effort) into matrix operations. Hence, with access to a relatively small library of high-performance implementations of these operations, one can obtain good computational performance. This is certainly practical.
However, is the matrix formulation the best one, if we have access to a programming language (like CUDA) where we have more flexibility in how we perform computation? For example, while we might be able to express k-means clustering as a matrix operation, we might express it more efficiently by programming it directly.
I think this will be an even more meaningful discussion if you can identify an overhead or a redundant computation in the reduction of k-means to matrix and linear algebraic operations. BTW specific APIs are a different matter, they can and do entail overheads unless compiler optimization and runtime JIT can remove them.
If there aren't any redundant operation, matrices give a good abstraction that does not leak much. Libraries take care of the best use of cache.
If you have to build castles out of individual grains of sand and burnt clay, it will limit how many of them you can build. That's why building blocks are useful. Where the building blocks don't quite fit, there is always sand, clay and mortar to fill the gaps.
For anything that just uses "standard" matrices/vectors I'd wager the best bet is to just use a C# wrapper around a native library such as MKL or something.
This seems like the choice when you want to have custom kernels run on your large arrays.
When it works. It works. When it fails you're in for a world of pain. A lot is going on behind the scenes. The data structures, classes need special annotations to carefully translate the c# structures into something that cuda understands. You have to pay special attention to your object hierarchy, you have to be aware of all C# keywords that are supported and not supported. The fields in your class may be misaligned by a few bytes of the wrong annotation is used.
Don't get me started on cuda pointers - you're not shielded from this. My experience with this has put me off cuda frameworks in general.
You're better off learning cuda or hiring a cuda Dev than investing heavily in stacks like this.
My background is in HPC in finance.