Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my view, this is the strongest evidence that Matlab, SciPy, R, etc. haven't found the right abstraction level for numerical computing. The high-level language is supposed to be the abstraction, yet in these systems you continually need to break through that abstraction and code in C for performance and scale. That's not a very good abstraction. This problem is precisely what Travis Oliphant and his team are tackling with Numba and Blaze, but it remains to be seen if they can produce a better abstraction.

If you're willing to try another language altogether, Julia [http://julialang.org] is a general purpose language with enough performance and expressiveness to be an effective abstraction layer for numerical programming – you never have to dip into C for speed, scale or control. In developing the language, we haven't allowed ourselves to resort to C – instead, we've worked at making Julia itself fast enough to implement things like I/O, Dicts, Strings, BitArrays (packed 8 bits-per-byte boolean arrays), etc. – all in pure Julia code while getting C-like performance.



Indeed. I have pretty much stopped engaging with the standard dialog repeated ad-infinitum that goes along the lines of "code the bottleneck in C", "GIL is a non-issue, just use parallel processes".

For some workloads, the latter is actually a good advice, but for my typical use case that does not help. These would be tight'ish loop wrapped around a fork-join. Shared memory handling can be quite clunky in numpy, and if you want to do message passing, the overheads bleed off any advantage that parallalelism ought to have given you. I dont mind the message passing abstraction, just that the overhead for doing it in python/numpy is too much. About the former, one major motivation to use numpy et. al. was to not use C with its explicit indexing over arrays. Its both verbose and error prone.

It is never pleasant to drop into a different language, though it is much much better than how bad it could be, thanks to Swig, Cython, Weave. Contrary to common wisdom I prefer Weave because of its much succincter syntax. In Cython I am back to writing C again but with a different syntax. This is not a criticism of Cython, its an excellent tool and it is much much more pleasant to parallelize from Cython than from Numpy/Python.

Julia looks pretty good. I have one suggestion: The best way to get speed out of Julia is not to write vectorized expressions but to writeout explicit loops. Thats a little unfortunate because though vectorization constructs evolved out of the necessity to avoid loops (which was slow in the older languages), it did have an excellent byproduct of succinct code. Ideally I would like to retain that.


Julia is a wonderful and elegantly designed language. Fast and a great type system. Intuitive. For several days, I was very excited about the prospect of moving my research to Julia.

And then I discovered that it too has no reasonable shared memory parallelism story, just the same manual distribution of arrays plus multiprocessing that exists in Python.

I will speculate that Julia's authors have the same attitude as many in the Python community -- namely, that there are small jobs, which can be run in one process, and large jobs, which need to be massively parallelized, and nothing in between. But in reality there are many scientific tasks that are medium-sized, for which OpenMP-style solution is the best fit. Tasks which might take days can be reduced to hours. With new developments like Xeon Phi, that ratio might further improve.

Also many problems require a lot of heterogeneous shared state, and it is tedious to manually distribute each element in this shared state. Finally there are many problems, such as natural language processing, that are only partially numerical. For these problems, distributing arrays is only part of the solution.


I totally agree as such. I certainly don't think Python has a particularly good solution, just the best current practical solution. I mean it's pretty much an accident of history the python became a popular language for numerics and it's certainly not what it was designed for.

I'm following Numba and Blaze with interest and honestly consider Julia the most exciting new language out there. But until they reach a point where they are useable for me I'll keep using python and the incredibly powerful, if slightly kludgy, solutions it offers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: