Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi boothead, just curious what you do when you run into the inevitable performance issues that can't be ameliorated with libraries like NumPy?

Is Cython or PyPy useful, or do you use Python more for prototyping, and rewrite some portions in C?



Python is definitely preferred on the research rather than the production side generally. Some organisations do use it on the infrastructure side too, but these guys are algorithmic traders, and not HFTs so latency isn't that big a deal.

WRT to the problem of deploying prototyping/research code, there are the following unsolved problems that I'm aware of:

* Going from matrix operations over the whole timeseries (taking care to avoid problems with your algo looking ahead) for speed in a research setting to deploying to an environment that streams updates to the timeseries one at a time. I think that this is an area that haskell has the potential to excel at, given it's strong guarantees on structure.

* Concurrency. The options in python all suck to some degree - especially if you have to interact with C libraries or extensions. I don't think it will ever make sense to build your real time market date in python. Again here haskell has an advantage.

However the python ecosystem seems to be almost perfect for researchers:

* Excellent and flexible data slurping/munging/transforming.

* numpy, scipy, pandas, theano, scikits... 'nuff said

* ipython

* Cross platform

HTH


> Going from matrix operations over the whole timeseries ... to deploying to an environment that streams updates to the timeseries one at a time. I think that this is an area that haskell has the potential to excel at, given it's strong guarantees on structure.

exactly my thought. Algo guys get stuck in matrix land because that's where their tools take them. Whereas this came out in R last week: http://cran.r-project.org/web/packages/stream/index.html

python is a better toolset than R maybe, but the R problem domain seems broader in the last few months anyway.

My interest is in monitoring thousands of algorithms in real time directly within the messaging environment, and before the data hits a database. That type of concurrency is where haskell can muscle up and do the job.


Some of the work that continuum have done on blaze [1] look to be tackling the problem of streaming in python too. In fact some of the ideas in this library come directly from haskell and one of the main developers is Stephen Diehl, who posted the very popular "what I wish I knew about haskell" slides recently.

Looks like a match made in heaven :-)

[1] http://blaze.pydata.org/


Stephen here, yes boothead is right. Using a combination of Haskell and Python you can make a really powerful trading system. Python has a lot of user-facing algorithm tools and Haskell has the robustness and parallelism for the backend that Python doesn't.

If you're interested in advice on how to bridge the two worlds let me know, there's a lot of of upcoming technology ( LLVM, Blaze, pipes, zeromq, cloud-haskell ) that could be very useful.


It does feel like I'd be swimming against the tide heading down the R path. Thanks for the offer :)


I haven't worked in anything as high performance as this (though I've been doing lots of number crunching in Python recently) but there are a couple of great libraries that you can use before you jump into Cython. Check out Numba [1] and numexpr [2]

[1] https://github.com/numba/numba [2] https://code.google.com/p/numexpr/




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: