If you're using Python for performance-critical applications, simple use of the ...

absherwin · on Feb 13, 2014

Using ctypes here is slower in CPython 2.7.5 than a simple class. Summing 10000 points takes 1.81ms using a simple class vs 2.22ms using the code above. ctypes is designed for compatibility not speed.

Using slots is 0.06ms faster while namedtuple is slower.

A bigger speedup in CPython can be achieved by just using the builtin sum function twice to avoid the for loop.

Here's a gist: https://gist.github.com/absherwin/8976587

jnbiche · on Feb 13, 2014

Please expand on your assertion. How are you storing your "two ints"? Two int tuple? Dict? List? I don't even have enough information to respond.

I do agree that ctypes is primarily meant for compatibility, but it can be used for speed optimizations in certain cases, particularly when there is a speed penalty for non-continguous storage of data structures (like a list of dicts, or list of tuples).

Edit: Thanks for the additional information. I totally agree that Python build-ins are often faster, as in this case. However, there are times when no build-in is available, and you have to use a custom function (usually from C). This gets back to the compatibility question, and I totally agree that ctypes is primarily meant for compatibility.

absherwin · on Feb 13, 2014

Apologies for the confusion. The comparison was against a new-style class as described in the original post. All of the types have two ints. The code is now linked so you can inspect and respond.

Edit to respond to edit: Built-ins and slots are even faster but the key point is that the ctypes-based class is actually slower than simply using a class with ordinary Python variables.

The reason ctypes is slower is that addition isn't defined on c_ints which requires casting them back to Python ints for each operation. This can be avoided by using ctypes to call a C function to perform addition.

jnbiche · on Feb 13, 2014

Response to edit: you may be right, I don't have time to do comprehensive benchmarking now. I never use ctypes on its own, but rather always with an external library. I assumed that the ctypes Structures in a ctypes array would be faster because its stored contiguously, but it's possible that is incorrect. I'll have to come back to this later when I have time.

I stand by my assertion that ctypes along with an external C library is a great way to do Python speed-ups. It's very simple to do, see here:

http://www.jiaaro.com/python-performance-the-easyish-way/

This is the kind of optimizations I usually use ctypes for, or for interfacing with a third-party shared library.

nostrademons · on Feb 14, 2014

The ctypes structures are stored contiguously with a C-compatible memory layout, but every time an element is accessed, a new Python object is constructed to hold that return value. It's like it defines getattr to call int(<C value>). That's why they end up slower than regular classes - every access needs to create a new object.

ctypes + C code can be quite efficient, but you have to write the entire fast-path in C, not flip-flop between C and Python. It's best when you have a certain operation that needs to be fast (say, serving your most common path on a webserver, or running a complex matrix operation in Numpy), and then a bunch of additional features that are only called once in a while.

acdha · on Feb 13, 2014

You might also want to look at cffi – it's not in stdlib but the interface is pretty nice for this kind of thing:

http://cffi.readthedocs.org/

In the specific example above, however, I'd try to reduce that overhead by going even more C-style and allocating an array or two so you avoid the overhead of individual instances and instead have one Python object for many elements.

jnbiche · on Feb 13, 2014

>In the specific example above, however, I'd try to reduce that overhead by going even more C-style and allocating an array or two so you avoid the overhead of individual instances and instead have one Python object for many elements.

How exactly would you avoid the overhead of individual instances using a ctypes structure? Sure, you can allocate an array very easily, using Point * 10 and then putting your instances in the array. In fact, you'd have to do this if you wanted a collection of these structs. I'm pretty sure that this array is stored continguously, but you still have the overhead of the Structure base class. How are you proposing to avoid this in ctypes? Or were you referring to cffi?

If you really want to be serious about large, performance-intensive arrays in Python, it's time to pull out numpy. It actually would fit well here.

Re: cffi, I do love luajit but I'm not a huge fan of its C interface. I've not used cffi, but it seems to be inspired by, or at least shares a lot with, luajit's C ffi. Which means you have to basically paste all the C header files into a string in your Lua or Python code to use the corresponding shared library.

The beautiful thing about ctypes is that you don't have to declare all the function prototypes and type declarations when you use a shared library. You simply make sure to feed the functions the properly typed variables, and store the returned value properly, and it just works.

As a result, I find ctypes a lot easier to use for simple optimizations. YMMV.

comex · on Feb 13, 2014

> The beautiful thing about ctypes is that you don't have to declare all the function prototypes and type declarations when you use a shared library. You simply make sure to feed the functions the properly typed variables, and store the returned value properly, and it just works.

It just works... until the argument that you passed as int but is actually size_t makes the program crash on 64-bit. Or you have to use an API which includes a giant structure which you then must transcribe manually into Python. Or you need to use a system library located at different paths on different platforms.

In my opinion, ctypes isn't the worst, but a sane C FFI should allow you to specify some #includes and -ls and automatically extract the proper declarations and linkage. cffi only partially satisfies this, since you have to add declarations manually and then have cffi "verify" them using a C compiler, but it's still better than ctypes.

acdha · on Feb 13, 2014

> How exactly would you avoid the overhead of individual instances using a ctypes structure?

This optimization only works if you have an upper bound for the total number of items but if so I'd at least try something like declaring point_x and point_y arrays so you'd have two Python objects accessed by indexes rather than many Point references. This technique can be particularly useful if you don't access all of the elements at the same time and you can get more efficient memory access patterns because you can access a subset of the elements in nice linear reads.

> If you really want to be serious about large, performance-intensive arrays in Python, it's time to pull out numpy. It actually would fit well here.

Very strong agreement - numpy is a treasure.

> As a result, I find ctypes a lot easier to use for simple optimizations. YMMV.

Ditto - I've found cffi to be faster in some cases but it's definitely more overhead to use.

jnbiche · on Feb 13, 2014

Ah ok, so you're proposing a nested array, not an array of structure pointers. I thought you were saying there was some way to use ctypes structures and yet avoid the overhead of the class instances.

And yes, I agree that if you're going to create this type of data structure, then it's worth the trouble to pull out numpy if you can.

Sauliusl · on Feb 13, 2014

This would be one of the approaches to use if you wanted to squeeze the last pieces of performance out of this class. I would recommend using Cython for this though.

The problem with both of these approaches is that, well, use this too often, however, and you suddenly realise you are not really coding in Python anymore.

Also, correct me if I'm wrong, overuse of tricks like these is one of the key reasons why we cannot have nice things like PyPy for all modules.

jnbiche · on Feb 13, 2014

I actually agree with everything you write here. If I could re-write the post to include these points, I would.

I do think PyPy is very near to having full ctypes support, if they don't already.

pekk · on Feb 13, 2014

This is like saying that if you call C programs from shell scripts, or run node programs that make use of C libraries, that you are just writing C.

untothebreach · on Feb 13, 2014

I am a python programmer who has yet to dive into ctypes, and I am curious if you have ever played with cffi[1], and if so, thoughts? The project seems to be aimed at the same use cases that you would use ctypes for.

1: http://cffi.readthedocs.org/en/release-0.8/

jnbiche · on Feb 13, 2014

cffi's API is similar to LuaJIT's, so if you're familiar and comfortable with that, then cffi might be a good option.

If you're looking at creating your whole application around C extensions for Python, as in the case with some Cython apps, then libffi might be a good alternative. I've never used it, so I can't make any empirical observations about cffi (unlike Cython, which I've used -- it's awesome).

The great thing about all these newish ways to interface with C code in Python is that we no longer have to write Python extensions in C. I like C, but writing Python extensions in C was very painful and tedious for me. So all these options are a good thing.

Otherwise, I think ctypes is a more convenient option (it's always available, no extra installations required). Also, the ctypes API is much more Pythonic, in my opinion.

untothebreach · on Feb 13, 2014

Thanks for the thoughts!

vishvananda · on Feb 13, 2014

I really like cffi for interacting with external c libraries. See for example my post on talking to openssl using cffi:

http://unchainyourbrain.com/using-openssl-from-python-with-p...

walshemj · on Feb 13, 2014

If you using python for performance critical work would you not prototype it in python and then rewrite it in C C++ or even Fortan if its HPC

codygman · on Feb 13, 2014

Would you call me a heretic for writing it all in Haskell then rewriting hot spots in C[1] if performance isn't good enough?

1. http://book.realworldhaskell.org/read/interfacing-with-c-the...

pekk · on Feb 13, 2014

No. I don't know how Haskell fans feel about it, but I don't see anything wrong with it if it works and lets you write more good code than you would writing everything in C. C isn't that scary.

freyrs3 · on Feb 13, 2014

C isn't that sca... Segmentation fault

1amzave · on Feb 13, 2014

I've never understood why people always seem to immediately bring up segfaults as scary C behavior. Is it really that different than Python dying with "AttributeError: 'NoneType' object has no attribute 'foo'", or a NullPointerException in Java? Sure, in the latter two you get a stack trace, but you can rig up C to give you that too if you want.

What's actually potentially scary in C is code that doesn't segfault, but just continues running along with silently-corrupted memory. (And that's something that doesn't have an analogous failure mode in most other languages, as far as I know.)

andreasvc · on Feb 14, 2014

Yes it's actually really different because the error is non-specific (unhelpful) and can be caused by a completely unrelated operation, so debugging is harder.

bch · on Feb 14, 2014

  $ gdb myprog myprog.core
  % backtrace full
  ...[stack frames with program line numbers printed here...]

Edit: formatting

andreasvc · on Feb 14, 2014

Indeed, but that backtrace can be completely unrelated to what caused the memory corruption, so it's definitely more difficult than with managed languages, which was the point I made.

ori_b · on Feb 14, 2014

And the AttributeError can be completely unrelated to what set a variable to something of the wrong type, so it's definitely not dissimilar to C's memory corruption.

The kind of errors that scare me are silently writing past the end of an array on the stack, which will silently introduce wrong values that will probably not crash your program.

andreasvc · on Feb 14, 2014

The AttributeError is in a different category, because the interpreter protects you from doing unsafe things, and you get a clue as to what is wrong because you get the name of an attribute and the object which was supposed to have it. Segmentation faults can be much more difficult. For example the other day I got a segmentation fault deep in the CPython code; there was no bug there, and moving up the stack trace also did not reveal a bug either. It turned out that I had messed up a reference count _in a completely different (Cython compiled) module_, and I only found out after poring over several sections of code repeatedly.

The reason that this can happen in C is actually exactly what you bring up, silently corrupting memory. It's just that with the segmentation fault the silence is broken abruptly.

Elv13 · on Feb 14, 2014

That is what Valgrind / LLVM memory sanitizer are for

rbanffy · on Feb 14, 2014

We can't know how much data was destroyed between the truncation of the message and the actual segfault message...

lucian1900 · on Feb 14, 2014

Segfaults are just the most user-visible consequence of lack of memory safety.

pekk · on Feb 14, 2014

You can't have everything at once