And the corollary: Why do researchers never respect the PEP8 when they write pyt...

hogu · on Nov 3, 2010

I wasn't aware of pep8 when I started, most science people arrive at python from a different path. What I mean is, for a long time I knew much more about numpy than about python itself.

there are some things in pep8 that are bad for science, the spaces around operations, and also the 80 chars to a line... scientific expressions are often long and complicated, yes you can do it while adhering to pep8, but its kind of a PITA

ogrisel · on Nov 3, 2010

The 80 chars limit has a justification: expressions that don't fit on 80 chars (or two lines using parens) are not readable anyway. In such a case temporary variables with meaningful names would both help respect the 80 chars constraint and make the expression easier to understand by the reader.

Furthermore having 80 chars is great to have vertically split editors with the code on one panel and the tests or the documentation on the other panel.

Goladus · on Nov 3, 2010

> The 80 chars limit has a justification: expressions that don't fit on 80 chars (or two lines using parens) are not readable anyway

I disagree. There are many cases, especially after 2-3 levels of indentation, where 80 characters is an unreasonably narrow space. I don't have a strong preference for reading code within 80 characters. And I'd much rather comments use 80 characters plus indentation rather than worry about whether I've got my screens vertically split.

> Furthermore having 80 chars is great to have vertically split editors with the code on one panel and the tests or the documentation on the other panel.

Having lines here or there that go beyond 80 characters doesn't completely prevent you from doing this, and having an entire statement on a single line of code makes line-based tools like grep or kill-line more effective.

Limiting to 80 characters is a good idea, but it's easy to see that there are a few significant tradeoffs, and that someone trying to get something done is not going to want to bother.

follower · on Nov 4, 2010

> > The 80 chars limit has a justification: expressions that don't fit on 80 chars (or two lines using parens) are not readable anyway

> I disagree. There are many cases, especially after 2-3 levels of indentation, where 80 characters is an unreasonably narrow space.

Some people would say that after 2-3 levels of indentation you should be looking at refactoring your code. Probably to pull something into a separate function/method.

Goladus · on Nov 4, 2010

I don't think many people would say 2-3 levels is the threshold for refactoring in Python. 3 levels is 1 class, 1 def, and 1 other control structure. Then you have 68 characters left.

hogu · on Nov 3, 2010

I'm not saying there aren't reasons to have short lines.

But in science, we have longer and more complex expressions in general.

Our priorities are different.

Avshalom · on Nov 3, 2010

Because researchers have never heard of pep8, and in general don't give a shit about domain specific politics unless it's their domain.

BrandonM · on Nov 3, 2010

From PEP 8:

> The preferred place to break around a binary operator is after the operator, not before it.

I'd be interested in hearing the justification for this rule. I think that leading a continuation line with the binary operator makes it super-clear that it is a continuation line. What is the benefit of the preferred style? Compare:

  if (the_result_of_this_function(on_this_arg) == 10
      and this_overly_descriptive_boolean):
      do_stuff()

  if (the_result_of_this_function(on_this_arg) == 10 and
      this_overly_descriptive_boolean):
      do_stuff()

To me, the first one is quite clearly a continuation line (no statement can start with "and"). The second requires closer inspection.

zephyrfalcon · on Nov 3, 2010

I would write:

  if the_result_of_this_function(on_this_arg) == 10 \
  and this_overly_descriptive_boolean:
      do_stuff()

Indenting the second line of the if statement would, at first glance, indicate that it's part of the block instead. Then again, it depends. If it was the header of a def statement, I would follow the PEP, e.g.

  def __init__(self, width, height,
                     color='black', emphasis=None, highlight=0):

On a side note, I once did the "Art & Logic challenge" [http://www.artlogic.com/] and they use guidelines that apply to several languages, e.g. you would use the same formatting style for C++ and for Python, if at all possible. Much of it flies in the face of PEP 8.

ogrisel · on Nov 3, 2010

I agree on this specific case but consistency with conventions shared across projects is more important.

But I don't think anybody will complain if you use either of the them whereas 160 chars long expressions with no spacing between operators and funkyCamelCasing all over the code are just show-stoppers when I want to contribute a patch to a project.

njharman · on Nov 3, 2010

PEP8 is wrong on several counts. It even understands this the first section (after introduction) is "A Foolish Consistency is the Hobgoblin of Little Minds" which is about the spirit of pep8 readability and consistency and explains some situations when you should violate pep8.

samd · on Nov 3, 2010

I don't think most researchers ever expect anybody to read their code. Woe to the graduate student who years later actually needs to use the code.

ogrisel · on Nov 3, 2010

That must change. Science must be reproducible. Other researchers should be able to dive into each others code quickly to understand the impact of implementation details.

Avshalom · on Nov 3, 2010

Well yes and no, if they're doing their job right they describe the method in such a way that you don't need their code to reproduce their results.

Code should not be Documentation.

Further nobody trusts anybody's code anyway unless it's just a couple of trivial calls to a pre-vetted software package like IRAF, AIPS (to name some astronomy related one), or LAPACK. So generally they don't want your code. the exception is grad students trying to apply your old work to new data because they aren't in a position to be trusted with completely original research yet.

Yes it'd be nice if every one had great readable code and handed over the 2 terabyte data sets that it needs without batting an eye. but in practice code quality is pretty low on the ladder of "things that get in the way of collaboration"

njharman · on Nov 3, 2010

>Code should not be Documentation.

Code is for humans to read, that it compiles/interprets to a program is a side effect. Otherwise we'd all be passing around binaries (or byte encoded files) with our thick stacks of documentation.

mfukar · on Nov 3, 2010

What's your take on the multitude of software that you buy together with all the README files, Word or PDF documents describing how to use the software, what does it do, and all that jazz? Do we (humans) get to view all that code and see what Microsoft Office Word 2007 can do for us?

njharman · on Nov 9, 2010

> multitude of software that you buy

Why do you assume I buy any software?

The claim "code is for humans to read" does not logically lead to claim "code is the only thing for humans to read". There are different kinds of humans, programmers, maintainers, end-users, and idiots are some. You're a member of the later.

xiongchiamiov · on Nov 3, 2010

> Do we (humans) get to view all that code and see what Microsoft Office Word 2007 can do for us?

Well, we should be able to, but no, we can't, precisely because we don't get code - we get binaries.

mfukar · on Nov 4, 2010

The hypothetical universe in which 'code' is interchangeable with 'natural language' does not concern me because, as explained, we don't live in it.

Or maybe not just yet.

leot · on Nov 3, 2010

Python, unlike any other language I've dealt with, lends itself very nicely to producing stuff that's reusable and easy to understand. I chalk this up to

* lack of elitism in documentation (e.g. there are always plenty of examples)

* lack of elitism in conventions for code use: everything "just works", generally without any boilerplate

* installing libraries is a snap, and the whole module organization system is intuitive and elegant

* assumption that anything that's not a script is a library

* documentation conventions (doctests, e.g., are a nice stepping stone to good documentation _and_ code testing)

* the "there's only one way to do it" attitude

* large standard library

On the other topic: you are describing the way research works "today", which is actually pretty poorly (why, e.g., does all data need to be surrounded by so many words of introduction and discussion? why can't I just add something to someone else's work like I can add to an open source project?). This model of research will change, at one point or another, to resemble the much more efficient, effective, and fun, open source project model.

ogrisel · on Nov 3, 2010

If your system is complicated enough (which can be the case for complex machine learning or NLP algorithms), an 8 pages paper (common limit for many conferences) cannot describe all the implementation details but those implementation details might be very important to be able to reproduce the results.

Hence code should be both published, well documented and readable.

Avshalom · on Nov 3, 2010

Fair enough. When I think of scientific uses of python I think astronomy, atmospheric physics, finite element analysis and linear systems using existing techniques...

Existing techniques in general really. Fields where the interest is the data and the implications of the data. Fields like ML and NLP where the algorithm/technique is the thing of interest then yeah sure the code is important.

ogrisel · on Nov 3, 2010

I agree publishing datasets is very very important too. Often more important than code.

killedbydeath · on Nov 3, 2010

I worked in projects where different used slightly different coding styles and I did not find it getting in the way too much -- you just match the style of the code you are working with. I am surprised there are people who will not contribute to a project because of this.

uriel · on Nov 4, 2010

This is one of the many reasons I love Go, gofmt takes care of almost all the silly style issues, and there is no need to learn any style guide, just run your code (or anyone's code) through gofmt, and you are done.