Python Style Guide: Difference between revisions

From OLPC
Jump to navigation Jump to search
Line 602: Line 602:


[Q: what about dates and other localized values?]
[Q: what about dates and other localized values?]

[Q: Should we prefer string.Template over % substitution?]


== Testing ==
== Testing ==

Revision as of 19:50, 14 November 2006

Note: this document is still being discussed, and is not authoritative for the OLPC project. Also, no code has been updated to fit this style guide. Not code should be updated until this is finalized.

Introduction

This document gives coding conventions for the Python code in the One Laptop Per Child project.

This document was adapted from Guido's original Python Style Guide essay, with some additions from Barry's style guide. This guide was then modified from PEP 8 by Ian for One Laptop Per Child, to cover additional issues that are present in that environment and to make some of the language stronger.


A Foolish Consistency is the Hobgoblin of Little Minds

One of Guido's key insights is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. As PEP 20 says, "Readability counts".

A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is most important.

But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!

Two good reasons to break a particular rule:

  1. When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.
  2. To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).


A Note On Consistency

When you are interfacing with another library and providing a Python wrapping for its functions, you should always adopt the naming style of that library.

If a library is maintained or authored outside of the OLPC project, you should respect the style guidelines of that library when making edits or additions.

If you are changing the style of a piece of code, this should be done all at once and no other changes should be made at the same time. Whitespace changes in particular should be done separate from even naming changes.


Code lay-out

Indentation

Use 4 spaces per indentation level. Do not use tabs.

Maximum Line Length

Limit all lines to a maximum of 79 characters.

There are still many devices around that are limited to 80 character lines; plus, limiting windows to 80 characters makes it possible to have several windows side-by-side. The default wrapping on such devices looks ugly. Therefore, please limit all lines to a maximum of 79 characters. For flowing long blocks of text (docstrings or comments), limiting the length to 72 characters is recommended.

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better. Make sure to indent the continued line appropriately. Some examples:

   class Rectangle(Blob):
   
       def __init__(self, width, height,
                    color='black', emphasis=None, highlight=0):
           if width == 0 and height == 0 and \
              color == 'red' and emphasis == 'strong' or \
              highlight > 100:
               raise ValueError("sorry, you lose")
           if width == 0 and height == 0 and (color == 'red' or
                                              emphasis is None):
               raise ValueError("I don't think so")
           Blob.__init__(self, width, height,
                         color, emphasis, highlight)

Assert statements in particular tend to go over the line boundaries; so generally asserts should look like this:

   assert value is not None, (
       "value should not be None")

Blank Lines

Vertical whitespace (blank lines) are not that important to readability. For the most part this can be left to the developers discretion. As a general guideline:

  • Separate top-level function and class definitions with two blank lines.
  • Method definitions inside a class are separated by a single blank line.
  • Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).
  • Use blank lines in functions, sparingly, to indicate logical sections.

Encodings (PEP 263)

[note: this diverges from PEP 8]

Python source is assumed to be ASCII. You cannot include non-ASCII unicode in a Python file without an encoding declaration, which looks like:

   # coding: UTF8

Only UTF8 should be used if you are not using ASCII. Latin-1 is *allowed* in Python without a declaration but currently deprecated (in 2.4, illegal in 2.5) and will signal a warning. Do not use Latin-1.

As a special case a file with the UTF8 signature '\xef\xbb\xbf' at the beginning of the file will be detected as a UTF8 file. [Q: how stable is the signature? That is, will all our files write this signature and keep it in the file? If it disappears, it means the file suddenly becomes ASCII and if any non-ASCII has slipped into the file it will become unimportable]

Note that you cannot use unicode in any identifiers in Python; the encoding only applies to Unicode strings like u"a string". In all cases you can use backslash escapes for non-ASCII characters, and this is generally preferred. An exception would be a case when long strings of text (that are not English) are present in the module. Generally such text should be in localization files.

Files produced by OLPC should generally be plain ASCII. UTF8 is used and allowed for because children who may later modify the source are unlikely to adhere to strict ASCII text. They still must adhere to strict ASCII identifiers; ideally an editor (such as vim) would make it clear that non-ASCII identifiers are syntactically invalid in Python.


Imports

Imports should usually be on separate lines, e.g.:

   Yes: import os
        import sys
   
   No:  import sys, os

it's okay to say this though:

   from subprocess import Popen, PIPE

[note: this is a soft requirement]

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Imports should be grouped in the following order:

  1. standard library imports
  2. related third party imports
  3. local application/library specific imports

You should put a blank line between each group of imports. [note: I don't care about the blank line, and consider the ordering to be only a suggestion]

Put any relevant __all__ specification after the imports.

Relative imports for intra-package imports are highly discouraged.

Always use the absolute package path for all imports. If or until we settle on Python 2.5 we cannot use PEP 328, and so cannot do explicit relative imports.

"from x import *" is generally discouraged.

You should only import this way from packages that are intended to be used like this (the packages generally define __all__).

You should never use "import *" more than once in a file. If you use it more than once then there is no way to know (without leaving the file) exactly where a name comes from. So long as "import *" is used just once, one can assume when no other source can be found for a name that it must come from this import.

When importing a class from a class-containing module

It's usually okay to spell this:

   from myclass import MyClass
   from foo.bar.yourclass import YourClass

If this spelling causes local name clashes, then spell them

   import myclass
   import foo.bar.yourclass

and use "myclass.MyClass" and "foo.bar.yourclass.YourClass"

[note: I hate "import foo.bar.yourclass" and prefer just "from foo.bar import yourclass" or "from foo.bar.yourclass import YourClass"; this note should probably be changed.]

In summary

A file should generally look like this:

   # -*- coding: UTF8 -*-  (if used at all)
   """
   docstring: may also be a unicode or 'raw' string
   If you are using doctest then a raw string is recommented
   (prefix the string with an r)
   [are unicode strings generally preferred for docstrings?
   that would give a prefix or u or ur]
   """
   from __future__ ...
   import stdlib modules
   import external modules
   import internal modules
   __all__ = [...]   # If you use __all__
   constants...
   functions and classes...

__init__.py Files

__init__.py files should generally contain no substantive code. Instead they should import from other modules. Importing from other modules is done so that a package can provide a front-facing set of objects and functions it exports, without exposing each of the internal modules in the package. Note however that this causes the submodules to be eagerly imported; if this is likely to cause unnecessary overhead then the import in __init__.py should be reconsidered.

Whitespace in Expressions and Statements

Pet Peeves

Avoid extraneous whitespace in the following situations:

Immediately inside parentheses, brackets or braces.

   Yes: spam(ham[1], {eggs: 2})
   No:  spam( ham[ 1 ], { eggs: 2 } )

Immediately before a comma, semicolon, or colon:

   Yes: if x == 4: print x, y; x, y = y, x
   No:  if x == 4 : print x , y ; x , y = y , x

[note: if you do not put a space after a comma, it is harder to visually distinguish . from ,; e.g., foo(a,b) and foo(a.b). Please use spaces after commas!]

Immediately before the open parenthesis that starts the argument list of a function call:

   Yes: spam(1)
   No:  spam (1)

Immediately before the open parenthesis that starts an indexing or slicing:

   Yes: dict['key'] = list[index]
   No:  dict ['key'] = list [index]

More than one space around an assignment (or other) operator to align it with another.

   Yes:
   
       x = 1
       y = 2
       long_variable = 3
   
   No:
   
       x             = 1
       y             = 2
       long_variable = 3

[note: I'm soft on this one, though less soft on the others]


Other Recommendations

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

Use spaces around arithmetic operators:

   Yes:
   
       i = i + 1
       submitted += 1
       x = x * 2 - 1
       hypot2 = x * x + y * y
       c = (a + b) * (a - b)
   
   No:
   
       i=i+1
       submitted +=1
       x = x*2 - 1
       hypot2 = x*x + y*y
       c = (a+b) * (a-b)

Don't use spaces around the '=' sign when used to indicate a keyword argument or a default parameter value.

   Yes:
   
       def complex(real, imag=0.0):
           return magic(r=real, i=imag)
   
   No:
   
       def complex(real, imag = 0.0):
           return magic(r = real, i = imag)

[note: this is really helpful to make the code more readable; please use this convention. Keyword arguments aren't assignments, and this makes that visually clear.]

Compound statements (multiple statements on the same line) are strongly discouraged.

   Yes:
   
       if foo == 'blah':
           do_blah_thing()
       do_one()
       do_two()
       do_three()
   
   Rather not:
   
       if foo == 'blah': do_blah_thing()
       do_one(); do_two(); do_three()

Don't be lazy, just hit enter!

if/else expressions and list comprehensions should not be deeply nested.

   [this needs some examples]


Comments

Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Comments should go before the thing they are commenting on, like:

   # match will be the regex match object:
   match = None

Or sometimes inside an if statement or other control structure:

   if match is None:
       # None of our attempts to match worked
       raise ValueError("Nothing matched!")

Comments should be complete sentences. If a comment is a phrase or sentence, its first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!).

If a comment is short, the period at the end can be omitted. Block comments generally consist of one or more paragraphs built out of complete sentences, and each sentence should end in a period.

You should use two spaces after a sentence-ending period.

When writing English, Strunk and White apply.

Python coders from non-English speaking countries: please write your comments in English, unless you are 120% sure that the code will never be read by people who don't speak your language. [I'm not sure exactly how this applies to OLPC... except probably more strongly if anything. But less strongly for code created by external entities.]


Block Comments

Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).

Paragraphs inside a block comment are separated by a line containing a single #.

Inline Comments

Use inline comments sparingly.

An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.

Inline comments are unnecessary and in fact distracting if they state the obvious. Don't do this:

   x = x + 1                 # Increment x

But sometimes, this is useful:

   x = x + 1                 # Compensate for border

Generally comments on seperate lines are easier to edit:

   # Compensate for border:
   x = x + 1

Documentation Strings

Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.

Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the "def" line.

PEP 257 describes good docstring conventions. Note that most importantly, the """ that ends a multiline docstring should be on a line by itself, and preferably preceded by a blank line, e.g.:

   """Return a foobang
   
   Optional plotz says to frobnicate the bizbaz first.
   
   """

For one liner docstrings, it's okay to keep the closing """ on the same line.

Avoid using ''' for docstrings.


Naming Conventions

Descriptive: Naming Styles

There are a lot of different naming styles. It helps to be able to recognize what naming style is being used, independently from what they are used for.

The following naming styles are commonly distinguished:

  • b (single lowercase letter)
  • B (single uppercase letter)
  • lowercase
  • lower_case_with_underscores
  • UPPERCASE
  • UPPER_CASE_WITH_UNDERSCORES
  • CapitalizedWords (or CapWords, or CamelCase -- so named because of the bumpy look of its letters). This is also sometimes known as StudlyCaps. (Note: When using abbreviations in CapWords, capitalize all the letters of the abbreviation. Thus HTTPServerError is better than HttpServerError.)
  • mixedCase (differs from CapitalizedWords by initial lowercase character!)
  • Capitalized_Words_With_Underscores (ugly!)

There's also the style of using a short unique prefix to group related names together. This is not used much in Python, but it is mentioned for completeness. For example, the os.stat() function returns a tuple whose items traditionally have names like st_mode, st_size, st_mtime and so on. (This is done to emphasize the correspondence with the fields of the POSIX system call struct, which helps programmers familiar with that.)

The X11 library uses a leading X for all its public functions. In Python, this style is generally deemed unnecessary because attribute and method names are prefixed with an object, and function names are prefixed with a module name.

In addition, the following special forms using leading or trailing underscores are recognized (these can generally be combined with any case convention):

_single_leading_underscore:

Weak "internal use" indicator. E.g. "from M import *" does not import objects whose name starts with an underscore.

single_trailing_underscore_:

used by convention to avoid conflicts with Python keyword, e.g.

 Tkinter.Toplevel(master, class_='ClassName')

__double_leading_underscore:

Wen naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

__double_leading_and_trailing_underscore__:

"magic" objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.

Prescriptive: Naming Conventions

Names to Avoid

Never use the characters `l' (lowercase letter el), `O' (uppercase letter oh), or `I' (uppercase letter eye) as single character variable names.

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use `l', use `L' instead.

Do not abbreviate names by removing vowels. Instead truncate the name.

   Yes:
   
     func
     decl
    
   No:
   
     fnctn
     dcln [note: these aren't very good examples, because they are just
       *too* ugly to be plausible...]

Module Names

Modules should have short, lowercase names, without underscores.

This naming convention distinguishes modules from both functions and classes. This is important; consider this example from Zope 2:

 from DateTime.DateTime import DateTime

In Zope 2 the DateTime package contained a DateTime module with a DateTime class. As a result when you see "DateTime" in the source you can't be sure if it's referring to the package, module, or class. If the module had been named datetime it would be obvious when you were referring to the module and when you were referring to the class. Similar confusion can exist with functions, which is the motivation for leaving underscores out of module names (but using them in function names).

When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket).

Like modules, Python packages should have short, all-lowercase names, without underscores.

Class Names

Almost without exception, class names use the CapWords convention. Classes for internal use have a leading underscore in addition.

Exception Names

Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix "Error" on your exception names (if the exception actually is an error). [note: I find the Error suffix to often be redundant, but maybe it is best to use]

Global Variable Names

(Let's hope that these variables are meant for use inside one module only.) The conventions are about the same as those for functions.

Modules that are designed for use via "from M import *" should use the __all__ mechanism to prevent exporting globals, or use the the older convention of prefixing such globals with an underscore (which you might want to do to indicate these globals are "module non-public").

Many modules are not really intended to be used with "from M import *" and will export many unintended objects (like other modules). Generally you should not use "import *" unless a module is intended to be used like that, and the presence of __all__ is a good indication if a module is intended to be used that way.

Function Names

Function names should be lowercase, with words separated by underscores as necessary to improve readability.

mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py).

Function and method arguments

Always use 'self' for the first argument to instance methods.

Always use 'cls' for the first argument to class methods.

Always use 'metacls' for the first argument to metaclass method. These are technically class methods of the metaclass, but if you don't distinguish metaclasses from classes you will confuse readers terribly.

If a function argument's name clashes with a reserved keyword, it is generally better to append a single trailing underscore rather than use an abbreviation or spelling corruption. Thus "print_" is better than "prnt". (Perhaps better is to avoid such clashes by using a synonym.)

Method Names and Instance Variables

Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability.

Use one leading underscore only for non-public methods and instance variables.

Do *not* use two leading underscores. Python mangles these names with the class name: if class Foo has an attribute named __a, it cannot be accessed by Foo.__a. (An insistent user could still gain access by calling Foo._Foo__a.) If you have some reason to want to avoid name clashes in subclasses, you should use *explicit* name mangling by using an explicit prefix in front of your attributes or functions, like Foo._Foo_a.

Designing for inheritance

[note: this is rather complex; generally I think designing for inheritance should be avoided except in specific cases where it provides real benefits. In many cases first class functions and other techniques are easier to understand and manage than subclassing.]

Always decide whether a class's methods and instance variables (collectively: "attributes") should be public or non-public. If in doubt, choose non-public; it's easier to make it public later than to make a public attribute non-public.

Public attributes are those that you expect unrelated clients of your class to use, with your commitment to avoid backward incompatible changes. Non-public attributes are those that are not intended to be used by third parties; you make no guarantees that non-public attributes won't change or even be removed.

We don't use the term "private" here, since no attribute is really private in Python (without a generally unnecessary amount of work).

Another category of attributes are those that are part of the "subclass API" (often called "protected" in other languages). Some classes are designed to be inherited from, either to extend or modify aspects of the class's behavior. When designing such a class, take care to make explicit decisions about which attributes are public, which are part of the subclass API, and which are truly only to be used by your base class.

With this in mind, here are the Pythonic guidelines:

  • Public attributes should have no leading underscores.
  • If your public attribute name collides with a reserved keyword, append a single trailing underscore to your attribute name. This is preferable to an abbreviation or corrupted spelling. (However, notwithstanding this rule, 'cls' is the preferred spelling for any variable or argument which is known to be a class, especially the first argument to a class method.) Note 1: See the argument name recommendation above for class methods.
  • For simple public data attributes, it is best to expose just the attribute name, without complicated accessor/mutator methods. Keep in mind that Python provides an easy path to future enhancement, should you find that a simple data attribute needs to grow functional behavior. In that case, use properties to hide functional implementation behind simple data attribute access syntax. Note 1: Properties only work on new-style classes. Note 2: Try to keep the functional behavior side-effect free, although side-effects such as caching are generally fine. Note 3: Avoid using properties for computationally expensive operations; the attribute notation makes the caller believe that access is (relatively) cheap.

Programming Recommendations

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Pyrex, Psyco, and such).

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a+=b or a=a+b. Those statements run more slowly in Jython. In performance sensitive parts of the library, the .join() form should be used instead. This will assure that concatenation occurs in linear time across various implementations.

[note: I think we can be softer about this, as we need to target more than just CPython but the performance characteristics of our particular software and hardware stack.]

Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators.

Note is and is not compare the identity of an object. == can be overridden and does more complex comparisons, and so there is a small performance penalty. There is and only will ever by one None.

Also, beware of writing "if x" when you really mean "if x is not None" -- e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!

Use class-based exceptions.

String exceptions in new code are strongly discouraged, as they will eventually (in Python 2.5) be deprecated and then (in Python 3000 or perhaps sooner) removed.

Modules or packages should define their own domain-specific base exception class, which should be subclassed from the built-in Exception class. Always include a class docstring. E.g.:

   class MessageError(Exception):
       """Base class for errors in the email package."""

Class naming conventions apply here, although you should add the suffix "Error" to your exception classes, if the exception is an error. Non-error exceptions need no special suffix.

[note: this is a strict requirement in OLPC]

When raising an exception, use "raise ValueError('message')" instead of the older form "raise ValueError, 'message'".

The paren-using form is preferred because when the exception arguments are long or include string formatting, you don't need to use line continuation characters thanks to the containing parentheses. The older form will be removed in Python 3000.

[note: also a strict requirement for OLPC]

Use string methods instead of the string module.

String methods are always much faster and share the same API with unicode strings. Override this rule if backward compatibility with Pythons older than 2.0 is required.

[note: we can be strict here. string.Template is an exception, which is the only reason the string module should be used at all.]

Use .startswith() and .endswith() instead of string slicing to check for prefixes or suffixes.

startswith() and endswith() are cleaner and less error prone. For example:

   Yes: if foo.startswith('bar'):
   
   No:  if foo[:3] == 'bar':

The exception is if your code must work with Python 1.5.2 (but let's hope not!). [note: clearly we don't]

Object type comparisons should always use isinstance() instead of comparing types directly.

   Yes: if isinstance(obj, int):
   
   No:  if type(obj) is type(1):

When checking if an object is a string, keep in mind that it might be a unicode string too! In Python 2.3, str and unicode have a common base class, basestring, so you can do:

   if isinstance(obj, basestring):

In Python 2.2, the types module has the StringTypes type defined for that purpose, e.g.:

   from types import StringTypes
   if isinstance(obj, StringTypes):

In Python 2.0 and 2.1, you should do:

   from types import StringType, UnicodeType
   if isinstance(obj, StringType) or \
      isinstance(obj, UnicodeType) :

[note: obviously we can just use basestring; though we need to be careful about distinguishing str and unicode. It is valid and perhaps preferred for us to be careful in distinguishing these two values. assert isinstance(value, unicode) is probably an assert we should use liberally]

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

   Yes: if not seq:
        if seq:
   
   No: if len(seq)
       if not len(seq)

Don't write string literals that rely on significant trailing whitespace.

Such trailing whitespace is visually indistinguishable and some editors (or more recently, reindent.py) will trim them.

[note: this only applies to multi-line/triple-quoted strings]

Don't compare boolean values to True or False

Using:

   Yes:   if greeting:
   
   No:    if greeting == True:
   
   Worse: if greeting is True:

Strings and Unicode

Generally there are three types of strings:

  1. 8-bit strings ("str") that contain binary data
  2. Unicode strings that contain textual data
  3. Encoded strings, represented as 8-bit strs, that contain textual data

The third form can cause problems. Python is encoding agnostic; the only encoding it does automatically is ASCII. When using ASCII text, an encoded and unicode string look very similar; they compare as equal, they hash to the same value, and str() and unicode() will convert cleanly between the two. Once non-ASCII text is introduced this all breaks.

We should avoid encoded strings when possible. When we expect to receive unicode strings, it is acceptable and even encouraged to do "assert isinstance(value, unicode)".


Internationalization and Localization

All user-visible strings should be translatable. You do this like:

 from gettext import gettext as _
 import getpass
 
 print _("Hello %(name)s!") % {'name': getpass.getuser()}

Note that string substitutions should be done after the translation via _(). Also, named values should be used. You may find string.Template preferable to %-based substitution; you can use it like:

 import string
 print string.Template(_("Hello $name!")).substitute(name=getpass.getuser())

[Q: are there translation domains? How does activity translation work? Are we going to monkeypatch gettext.gettext to make it work like we want for activities?]

[Q: what about dates and other localized values?]

[Q: Should we prefer string.Template over % substitution?]

Testing

[We should have some conventions, like where tests go, and some preferences around test runners and formats]


Documentation

[We should have conventions here too, like the documentation format and the location of the files. The format should probably match the format for docstrings as well.]


File Layout

[The file layout for Python packages is pretty clear. However, where should other files go? For example, should images go beside Python code? And other declarations, like XML and the activity info file]


Distribution

[Something about distutils, setup.py, setuptools, etc. Should we have a single author, an author list? Should the author email point to a support address or developer discussion list?]


Deprecations and Warnings

[This should discuss how and when you should do deprecations, and how you should raise warnings.]


Copyright

This document has been placed in the public domain.