Friday, February 18, 2011

Counting heteroatoms in Python: try/except vs. if

I found this answer on the Blue Obelisk eXchange interesting and, from the comments, seemingly controversial. The answer states that in Python using try/except with an assert could be faster than just an if statement. Intuitively I felt that try/except would be much slower than a simple if statement but I wanted to know for sure. So I opened my editor, wrote some code, and found out which actually performed better.

I used the chemkit library's Atom::isHeteroatom() method to perform the counting of the heteroatoms. For the timing I used Python's timeit module.

The code reads the 753 molecules from the MMFF validation suite's test file MMFF94_hypervalent.mol2 and then uses the timeit.Timer class to measure the execution time of counting the total number of heteroatoms.

Here is the code:
import timeit
import chemkit

# Counts the number of heteroatoms in the molecule
# using an if statement
def heteroatom_count_if(molecule):
    count = 0

    for atom in molecule.atoms():
        if atom.isHeteroatom():
            count += 1

    return count

# Counts the number of heteroatoms in the molecule
# using a try/except statement
def heteroatom_count_try(molecule):
    count = 0

    for atom in molecule.atoms():
        try:
            assert(atom.isHeteroatom())
            count += 1
        except:
            continue

    return count

# Counts the total number of heteroatoms in the list
# of molecules using the given heteroatom_function
def count(molecules, heteroatom_function):
    count = 0

    for molecule in file.molecules():
        count += heteroatom_function(molecule)

    return count

if __name__ == '__main__':
    # read molecules from file
    file = chemkit.ChemicalFile("MMFF94_hypervalent.mol2")
    if not file.read():
        print 'Error reading file: ' + file.errorString()
        exit()

    # list of molecules
    molecules = file.molecules()

    # measure heteroatom_count_if
    t = timeit.Timer("count(molecules, heteroatom_count_if)", 
                     "from __main__ import count, molecules, heteroatom_count_if")

    print 'heteroatom_count_if time: ' + str(t.timeit(500))

    # measure heteroatom_count_try
    t = timeit.Timer("count(molecules, heteroatom_count_try)", 
                     "from __main__ import count, molecules, heteroatom_count_try")

    print 'heteroatom_count_try time: ' + str(t.timeit(500))
And here are the results:
$ python heteroatoms.py
heteroatom_count_if time: 8.65832996368

heteroatom_count_try time: 17.5534369946
So we see that the if statement version is roughly twice as fast as the try/except version. That was my intuition, but it's good to be sure.

3 comments:

  1. Did you add your stats to that BOx answer?

    ReplyDelete
  2. @Egon: Ya, I left a comment on matteo's first answer.

    ReplyDelete
  3. As another note: One should never use the assert statement in Python for purposes other than runtime program integrity checks because the -O option (compile python modules with optimization) discards assert statements!

    ReplyDelete