Monday, December 20, 2010

python : don't use sys.exit() inside signal handlers

It is common to want to exit the program on handling a kill signal. But you should probably not use the standard sys.exit() function for this. Instead use the os._exit() function.

The reason is that python implements sys.exit() to throw an exception to the stack frame that was executing at the time the kill signal was received by the interpreter. If the kill signal was intercepted within a _try/_except block, control will be given back to this block and this is probably not what you intended.

This happened to me on an automated script last night, and since I wasn't aware of this feature of sys.exit(), it puzzled me a bit. The logs showed that the script was stopping, but then it kept continuing from the point where the kill interrupted it.

Here is the relevant part of the log:

running update table set somedate="2010-12-20 10:34:27" where id=4329
running update table set somedate="2010-12-20 10:34:27" where id=4330
Stopping as requested..
commiting mysql buffers
stopped
failed: update table set somedate="2010-12-20 10:34:27" where id=4330
running update table set somedate="2010-12-20 10:34:27" where id=4346
failed: update table set somedate="2010-12-20 10:34:27" where id=4346


Notice how the script just carried on from the point of interruption, but notice how everything is failing after the failed stop. The failure is due to the cleanup done in the signal handler, the db connection is closed.

Here is the stack trace at the point where the signal was received (I could get this by doing another "kill", as the _try/_except logic was particularly long and it was still stuck there, you might not be so lucky!) :

Traceback (most recent call last):
  File "/path/to/script.py", line 165, in <module>
    exec_retry(cursor,mysql,1)
  File "/path/to/script.py", line 72, in exec_retry
    time.sleep(secs)
  File "/path/to/script.py", line 59, in kill_handler
    conn.commit()
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')


This is the point where the signal was received, particularly within the time.sleep() call:

def exec_retry(cursor, cmd, secs):
    retries=0
    while retries<2:
        try:
            return cursor.execute(cmd)
    except:
            retries+=1
            time.sleep(secs)
    print "failed: %s" % cmd
    return 0


1 comment:

Anonymous said...

This might be an old post, but it solved a big issue for me. Thx!