Wednesday, December 29, 2010

bash one liner : find process start time

Sometimes you want to find the start time of a process. You might want to check the start time of a particular process running on many Linux servers. Rather than logging into each machine, we can use password-less ssh to do this from one machine. But first, we need to craft the bash one liner to do this on one particular machine.

Say, you want to find out the start time of process called 'foobar'. This is what you could do:

ps auxww | grep foobar | grep -v '/bin/sh' | grep -v grep | tr -s '\t' ' ' | cut -f 9 -d ' '

Notice I use "grep -v" to eliminate certain processes that are not relevant. I omit the process that starts foobar ("/bin/sh"). I also omit the "grep" command we are using from the output. If your process is not started explicitly by the shell, you need not do the former, but filtering out "grep" is always useful.

The interesting parts are to the end of the command line. We are looking for the 9th column which has the "start time" of the process. However, since the "ps" output may have multiple tab characters separating the columns, we need to convert multiple tabs to a single tab or a space. Here I have chosen to use the "tr" command to convert multiple repeating tabs to a single space.

Now that we have this handy command, and you are tasked with checking the process start time across a dozen or more machines, it is simple enough to wrap this in a nice one-liner bash loop:

for m in host1 host2 host3 host4 ; do echo $m; ssh $m "ps auxww | grep foobar | grep -v '/bin/sh' | grep -v grep | tr -s '\t' ' ' | cut -f 9 -d ' '"  ; done

Monday, December 20, 2010

python : don't use sys.exit() inside signal handlers

It is common to want to exit the program on handling a kill signal. But you should probably not use the standard sys.exit() function for this. Instead use the os._exit() function.

The reason is that python implements sys.exit() to throw an exception to the stack frame that was executing at the time the kill signal was received by the interpreter. If the kill signal was intercepted within a _try/_except block, control will be given back to this block and this is probably not what you intended.

This happened to me on an automated script last night, and since I wasn't aware of this feature of sys.exit(), it puzzled me a bit. The logs showed that the script was stopping, but then it kept continuing from the point where the kill interrupted it.

Here is the relevant part of the log:

running update table set somedate="2010-12-20 10:34:27" where id=4329
running update table set somedate="2010-12-20 10:34:27" where id=4330
Stopping as requested..
commiting mysql buffers
stopped
failed: update table set somedate="2010-12-20 10:34:27" where id=4330
running update table set somedate="2010-12-20 10:34:27" where id=4346
failed: update table set somedate="2010-12-20 10:34:27" where id=4346


Notice how the script just carried on from the point of interruption, but notice how everything is failing after the failed stop. The failure is due to the cleanup done in the signal handler, the db connection is closed.

Here is the stack trace at the point where the signal was received (I could get this by doing another "kill", as the _try/_except logic was particularly long and it was still stuck there, you might not be so lucky!) :

Traceback (most recent call last):
  File "/path/to/script.py", line 165, in <module>
    exec_retry(cursor,mysql,1)
  File "/path/to/script.py", line 72, in exec_retry
    time.sleep(secs)
  File "/path/to/script.py", line 59, in kill_handler
    conn.commit()
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')


This is the point where the signal was received, particularly within the time.sleep() call:

def exec_retry(cursor, cmd, secs):
    retries=0
    while retries<2:
        try:
            return cursor.execute(cmd)
    except:
            retries+=1
            time.sleep(secs)
    print "failed: %s" % cmd
    return 0


Friday, December 10, 2010

use perl BEGIN / END blocks for summations

Various Perl one-liners are very useful in data manipulation. the "-ne" mode in perl allows the command specified to be run over each line of stdin. However, if you want to do a summation and only print the final tally, you can make use of the BEGIN / END blocks in Perl. Initialize the counter in the BEGIN block, print the sum in the END block.

Say, there is a file of numbers called "nums" , each number seperated by a newline, and we want to sum the numbers:

cat nums | perl -ne 'BEGIN{$s=0;} chomp; $s+=$_; END {print "$s\n"}'