Tuesday, February 26, 2013

Escaping strings in bash

Good way to escape strings in bash.

Using this, I generated a script to push a number of URLs to a redis queue :

redis-cli LPUSH our_queue http\:\/\/zuvypyzulogu\.wordpress\.com\/2013\/02\/20\/bist\-du\-bei\-mir\-music\-download 
redis-cli LPUSH our_queue http\:\/\/zuvypyzulogu\.wordpress\.com\/2013\/02\/20\/top\-10\-free\-music\-download\-sites 
redis-cli LPUSH our_queue http\:\/\/zwingliusredivivus\.wordpress\.com\/2012\/02\/09\/where\-marc\-sees\-cause\-to\-lament\-i\-see\-reason\-to\-rejoice 
redis-cli LPUSH our_queue http\:\/\/zwingliusredivivus\.wordpress\.com\/2013\/02\/26\/a\-word\-to\-the\-emergents 
redis-cli LPUSH our_queue http\:\/\/zwischen\-uns\.forumactif\.com\/post 
redis-cli LPUSH our_queue http\:\/\/zx6r\.com\/zx6r\/19061\-normal\-tempature\-07\-zx6r\.html 
redis-cli LPUSH our_queue http\:\/\/zx6r\.com\/zx6r\/23572\-09\-zx6\-hid\-kit\.html redis-cli LPUSH our_queue http\:\/\/zx6r\.com\/zx6r\/9351\-06\-636\-build\-start\.html 
redis-cli LPUSH our_queue http\:\/\/zyngadeutschland\.wordpress\.com\/2013\/02\/26\/farmville\-2\-neue\-limitierte\-auflage\-kelten 
redis-cli LPUSH our_queue http\:\/\/zzmtokg\.wordpress\.com\/2011\/07\/09\/best\-price\-stok\-fyr\-torch\-for\-less  

Monday, February 18, 2013

Perl : Use Text::CSV instead of split for parsing CSV lines

When you have lines which have quoted fields delimited by commas, it is often easy to use the perl native split command to parse:

 perl -ne '@x=split(",");' file.csv  
But this will not parse this type of line :


Everything starting from "http://" and including the closing brace is a single field.

But within this field, we find the "," separator. This means that when our perl split does its work, we get the following as the 2nd field in the line:


However, Text::CSV perl module is smart enough to recognize certain sub formats within the quoted fields and parse accordingly. It identifies the opening brace and searches for a closing brace, passing over any delimiters in between.

Here is an example:

cat /tmp/x | perl -ne 'BEGIN { use Text::CSV; $csv=Text::CSV->new();} chomp; $csv->parse($_); @x=$csv->fields(); for $x (@x) {print "$x ## "}; print "\n";'
2013-02-15 ## 478944 ##  http://cdn.springboard.gozammer.com/mediaplayer/springboard/mediaplayer.swf?config={"externalConfiguration":"http://www.springeagle.com/superconfig/sgv014.js","playlist":"http://cms.springeagle.com/xml_feeds_advanced/index/683/rss3/668219/0/0/5/"} ## 1 ## 0 ##  ## 

Wednesday, February 06, 2013

Tomcat : The Init loop

Tomcat can be configured so that the servlets initialize upon server startup, so that they are ready and primed for the subsequent GET requests. But if the init() method fails for whatever reason, then init() will be called again by Tomcat, upon a subsequent GET request.

It is not clear why Tomcat developers chose this method of calling init() again on a GET request. It may be that they wanted to coax the servlet to initialize on the second attempt, even if it failed upon startup. The reasoning might have been that the first failure was due to a race condition. Another reason I've seen posted is that this way, tomcat can return the full error with a stack trace from the init() call to the browser. The reasoning is that this will make it easier for a developer to fix the init() error as it is immediately visible upon the browser. (vs having to find it in the catalina.out file)

Well, neither of these explanations can be accepted without some misgivings. First, if the design goal of the servlet container was to reduce the chance of a race condition from affecting the serving of requests, the init() call could be repeated a configured number of times upon startup. Secondly, Tomcat could have been designed to store any error from init() locally, so that it can be returned upon the next GET request, without making another call to init().

Our complaints on the design of Tomcat server is not so much pedantic. If the servlet init() routine in fact has a race condition, this design of Tomcat, rather than resolving the race, under certain conditions, can cause Tomcat to race in a never ending series of calls to init().

In fact, this very thing happened recently on our production servers.

First, there was an extremely rare race that hit a portion of the init() code on the servlet. This caused init() to fail. Then tomcat dutifully called init() again, but this time, a component that had been created before the init() failure last time, threw an Exception as it was already created. (It was a singleton object). Now, since init() fails again, for a different reason, the next GET will make Tomcat call init() again, and again, init() will fail. This throws the server into an endless init() loop.

The problem is that now, the part of the code where we first encountered the race, never gets hit. So, the server is up without being properly initialized, and in any case, init() will never succeed as the singleton object will always throw an exception. So none of the GET requests will be served.

We can think of other unintended consequences also, due to this design. What if the servlet was accumulating some sort of a list in memory from the database upon startup? Calling init() more than once may increase the list size and possibly lead to bugs.

And then, what if the first error corrupted some data structure - or even some data on permanent storage ? Even if the second init() succeeded, the server might be corrupt at this point.

JDO : Beware when closing PreparedStatement Objects

Once you are done with a PreparedStatement, it is good practice to close it. But pain awaits you in unexpected moments, if you close a statement one too many times.

In standard code using JDO, it is not uncommon to find ResultSet and PreparedStatement objects that do not get closed. Things will work for a while before memory issues force developers to clean these up. This is what happened on our production systems recently. Unfortunately, we were somewhat overzealous and at one point, closed a statement twice. It ran on Q/A without a problem, so unnoticed to the gatekeepers, it slipped into production.

And there it crashed and burned, not in the close statement as one would imagine, but in stmt.get call with the following stack trace:
Processor.processFile: problem 
processing data from filename: /path/to/something.csv on:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after statement closed.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
        at com.mysql.jdbc.Util.getInstance(Util.java:384)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1015)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:929)
        at com.mysql.jdbc.StatementImpl.checkClosed(StatementImpl.java:406)
        at com.mysql.jdbc.ServerPreparedStatement.checkClosed(ServerPreparedStatement.java:546)
        at com.mysql.jdbc.ServerPreparedStatement.setLong(ServerPreparedStatement.java:2037)
        at com.solarmetric.jdbc.DelegatingPreparedStatement.setLong(DelegatingPreparedStatement.java:397)
        at com.solarmetric.jdbc.PoolConnection$PoolPreparedStatement.setLong(PoolConnection.java:448)
        at com.solarmetric.jdbc.DelegatingPreparedStatement.setLong(DelegatingPreparedStatement.java:397)
        at com.solarmetric.jdbc.DelegatingPreparedStatement.setLong(DelegatingPreparedStatement.java:397)
        at com.solarmetric.jdbc.DelegatingPreparedStatement.setLong(DelegatingPreparedStatement.java:397)
        at com.solarmetric.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.setLong(LoggingCo
        at com.solarmetric.jdbc.DelegatingPreparedStatement.setLong(DelegatingPreparedStatement.java:397)
        at com.solarmetric.jdbc.DelegatingPreparedStatement.setLong(DelegatingPreparedStatement.java:397)
        at ourcode.getData(Manager.java:5435)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at ourcode.JDOProxy.invoke(JDOProxy.java:198)
        at $Proxy5.getData(Unknown Source)
        at ourcode.Processor.writeData(Processor.java:3185)
This is due to a bug in the mysql connector where if a statement is closed twice, and we create another statement (PreparedStatement) again with the same sql string, it retrieves the closed PreparedStatement object from an internal cache. Then any attempt to use that statement results in an exception. Here are the details of the mysql bug

Here is a bit of code showing the double close. If this function is called twice, it will throw the exception on the second call :

    private static void badsql(PersistenceManager pm) {
        Connection conn=null;
        ResultSet results =null;
        PreparedStatement stmt=null;

        try {
            conn = QUtil.getConn(pm);

            String[] urls = new String[] {"blingbling.com", "singsing.com", "soso.com"};

            stmt = conn.prepareStatement("select numhits from facttable where name = ?");

            for (String url : urls) {
                stmt.setString(1, url);
                results = stmt.executeQuery();
                if (results.next()) {
                    int numTerms = results.getInt(1);
                    System.out.println(url + "=>" + numTerms);
            //first close

        } catch (Exception e) {

        } finally {
            //Here is the second close

The problematic function was being called once for a large batch of records. Since there was not enough records in the Q/A environment, it was called just once and thus we never found the bug in Q/A.