Friday, May 23, 2014

Decoding HTML pages with Content-Encoding : deflate

All web servers do not implement zlib protocol the same way when they return data with Content-Encoding set to deflate. Some servers return a zlib header as specified in RFC 1950, but some return the compressed data alone.

Java Inflator class can be used to deal with both cases, but first we must check for the header. The first two bytes denote the header and it is a simple check :

    static boolean isZlibHeader(byte[] bytes) {
        //deal with java stupidity : convert to signed int before comparison
        char byte1 = (char)(bytes[0] & 0xFF);
        char byte2 = (char)(bytes[1] & 0xFF);
        return byte1 == 0x78 && (byte2 == 0x01 || byte2 == 0x9c || byte2 == 0xDA);

    private void inflateToFile(byte[] encBytes, int offset, int size, BufferedOutputStream f) throws IOException {
        Inflater inflator = new Inflater(true);
        inflator.setInput(encBytes, isZlibHeader(encBytes) ? offset+2 : offset, isZlibHeader(encBytes) ? size-2 : size);
        byte[] buf = new byte[4096];
        int nbytes = 0;
        do {
            try {
                nbytes = inflator.inflate(buf);
                if (nbytes > 0) {
            } catch (DataFormatException e) {
                //handle error
        } while (nbytes > 0);

An example URL that had to be processed this way : Here is the Wireshark capture, showing the Content-Encoding set to deflate as well as the de-chunked header (the first 2 bytes "78 9c") at the lower bottom pane of the display:

Tuesday, May 20, 2014

Mapping sockets of a process to the remote end point

Recently, one of our long running processes started exhibiting a high number of open file handles. We were leaking handles somewhere. The first thing is to figure out what handles are open, which is easy in Linux with /proc. Just plug in the PID of your process:

ls -ltr /proc/21657/fd

This spits out all the open file handles for the process with PID 21657. Here is an example of an open socket:

lrwx------ 1 user user 64 May 20 13:20 649 -> socket:[2336308491]

This alone doesn't tell us much. Our application use sockets for many reasons. There are connections to mysql, memcache and mongodb. There are sockets listening and responding to requests. There are connections made to web servers.

To get an idea of the two end points of the socket, we need to look at /proc/net/tcp (as well as tcp6, udp, udp6) :

user@host ~$ cat /proc/net/tcp6 | grep 2336308491
 129: 0000000000000000FFFF00004D29650A:9030 0000000000000000FFFF00005881754A:01BB 08 00000000:00000001 00:00000000 00000000   237        0 2336308491 1 ffff81061cf60740 371 40 0 4 -1

This is a connection to a SSL port on the remote end; 0x01BB = 443