Friday, February 27, 2015

Titan : using native hadoop libraries on MacOSX

Once you have built the native hadoop libraries on your MacOSX, you need to add this bit of code to bin/ so that it can find them:
if [ -e "${HADOOP_PREFIX}/lib/native/libhadoop.dylib" ]; then
   if [ -n "${LD_LIBRARY_PATH:-}" ]; then
       LD_LIBRARY_PATH="${LIB_PATH}:${LD_LIBRARY_PATH}"     # For Linux

   if [ -n "${DYLD_LIBRARY_PATH:-}" ]; then

The only oddity here is that the script uses "set -u" at the top, which makes bash complain if you use uninitialized variables. So you have to append ":-" to the variables that you are testing. You can see that in the lines that test LD_LIBRARY_PATH etc.

Saturday, December 20, 2014

deleting an iterator in accumulo

I learnt the hard way that setting an iterator in the accumulo shell sets it for a table permanently. To make matters worse, I set this iterator in the metadata table and made everything fail.

Removing the iterator was tricky. First I had to find what accumulo decided to call the iterator as I did not specify a name but just the java class:

Here was the command I used:

setiter -class org.apache.accumulo.core.iterators.FirstEntryInRowIterator -p 99 -scan

Here is how I found what the iterators for accumulo.metadata table were called:

config -t accumulo.metadata -f iterator

SCOPE      | NAME                                                  | VALUE

table      | table.iterator.majc.bulkLoadFilter .................. | 20,org.apache.accumulo.server.iterators.MetadataBulkLoadFilter

table      | table.iterator.majc.vers ............................ | 10,org.apache.accumulo.core.iterators.user.VersioningIterator

table      | table.iterator.majc.vers.opt.maxVersions ............ | 1

table      | table.iterator.minc.vers ............................ | 10,org.apache.accumulo.core.iterators.user.VersioningIterator

table      | table.iterator.minc.vers.opt.maxVersions ............ | 1

table      | table.iterator.scan.firstEntry ...................... | 99,org.apache.accumulo.core.iterators.FirstEntryInRowIterator

table      | table.iterator.scan.firstEntry.opt.scansBeforeSeek .. | 10

table      | table.iterator.scan.vers ............................ | 10,org.apache.accumulo.core.iterators.user.VersioningIterator

table      | table.iterator.scan.vers.opt.maxVersions ............ | 1
The iterator I added seemed to be named "table.iterator.scan.firstEntry", so I tried to delete that:
root@work accumulo.metadata> deleteiter -n table.iterator.scan.firstEntry -t accumulo.metadata

2014-12-20 15:13:42,854 [shell.Shell] WARN : no iterators found that match your criteria
You have to specify just the last part of the iterator name:
root@work accumulo.metadata> deleteiter -scan -n firstEntry -t accumulo.metadata

Wednesday, November 19, 2014

grep many files while printing file name

A useful trick I found:

find . -exec grep -n hello /dev/null {} \;

Including more than one file makes grep print the file name as well as the line number. So we use the handy /dev/null as one extra file to do the job.


Friday, May 23, 2014

Decoding HTML pages with Content-Encoding : deflate

All web servers do not implement zlib protocol the same way when they return data with Content-Encoding set to deflate. Some servers return a zlib header as specified in RFC 1950, but some return the compressed data alone.

Java Inflator class can be used to deal with both cases, but first we must check for the header. The first two bytes denote the header and it is a simple check :

    static boolean isZlibHeader(byte[] bytes) {
        //deal with java stupidity : convert to signed int before comparison
        char byte1 = (char)(bytes[0] & 0xFF);
        char byte2 = (char)(bytes[1] & 0xFF);
        return byte1 == 0x78 && (byte2 == 0x01 || byte2 == 0x9c || byte2 == 0xDA);

    private void inflateToFile(byte[] encBytes, int offset, int size, BufferedOutputStream f) throws IOException {
        Inflater inflator = new Inflater(true);
        inflator.setInput(encBytes, isZlibHeader(encBytes) ? offset+2 : offset, isZlibHeader(encBytes) ? size-2 : size);
        byte[] buf = new byte[4096];
        int nbytes = 0;
        do {
            try {
                nbytes = inflator.inflate(buf);
                if (nbytes > 0) {
            } catch (DataFormatException e) {
                //handle error
        } while (nbytes > 0);

An example URL that had to be processed this way : Here is the Wireshark capture, showing the Content-Encoding set to deflate as well as the de-chunked header (the first 2 bytes "78 9c") at the lower bottom pane of the display:

Tuesday, May 20, 2014

Mapping sockets of a process to the remote end point

Recently, one of our long running processes started exhibiting a high number of open file handles. We were leaking handles somewhere. The first thing is to figure out what handles are open, which is easy in Linux with /proc. Just plug in the PID of your process:

ls -ltr /proc/21657/fd

This spits out all the open file handles for the process with PID 21657. Here is an example of an open socket:

lrwx------ 1 user user 64 May 20 13:20 649 -> socket:[2336308491]

This alone doesn't tell us much. Our application use sockets for many reasons. There are connections to mysql, memcache and mongodb. There are sockets listening and responding to requests. There are connections made to web servers.

To get an idea of the two end points of the socket, we need to look at /proc/net/tcp (as well as tcp6, udp, udp6) :

user@host ~$ cat /proc/net/tcp6 | grep 2336308491
 129: 0000000000000000FFFF00004D29650A:9030 0000000000000000FFFF00005881754A:01BB 08 00000000:00000001 00:00000000 00000000   237        0 2336308491 1 ffff81061cf60740 371 40 0 4 -1

This is a connection to a SSL port on the remote end; 0x01BB = 443

Tuesday, January 14, 2014

telnet relocation error: symbol krb5int_labeled_fopen

Ever had telnet not work on a machine? It happened to me recently, on a Centos 5.8, with this error message:

telnet: error: relocation error: symbol krb5int_labeled_fopen, version krb5support_0_MIT not defined in file with link time reference (fatal)

Googling hinted at a conflict in the shared library providing kerberos authentication. So I ran telnet under LD_DEBUG=all :

LD_DEBUG=all telnet host port

which, showed me the problem:

29655:    symbol=krb5int_labeled_fopen;  lookup in file=/usr/local/greenplum-db/lib/ [0]
29655:    telnet: error: relocation error: symbol krb5int_labeled_fopen, version krb5support_0_MIT not defined in file with link time reference (fatal)

So, an installation of greenplum had inserted its version of the kerberos library ahead of the search path for libraries the linux loader uses. The kerberos version of the library did not export the said function.

This all can be verified quickly :

login@host ~$ nm -D /usr/local/greenplum-db/lib/ | grep krb5int_labeled_fopen

login@host ~$ nm -D /usr/lib64/ | grep krb5int_labeled_fopen
00000033aea040b0 T krb5int_labeled_fopen

The greeenplum installation was using the LD_LIBRARY_PATH to allow it preferential status, so inserting the /usr/lib64 before it, was sufficient to help telnet find the right library.

login@host ~$ export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH

login@host ~$ telnet host port
Trying host...
Connected to host (ip).
Escape character is '^]'.

Friday, November 22, 2013

A case of Occam's razor

I wanted to write about a seemingly bizarre issue to do with a web page fetch that ultimately proved to be none other than another validation of the Occam's razor, which is simply that the simplest explanation to a problem is generally the right one.

So, to give some background, I'm involved in doing some statistical calculations over a large number of web pages and this has the side effect of highlighting web pages that deviate from the norm. So I end up going through many web pages that stand out from the pack at first glance.

The fetcher I use talks HTTP directly, and deals leniently with the web servers out there that don't always implement HTTP according to spec. On this particular occasion, one web site : responded to the fetcher with content that was nowhere close to what the browser retrieved.

Let me post here what the HTML looked like:

<html lang="en">
    <title>PHP Application - AWS Elastic Beanstalk</title>
    <link href="" rel="stylesheet" type="text/css"></link>
    <link href="" rel="icon" type="image/ico"></link>
    <link href="" rel="shortcut icon" type="image/ico"></link>
    <!--[if IE]><script src=""></script><![endif]-->
    <link href="/styles.css" rel="stylesheet" type="text/css"></link>
    <section class="congratulations">
Your AWS Elastic Beanstalk <em>PHP</em> application is now running on your own dedicated environment in the AWS&nbsp;Cloud<br />

        You are running PHP version 5.4.20<br />


    <section class="instructions">
What's Next?</h2>
<li><a href="">AWS Elastic Beanstalk overview</a></li>
<li><a href="">Deploying AWS Elastic Beanstalk Applications in PHP Using Eb and Git</a></li>
<li><a href="">Using Amazon RDS with PHP</a>
<li><a href="">Customizing the Software on EC2 Instances</a></li>
<li><a href="">Customizing Environment Resources</a></li>
AWS SDK for PHP</h2>
<li><a href="">AWS SDK for PHP home</a></li>
<li><a href="">PHP developer center</a></li>
<li><a href="">AWS SDK for PHP on GitHub</a></li>

    <!--[if lt IE 9]><script src=""></script><![endif]-->

This is nowhere close to the HTML retrieved by the browser. You can try it. The web page is about hair products.

My experience is that sometimes, based on the HTTP headers and originating IP, some web servers can return different content. Sometimes, the server has identified an IP as a bot and decided to return an error response or an outright wrong page.

So I tested the theory of the IP by running the fetcher from a different network, with a different outgoing IP. This time, the correct page was retrieved. Then I used curl to retrieve the page from the same network that had given me the incorrect page. To my surprise, curl retrieved the correct page. curl got the correct page from both networks.

This was quite puzzling. I thought that perhaps the web server might have done some sophisticated finger printing and thus having identified the User Agent and maybe other headers the fetcher was using had decided to send it a wrong page.

So using wireshark, I captured all the HTTP headers sent by the fetcher. Another team member then used curl, specifying these same headers.

curl -H 'User-Agent: rtw' -H 'Host:' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-us,en;q=0.5'  -H 'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' -H 'Keep-Alive: 115' -H 'Connection: keep-alive' -H 'Accept-Encoding: gzip,deflate'

I was positive that curl would then fail. But of course it still returned the correct page. So my theory of the sophisticated finger printing was wrong - or maybe it was even more sophisticated that I thought. I was stumped.

And then I realized, that I had missed looking at a very crucial piece of data in this whole operation. The IP the fetcher used to get the page. The first thing the fetcher does is to resolve the IP and since the DNS query can be expensive and we do lots of those, the IP is retrieved from a memcached instance if it is available. An IP may be cached for a number of hours. From the fetcher logs, I could see the IP that it was using:

DNS resolved from cache -> /

But as dig showed, that was the incorrect IP :

>>$ dig
; <<>> DiG 9.3.6-P1-RedHat-9.3.6-20.P1.el5 <<>>
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28108
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 4, ADDITIONAL: 4

;    IN    A

;; ANSWER SECTION: 300 IN    CNAME 60 IN    A 60 IN    A

;; AUTHORITY SECTION: 1703 IN    NS 1703 IN    NS 1703 IN    NS 1703 IN    NS

;; ADDITIONAL SECTION:    92612    IN    A    92612    IN    A    92612    IN    A 92510    IN    A

;; Query time: 11 msec
;; WHEN: Fri Nov 22 12:40:20 2013
;; MSG SIZE  rcvd: 345

All that remained now was to validate this - far simpler - hypothesis. It was trivial to do so, all I had to do was remove the domain->IP maping from memcached.

>>$ telnet localhost 11211
Connected to localhost.localdomain (
Escape character is '^]'.
VALUE 4096 4
Connection closed by foreign host.

This time, the fetcher logs showed that indeed, it was picking the correct IP. And of course it fetched the correct page with all the hair product details.

DNS resolved -> /

So once again, I was reminded of the Occam's Razor and how important it is to

1. Remember all the assumptions we make about how a certain software system works.
2. Validate all the assumptions, starting with the simplest first.

 Happy debugging the Net!