Friday, August 14, 2009

remove bing links from content sites



Lately, I have seen numerous "bing" links appearing in certain content sites I frequent. A case in point is http://articles.moneycentral.msn.com

It is somewhat insidious as unless I happen to note it is a search link, I pursue it imagining it will take me to some good content.

So I wrote a GreaseMonkey script to disable those links. Here goes:

// ==UserScript==
// @name test
// @namespace http://userscripts.org/thushara/
// @include http://articles.moneycentral.mn.com/*
// ==/UserScript==
var allLinks, thisLink;
allLinks = document.evaluate(
'//a[@href]',
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
for (var i = 0; i < allLinks.snapshotLength; i++) {
thisLink = allLinks.snapshotItem(i);
// do something with thisLink
if (thisLink.href.substring(0,19)=="http://www.bing.com") {
thisLink.removeAttribute("href");
}
}


I'm not too familiar with XPath, but I believe it is possible to specify "http://www.bing.com" inside the XPath query itself, so that there is no need to iterate through all the links finding the search links. I couldn't get the syntax right for this. Please post if you find a way around this.

Thursday, July 30, 2009

groovy, mysql and case sensitivity

Groovy seems to have a somewhat hard to grok policy on case sensitivity as it pertains to mysql columns. To illustrate, for a table apiaccess with a column Domain this fails as of groovy version 1.6.3:

  query = "select domain from apiaccess where apiaccessid=3052";
row = sql.firstRow(query);
dom = row.Domain;


The reason is that the Domain in row.Domain does not match regards case with domain in the filter string.

This works:

  query = "select domain from apiaccess where apiaccessid=3052";
row = sql.firstRow(query);
dom = row.domain;


However, on an earleir version (perhaps the RC1 candidate of 1.6), the first block of code worked. There groovy expected a match with the actual mysql column name vs the filter string.

On both versions, the case used in the filter string do not need to match the mysql column names with regards to case.

Tuesday, June 30, 2009

mv is not atomic in Mac OS

you shouldn't rely on `mv` being atomic on the regular file system under MacOS. i had a script that had to regularly update a file that is read by a different script. under this scenario i resorted to writing a temporary file and then `mv`ing the file to the permanent location. while this works for linux, it doesn't work for MacOS.

to demonstrate, open two command windows in you Mac and in one type this:

while true; do echo this better be a whole sentence > x1.txt; mv x1.txt x.txt; done


on the other, run this script:

while true
do
F=`cat x.txt`
echo $F
if [ "$F" = "this better be a whole sentence" ]
then
echo ok
else
echo bad
exit -1
fi
done


notice the output:

mpire@brwdbs02:~$ ./x.sh
this better be a whole sentence
ok
this better be a whole sentence
ok
ok
this better be a whole sentence
ok
this better be a whole sentence
ok
cat: x.txt: No such file or directory

bad
[~]


bad
mpire@brwdbs02:~$

Performance Improvement in org.apache.hadoop.io.Text class


I wrote earlier on a performance improvement I made to Hadoop. Upon discussing with Hadoop devs, notably Chris Douglas, this change was made to the core org.apache.hadoop.io.Text class. This has the additional benefit of improving a core text handling class used commonly in Hadoop, and we avoid the additional memory foot-print created by having an additional instance of OutputStream.

This improvement will be available in hadoop 0.21.0:

Note the difference in YourKit profiling data with the new Text class:

Thursday, June 25, 2009

running Hadoop tests


install jdk 1.5 and Apache Forrest. then,
run this command:

ant -Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home/ -Dforrest.home=/Users/thushara/apache-forrest-0.8 -Djavac.args="-Xlint -Xmaxwarns 1000" clean test tar

Monday, June 22, 2009

bash: single line for loop

run commands on multiple files at once:

[~/hadoop-src] for f in *; do echo $f; done
common
hdfs
mapreduce
[~/hadoop-src]for f in *; do svn up $f/trunk; done
At revision 787534.

Fetching external item into 'hdfs/trunk/src/test/bin'
External at revision 787534.

At revision 787534.

Fetching external item into 'mapreduce/trunk/src/test/bin'
External at revision 787534.

At revision 787534.
[~/hadoop-src]

Friday, June 19, 2009

date -d is different from Linux to Mac


familiar with:

date -d '1 hour ago'

well, it will work on the Linux command line, but no such luck on the Mac

here is the code you need if your script is to work on both OSes:

OS=`uname -a`
if [[ $OS == Darwin* ]]
then
TODAY=`date -v-1H +"%Y-%m-%d.%H"`
else
TODAY=`date -d '1 hour ago' +"%Y-%m-%d.%H"`
fi