An optimization was introduced in JDK 1.7, but if your split character happens to have special meaning in a regular expression (ex: ^ |), then the optimization will not apply.
I used org.apache.commons.lang.StringUtils.split to gain a roughly 3X advantage over the split call used in our servers.
Here is the performance test:
import org.apache.commons.lang.StringUtils;
public class TSplit {
public static void main(String[] args) {
if (args.length==0) {
System.err.println("TSplit jdk|nojdk");
System.exit(-1);
}
String var = "here|is|a|string|that|must|be|split";
if (args[0].compareTo("jdk")==0) {
for (int i=0;i<10000000;i++) {
String[] splits = var.split("\\|");
}
} else {
for (int i=0;i<10000000;i++) {
String[] splits = StringUtils.split(var, '|');
}
}
}
}
The results from the test :
[~/] time java -cp `echo /path/to/jars/*.jar|tr ' ' :` TSplit jdk
real 0m16.027s
user 0m16.245s
sys 0m0.412s
[~/] time java -cp `echo /path/to/jars/*.jar|tr ' ' :` TSplit nojdk
real 0m5.354s
user 0m5.395s
sys 0m0.304s
[~/]
As this post shows, Users who encountered these problems pre-1.7 have sometimes hacked their code to even pre-compile the single split character to a regular expression. This unfortunately means, that if and when they upgrade to 1.7, the optimization that Sun added will have no effect.
No comments:
Post a Comment