Thursday, October 30, 2008

Java string replace is slow

Java String.replace uses regular expressions and can be slow, especially for large strings requiring many replacements.

If you don't need regular expression replacement, you can use this function I wrote that performs better.

static public String replaceString(String orig, String search, String replace) {
  int cap = replace.length() > search.length() ? orig.length() + (replace.length() - search.length())*20 : orig.length();
  StringBuilder out = new StringBuilder(cap);

  int prev = 0;
  CharSequence okPart;
  for (int i=orig.indexOf(search); i!=-1; i=orig.indexOf(search, prev)) {
    okPart = orig.subSequence(prev, i);
    out.append(okPart).append(replace);
    prev = i + search.length();
  }
  if (prev < orig.length()) {
    okPart = orig.subSequence(prev, orig.length());
    out.append(okPart);
  }
  return out.toString();
}

Here is the performance data with Java String.replace:



Here is the performance data with the new function:



Interpreting the data:

With String.replace, time inside doFlowAction() was broken down 71/29% for enQueueWBImpression/getCPCWidgetCache. The hottest function inside getCPCWidgetCache was String.replace that accounted for 93% of the time.

With the new replacement function,
With String.replace, time inside doFlowAction() was broken down 89/11% for enQueueWBImpression/getCPCWidgetCache. This reduction of time spent inside getCPCWidgetCache (a 62% gain) is due to the replacement of the String.replace with the new string function.

3 comments:

Anonymous said...

What a load of baloney! String.replace(...) does NOT use regex.

I stopped reading your article after that very first sentence.

Anonymous said...

Well, well, would you like to comment on uncle_alice's or prometheuzz replies to your spam message on http://forums.sun.com/thread.jspa?messageID=10486754 ?

Probably not.

Anonymous said...

Just wanted to mention that i came across this after trying to find out why replace as so slow. I thought replace() used regex as well, since if you look at the source for the replace() in the Java SDK it says:

"return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(this).replaceAll(Matcher.quoteReplacement(replacement.toString()));"

But anyway, i followed the forum link to the Sun forums, and played with the code "prometheuzz" posted. And indeed, for very long strings, the inbuilt java replace() is quicker, but for text less than 400 characters, a standard loop is quicker. My test results were like this:

TESTING: 10
replaceString(...) took 294 ms.
String.replace(...) took 1457 ms.
TESTING: 50
replaceString(...) took 251 ms.
String.replace(...) took 1330 ms.
TESTING: 100
replaceString(...) took 475 ms.
String.replace(...) took 1419 ms.
TESTING: 200
replaceString(...) took 899 ms.
String.replace(...) took 2444 ms.
TESTING: 300
replaceString(...) took 1893 ms.
String.replace(...) took 1773 ms.
TESTING: 400
replaceString(...) took 1984 ms.
String.replace(...) took 1817 ms.
TESTING: 500
replaceString(...) took 2452 ms.
String.replace(...) took 1998 ms.

For me, using a method like yours saved me quite a few seconds while parsing a file, since the text was less than 400 characters long. I ended up putting a conditional in that checks if the test is less than 400, use a method like yours. Thanks!