Java Specialists' Java Training Europehome of the java specialists' newsletter

The Java Specialists' Newsletter
Issue 0682003-04-21 Category: Performance Java version:

GitHub Subscribe Free RSS Feed

Appending Strings

by Dr. Heinz M. Kabutz

Welcome to the 68th edition of The Java(tm) Specialists' Newsletter, sent to 6400 Java Specialists in 95 countries.

Since our last newsletter, we have had two famous Java authors join the ranks of subscribers. It gives me great pleasure to welcome Mark Grand and Bill Venners to our list of subscribers. Mark is famous for his three volumes of Java Design Patterns books. You will notice that I quote Mark in the brochure of my Design Patterns course. Bill is famous for his book Inside The Java Virtual Machine. Bill also does a lot of work training with Bruce Eckel.

Our last newsletter on BASIC Java produced gasps of disbelief. Some readers told me that they now wanted to unsubscribe, which of course I supported 100%. Others enjoyed it with me. It was meant in humour, as the warnings at the beginning of the newsletter clearly indicated.

Join us on Crete (or via webinar) for advanced Core Java Courses:Concurrency Specialists Course 1-4 April 2014 and Java Specialists Master Course 20-23 May 2014.

Appending Strings

The first code that I look for when I am asked to find out why some code is slow is concatenation of Strings. When we concatenate Strings with += a whole lot of objects are constructed.

Before we can look at an example, we need to define a Timer class that we will use for measuring performance:

/**
 * Class used to measure the time that a task takes to execute.
 * The method "time" prints out how long it took and returns
 * the time. 
 */
public class Timer {
  /**
   * This method runs the Runnable and measures how long it takes
   * @param r is the Runnable for the task that we want to measure
   * @return the time it took to execute this task
   */
  public static long time(Runnable r) {
    long time = -System.currentTimeMillis();
    r.run();
    time += System.currentTimeMillis();
    System.out.println("Took " + time + "ms");
    return time;
  }
}

In the test case, we have three tasks that we want to measure. The first is a simple += String append, which turns out to be extremely slow. The second creates a StringBuffer and calls the append method of StringBuffer. The third method creates the StringBuffer with the correct size and then appends to that. After I have presented the code, I will explain what happens and why.

public class StringAppendDiff {
  public static void main(String[] args) {
    System.out.println("String += 10000 additions");
    Timer.time(new Runnable() {
      public void run() {
        String s = "";
        for(int i = 0; i < 10000; i++) {
          s += i;
        }
        // we have to use "s" in some way, otherwise a clever
        // compiler would optimise it away.  Not that I have
        // any such compiler, but just in case ;-)
        System.out.println("Length = " + s.length());
      }
    });

    System.out.println(
        "StringBuffer 300 * 10000 additions initial size wrong");
    Timer.time(new Runnable() {
      public void run() {
        StringBuffer sb = new StringBuffer();
        for(int i = 0; i < (300 * 10000); i++) {
          sb.append(i);
        }
        String s = sb.toString();
        System.out.println("Length = " + s.length());
      }
    });

    System.out.println(
        "StringBuffer 300 * 10000 additions initial size right");
    Timer.time(new Runnable() {
      public void run() {
        StringBuffer sb = new StringBuffer(19888890);
        for(int i = 0; i < (300 * 10000); i++) {
          sb.append(i);
        }
        String s = sb.toString();
        System.out.println("Length = " + s.length());
      }
    });
  }
}

This program does use quite a bit of memory, so you should set the maximum old generation heapspace to be quite large, for example 256mb. You can do that with the -Xmx256m flag. When we run this program, we get the following output:

String += 10000 additions
Length = 38890
Took 2203ms
StringBuffer 300 * 10000 additions initial size wrong
Length = 19888890
Took 2254ms
StringBuffer 300 * 10000 additions initial size right
Length = 19888890
Took 1562ms

You can observe that using StringBuffer directly is about 300 times faster than using +=. Another observation that we can make is that if we set the initial size to be correct, it only takes 1562ms instead of 2254ms. This is because of the way that java.lang.StringBuffer works. When you create a new StringBuffer, it creates a char[] of size 16. When you append, and there is no space left in the char[] then it is doubled in size. This means that if you size it first, you will reduce the number of char[]s that are constructed.

The time that the += String append takes is dependent on the compiler that you use to compile the code. I discovered this accidentally during my Java course last week, and much to my embarrassment, I did not know why this was. If you compile it from within Eclipse, you get the result above, and if you compile it with Sun's javac, you get the output below. I think that Eclipse uses jikes to compile the code, but I am not sure. Perhaps it even has an internal compiler?

String += 10000 additions
Length = 38890
Took 7912ms
StringBuffer 300 * 10000 additions initial size wrong
Length = 19888890
Took 2634ms
StringBuffer 300 * 10000 additions initial size right
Length = 19888890
Took 1822ms

Why the difference between compilers?

This took some head-scratching, resulting in my fingers being full of wood splinters. I started by writing a class that did the basic String append with +=.

public class BasicStringAppend {
  public BasicStringAppend() {
    String s = "";
    for(int i = 0; i < 100; i++) {
      s += i;
    }
  }
}

When in doubt about what the compiler does, disassemble the classes. Even when I disassembled them, it took a while before I figured out what the difference was and why it was important. The part where they differ is in italics. You can disassemble a class with the tool javap that is in the bin directory of your java installation. Use the -c parameter:

javap -c BasicStringAppend
Compiled with Eclipse:
Compiled from BasicStringAppend.java
public class BasicStringAppend extends java.lang.Object {
    public BasicStringAppend();
}

Method BasicStringAppend()
   0 aload_0
   1 invokespecial #9 <Method java.lang.Object()>
   4 ldc #11 <String "">
   6 astore_1
   7 iconst_0
   8 istore_2
   9 goto 34
  12 new #13 <Class java.lang.StringBuffer>
  15 dup
  16 aload_1
  17 invokestatic #19 <Method java.lang.String valueOf(java.lang.Object)>
  20 invokespecial #22 <Method java.lang.StringBuffer(java.lang.String)>
  23 iload_2
  24 invokevirtual #26 <Method java.lang.StringBuffer append(int)>
  27 invokevirtual #30 <Method java.lang.String toString()>
  30 astore_1
  31 iinc 2 1
  34 iload_2
  35 bipush 100
  37 if_icmplt 12
  40 return
Compiled with Sun's javac:
Compiled from BasicStringAppend.java
public class BasicStringAppend extends java.lang.Object {
    public BasicStringAppend();
}

Method BasicStringAppend()
   0 aload_0
   1 invokespecial #1 <Method java.lang.Object()>
   4 ldc #2 <String "">
   6 astore_1
   7 iconst_0
   8 istore_2
   9 goto 34
  12 new #3 <Class java.lang.StringBuffer>
  15 dup
  16 invokespecial #4 <Method java.lang.StringBuffer()>
  19 aload_1
  20 invokevirtual #5 <Method java.lang.StringBuffer append(java.lang.String)>
  23 iload_2
  24 invokevirtual #6 <Method java.lang.StringBuffer append(int)>
  27 invokevirtual #7 <Method java.lang.String toString()>
  30 astore_1
  31 iinc 2 1
  34 iload_2
  35 bipush 100
  37 if_icmplt 12
  40 return

Instead of explaining what every line does (which I hope should not be necessary on a Java Specialists' Newsletter) I present the equivalent Java code for both IBM's Eclipse and Sun. The differences, which equate to the disassembled difference, is again in italics:

public class IbmBasicStringAppend {
  public IbmBasicStringAppend() {
    String s = "";
    for(int i = 0; i < 100; i++) {
      s = new StringBuffer(String.valueOf(s)).append(i).toString();
    }
  }
}
public class SunBasicStringAppend {
  public SunBasicStringAppend() {
    String s = "";
    for(int i = 0; i < 100; i++) {
      s = new StringBuffer().append(s).append(i).toString();
    } 
  }
}

It does not actually matter which compiler is better, either is terrible. The answer is to avoid += with Strings wherever possible.

Throw the used StringBuffers away!

You should never reuse a StringBuffer object. Construct it, fill it, convert it to a String, and then throw it away.

Why is this? StringBuffer contains a char[] which holds the characters to be used for the String. When you call toString() on the StringBuffer, does it make a copy of the char[]? No, it assumes that you will throw the StringBuffer away and constructs a String with a pointer to the same char[] that is contained inside StringBuffer! If you do change the StringBuffer after creating a String, it makes a copy of the char[] and uses that internally. Do yourself a favour and read the source code of StringBuffer - it is enlightning.

But it gets worse than this. In JDK 1.4.1, Sun changed the way that setLength() works. Before 1.4.1, it was safe to do the following:

 
  ... // StringBuffer sb defined somewhere else
  sb.append(...);
  sb.append(...);
  sb.append(...);
  String s = sb.toString();
  sb.setLength(0);

The code of setLength pre-1.4.1 used to contain the following snippet of code:

if (count < newLength) {
  // *snip*
} else {
  count = newLength;
  if (shared) {
    if (newLength > 0) {
      copy();
    } else {
      // If newLength is zero, assume the StringBuffer is being
      // stripped for reuse; Make new buffer of default size
      value = new char[16];
      shared = false;
    }
  }
}

It was replaced in the 1.4.1 version with:

if (count < newLength) {
  // *snip*
} else {
  count = newLength;
  if (shared) copy();
}

Therefore, if you reuse a StringBuffer in JDK 1.4.1, and any one of the Strings created with that StringBuffer is big, all future Strings will have the same size char[]. This is not very kind of Sun, since it causes bugs in many libraries. However, my argument is that you should not have reused StringBuffers anyway, since you will have less overhead simply creating a new one than setting the size to zero again.

This memory leak was pointed out to me by Andrew Shearman during one of my courses, thank you very much! For more information, you can visit Sun's website.

When you read those posts, it becomes apparent that JDOM reuses StringBuffers extensively. It was probably a bit mean to change StringBuffer's setLength() method, although I think that it is not a bug. It is simply highlighting bugs in many libraries.

For those of you that use JDOM, I hope that JDOM will be fixed soon to cater for this change in the JDK. For the rest of us, let us remember to throw away used StringBuffers.

So long...

Heinz

Performance Articles Related Java Course

Java Master
Java Concurrency
Design Patterns
In-House Courses



© 2010-2014 Heinz Kabutz - All Rights Reserved Sitemap
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. JavaSpecialists.eu is not connected to Oracle, Inc. and is not sponsored by Oracle, Inc.
@CORE_THE_BAND #RBBJGR