Saturday, February 9, 2013

Potential perf issues with String.substring()

This post is relevant for Oracle’s java implementation of 1.6!

Following is the implementation of substring method in String class:

    public String substring(int beginIndex, int endIndex) {
      if (beginIndex < 0) {
          throw new StringIndexOutOfBoundsException(beginIndex);
      }
      if (endIndex > count) {
          throw new StringIndexOutOfBoundsException(endIndex);
      }
      if (beginIndex > endIndex) {
          throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
      }
      return ((beginIndex == 0) && (endIndex == count)) ? this :
          new String(offset + beginIndex, endIndex - beginIndex, value);
    }

Now, if you take a look at the highlighted part of the code, it says a new String is created with same char sequence (‘value’), but with different offset!
Let’s take a scenario where we have a huge string, say of 100MB and we take a substring containing last 100 chars of that string.

String s100MB = <100MB String>; //memory occupied is 100MB
String substring = s100MB.subString(s100MB.length() – 100); //does not occupy any additional memory for the chars of substring as it uses the same char array as the parent string
s100MB = null; //‘substring’ still occupies 100MB of memory where as what it requires is only 200k (for 100 chars)

Now, in real world this may not be a very serious issue as GC is not as instantaneous but cases where ‘substring’ hangs around in memory for very long time – this can be unnecessary wastage of memory! (Worse…. think of the original string being 1GB instead of 100MB!)

<![if !supportLists]>-          <![endif]>Sarang



No comments:

Post a Comment