Oak.... Green.... Java.... BigData: Potential perf issues with String.substring()

Saturday, February 9, 2013

Potential perf issues with String.substring()

This post is relevant for Oracle’s java implementation of 1.6!

Following is the implementation of substring method in String class:

public String substring(int beginIndex, int endIndex) {

if (beginIndex < 0) {

throw new StringIndexOutOfBoundsException(beginIndex);

}

if (endIndex > count) {

throw new StringIndexOutOfBoundsException(endIndex);

}

if (beginIndex > endIndex) {

throw new StringIndexOutOfBoundsException(endIndex - beginIndex);

}

return ((beginIndex == 0) && (endIndex == count)) ? this :

new String(offset + beginIndex, endIndex - beginIndex, value);

}

Now, if you take a look at the highlighted part of the code, it says a new String is created with same char sequence (‘value’), but with different offset!

Let’s take a scenario where we have a huge string, say of 100MB and we take a substring containing last 100 chars of that string.

String s100MB = <100MB String>; //memory occupied is 100MB

String substring = s100MB.subString(s100MB.length() – 100); //does not occupy any additional memory for the chars of substring as it uses the same char array as the parent string

s100MB = null; //‘substring’ still occupies 100MB of memory where as what it requires is only 200k (for 100 chars)

Now, in real world this may not be a very serious issue as GC is not as instantaneous but cases where ‘substring’ hangs around in memory for very long time – this can be unnecessary wastage of memory! (Worse…. think of the original string being 1GB instead of 100MB!)

<![if !supportLists]>- <![endif]>Sarang

Oak.... Green.... Java.... BigData

Saturday, February 9, 2013

Potential perf issues with String.substring()

No comments:

Post a Comment