Skip to content
Advertisement

Fastest way to iterate over all the chars in a String

In Java, what would the fastest way to iterate over all the chars in a String, this:

JavaScript

Or this:

JavaScript

EDIT :

What I’d like to know is if the cost of repeatedly calling the charAt method during a long iteration ends up being either less than or greater than the cost of performing a single call to toCharArray at the beginning and then directly accessing the array during the iteration.

It’d be great if someone could provide a robust benchmark for different string lengths, having in mind JIT warm-up time, JVM start-up time, etc. and not just the difference between two calls to System.currentTimeMillis().

Advertisement

Answer

FIRST UPDATE: Before you try this ever in a production environment (not advised), read this first: http://www.javaspecialists.eu/archive/Issue237.html Starting from Java 9, the solution as described won’t work anymore, because now Java will store strings as byte[] by default.

SECOND UPDATE: As of 2016-10-25, on my AMDx64 8core and source 1.8, there is no difference between using ‘charAt’ and field access. It appears that the jvm is sufficiently optimized to inline and streamline any ‘string.charAt(n)’ calls.

THIRD UPDATE: As of 2020-09-07, on my Ryzen 1950-X 16 core and source 1.14, ‘charAt1’ is 9 times slower than field access and ‘charAt2’ is 4 times slower than field access. Field access is back as the clear winner. Note than the program will need to use byte[] access for Java 9+ version jvms.

It all depends on the length of the String being inspected. If, as the question says, it is for long strings, the fastest way to inspect the string is to use reflection to access the backing char[] of the string.

A fully randomized benchmark with JDK 8 (win32 and win64) on an 64 AMD Phenom II 4 core 955 @ 3.2 GHZ (in both client mode and server mode) with 9 different techniques (see below!) shows that using String.charAt(n) is the fastest for small strings and that using reflection to access the String backing array is almost twice as fast for large strings.

THE EXPERIMENT

  • 9 different optimization techniques are tried.

  • All string contents are randomized

  • The test are done for string sizes in multiples of two starting with 0,1,2,4,8,16 etc.

  • The tests are done 1,000 times per string size

  • The tests are shuffled into random order each time. In other words, the tests are done in random order every time they are done, over 1000 times over.

  • The entire test suite is done forwards, and backwards, to show the effect of JVM warmup on optimization and times.

  • The entire suite is done twice, once in -client mode and the other in -server mode.

CONCLUSIONS

-client mode (32 bit)

For strings 1 to 256 characters in length, calling string.charAt(i) wins with an average processing of 13.4 million to 588 million characters per second.

Also, it is overall 5.5% faster (client) and 13.9% (server) like this:

JavaScript

than like this with a local final length variable:

JavaScript

For long strings, 512 to 256K characters length, using reflection to access the String’s backing array is fastest. This technique is almost twice as fast as String.charAt(i) (178% faster). The average speed over this range was 1.111 billion characters per second.

The Field must be obtained ahead of time and then it can be re-used in the library on different strings. Interestingly, unlike the code above, with Field access, it is 9% faster to have a local final length variable than to use ‘chars.length’ in the loop check. Here is how Field access can be setup as fastest:

JavaScript

Special comments on -server mode

Field access starting winning after 32 character length strings in server mode on a 64 bit Java machine on my AMD 64 machine. That was not seen until 512 characters length in client mode.

Also worth noting I think, when I was running JDK 8 (32 bit build) in server mode, the overall performance was 7% slower for both large and small strings. This was with build 121 Dec 2013 of JDK 8 early release. So, for now, it seems that 32 bit server mode is slower than 32 bit client mode.

That being said … it seems the only server mode that is worth invoking is on a 64 bit machine. Otherwise it actually hampers performance.

For 32 bit build running in -server mode on an AMD64, I can say this:

  1. String.charAt(i) is the clear winner overall. Although between sizes 8 to 512 characters there were winners among ‘new’ ‘reuse’ and ‘field’.
  2. String.charAt(i) is 45% faster in client mode
  3. Field access is twice as fast for large Strings in client mode.

Also worth saying, String.chars() (Stream and the parallel version) are a bust. Way slower than any other way. The Streams API is a rather slow way to perform general string operations.

Wish List

Java String could have predicate accepting optimized methods such as contains(predicate), forEach(consumer), forEachWithIndex(consumer). Thus, without the need for the user to know the length or repeat calls to String methods, these could help parsing libraries beep-beep beep speedup.

Keep dreaming 🙂

Happy Strings!

~SH

The test used the following 9 methods of testing the string for the presence of whitespace:

“charAt1” — CHECK THE STRING CONTENTS THE USUAL WAY:

JavaScript

“charAt2” — SAME AS ABOVE BUT USE String.length() INSTEAD OF MAKING A FINAL LOCAL int FOR THE LENGTh

JavaScript

“stream” — USE THE NEW JAVA-8 String’s IntStream AND PASS IT A PREDICATE TO DO THE CHECKING

JavaScript

“streamPara” — SAME AS ABOVE, BUT OH-LA-LA – GO PARALLEL!!!

JavaScript

“reuse” — REFILL A REUSABLE char[] WITH THE STRINGS CONTENTS

JavaScript

“new1” — OBTAIN A NEW COPY OF THE char[] FROM THE STRING

JavaScript

“new2” — SAME AS ABOVE, BUT USE “FOR-EACH”

JavaScript

“field1” — FANCY!! OBTAIN FIELD FOR ACCESS TO THE STRING’S INTERNAL char[]

JavaScript

“field2” — SAME AS ABOVE, BUT USE “FOR-EACH”

JavaScript

COMPOSITE RESULTS FOR CLIENT -client MODE (forwards and backwards tests combined)

Note: that the -client mode with Java 32 bit and -server mode with Java 64 bit are the same as below on my AMD64 machine.

JavaScript

COMPOSITE RESULTS FOR SERVER -server MODE (forwards and backwards tests combined)

Note: this is the test for Java 32 bit running in server mode on an AMD64. The server mode for Java 64 bit was the same as Java 32 bit in client mode except that Field access starting winning after 32 characters size.

JavaScript

FULL RUNNABLE PROGRAM CODE

(to test on Java 7 and earlier, remove the two streams tests)

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement