GraalVM demos: Performance Examples for Java
The GraalVM compiler achieves excellent performance for modern workloadssuch as Scala or usage of the Java Streams API. The examples belowdemonstrate this.
Prerequisites
Running the examples
Let us use a simple example based on the Streams APIto demonstrate performance of the GraalVM compiler. This example counts the number of uppercase characters in a body of text. To simulate a large load, the same sentence is processed 10 million times:
- Save the following code snippet to a file named
CountUppercase.java
:
// COMPILE-CMD: javac {file}
// RUN-CMD: java -Diterations=2 {file} In 2017 I would like to run ALL languages in one VM.
// RUN-CMD: java -Diterations=2 -XX:-UseJVMCICompiler {file} In 2017 I would like to run ALL languages in one VM.
// BEGIN-SNIPPET
public class CountUppercase {
static final int ITERATIONS = Math.max(Integer.getInteger("iterations", 1), 1);
public static void main(String[] args) {
String sentence = String.join(" ", args);
for (int iter = 0; iter < ITERATIONS; iter++) {
if (ITERATIONS != 1) System.out.println("-- iteration " + (iter + 1) + " --");
long total = 0, start = System.currentTimeMillis(), last = start;
for (int i = 1; i < 10_000_000; i++) {
total += sentence.chars().filter(Character::isUpperCase).count();
if (i % 1_000_000 == 0) {
long now = System.currentTimeMillis();
System.out.printf("%d (%d ms)%n", i / 1_000_000, now - last);
last = now;
}
}
System.out.printf("total: %d (%d ms)%n", total, System.currentTimeMillis() - start);
}
}
}
// END-SNIPPET
- Compile it and run as follows:
$ javac CountUppercase.java
$ java CountUppercase In 2019 I would like to run ALL languages in one VM.
1 (389 ms)
2 (235 ms)
3 (216 ms)
4 (77 ms)
5 (81 ms)
6 (79 ms)
7 (85 ms)
8 (80 ms)
9 (78 ms)
total: 69999993 (1408 ms)
The warmup time depends on numerous factors like the source code or howmany cores a machine has. If the performance profile of CountUppercase
on yourmachine does not match the above, run it for more iterations by adding-Diterations=N
just after java
for some N
greater than 1.
- Add the
-Dgraal.PrintCompilation=true
option to see statistics for the compilations:
$ java -Dgraal.PrintCompilation=true CountUppercase In 2019 I would like to run ALL languages in one VM.
This option prints a line after each compilation that shows the methodcompiled, time taken, bytecodes processed (including inlined methods), sizeof machine code produced, and amount of memory allocated during compilation.
- Use the
-XX:-UseJVMCICompiler
option to disable the GraalVM compiler anduse the native top tier compiler in the VM to compare performance, as follows:
$ java -XX:-UseJVMCICompiler CountUppercase In 2019 I would like to run ALL languages in one VM.
1 (602 ms)
2 (443 ms)
3 (429 ms)
4 (423 ms)
5 (418 ms)
6 (432 ms)
7 (454 ms)
8 (415 ms)
9 (407 ms)
total: 69999993 (4443 ms)
The preceding example demonstrates the benefits of partial escape analysis (PEA)and advanced inlining, which combine to significantly reduce heap allocation.The results were obtained using GraalVM Enterprise Edition.
The GraalVM Community Edition still has good performance compared to the native top tiercompiler as shown below. You can simulate the Community Edition on the Enterprise Editionby adding the option -Dgraal.CompilerConfiguration=community
.
Sunflow is an open source rendering engine.The following example is a simplified version of code at the core of theSunflow engine. It performs calculations to blend various values for a point oflight in a rendered scene.
- Save the following code snippet to a file named
Blender.java
:
// COMPILE-CMD: javac {file}
// RUN-CMD: java {file}
// RUN-CMD: java -XX:-UseJVMCICompiler {file}
// BEGIN-SNIPPET
public class Blender {
private static class Color {
double r, g, b;
private Color(double r, double g, double b) {
this.r = r;
this.g = g;
this.b = b;
}
public static Color black() {
return new Color(0, 0, 0);
}
public void add(Color other) {
r += other.r;
g += other.g;
b += other.b;
}
public void add(double nr, double ng, double nb) {
r += nr;
g += ng;
b += nb;
}
public void multiply(double factor) {
r *= factor;
g *= factor;
b *= factor;
}
}
private static final Color[][][] colors = new Color[100][100][100];
public static void main(String[] args) {
for (int j = 0; j < 10; j++) {
long t = System.nanoTime();
for (int i = 0; i < 100; i++) {
initialize(new Color(j / 20, 0, 1));
}
long d = System.nanoTime() - t;
System.out.println(d / 1_000_000 + " ms");
}
}
private static void initialize(Color id) {
for (int x = 0; x < colors.length; x++) {
Color[][] plane = colors[x];
for (int y = 0; y < plane.length; y++) {
Color[] row = plane[y];
for (int z = 0; z < row.length; z++) {
Color color = new Color(x, y, z);
color.add(id);
if ((color.r + color.g + color.b) % 42 == 0) {
// PEA only allocates a color object here.
row[z] = color;
} else {
// In this branch the color object is not allocated at all.
}
}
}
}
}
}
// END-SNIPPET
- Compile it and run as follows:
$ javac Blender.java
$ java Blender
2477 ms
910 ms
857 ms
815 ms
813 ms
821 ms
819 ms
832 ms
819 ms
839 ms
If you would like to check how it would behave when using the GraalVM CE, use the following configuration flag:
java -Dgraal.CompilerConfiguration=community Blender
1127 ms
902 ms
888 ms
858 ms
820 ms
860 ms
855 ms
864 ms
899 ms
899 ms
- again, use the
-XX:-UseJVMCICompiler
option to disable the GraalVM compiler and run with the normal HotSpot’s jit:
$ java -XX:-UseJVMCICompiler Blender
2214 ms
1666 ms
1667 ms
1438 ms
1436 ms
1458 ms
1452 ms
1528 ms
1557 ms
1474 ms
The improvement compared to not using the GraalVM compiler comes from the partial escape analysis moving the allocation of color
in initialize
down to the point where it is stored into colors
(i.e., thepoint at which it escapes).