Matthew Gilliard's blog || Better Containerized JVMs in JDK10

TL;DR The JDK team has committed to making Java a good citizen in a world of containers. JDK10 contains several changes to have the JVM and your apps respect container restrictions. JDK10 is due to be released in March 2018.

This post is a counterpoint or followup to Jörg Schad’s recent post Nobody puts Java in a container. I would absolutely recommend reading that for its excellent summary of how container technology affects the JVM today (ie JDK9).

I can’t really agree with his title though - lots and lots of people do put JVM workloads in containers (spoiler: I have done so on my last several projects - we even tune for it in Fn Project). As Jörg points out, when JDK10 is released the support will be even better.

There is a significant amount of work going into JDK10 to support containerized JVMs. In this post I’ll show how the next release of the JDK will be container-aware.

I’ll be using Docker to run the latest build of JDK10 (10+39 right now). For the JDK download, head to http://jdk.java.net/10/. These patches landed in 10+34. I am using this Dockerfile. I’ll be using a baremetal instance on Oracle Cloud Infrastructure which comes with 72 cores and 256Gb of RAM. Just the kind of place to worry about how best to run lots of things concurrently 😉

CPU management

A lot of existing code uses Runtime.getRuntime().availableProcessors() to size thread pools, for example:

It’s a sensible strategy. Even if your code doesn’t do so directly, you’re pretty likely to be using something which uses the ForkJoinPool under the hood, and Oh Look!

Without specifying any constraints, a containerized process will be able to see and use all the hardware on the host. If that’s not what you want there are several ways to limit the CPU which a container can use in Docker, and naturally you would like your JVM app to use an appropriate amount of resources. In experiments on Fn we found cases where creating a large number of containerized JVMs was slower than booting the same number of hypervisor VMs because of the overhead of too-large threadpools. And then you have to concern yourself the contention between the huge number of threads.

Let’s see how the value of availableProcessors is affected by the different CPU throttling mechanisms:

Host OS

Running outside of any container, this is the JVM’s view of the host.

$ echo 'Runtime.getRuntime().availableProcessors()' | jdk-10/bin/jshell -q
jshell> Runtime.getRuntime().availableProcessors()
$1 ==> 72

CPU Shares

Specified with —cpu-shares [Doc].

This rations the CPU according to the proportions you choose, but only when the system is busy. For example you can have three containers with shares of 1024/512/512, but those limits are only applied when necessary. When there is headroom your containerized process can use left-over CPU time.

$ echo 'Runtime.getRuntime().availableProcessors()' | docker run --cpu-shares 36864 -i jdk10
jshell> Runtime.getRuntime().availableProcessors()
$1 ==> 36

Notes:

This is based on the relationship 1 CPU = 1024 ‘shares’. 36864 = 36×1024.
That said, 1024 is the default setting, so specifying 1024 here will cause the JVM to report 72 processors rather than 1. This seems like a gotcha waiting to happen.
CPU shares are hard for the JVM or any containerized process trying to do this kind of introspection. The arg you specify is interpreted relative to all the other containers on the system.

CPU Period/Quota

Specified with —cpus (or —cpu-period and —cpu-quota, where CPUs = quota/period) [Doc].

This uses the Linux Completely Fair Scheduler to limit the container’s CPU usage. This means that the container will have limited CPU time even when the machine is lightly loaded and your workload may be spread across all the CPUs on the host. The example here might give you 50% time on all 72 cores, for example.

$ echo 'Runtime.getRuntime().availableProcessors()' | docker run --cpus 36 -i jdk10
jshell> Runtime.getRuntime().availableProcessors()
$1 ==> 36

CPU Sets

Specified with —cpuset-cpus [Doc].

Unlike the two previous constraints, this pins the containerized process to specific CPUs. Your process may have to share those CPUs, but will obviously not be allowed to use spare capacity on any others.

$ echo 'Runtime.getRuntime().availableProcessors()' | docker run --cpuset-cpus 0-35 -i jdk10
jshell> Runtime.getRuntime().availableProcessors()
$1 ==> 36

Combinations

You can mix these options together, too - eg 50% of time on cores 0 through 8. It’s well documented by Docker which will link you through to the kernel docs for more info.

JVM calculation

The formula the JVM uses is: min(cpuset-cpus, cpu-shares/1024, cpus) rounded up to the next whole number.

Just tell me what to use!

It seems logical to use CPU Sets when you have enough cores as it should reduce context-switches and increase CPU cache coherency. Or maybe it would be OK to have more threads, if they spend a lot of their time parked and waiting on an interrupt, in which case perhaps CPU period/quota would be appealing. Maybe you are happy with the constraints of CPU shares and are happy to be able to use spare CPU cycles. As usual the best advice is to do some profiling. Contrary to the warnings in the JavaDoc I have not seen any occasions where availableProcessors changes its result over time within a single JVM.

Memory

The amount of memory that the JVM will try to allocate and make available is either set explicitly or chosen for you by a process known as Ergonomics. The docs state that on a ‘server-class’ machine (>2 processors, >2Gb RAM) the JVM will run in ‘server’ mode and max heap size will be set by ergonomics to 1/4 physical memory. In fact in 64bit JVMs there is no alternative to ‘server’ mode. Additionally the heap size chosen by ergonomics is limited to around 32Gb - if you want more than that you have to ask for it with -Xmx.

For example, on my test server (256Gb RAM):

$ jdk-10+23/bin/java -XX:+PrintFlagsFinal -version | grep MaxHeapSize
   size_t MaxHeapSize                              = 32178700288                              {product} {ergonomic}

32Gb is the max for ergonomics as it’s the largest size which can use Compressed Oops.

Let’s try with a smaller memory limit:

$ docker run -it -m512M  --entrypoint bash jdk10
root@7378a1f0a2a9:/# /java/jdk-10/bin/java  -XX:+PrintFlagsFinal -version | grep MaxHeapSize
   size_t MaxHeapSize                              = 134217728                              {product} {ergonomic}

This is close enough to 128Mb, as expected.

$ docker run -it -m512M  jdk10
jshell> Runtime.getRuntime().maxMemory()
$1 ==> 129761280

Again, close to 128Mb. No surprises what happens if we try to use too much memory:

jshell> new byte[140_000_000]
|  java.lang.OutOfMemoryError thrown: Java heap space
|        at (#1:1)

More Ergonomics

Ergonomics also tunes internal JVM values, such as threadpool sizes used by G1GC. These are reported by the arg -XX:+PrintFlagsFinal as ConcGCThreads and ParallelGCThreads as described on the OTN. G1GC threadpool sizes are selected by the same means as availableProcessors.

Host OS

$ jdk-10/bin/java -XX:+PrintFlagsFinal -version | grep -i GCThreads
     uint ConcGCThreads                            = 12                                       {product} {ergonomic}
     uint ParallelGCThreads                        = 48                                       {product} {default}

CPU-limited Container

$ docker run -it --cpus 36  --entrypoint bash jdk10
root@6a94863c54df:/# /java/jdk-10/bin/java -XX:+PrintFlagsFinal -version | grep -i GCThreads
     uint ConcGCThreads                            = 6                                        {product} {ergonomic}
     uint ParallelGCThreads                        = 25                                       {product} {default}

Other values

Several other values are set by the ergonomics process:

HotSpot compilation thread count
GC region sizes
Code cache sizes

Here is a diff of host vs container values. The container is run with --cpus 36 -m 4G.

Summary

In JDK10 it does seem that applying CPU and memory limits to your containerized JVMs will be straightforward. The JVM will detect hardware capability of the container correctly, tune itself appropriately and make a good representation of the available capacity to your application.

Thanks to @msgodf and Bob Vandette for help with this post.