Gradle daemon becomes very slow when the heap is nearly out of memory: it's running full GCs almost back-to-back, and it doesn't die quickly

Every time gradle is run, for our build the daemon leaks about 100m of memory.

This would be perfectly acceptable if the daemon would die quickly when it exhausts the heap, but what actually happens is that from watching -verbose:gc, we see the daemon doing 1+ second full GCs nearly back-to-back, and the builds using the daemon slow down by 2x or more. I can run several more builds that take 10 seconds when the daemon is not in this thrashing mode, and the builds take over a minute, and the daemon won’t die for several minutes, when I’d prefer it die and restart much sooner to avoid repeatedly building very slowly.

In an attempt to configure JVM options to automatically kill the daemon promptly, I tried various values of the GC overhead limit options, e.g. -XX:+UseGCOverheadLimit -XX:GCTimeLimit=10 -XX:GCHeapFreeLimit=10. (I may have misread the option documentation and those values may not be useful.)

I tried CMS, the default GC, and G1GC on version:

java -version java version “1.7.0_40” Java™ SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot™ 64-Bit Server VM (build 24.0-b56, mixed mode) and all GCs went into full GC thrash mode with slow builds instead of dying promptly.

If this is a problem without a current solution, I have some simplistic ideas about code that may help, e.g. have a parameter like killIfRemainingFreeMemoryLessThan, and a thread would kill the daemon if free memory was less than that value for N consecutive samples over a configurable sampling interval. (The idea being to ignore low memory if a GC reclaims it.) In my case, I would configuer this to die if the heap had less than 100m or 200m free, for a -xmx of 2 or more GB.

If you need more information, I can generate images of jvisualVM, provide GC logs, or whatever will help. Thanks.

Hi Peter,

Daemon hygiene is still on the to do list. In situations where there is memory pressure due to leaks over successive runs, we want to preemptively restart the daemon.

jvisualVM images would be very useful so we can see what is leaking. As much information you can give about what your build is doing would also be helpful.

A jvisualVM image is attached.

The gradle task being executed is “clean”. The times to run this, along with the bash commands echoed in the terminal, are:

peter@ubuntu-peter-dep:~/workspace/AMP$ gr clean|tail -1;gr clean|tail -1;gr clean|tail -1;gr clean|tail -1 Total time: 15.387 secs

daemon just launched, warming up Total time: 8.954 secs Total time: 8.288 secs Total time: 8.28 secs peter@ubuntu-peter-dep:~/workspace/AMP$ gr clean|tail -1;gr clean|tail -1;gr clean|tail -1;gr clean|tail -1 Total time: 8.413 secs Total time: 8.435 secs Total time: 9.699 secs Total time: 17.41 secs peter@ubuntu-peter-dep:~/workspace/AMP$ gr clean|tail -1;gr clean|tail -1;gr clean|tail -1;gr clean|tail -1 Total time: 21.728 secs Total time: 32.809 secs Total time: 52.041 secs ^C

control-C’ed after at least 2 minutes; I expect all later gradle builds to take 5x or 10x more time than usual

Here’s what -verbose:gc looks like at the start. Notice few Full GCs and all GCs are fairly fast: [GC 209792K->23678K(760256K), 0.0204780 secs] [Full GC 216241K->30991K(760256K), 0.1278270 secs] [GC 240783K->57535K(760256K), 0.0174880 secs] [Full GC 77253K->47510K(760256K), 0.1415680 secs] [GC 257302K->64961K(760256K), 0.0108680 secs] [Full GC 229049K->67029K(760256K), 0.2083170 secs] [GC 276821K->81836K(760256K), 0.0117780 secs] [GC 291628K->92854K(760256K), 0.0197380 secs] [GC 302646K->105325K(760256K), 0.0212090 secs] [GC 315117K->117140K(760256K), 0.0211160 secs] [GC 326932K->129481K(760256K), 0.0205780 secs] [Full GC 233451K->109287K(760256K), 0.3421910 secs] [GC 319079K->131139K(760256K), 0.0114770 secs] [GC 340931K->141331K(760256K), 0.0261110 secs] [GC 351123K->151201K(760256K), 0.0203350 secs] [GC 360993K->159764K(760256K), 0.0180320 secs]

Here’s what verbose:gc looks like at the end, essentially an infinite loop: [Full GC 760255K->759521K(760256K), 1.6580100 secs] [Full GC 760255K->759537K(760256K), 1.6557030 secs] [Full GC 760256K->759588K(760256K), 1.9779090 secs] [Full GC 760255K->759620K(760256K), 1.6870580 secs] [Full GC 760255K->759289K(760256K), 1.6401940 secs] [Full GC 760255K->759299K(760256K), 1.6483010 secs] [Full GC 760255K->759217K(760256K), 1.9631260 secs] [Full GC 760255K->759175K(760256K), 1.6567050 secs]


In the attached jvisualvm image, I believe the slowness started around 10:33 when the CPU GC activity (blue line, upper left graph) become close to 12% (1 core of 8 at 100%). (The long 0% CPU times were bash waiting for me to issue the next command.)

I believe there are 2 relevant issues here:

[1] The heap may be leaking objects that are leaked due to code not operating correctly, such that a code change may be able to remove the leak.

or

[2] The leak is inherent in Groovy and/or other code running in the daemon, and it cannot be avoided.

For case [1], if you suggest specific commands or other methods, I can provide a heap dump. If that’s useful, please supply the exact commands or tool invocation you’d like me to take a dump with.

However if this sort of problem occurs with other Gradle users or is unavoidable as suggested in [2], then perhaps we could try a patch to gradle with a crude heuristic as I suggested in the initial posts. I have the gradle source, and I could apply a patch and run with a locally-built gradle.

Also I will run the gradle source build (i.e. the full build of the gradle program), and observe whether it has a similar heap exhaustion problem. I’ll report back if I find another open-source build that has the same problem.

Thanks for reviewing this.

Also we are running Gradle 1.8, and I’ll test 1.9rc now.

As a first attempt at a heap analysis, the jvisualvm heap dump is below. This occurred when the Full GC thrashing had started.

If these HashMaps, Objects, and Strings look like Groovy objects, is there a way to as Groovy to dump the counts and sizes of Groovy objects?

As another data point, when running “gradle clean” on the gradle source itself, the daemon behaves somewhat better:

Permgen was rising, from 50 to 320 MB at the end. I was not running any GC options that may reclaim permgen.

The build ran in 5-6 seconds at the start, and only slowed down to 10-15 seconds at the end. Note that unlike in our gradle build, your Used heap blue line stabilized, and the daemon was able to do useful work instead of slowing down 10x as ours does.

The daemon did not die, however the gradle client auto-started another daemon.

For comparison to our heap dump, here’s the dump when running clean on your source. mx was the same 768m:

From the heap dump comparison, since ours looks like it has many objects that are of generic types, it may help to try to analyze what data structures are referencing the objects.

Also regarding the simplistic “kill when there’s less than N MB free for longer than S seconds”, it does appear that this heuristic would also detect the condition when running clean on your build, since from your build’s “Used heap” graph, with mx768m, after thrashing starts, the low water mark on the heap is around 480m.

Another heuristic that would immediately detect our problem would be, assuming the JVM lets you get this info, if P % of the last N GCs were Full GCs, then kill the daemon. E.g. P=90% and N=20 would detect this condition after around 30 seconds for our build (at 1.5 sec per full GC).

One more thing: the original daemon didn’t die, but the client started another daemon (I think: didn’t verify this), and around 4:00 PM in the Heap graph, a GC took the blue line from 700MB down to 100MB. However if reduction in heap can’t occur while thrashing is occurring, then I propose it’s better for the daemon to die very quickly. Also the Classes graph dropped from 37,000 classes to 5,000, and Used Permgen dropped from 340MB to 30MB.

Hi Peter,

I’ve linked this post to our (admittedly under baked) design spec for daemon improvements: https://github.com/gradle/gradle/blob/master/design-docs/daemon-improvements.md#daemon-feature-is-usable-when-under-memory-pressure

The next step for us would be to expand this spec out to lay out an implementation plan. To do this though, we’d probably need to spike some potential solutions as we don’t really know the best way to go about resolving this problem (you do offer some interesting options in your post).

Do you have any interest in working with us on this?

Hi Luke,

I’m able to work on this to some extent, at least to be able to run our in-house-only build to reproduce this problem. If that’s not all you were considering, feel free to suggest other matters.

Hey Peter,

This is great investigation, thanks a lot! I’m working on removing those memory leaks at the moment. Current Gradle master (or nightly build so long it is 2.3.* http://gradle.org/nightly) contains a feature that we hope to release soon. It caches the classloaders (and subsequently, the classes) in between daemon invocations.

Can you run your tests again with Gradle master? In order to toggle the new feature on, please set the internal system property “org.gradle.caching.classloaders=true”, for example, by running the build with:

-Dorg.gradle.caching.classloaders=true

Let me know how does it work for you. Thanks!