Tons of Gradle daemons exhausting memory

We have a situation where many developers are sharing a single machine for builds. In general only a few may be doing builds at a given time. I would expect a few Gradle daemons to spawn and be reused on a rolling basis.

What we seem to see is that every build creates a new daemon, until there are 100 of them and Gradle refuses to start any more.

I know there are not 100 developers building simultaneously. So how can I diagnose why there are so many daemons that haven’t been killed or reused?

We are using 3.1.

Can you explain the setup in more detail? How are the developers using that machine? Can you share the daemon logs?

Without getting too deep, it’s a security requirement with controlling access to the code. There are 2 Unix VMs shared by ~100 developers. 4 cores each, 48GB memory. No external Internet. Gradle is installed on a shared network mount. Developers can remote into the machines with a controlled set of commands for editing and building. Obviously these machines are over-utilized, but that’s a separate discussion.

With this many developers, you’d expect a few may be building at any given time - but certainly not everyone at once. 48GB memory should be enough to sustain a lot of daemons without issues.

However, they have been hitting multiple problems related to out-of-memory issues. One sample error:

Problem in daemon expiration check
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.File.getAbsoluteFile(File.java:573)
at org.gradle.util.GFileUtils.mkdirs(GFileUtils.java:262)
at org.gradle.launcher.daemon.registry.DaemonDir.(DaemonDir.java:33)

Another (some personal info redacted):

What went wrong:
Unable to find a usable idle daemon. I have connected to 100 different daemons but I could not use any of them to run the build. BuildActionParameters were DefaultBuildActionParameters{, currentDir=/ws/[USER ID]/[CODE PATH], systemProperties size=58, envVariables size=88, logLevel=LIFECYCLE, useDaemon=true, continuous=false, interactive=true, injectedPluginClasspath=}.

For now most are building without the daemon, but we’d like to figure out what’s breaking here. Unsure if these errors are connected or not, but any suggestions for debugging or tweaking the environment are appreciated.

I should note that I also periodically see out-of-memory issues on my machine. This is on the same codebase, but not a shared machine - so generally will only have 1 daemon running. Today I got into a state where the daemon ran out of memory and apparently got stuck that way - I could not proceed until I manually ran gradle --stop to kill the daemon. Below is the error I was receiving before I killed the daemon.

Problem in daemon expiration check
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:201)
        at java.lang.String.substring(String.java:1921)
        at java.io.File.getName(File.java:456)
        at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:243)
        at java.io.File.isDirectory(File.java:849)
        at org.gradle.util.GFileUtils.mkdirs(GFileUtils.java:263)
        at org.gradle.cache.internal.DefaultFileLockManager$DefaultFileLock.<init>(DefaultFileLockManager.java:122)
        at org.gradle.cache.internal.DefaultFileLockManager.lock(DefaultFileLockManager.java:81)
        at org.gradle.cache.internal.DefaultFileLockManager.lock(DefaultFileLockManager.java:68)
        at org.gradle.cache.internal.OnDemandFileAccess.readFile(OnDemandFileAccess.java:36)
        at org.gradle.cache.internal.SimpleStateCache.get(SimpleStateCache.java:47)
        at org.gradle.cache.internal.FileIntegrityViolationSuppressingPersistentStateCacheDecorator.get(FileIntegrityViolationSuppressingPersistentStateCacheDecorator.java:31)
        at org.gradle.launcher.daemon.registry.PersistentDaemonRegistry.getAll(PersistentDaemonRegistry.java:69)
        at org.gradle.launcher.daemon.registry.PersistentDaemonRegistry.getDaemonsMatching(PersistentDaemonRegistry.java:111)
        at org.gradle.launcher.daemon.registry.PersistentDaemonRegistry.getIdle(PersistentDaemonRegistry.java:81)
        at org.gradle.launcher.daemon.server.CompatibleDaemonExpirationStrategy.checkExpiration(CompatibleDaemonExpirationStrategy.java:54)
        at org.gradle.launcher.daemon.server.expiry.AllDaemonExpirationStrategy.checkExpiration(AllDaemonExpirationStrategy.java:45)
        at org.gradle.launcher.daemon.server.expiry.AnyDaemonExpirationStrategy.checkExpiration(AnyDaemonExpirationStrategy.java:43)
        at org.gradle.launcher.daemon.server.MasterExpirationStrategy.checkExpiration(MasterExpirationStrategy.java:83)
        at org.gradle.launcher.daemon.server.Daemon$DaemonExpirationPeriodicCheck.run(Daemon.java:270)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at com.sun.tools.javac.code.Scope$ImportScope.makeEntry(Scope.java:526)
        at com.sun.tools.javac.code.Scope.enter(Scope.java:220)
        at com.sun.tools.javac.code.Scope.enter(Scope.java:202)
        at com.sun.tools.javac.code.Scope$StarImportScope.importAll(Scope.java:549)
        at com.sun.tools.javac.comp.MemberEnter.importAll(MemberEnter.java:165)
        at com.sun.tools.javac.comp.MemberEnter.visitTopLevel(MemberEnter.java:525)
        at com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:518)
        at com.sun.tools.javac.comp.MemberEnter.memberEnter(MemberEnter.java:437)
        at com.sun.tools.javac.comp.MemberEnter.complete(MemberEnter.java:1038)
        at com.sun.tools.javac.code.Symbol.complete(Symbol.java:574)
        at com.sun.tools.javac.code.Symbol$ClassSymbol.complete(Symbol.java:1037)
        at com.sun.tools.javac.comp.Enter.complete(Enter.java:493)
        at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
        at com.sun.tools.javac.main.JavaCompiler.enterTrees(JavaCompiler.java:982)
        at com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:857)
        at com.sun.tools.javac.main.Main.compile(Main.java:523)
        at com.sun.tools.javac.api.JavacTaskImpl.doCall(JavacTaskImpl.java:129)
        at com.sun.tools.javac.api.JavacTaskImpl.call(JavacTaskImpl.java:138)
        at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:46)
        at org.gradle.api.internal.tasks.compile.JdkJavaCompiler.execute(JdkJavaCompiler.java:33)
        at org.gradle.api.internal.tasks.compile.NormalizingJavaCompiler.delegateAndHandleErrors(NormalizingJavaCompiler.java:104)
        at org.gradle.api.internal.tasks.compile.NormalizingJavaCompiler.execute(NormalizingJavaCompiler.java:53)
        at org.gradle.api.internal.tasks.compile.NormalizingJavaCompiler.execute(NormalizingJavaCompiler.java:38)
        at org.gradle.api.internal.tasks.compile.CleaningJavaCompilerSupport.execute(CleaningJavaCompilerSupport.java:35)
        at org.gradle.api.internal.tasks.compile.CleaningJavaCompilerSupport.execute(CleaningJavaCompilerSupport.java:25)
        at org.gradle.api.tasks.compile.JavaCompile.performCompilation(JavaCompile.java:189)
        at org.gradle.api.tasks.compile.JavaCompile.compile(JavaCompile.java:170)
        at org.gradle.api.tasks.compile.JavaCompile.compile(JavaCompile.java:113)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)

That’s interesting, the daemon fails to kill itself because it doesn’t have enough memory for the health check. @eriwen this is something we should probably fix.

@awrichar There is probably a memory leak in your build. Memory dumps and allocation profiling could help you spot which task or tool is responsible for that.

@st_oehme Thanks very much. FWIW we do have a very large build (currently ~120 projects and growing, including a lot of native components and some jars with thousands of classes each) - not sure if this contributes to the frequency with which we see the problem.

Do you have any suggestions on tools or processes to zero in on potential leaks? I assumed the Groovy code would be pretty resilient due to garbage collection. We do have a significant amount of GoogleTest suites - would those be more likely to introduce problems? I’ll do some research and testing, but any hunches you have would be appreciated.

Just to jump in with a few observations:

  1. The “I have connected to 100 different daemons” error is likely the cause of the system memory problems, and all of the daemons that you see running. What is likely happening here is that there is some sort of error when the daemon is starting and the client is not able to complete the handshake with the daemon. He assumes that something went wrong, and so he tries to start/connect to another daemon. Each time he tries to start a new daemon, but then can’t connect to it, so he assumes the daemon is not available. Normally, this shouldn’t result in daemons being left running, but there is likely some failure scenario at play that’s not being handled. Take a look at the user’s ~/.gradle/daemons directory and you should see the daemon logs for each failed daemon. This should give an idea of what failures might have happened while the daemon was starting.
  2. For the GC overhead limit exceeded, it’s likely that you either 1) have the max heap size set much too low or 2) have a fast leak somewhere. The daemon does actively watch its own GC activity and will attempt to stop the daemon if it sees the garbage collector starting to thrash, but if a fast leak occurs that consumes memory very quickly, you can get an OOM before the daemon detects the situation. In other words, the GC monitoring is designed to catch slow leaks over multiple builds, but is vulnerable to very fast leaks. If the error is happening on the very first build, then you may just have the heap set too low (in which case the behavior is probably indistinguishable from a fast memory leak). The first thing to try would be to increase the heap size and see if that’s all it is. If that just pushes the problem out for a few builds, and the OOM still occurs, then it’s likely that there is a leak. In that case, probably the only way to troubleshoot would be to use a profiler or heap analyzer of some sort on the daemon and determine where the memory is being allocated.
  3. If the heap was undersized and the daemon was unable to properly initialize, that could very well result in the behavior described in 1 above, but that’s pure speculation.

I have filed https://github.com/gradle/gradle/issues/948 for the OutOfMemoryError problem encountered.

Thanks for the analysis. We have not tweaked any memory settings, which I think means that we get 1GB of max heap by default. I’ll experiment with 2GB to see if it helps the OOM issues. I’ll also see about getting daemon logs from the “100 daemons” case.

@Gary_Hale how do I set the max heap size correctly when using the daemon? I added some println statements to my build to dump memory usage. I see that setting GRADLE_OPTS=-Xmx2048m has an obvious effect when running a --no-daemon build, but seemingly no effect on daemon builds:

$ GRADLE_OPTS=-Xmx2048m gradle build --no-daemon
JVM memory (used/max): 62.64MB / 1820.50MB

$ gradle --stop
$ GRADLE_OPTS=-Xmx2048m gradle build            
JVM memory (used/max): 463.53MB / 910.50MB

Below is my debug code for printing the memory. Let me know if my assumptions are wrong somehow.

Runtime runtime = Runtime.getRuntime()
float maxMemory = runtime.maxMemory() / 1024 / 1024
float usedMemory = (runtime.totalMemory() - runtime.freeMemory()) / 1024 / 1024
printf "JVM memory (used/max): %.2fMB / %.2fMB\n", [usedMemory, maxMemory]

You can set org.gradle.jvmargs in the gradle.properties file for your project to add arguments onto the daemon’s java arguments. Another setting you can add to org.gradle.jvmargs is -Dorg.gradle.daemon.performance.logging=true. This will cause information about garbage collection and heap usage to be printed at the beginning of each build.

Thanks, org.gradle.jvmargs seems to work well, so we’ve updated our max heap to 2GB. Initially looks good, but it’ll take a few days to see if the OOM issues really go away.

Here is a daemon log snippet from the machine that was seeing “I have connected to 100 different daemons…”. At some point a build fails with OOM, and then the daemon seems to be looping (I have only included a snippet, but the same log continues repeating on and on for a few minutes). Wanted to share in case this helps zero in on what happened.

daemon-log.zip (3.1 KB)

A high-level question - is it generally supported to run multiple Gradle builds simultaneously on a machine? We have the situation mentioned in this thread (shared developer machines), and we have also experimented with CI jobs that spin up many Dockers on a single machine to build in parallel under different configurations. Seems to me that if we can tune the max heap size and have enough physical memory to sustain it, this is expected to work - but I want to make sure we’re not crazy :slight_smile:

Yes, you can run Gradle builds simultaneously. Note that Gradle forks additional JVMs that will consume additional resources, so don’t mentally divide 64 GB into 32 * 2GB Gradle builds. I would recommend against running two builds on the same project dir at the same time as they’ll interfere with each other.

Thanks, that makes sense. Each build is running in a separate Docker with an isolated project dir and Gradle user home. We see that running 5 concurrent builds with 2GB each on a 32GB machine generally exhausts the native thread pool, so I’m guessing the forked JVMs account for that. We’ll continue to tune it and see what’s possible.