Using Gradle in a computer cluster causes daemons to lock each other out

tudortimi · July 18, 2023, 8:19am

We use Gradle in a computer cluster, where we have to submit jobs to available machines. This means that each time we start gradle, it potentially executes on a different machine. All machines have access to the same network file system, so even though the jobs run remotely, they sort of have the same “feel” as running locally.

The Gradle daemon causes issues, though. Running gradle on a machine will leave behind a daemon on that machine. This daemon will lock parts of .gradle (e.g. journal cache). Trying to execute gradle again but landing on a different machine will cause the second invocation to time out, because it can’t acquire locks.

I’ve seen that Gradle daemons running on the same machine will talk to each other, to request access to locked resources. A second invocation which is incompatible with a running daemon can ask this daemon to relinquish its lock so that the second daemon can execute. The communication is based on process IDs. When running in our multi-machine case, daemons can’t talk to each other, because they don’t know how to reach each other. We currently disable daemons altogether, but this is kind of a drain on performance. I’m guessing Gradle, at least the daemon infrastructure, isn’t really built for such a scenario, but I thought I’d ask whether there is anything we could do to be able to work with daemons.

Vampire · July 21, 2023, 12:35pm

I don’t think it is a good idea to have the GRADLE_USER_HOME on a network drive that is shared / used by multiple machines.
This is most probably the cause of your problems and I don’t think that Gradle supports that or even should support it.
Most software - I know - that have some kind of local cache or similar would not be compatible with multiple machines accessing that cache directory in write-mode.

So to resolve your issue you should most probably make sure, that different machines do not use the same shared GRADLE_USER_HOME.

tudortimi · July 21, 2023, 3:20pm

I thought about this as well. I can’t find an easy way of changing the value of GRADLE_USER_HOME to point to a different location based on the machine it’s running on. I’d like something like GRADLE_USER_HOME=/some/location/for/$(hostname)/.gradle, where $(hostname) returns the name of the machine where gradle was executed on. I don’t know a priori when I submit gradle on which machine it will land, but entries in gradle.properties have to be “simple” values.

Vampire · July 21, 2023, 3:26pm

That’s not an entry for gradle.properties.
That’s an environment variable.

tudortimi · July 21, 2023, 3:32pm

I’m referring to gradle.user.home, but the environment variable has the same problem.

The way submission works is like the following:

hostA> bsub gradle <task>
<dispatching to hostB>
<... Gradle executes on hostB>

bsub is the command that’s used to submit the job to the cluster.

It’s not possible when executing the bsub gradle command to know on what host it will land on, so it’s not possible to set GRADLE_USER_HOME to any concrete value. Only once the job was submitted is this available.

Vampire · July 21, 2023, 3:37pm

Well, you should probably not specify it at that level at all, but on the build machine.

If you really need to set it “from the build” dynamically, assuming the build machine uses the wrapper scripts, modify the wrapper scripts to set the environment variable or give the system property with an accordingly calculated value.

tudortimi · July 21, 2023, 3:39pm

There’s only one home directory (i.e. ~) for the user, regardless on which machine they are, so it’s not possible to solve this like that.

So it’s generally ok to modify the Gradle wrapper? Or do you mean another kind of wrapper scripts?

Vampire · July 21, 2023, 4:34pm

However you like. You can make wrapper scripts calling the wrapper scripts, or you can modify the existing wrapper scripts, if you do it in a way that does not disturb your local execution. And you have to take care to redo it on wrapper updates of course, for example by adding a doLast action to the wrapper task.

tudortimi · July 28, 2023, 4:39pm

I found the following comment in the Gradle wrapper:

github.com

gradle/gradle/blob/c8ca699854c0e8ee7f69c1db676bb25bff775de4/subprojects/plugins/src/main/resources/org/gradle/api/internal/plugins/unixStartScript.txt#L243-L247


      
          # Collect all arguments for the java command;
          #   * \$DEFAULT_JVM_OPTS, \$JAVA_OPTS, and \$${optsEnvironmentVar} can contain fragments of
          #     shell script including quotes and variable substitutions, so put them in
          #     double quotes to make sure that they get re-expanded; and
          #   * put everything else in single quotes, so that it's not re-expanded.

This leads me to believe that something like -Dgradle.user.home=.gradle-${HOSTNAME} inside GRADLE_OPTS should lead to Gradle using different user homes for each machine, because the ${HOSTNAME} would get re-expanded.

This doesn’t work, though. Gradle is using literally .gradle-${HOSTNAME} as a user home. Is the comment in the wrapper wrong or am I misinterpreting it? It would also seem strange to have the wrapper do any kind of shell expansion, because that would lead to different results depending on whether ./gradlew or gradle is used to start it.

Vampire · July 31, 2023, 1:24am

I guess the comment is either wrong or misleading, as there is extra code to escape meaningful characters so that no eval happens there.

tudortimi · July 31, 2023, 7:03am

I’ve created an issue for this: Comment in wrapper w.r.t. shell expansion of environment variables is wrong/misleading · Issue #25958 · gradle/gradle · GitHub

Topic		Replies	Views
Avoiding .gradle locking issues with concurrent gradle task runs? Help/Discuss	0	1442	January 22, 2019
Running multiple gradle processes concurrently Old Forum Archive	14	13933	December 16, 2014
Distributed Gradle Daemons Help/Discuss	1	237	June 30, 2020
Tons of Gradle daemons exhausting memory Help/Discuss	13	23045	December 15, 2016
Avoid blocking daemon for long running tasks Help/Discuss	5	429	March 9, 2023

Using Gradle in a computer cluster causes daemons to lock each other out

Related topics