Running integration tests in parallel on a CI server (GRADLE-2795, GRADLE-3200)

Continuing the discussion from Timeout waiting to lock artifact cache:

I have to bring this old issue up again since it’s currently blocking us from using gradle for our CI build. I can reliably reproduce this issue on our CI server and I might be able to spend some time trying to find the root cause. Is there any easy way to find out what the process holding the lock is doing while the others are waiting?

1 Like

Some more information: I’m currenly trying to run 8 sets of integration tests (8 different subprojects in the multi-project build) in parallel on different machines. GRADLE_USER_HOME is on a NFS share that all machines share. Whenever I try to start those 8 jobs in parallel, 7 of them time out waiting for either .gradle/caches/2.4/plugin-resolution/cache.properties.lock (queryPluginMetadata - read) or .gradle/caches/modules-2/modules-2.lock (resolving the configuration of the common subproject).

Using different GRADLE_USER_HOME directories is not an option, since that would slow down the build considerably.

After a bit more of testing, it seems like NFS being slow isn’t the only problem. I just tried to run all integration tests in parallel on a single machine without touching NFS, which fixed the lock timeouts, and ran into https://issues.gradle.org/browse/GRADLE-3200.

It seems like Gradle currently has a lot of issues with setups like ours, where multiple Gradle tasks have to run in parallel on one or multiple CI servers. 8 is not a huge number and our project is not insanely large.

As a workaround, for now, I will try to use 1 GRADLE_USER_HOME per Jenkins job, but I really wish there was a better solution.