ConnectException: Could not connect to server when using gradlew test together with SLURM

Hello,

we try to run gradlew test on our bright managed cluster, which uses SLURM for workload balancing. To start jobs on that server, you prefix commands with the srun command, which allocates a machine in the cluster and then runs whatever command on it. However, doing srun gradlew test gives us

org.gradle.messaging.remote.internal.ConnectException: Could not connect to server [75385f21-c996-4774-9a15-fc0e526d2dc9 port:40282, addresses:[/0:0:0:0:0:0:0:1%1, /127.0.0.1]]. Tried addresses: [/0:0:0:0:0:0:0:1%1, /127.0.0.1].
org.gradle.messaging.remote.internal.ConnectException: Could not connect to server [d77b2db4-dcc0-4322-8507-5dbd6ab34f36 port:39718, addresses:[/0:0:0:0:0:0:0:1%1, /127.0.0.1]]. Tried addresses: [/0:0:0:0:0:0:0:1%1, /127.0.0.1].
	at org.gradle.messaging.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:62)
	at org.gradle.messaging.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:62)
	at org.gradle.messaging.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:35)
	at org.gradle.messaging.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:35)
	at org.gradle.process.internal.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:58)
	at org.gradle.process.internal.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:58)
	at org.gradle.process.internal.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:37)
	at org.gradle.process.internal.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:37)
	at org.gradle.process.internal.child.ImplementationClassLoaderWorker.execute(ImplementationClassLoaderWorker.java:87)
	at org.gradle.process.internal.child.ImplementationClassLoaderWorker.execute(ImplementationClassLoaderWorker.java:87)
	at org.gradle.process.internal.child.ImplementationClassLoaderWorker.execute(ImplementationClassLoaderWorker.java:41)
	at org.gradle.process.internal.child.ImplementationClassLoaderWorker.execute(ImplementationClassLoaderWorker.java:41)
	at org.gradle.process.internal.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:49)
	at org.gradle.process.internal.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:49)
	at org.gradle.process.internal.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:33)
	at org.gradle.process.internal.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:33)
	at jarjar.org.gradle.process.internal.launcher.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at jarjar.org.gradle.process.internal.launcher.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at jarjar.org.gradle.process.internal.launcher.GradleWorkerMain.main(GradleWorkerMain.java:74)
	at jarjar.org.gradle.process.internal.launcher.GradleWorkerMain.main(GradleWorkerMain.java:74)
Caused by: java.net.ConnectException: Connection refused
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.Net.connect0(Native Method)
	at sun.nio.ch.Net.connect0(Native Method)
	at sun.nio.ch.Net.connect(Net.java:465)
	at sun.nio.ch.Net.connect(Net.java:465)
	at sun.nio.ch.Net.connect(Net.java:457)
	at sun.nio.ch.Net.connect(Net.java:457)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
	at java.nio.channels.SocketChannel.open(SocketChannel.java:184)
	at java.nio.channels.SocketChannel.open(SocketChannel.java:184)
	at org.gradle.messaging.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:53)
	at org.gradle.messaging.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:53)
	... 9 more
	... 9 more

Note we’re using Gradle 2.4 and maxParallelForks is set to Runtime.runtime.availableProcessors(). Everything else is left to default. Any ideas?

Thanks,
Korbi

Hi,

It’s something particular about your network setup. The test process is not able to communicate back to the build process via the loopback interface. You’ll have to ask your administrators as to why this isn’t allowed in this environment.

Hi Luke,

thanks for your reply! I talked to our cluster admin and he said loopback interface is enabled and this should work. The lo interface is also visible when doing an ifconfig and pinging 127.0.0.1 works fine. We did some more experiments and it seems to be somewhat related to the maxParallelForks.
The machines in the cluster have 72 cores, so by default Gradle should spin up 72 JVMs to run the tests. If we reduce it to 8, we don’t see those connection errors. With 12 it sometimes works, sometimes not. Does Gradle have some issues with a “larger” amount of forks maybe?
Also, we have a network file system on that cluster, so disk access is slower than on conventional machines, might that be another cause?