Timeout after 120.0 seconds

Keep getting a 120.0 seconds timeout when running the Gradle builds on our build server.

 Unable to connect to the child process 'Gradle Test Executor 183'.
error	It is likely that the child process have crashed - please find the stack trace in the build log.
error	This exception might occur when the build machine is extremely loaded.
error	The connection attempt hit a timeout after 120.0 seconds (last known process state: STARTED, running: true).

We have checked all metrics during build time and see nothing getting even close to maxing out our resources. Pinpointing the metrics to the time around the error also doesn’t show anything.

We have even taken a VM and dedicated it to our build agent with an insane amount of cores and ram, but we are still getting this timeout error build after build.

Anyone run into a similar issue?

Is there a way to increase the 120 seconds timeout?

Even if you can increase that timeout, do you really think it would change anything? If it couldn’t connect to the either process within two minutes and the server is bored, what should change after 3 more minutes?

Does more sound like the test worker process crashed, or your tests System.exited it, or it was killed, for example by the OOM killer on Linux, …

I agree with it not changing much. The idea was more to pinpoint the issue. Is it actually some resource constraint we are missing, or some system.exit we cannot find (and we have looked)?

When running the build locally it is successful most of the times and if it fails it is a different error all together and not related to resources.

Also, the build machine is on Windows OS due to some dependent test cases we haven’t had the bandwidth to refactor.

We have the following setting for one of our test suites I’ve been playing around with to no luck.
maxParallelForks = (Runtime.getRuntime().availableProcessors() / 2).takeIf { it > 0 } ?: 1

I have tried giving it less and just commenting it out altogether.

Any suggestions of where we should focus on?

Thank you

No idea, maybe enable --debug logging to see whether something helpful is logged.
Or run something like “Sysinternals ProcessMonitor” in parallel to the build to see what is happening with the test processes.
Or watch the task manager or “Sysinternals ProcessExplorer” in those 120 seconds to see what happens with the process and whether it actually is there but just cannot be talked to.