We have many thousands of unit tests in a project recently migrated from Maven to Gradle. They all pass individually (i.e. when run using forkEvery 1). However, things start to fail if “forkEvery” isn’t set to 1. Setting it to 1 introduces a very severe performance penalty for obvious reasons.
While we tried to play with various values of forkEvery we eventually decided that such effort was pointless, at least for us, as with whatever value greater than 1 (e.g. even 2), there is a risk of running a “polite” test with a “naughty” one and have it fail because of the state it encountered at its start. We’ve observed this play out exactly like that at “forkEvery 2”.
Of course, we know that the best thing to do is to make the tests not leaky. The problem is that there are just too many of them and they are very hard to identify at this stage. I’ve tried fishing them out by severely limiting the max heap size for them and flagging the ones that fail even at “forkEvery 1” and that seemed to help… until it did not. We also don’t know what kinds of resources are leaked and/or how they are (not) cleaned up after test execution as not all failures are due to OutOfMemoryErrors (but all succeed with sufficient memory - say 64M - and forkEvery 1).
At this point we have two questions and some recommendations:
How can we customize our Gradle scripts to fork on failure and retry the failed test(s)? This way forkEvery would no longer be needed as fork would occur automatically when needed and would be almost perfectly optimal. True, this could hide some errors (such as tests that expect unqualified exceptions that now occur because of a previously run leaky test), but it would get us going and we would know what to expect. I would consider this to be a good feature in the base product too but we wouldn’t wait for it, if we don’t have to.
Is there any way to monitor the resource (memory and other) use of test workers between tests? This way we could, perhaps, see what is going on there and help us identify the causes. Note that we do not know what is going on. It may be memory or, perhaps, failure to clean up mocking set up by previous tests (we’ve seen class loading issues seemingly not caused by OutOfMemoryErrors).
I tried to see how can I get Java memory utilization before/after tests using RunListeners and/or Gradle TestListeners. Not sure (yet) how to configure these and whether these run within the same JVM that the tests run in or not. Can anyone help point me in the right direction w.r.t. that?
In the interim I added custom runners for each of our tests using @RunWith, as I have no way of setting a global one. This uncovered at least two issues beyond our direct control, both causing tests to not run well with other tests:
and:
The only hope to gain some performance is to find a way to automatically group tests into forks so that we don’t have to forkEvery 1.