Gradle consistently stuck at test task

I maintain the Elasticsearch Groovy client and I have been using Gradle since I took over the project. However, starting with Elasticsearch 2.2.0 (released today), I have been unable to run tests via Gradle.

This is possibly related to the fact that Elasticsearch runs with a Security Manager, but it’s hard to say without hooking Gradle into a debugger.

Naturally, this is concerning, so I have not done the release for ES Groovy 2.2.0 until I can manage to resolve this issue.

It’s incredibly easy to reproduce and I’ve done it on a Mac and Windows with different versions of Gradle 2.x.

$ git clone https://github.com/elastic/elasticsearch-groovy.git
$ cd elasticsearch-groovy
$ gradle :test

Once it makes it to the test step, it simply stops. I ran it through a profiler and the various spawned processes are doing nothing.

What version of gradle are you using? I have tried with gradle 2.10 but could not get the dependencies to resolve.

> Could not resolve org.apache.lucene:lucene-core:5.5.0-snapshot-1725675.
  Required by:
      org.elasticsearch:elasticsearch-groovy:3.0.0-SNAPSHOT > org.elasticsearch:elasticsearch:3.0.0-SNAPSHOT
   > Could not resolve org.apache.lucene:lucene-core:5.5.0-snapshot-1725675.
      > Could not get resource 'http://repository.codehaus.org/org/apache/lucene/lucene-core/5.5.0-snapshot-1725675/lucene-core-5.5.0-snapshot-1725675.pom'.
         > Could not GET 'http://repository.codehaus.org/org/apache/lucene/lucene-core/5.5.0-snapshot-1725675/lucene-core-5.5.0-snapshot-1725675.pom'.
            > repository.codehaus.org: unknown error
   > Could not resolve org.apache.lucene:lucene-core:5.5.0-snapshot-1725675.
      > Could not get resource 'http://maven.elasticsearch.org/releases/org/apache/lucene/lucene-core/5.5.0-snapshot-1725675/lucene-core-5.5.0-snapshot-1725675.pom'.
         > Could not GET 'http://maven.elasticsearch.org/releases/org/apache/lucene/lucene-core/5.5.0-snapshot-1725675/lucene-core-5.5.0-snapshot-1725675.pom'. Received status code 401 from server: Unauthorized
  

Hi Adrian,

Sorry, I missed a step when I was providing repro instructions. I forgot that I was on an “earlier” branch. My master branch is currently out of date due to unrelated reasons.

$ git clone https://github.com/elastic/elasticsearch-groovy.git
$ cd elasticsearch-groovy
$ git checkout 2.2
$ gradle :test

That should do the trick. Thanks for reminding me that my master build needs to be reinstated into the CI loop though. :slight_smile:

If you run gradle :test -i do you get a JNA related warning? I get the following and then the build hangs. I suspect this is not a gradle problem.

16:53:27.021 [DEBUG] [TestEventLogger] org.elasticsearch.groovy.client.AdminClientExtensionsTests STANDARD_OUT
16:53:27.022 [DEBUG] [TestEventLogger] [2016-02-03 16:53:27,018][WARN ][bootstrap ] JNA not found. native methods will be disabled.
16:53:27.023 [DEBUG] [TestEventLogger] java.lang.ClassNotFoundException: com.sun.jna.Native
16:53:27.023 [DEBUG] [TestEventLogger] at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
16:53:27.023 [DEBUG] [TestEventLogger] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
16:53:27.023 [DEBUG] [TestEventLogger] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
16:53:27.024 [DEBUG] [TestEventLogger] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
16:53:27.024 [DEBUG] [TestEventLogger] at java.lang.Class.forName0(Native Method)
16:53:27.024 [DEBUG] [TestEventLogger] at java.lang.Class.forName(Class.java:264)
16:53:27.024 [DEBUG] [TestEventLogger] at org.elasticsearch.bootstrap.Natives.(Natives.java:45)
16:53:27.025 [DEBUG] [TestEventLogger] at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:89)
16:53:27.025 [DEBUG] [TestEventLogger] at org.elasticsearch.bootstrap.BootstrapForTesting.(BootstrapForTesting.java:85)
16:53:27.025 [DEBUG] [TestEventLogger] at org.elasticsearch.test.ESTestCase.(ESTestCase.java:99)
16:53:27.025 [DEBUG] [TestEventLogger] at java.lang.Class.forName0(Native Method)
16:53:27.026 [DEBUG] [TestEventLogger] at java.lang.Class.forName(Class.java:348)
16:53:27.026 [DEBUG] [TestEventLogger] at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:585)

I do, but that’s expected (it’s also present in the 2.1 branch, which is not having trouble building). It’s actually a checked/logged condition from the ES code:

On the bright side, it does suggest that some code is being executed, but I don’t know why it stops.

Ok, it’s definitely SecurityManager related because the tests will execute with the following:

gradle test -i -Dtests.security.manager=false

I’m almost certain the root cause is: https://github.com/elastic/elasticsearch/blob/0839cab6dd0065b95cbca5f197c95dcbcd2b6177/core/src/test/java/org/elasticsearch/bootstrap/BootstrapForTesting.java#L136-L142. Setting those socket permissions could be interfering with the internal protocol used by gradle to communicate with the daemon and, more generally, executors.

1 Like

Hmm. You’re definitely right because that change appeared in ES 2.2.

But I’m curious why that has the impact given that the Gradle subprocesses have reasonable ports. I’m also surprised that it does not error out, which is perhaps the bigger problem.

The ES team has switched to Gradle and they’re not running into this, but they’re using a custom test runner.

I’ve debugged this a bit by adding org.elasticsearch.SecureSM to the codebase which is used by org.elasticsearch.bootstrap.BootstrapForTesting I then added the following which gives more info as to what is going on with the SecurityManager and the AccessControlException's being thrown. There are quite a lot of those exceptions being thrown and the security model is difficult to understand. Run with: gradle test -i -Dtests.jarhell.check=false

 @Override
    public void checkPermission(Permission perm) {
        try {
            super.checkPermission(perm);
        } catch (Exception e) {
            System.err.println("---------Failed " + perm + "Exception" + e);
        }
    }

    //
    @Override
    public void checkPermission(Permission perm, Object context) {
        try {
            super.checkPermission(perm);
        } catch (Exception e) {
            System.err.println("---------Failed " + perm + " For: "+context + " Exception" + e);
        }
    }

Interestingly, the above code effectively swallows any AccessControlException's thrown by the SecurityManager and things go a lot further i.e. more of the tests’s run and fail before the build hangs

When you cancel the hung build with ctrl-c the following is printed:

Java HotSpot(TM) 64-Bit Server VM warning: Exception java.lang.StackOverflowError occurred dispatching signal SIGINT to handler- the VM may need to be forcibly terminated

At one point I killed a hung build and I got a bunch or error messages for each test suite, something along the lines of "Unhanded Exception: java.lang.StackOverflowError for test suite [SUITE-xxxx] "

I suspect that if you figure out all of the security permission issues and make sure there are no AccessControlException's being thrown, the build will not hang.

On the gradle side I suspect that there is some issue with cleanly exiting from a test run where a lot of AccessControlException's are thrown by the security manager. This may also be specific to groovy projects. We will need to investigate it a bit further.

Getting closer. I’ve managed to reproduce it consistently with the following test: https://github.com/gradle/gradle/commit/8be958eeb418b4107053d76a55e0dd022288cdbf It may be specific to both groovy and groovy.util.GroovyTestCase. Hoping to have a fix soon.

That’s awesome Adrian. Thanks for your effort!

Hey Adrian,

I noticed that the test made it into the build, but I don’t know enough about Gradle’s nightly builds to know if that means that the fix also made it into it. Were you able to find a fix?

Thanks,
Chris

@Chris_Earle we are unable to prioritise working on a fix for this right now. One way this could get fixed is if you or someone on your team investigate the root cause (probably by debugging the test) and subsequently propose a plan to fix it. We could then work together on that fix.

@Chris_Earle it may also be possible to get around this by using a security manager that grants org.gradle permission to do whatever it likes, possibly by setting the JVM’s access policy appropriately, or by testing with a SecurityManager implementation that is permissive.