Sudden ClassNotFound exceptions, sometimes related to changed dependencies

This will be a “bug report from hell”, unfortunately.

Sometimes (i.e. this is not reproducible and it’s also quite rare) our Jenkins CI server produces ClassNotFound exceptions after a dependency has been updated.

This happened, for example, while updating org.gebish:geb-spock:0.12.1 to org.gebish:geb-spock:0.12.2. The class geb.spock.GebSpec wasn’t found anymore.

It also happened with hibernate-validator-4.2.0.Final without changing the dependency.

error: package javax.validation does not exist
import javax.validation.Valid;

So far, this only happened on our build server.

The switches for Gradle on Jenkins are the following: --refresh-dependencies --no-search-upward --no-daemon --stacktrace.

This state is persistent. If Gradle gets into this strange state, it will stay in that state for future builds. Unfortunately, I have no idea how to reproduce that state.

Performing a gradle clean does not fix the problem - but manually deleting ~/.gradle/caches does.

I’m not sure if this already happened with Gradle 2.5 but it definitely happened with both Gradle 2.6 and 2.7-rc-1.

I know that this isn’t the most impressive bug report but I wanted to at least raise a flag and mention it. (A fishy heisencache is giving me the creeps.)

We are using an in-house Nexus repository as our dependency cache. I don’t think this has an impact since local builds have always been fine so far but I wanted to mention it anyway due to the general voodoo aspect of this issue.

Thanks for letting us know.

What do you delete from ~/.gradle/caches? Everything?

Could you check if the jar for the “missing”/“bad” dependency is in the cache? If it is, check if it’s valid (e.g., can list the contents and the missing class is listed). Could you also check the checksum of the jar and compare that to Nexus?

Is the ~/.gradle directory for Jenkins on a local filesystem or a network mount?

Just an rm -rf ~/.gradle/caches.

Unfortunately, I can’t check the files anymore since the cache is deleted. I’ll definitely check whenever that problem shows up again.

~/.gradle is a symbolic link.

lrwxrwxrwx 1 webserv webdev 25 Jun 24 2014 .gradle → /opt/work/jenkins/.gradle

But the target is on the same machine, i.e. no network mount.

Hope that helps. A bit.

Alright…
The issue has returned.

This time, the class org.codehaus.plexus.util.IOUtil can’t be found by an init.d script.

That class is located in org.codehaus.plexus:plexus-utils:3.0.15.

Debug output grepped to include plexus-utils: http://pastebin.com/jzh9tjKW

I compared the content and checksum of /opt/work/jenkins/.gradle/caches/modules-2/files-2.1/org.codehaus.plexus/plexus-utils/3.0.15/519b3f1fc84d3387c61a8c8e6235daf89aea4168/plexus-utils-3.0.15.jar to http://repo1.maven.org/maven2/org/codehaus/plexus/plexus-utils/3.0.15/plexus-utils-3.0.15.jar and the files are identical.

The init.d script in question looks like this: http://pastebin.com/PHe3iDyx

I could also send you the complete debug output of the build but it’s larger than 512kb so it doesn’t fit into pastebin.

Sure, you can use sterling at gradle.com

Thanks for the additional debug output.

So, I think you’re running into a classloader issue. Gradle has a dependency on an older version of the plexus-utils library, which pollutes the classloader for the buildscript. When the webdav wagon tries to use a newer version, you get the CNF errors.

We’ve been chipping away at removing some of these dependencies, but they’re wrapped up in the publishing plugins.

@Adrian_Kelly Anything you could suggest?

So this can’t really be related to the original issue…

What I really don’t get is why this sometimes, but not always, happens on our build server (linux) but it never happened on my dev machine (os x) so far.

I created GRADLE-3555 for this issue.

Another question from my end: Do you use different JDKs for CI server vs. developer machine?

1 Like

Thanks!

Should both run 1.8.0_102. I need to double-check that tomorrow but I’m 99% sure.
The main difference is that the Jenkins server is running on Linux while the dev machines are running Mac OS X. Oh, and the Jenkins server is always executing without daemon, for historical reasons. :wink:

Given that Gradle enables the daemon by default with 3.0 is there still a compelling reason to disable the daemon on Jenkins? I am wondering if the issue is exclusively related to running Gradle in non-daemon mode.

It’s still in no-daemon mode since it was still experimental. So we used the daemon on the dev machines and kept it disabled on the build server that produces our repo artifacts.

Regarding build scan… Can’t do that for the in-house projects but could probably do it for my open source ones. Is it OK to just always produce a build report for those projects or would it end up being to spammy for you? The main problem is the irregularity of the failure. It’s a Heisenbuild with a Heisenbug.

I think it mostly happens while upgrading the Gradle wrapper version but that isn’t more than a vague feeling…

Tried with or without daemon. Didn’t change anything. Failed in both cases.

I added ~/.gradle/init.d/buildScan.gradle and tried to execute the build with scan, but to no avail.

The problem manifests itself while evaluating the ~/.gradle/init.d/ gradle scripts, so before the scan has actually the chance to kick in at all. Consequently, no scan data is generated.

I then deleted the content of ~/.gradle/caches. That fixed the build and caused the scan to be performed but the data won’t help us very much since everything is fine again.

https://scans.gradle.com/s/nzzou643zp26g

Well, the only thing it did show was that the server is actually running 1.8.0_101 and not _102 as I suspected. :stuck_out_tongue:

The UploadNexus.gradle file that is causing the issue looks like this:

http://pastebin.com/dtka01ns

Please ignore how convoluted the file is looking by now.

It’s like that because of endless fiddling related to this bug. Changing the file causes it to be reevaluated and makes the issue go away temporarily so this file has been changed again and again and again until it ended up looking like this.