Remote Build Cache gotchas


(Kyle Moore) #1

Hi team. Build caching is indeed pretty awesome.

Local caching is super easy to setup, however it doesn’t really flush out all the build issues I’m seeing when I try out remote caching. I want to share one clarification and ask a question (which I think might be a bug).

First, let me explain my environment. I’m testing a large build with ~1,400 tasks on two different machines. One is linux with en_US locale and, of course, UTF-8 is the default charset. The other is a Win 10 Japanese PC. Its default charset is “windows-31j” and system prop file.encoding is “MS932” a.k.a. Shift-JIS; in other words, most definitely not UTF-8. Despite these environmental differences, all sources are UTF-8 with unix-style line breaks. Finally, local caching is disabled on both machines but they can push to the nginx Docker container suggested by Gradle, running on a different VM. Of the 1,400 tasks I mentioned, 168 are cacheable compilation tasks which I’m expecting to be loaded from the remote cache.

Ultimately, the ‘classpath’ property hash of the compile spec is differing on the two machines and I’m having cache misses. When I compare the SHAs of each JAR in the classpaths, I see that the locally built JARs representing other subproject dependencies have different hashes. This modifies the overall ‘classpath’ hash in the compile spec and I’m forced to recompile, even though the sources as inputs for the JARs are identical.

So, the first speedbump I hit is related to CopySpec#filteringCharset (PR) and Windows. I want to repeat it here in hopes it helps someone else out:

  1. CopySpec#filteringCharset defaults to Charset#defaultCharset
  2. The Jar task is an AbstractCopyTask, which implements CopySpec
  3. AbstractCopyTask has two CopySpecInternal references, rootSpec and mainSpec

Even though the Jar Task is not cacheable, instances of CopySpec are showing up as inputs to its cache key. Here’s what I see in my logs:

[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending taskClass to build cache key: org.gradle.api.tasks.bundling.Jar_Decorated
...
[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending inputPropertyHash for 'rootSpec$1.filteringCharset' to build cache key: d746f44d09fb58d2971f341d24d74c35
...
[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending inputPropertyHash for 'rootSpec$2$1.filteringCharset' to build cache key: d746f44d09fb58d2971f341d24d74c35
...
[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending inputPropertyHash for 'rootSpec$2.filteringCharset' to build cache key: d746f44d09fb58d2971f341d24d74c35
...
[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending inputPropertyHash for 'rootSpec$1' to build cache key: 5cd9402cddb4eab80587e698245670a2
[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending inputPropertyHash for 'rootSpec$2' to build cache key: d41d8cd98f00b204e9800998ecf8427e
[DEBUG] [o.g.c.i.t.DefaultTaskOutputCachingBuildCacheKeyBuilder] Appending inputPropertyHash for 'rootSpec$2$1' to build cache key: 2bb8f38c29360e36d2e7e00893dd165f
[INFO] [o.g.a.i.t.e.ResolveBuildCacheKeyExecuter] Cache key for task ':xxx:yyy:jar' is b44ea89ad1e475a2f38b774fba5533b6

It may be inconsequential, but on the Windows machine with Shift-JIS encoding, the filteringCharset hash was different. This may not have affected the bytes of the resulting JAR, but just to make sure I worked around this with a small bit of configuration:

tasks.withType(AbstractCopyTask) { AbstractCopyTask task ->
  rootSpec.filteringCharset = java.nio.charset.StandardCharsets.UTF_8.name()
  mainSpec.filteringCharset = java.nio.charset.StandardCharsets.UTF_8.name()
}

This resolved the problem that the inputs to the Jar task were now identical on linux vs. Windows.

But now I have a second issue; that is, the hash of the resulting JARs do not match which causes recompilation. Upon a folder comparison of the same JAR, exploded, on both test machines, I see that the contents are identical. But, I guess as a byproduct of the zip algorithm (?), their hashes are unequal and I have cache misses. Am I doing something wrong or is this an unanticipated use case?

I’m happy to share more detail if necessary. Thanks in advance.

  • Kyle

OK, Build Cache is pretty awesome
(Sterling Greene) #2

Hi @DPUkyle

The jars do not need to be byte-for-byte equivalent in the classpath for the cache keys to be the same for compilation, so it may be OK for the SHA/MD5 of a given jar to be different. We inspect the jars for compile avoidance that ignores file order (respecting duplicates) and timestamps. There is a known issue with 3.5 that could cause cache misses, but this should only affect compilation tasks with annotation processors.

Have you tried setting file.encoding=utf8 in gradlew on the Windows build to see if that makes a difference?

We use this init script to publish custom values to our dogfooding instance of Gradle Enterprise to diagnose cache misses for particular tasks. The plan is to integrate this into Gradle proper so it’s easier to use out of the box. You should be able to use that init script as a base to get some more info about the individual input hashes that go into producing the build cache key.


(Kyle Moore) #3

Thanks @sterling, you’re right.

I tried to isolate the issue using this simple project, but I was not able to replicate the issue. So now I’m back to the drawing board to figure out why I’m getting cache misses.

At the very least, my use case is validated so that’s good. Thanks again; I’ll be in touch if I find anything worthwhile.


(Sterling Greene) #4

@DPUkyle I was assuming you were talking about JavaCompile tasks. If you’re using something else (like a custom task), you need to be sure that the classpath property is using @Classpath or else the jars will need to be byte-for-byte identical.

Assuming JavaCompile, if compilation isn’t being cached and the classpath is the property that’s changing, that usually points to something that changes the class files (or a bug in our code).


(Jamie Cooper) #5

Having far too much experience with comparing Jars, I can state 100% that 2 jars will NEVER have exactly the same SHA/MD5 values, unless they are identical copies of the same Jar.

This is because the Zip format includes timestamps in the metadata. Even if you just run the ‘jar’ command from the command line twice in a row (unless your jar is so trivial you can get multiple runs in in under a second) the jars will be slightly different because of those timestamps.

We went so far as to write a custom comparison tool to takes things like that into account, when trying to determine if the contents of a Jar were the same or not.


(Kyle Moore) #6

Thanks @JamieC, you’re absolutely right.

The @Classpath annotation on the JavaCompile task’s getClasspath() is well documented (table 19.2 here). Even so, my understanding WRT inter-project dependencies has evolved so something akin to this:

For external dependencies, any change in the JAR’s SHA/MD5 will cause a cache miss and subsequent recompilation.
For inter-project dependencies (dependencies on other modules in the same build), any change in the hashes of all the inputs to the JAR, not the JAR itself, will cause a cache miss and subsequent recompilation.

This last part is important to illustrate your point. This is also a key reason, I’m sure, why the Jar task is not cacheable.

Finally, the taskCacheBuildScanUserData.gradle script Sterling pointed out was helpful; if nothing else it helps with the signal-to-noise ratio when looking at caching loggin which you can only get via the full debug logs.

Thanks again,

Kyle


(Sterling Greene) #7

@Classpath and @CompileClasspath operate the same with external dependencies and project dependencies (directories or jars). For @CompileClasspath, an ABI-change needs to be introduced to make the JavaCompile task out-of-date. For @Classpath, it’s a little simpler, classes or resources just need to have a different MD5 hash. We don’t do anything more advanced yet.

WRT creating identical jars, we introduced reproducible archives in 3.4. These settings strip date timestamps and sort the contents of jars. We don’t have this enabled by default.