Why are Tar/Jar tasks not cacheable by default?

Hi,

I am testing the local Gradle build cache and from the report I see that in our case only compileJava tasks are cached. I would expect also the Jar and Tar tasks to be cached by default.

Why are Jar and Tar tasks not cached y default? That would safe a lot of time.

Thanks

The Jar and Tar tasks are simple copy-like tasks, limited mostly by disk I/O. You don’t save the disk I/O by using the cache, but add additional overhead.

Caching was removed from these tasks as doing the work is typically faster than cache retrieval and overhead, even locally. If you use the remote cache, the difference becomes more pronounced.

A build scan or profiling of the build show point out that these tasks are executing faster than the retrieval of similarly sized items from the cache in almost every case. The few that aren’t are negligible (normally single digit milliseconds slower than something already taking under 1/10 of a second).

1 Like

That must not be always true.

  1. Imagine that you don’t change the Java sources. Now the sources will not be recompiled, but reused from cache. In such case Jar task will literally just:

    • repack all the cached files (many IO operations),

    • add ZIP compression on top of it (CPU operations)

      I don’t think in this case it will be faster to not cache the resulted jar files.

  2. Another use case is when packing up almost always the same files over and over again as part of the build. But because it is handled by a CI server, clean task is always used destroying the previously created archive:

     task collectFileNames {
         ext {
             collectedSoundFiles = []
         }
    
         ...
         doLast {
             collectedSoundFiles.add(wantedFile)
         }
     }
    
     task createTarball(type: Tar, dependsOn: collectFileNames) {
         from collectFileNames.collectedSoundFiles
         compression Compression.GZIP
         ...
     }
    

    In our case we are packing a lot of small binary files which tooks about 5s of execution time.

  3. Even more obvious cases where your assumptions are not true are once somebody is creating “uberjar” by repacking the content of configurations.*, flattening it and creating a single file, e.g.:

task uberJar(type: Jar) {
    dependsOn project(':a').configurations.runtimeClasspath
    dependsOn project(':b').configurations.runtimeClasspath
    dependsOn project(':c').configurations.runtimeClasspath

    dependsOn project(':d').tasks.jar

    from {
        def projectDependencies = (project(':a').configurations.runtimeClasspath +
                                   project(':b').configurations.runtimeClasspath +
                                   project(':c').configurations.runtimeClasspath +
                                   files(project(':d').tasks.jar) - files("e.jar"))

        projectDependencies.getFiles().collect {
            it.isDirectory() ? it : zipTree(it)
        }
    }
    ...
}

This is real code we have in our code base. In this case the potential improvement of caching the Jar task is in our case is huge.

Would it be possible to conditionally allow Jar ot Tar tasks to be cached?

By default, they should not, but experienced Gradle user that knows what he is doing should still have chance to switch it to be cached when appropriate.

Yes, it is possible to use task.cacheIf { ... } to determine when to enable caching for the Jar task. Beware that some features (like renaming files, or eachFile {}) are not tracked as inputs as of Gradle 4.0, so changing them will not result in a different cache key.

Sure, but your original question seemed to focus on why they were not cached by default, not how you could cache them anyway. I wasn’t recommending that you should not evaluate your own situation. I was only telling you why they aren’t cacheable by default (and in this case, they even were at one time when the cache was completely experimental, but the core team felt strongly enough about this to explicitly remove that while hardening the feature).

Thanks Lorant.

Where does the method come from? It looks like it is not part of the public API. Also, is there any documentation of the mentioned method?

On Gradle 4.0 cacheIf does not work on Tar, Jar or Zip tasks. Am I doing something wrong?

SOLVED => task.outputs.cacheIf()