How fine-grained is the build cache?

I’m building Docker images via a custom plugin. To make the process cacheable, the output of DockerBuildImageTask is a tar file (exported via docker image save) that includes content-addressed binary blobs of the image’s filesystem.

These tar files can take up a lot of space and bandwidth. However, many of the binary blobs inside (the base layers of the images, several of which are produced in a build) reappear across different subprojects. I’m wondering how to cut down on bandwidth and disk usage while still leveraging the build cache.

Is there a way to make Gradle treat the output as a tarTree? Alternatively, if I unpack and repack the tar archives around the Gradle-visible boundary (produce a directory task output and then consume it in a downstream task that repacks the tar), will Gradle cache individual files in this output directory? Or will it bundle them up anyways as the outputs of a single task?

My concern is not for local disk usage but rather for the disk of a shared HTTP build cache and for bandwidth. CI jobs talk to the HTTP build cache with no local cache of their own, so I’d like to minimise that impact and only pull every cached Docker image layer once.

The cache takes all the outputs of the task, packs them into an archive and stores that as a value with the task inputs fingerprint hash as key.
If the task is an archive, the archive is packed into another archive.
That’s also why for example Jar is not a cacheable task by default, because download the cache entry and unpacking it is usually slower than simply packing the jar freshly.
So to have the separate layers as cache entries, you would need to have each layers files as the output of a dedicated task.
So I doubt you can really leverage what you imagine.

What a shame. Thanks for the explanation @Vampire.

You could try to request it as feature of course.
Maybe the Gradle folks see the value in that feature.
Not as a general feature, because if you compile 10.000 files and then cache them individually in the remote build cache, the network overhead would be aweful.
But maybe in some way that a task can control it somehow.