Ability to short-circuit dependencies with build cache


(Bob Bell) #1

For starters, let me state that is likely an enhancement request, and I’m not sure how it would be achieved given Gradle’s approach. But I would like to share the idea. And if there’s a better way to share the idea, please let me know.

If build artifacts can be retrieved via the Gradle build cache, it would be interesting to see if there’s a way to “short-circuit” the process and limit execution to a minimal set of tasks.

Consider the following long strings of dependent tasks: A -> B -> C -> D -> E -> … -> X -> Y -> Z

Suppose all tasks are cacheable, and I execute all tasks, but then clean the local outputs. Then I instruct Gradle to run task “A” again. The expectation is that all tasks would have cache hits.

In the current Gradle model, Gradle will pull the output of Z from the cache, figure out that Y can be cached, pull the output of Y from the cache, then X, then … all the way up.

In the end, Gradle will have pulled dozens of outputs from the cache. If all the outputs are small and the cache is local, it’s not such a big deal. But if the outputs are large and/or the cache is over a slower link, then this amounts to downloading and storing a lot of outputs, just to get the one output that we really want. That’s hopefully still better than actually executing all of the tasks, but could there be a more efficient way?

That is, wouldn’t it be cool if Gradle could avoid actually retrieving the outputs of most of the tasks (tasks B-Z), and merely take them into account in the computations of the build cache key, and instead only retrieve the output of the task that we really want?

I recognize that this might not make sense as a default mode of operation, but as something optional, I think it could be great. I have personal use cases in mind, of course. :slight_smile:

As mentioned, I’m not sure how this would be achieved. My gut tells me it would be like some sort of “lazy instantiation”, where only the checksum of the cached output was retrieved, but the artifact itself wasn’t actually retrieved and put into place until it was needed by a non-cached task or by a “targeted” task. But I wouldn’t want to dictate implementation!

If there’s a more appropriate forum to discuss this idea, please let me know.