Caching for custom plugins

simonbethke · January 8, 2019, 12:54pm

Currently I am in the process of implementing a custom plugin for code generation. At the same time colleagues are implementing plugins for jar-signing and icon-composition. All these plugins have in common, that they should make heavy use of caching.

Now several questions arise regarding caching. The basic problematic is, that each of these plugins has to work on a set of input data and generating a set of output data. As I just learned, a single task should be called only once in a build. Also I cannot pass a map to gradle that explains which input artifact causes which output artifact. All in all, it appears as if - for task implementation - all the nice caching features of gradle don’t help for incremental changes.

Also I have trouble understanding how a plugin/task can become reusable if it cannot implement the KISS-principle. For example implementing jar-signing straight forward should be like: One task-execution signs one file. However, as it is neither possible to run this task in a loop over multiple files, nor it would be possible to cache multiple outputs for multiple inputs.
Also a simple input list would cause that a change of one file will result in executing the task for all files again.
It looks as if the only solution gradle provides here is the incremental task inputs, but I cannot make any use of output-caching then.

Right now for the code generation I am wondering how the caching works in a multi-project build. Imagine a build of the projects A, B, C and each of the projects executes the CodeGeneratorTask with different input parameters. Now would the execution of CodeGeneratorTask in project B already clear the cached output for its input parameters in project A? If so, it seems impossible to use output caching for tasks that are executed two times in two projects that are part of the same multi-project build.

simonbethke · January 8, 2019, 1:40pm

I think I have just learned a bit more about the caching. Thanks to this this video: https://www.youtube.com/watch?v=M1MqYE7bB90

I learned, that Gradle-Output caching seems to be not about caching the output itself, but just a hash about it.
So on the one hand I need to leave the stale files in the output place (directory) but on the other hand, the cache seems to be kept infinite which I really like.

simonbethke · January 8, 2019, 2:34pm

Just a few tries later this is still not what I expected or am able to use here:

Run the task from the command line with templateA as input to generate not existing outputA
Run the task from the command line with templateB as input to generate not existing outputB
Run the task from the command line with templateA as input to generate existing outputA

Expected
At run 3 it should see that the output is already up to date and not execute the task

Actual
At run 3 the task is executed again…

simonbethke · January 9, 2019, 8:46am

I think I have a set of very clear questions:

How many caches are there?
What does the cache contain?
When is the cache cleared?

For question 2 I sometimes think it contains complete outputs (even binary) other times I think it contains only key=value pairs for hashed(input)->hashed(output) so the cache can only help to know if stale output is still up to date.

For question 3, depending on what the cache contains, if it contains whole output data I think the cache is cleared with EVERY task execution if the input doesn’t matcht the previous (thats what I observed already). If the cache contains key=value hashes, it could contain an infinite history. This is also what I read somewhere.

Chris_Dore · January 10, 2019, 5:28am

Before Gradle 4.1, your expected behaviour would be true. I don’t know what change caused the change in behaviour, and I do not know if the behaviour change was intentional or not. I tested with the following task:

task cacheTest {
    File templateFile = file( project.property( 'template' ) )
    File outputFile = new File( buildDir, "${project.property( 'template' )}.compiled" )

    inputs.file( templateFile )
    outputs.file( outputFile )

    doLast {
        logger.lifecycle( "Compiling template ${templateFile} into ${outputFile}" )
        outputFile.parentFile.mkdirs()
        outputFile.text = templateFile.text
    }
}

If you didn’t already know, running Gradle with -i will give you more information about why a task is considered not UP-TO-DATE.

The output of my test:

--------------------4.0.2--------------------
$ ~/bin/gradle-4.0.2/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.001 secs.
Executing task ':cacheTest' (up-to-date check took 0.004 secs) due to:
  No history is available.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template1 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled
:cacheTest (Thread[Task worker,5,main]) completed. Took 0.011 secs.

$ ~/bin/gradle-4.0.2/bin/gradle cacheTest -Ptemplate=template2 -i
:cacheTest (Thread[Task worker,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.0 secs.
Executing task ':cacheTest' (up-to-date check took 0.003 secs) due to:
  No history is available.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template2 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template2.compiled
:cacheTest (Thread[Task worker,5,main]) completed. Took 0.011 secs.

$ ~/bin/gradle-4.0.2/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.0 secs.
Skipping task ':cacheTest' as it is up-to-date (took 0.003 secs).
:cacheTest UP-TO-DATE
:cacheTest (Thread[Task worker,5,main]) completed. Took 0.003 secs.

--------------------4.1--------------------
$ ~/bin/gradle-4.1/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker for ':',5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.003 secs.
Executing task ':cacheTest' (up-to-date check took 0.006 secs) due to:
  No history is available.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template1 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled
:cacheTest (Thread[Task worker for ':',5,main]) completed. Took 0.017 secs.

$ ~/bin/gradle-4.1/bin/gradle cacheTest -Ptemplate=template2 -i
:cacheTest (Thread[Task worker for ':',5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.0 secs.
Executing task ':cacheTest' (up-to-date check took 0.002 secs) due to:
  Output property '$1' file C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled has been removed.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template2 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template2.compiled
:cacheTest (Thread[Task worker for ':',5,main]) completed. Took 0.008 secs.

$ ~/bin/gradle-4.1/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker for ':' Thread 2,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.001 secs.
Executing task ':cacheTest' (up-to-date check took 0.002 secs) due to:
  Output property '$1' file C:\Users\Chris Dore\Projects\taskOutputCaching\build\template2.compiled has been removed.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template1 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled
:cacheTest (Thread[Task worker for ':' Thread 2,5,main]) completed. Took 0.006 secs.

simonbethke · January 10, 2019, 9:32am

Hi @Chris_Dore,
thanks for this explanation! This already helps me to understand that I am not entirely missunderstanding this.
Your example is already summarizing the task I am doing very well.

What do you suggest for forwarding this issue as a bug-report or is there somebody who might know why this changed?

Stefan_Wolf · January 10, 2019, 1:14pm

Hi Simon,

so the change was that we don’t try to match up the previous output directories to the current ones when the task re-executes. A task only has one set of output files and one last execution.

Regarding the questions about caching of task outputs:

incremental build: if the inputs and outputs of the task didn’t change since the last execution, don’t do anything. This is documented here: https://docs.gradle.org/nightly/userguide/more_about_tasks.html#sec:up_to_date_checks
task output caching: if the inputs are the same to a task execution which was stored in the build cache, then re-use the task output from the build cache. This is documented here and here.
incremental tasks: only re-process the inputs that changed. This is documented here

Incremental tasks work in combination with the build cache und incremental build: Only process the actual files that changed. Currently, incremental build and the build cache only work on the task level. That means that you should probably should use all of the three features together for optimized performance.

Cheers,
Stefan

simonbethke · January 11, 2019, 12:19pm

Hi @Stefan_Wolf,

thanks for this explanation. Most of the linked documentation I already read multiple times. Only the one about caching in the network.

How would I ideally solve this scenario: I have two projects that belong to the sasme multi-project build but can also get built separatly. Each is making use of the same task that generates some project specific output from project specific input.
Running the multi-project build twice would not make use of any of the gradle cache because the task input is always changing between the sub-projects.

Cheers,
Simon

Stefan_Wolf · January 11, 2019, 1:06pm

Why don’t you add two tasks, one for each project. That is probably what your plugin should do when applied to the corresponding project. Or am I missing something?

simonbethke · January 11, 2019, 1:25pm

The task definition is a plugin implemented in Java. Somehow I have the impression I don’t understand something fundamental that is totally clear to you and others

I understand, that I have one task implementation in the plugin and call this task once in each project. My impression is, that the cache is caching per task-implementation and not per task-call or per sub-project.

avonengel · January 11, 2019, 1:54pm

I think you’re confused about what is considered a task. It can either mean the implementation class (e.g. CodeGeneratorTask extends DefaultTask {...} in CodeGeneratorTask.java) or an instance of that task in a specific project (e.g. task codeGenerator(type: CodeGeneratorTask) {...} in build.gradle). I think the important point you’re missing is that Gradle considers the same task defined in both of your subprojects to be different tasks:

:rootProject
- :subProjectA
    - :subProjectA:yourTask
- :subProjectB
    - :subProjectB:yourTask

simonbethke · January 25, 2019, 8:39am

@avonengel yea, you were right! Thanks for the clarification. After your post I learned that I could simply create a ‘plugin without plugin’. Just a project with a new Task implementation that does not already apply the task to a project. No need for a plugin- or extension-class.

Topic		Replies	Views
Incremental / Caching from Settings Plugins Help/Discuss	13	289	February 26, 2024
Custom task: input and output dependencies Old Forum Archive	9	3965	March 5, 2015
Gradle caching disabled: Gradle does not know how file was created Help/Discuss	1	2031	June 7, 2018
Explaination of `Gradle does not know how file was created` Help/Discuss	5	3828	July 11, 2017
Caching tasks that create no output? Help/Discuss android	6	3403	June 26, 2019

Caching for custom plugins

Related topics