Caching for custom plugins


(Simon Bethke) #1

Currently I am in the process of implementing a custom plugin for code generation. At the same time colleagues are implementing plugins for jar-signing and icon-composition. All these plugins have in common, that they should make heavy use of caching.

Now several questions arise regarding caching. The basic problematic is, that each of these plugins has to work on a set of input data and generating a set of output data. As I just learned, a single task should be called only once in a build. Also I cannot pass a map to gradle that explains which input artifact causes which output artifact. All in all, it appears as if - for task implementation - all the nice caching features of gradle don’t help for incremental changes.

Also I have trouble understanding how a plugin/task can become reusable if it cannot implement the KISS-principle. For example implementing jar-signing straight forward should be like: One task-execution signs one file. However, as it is neither possible to run this task in a loop over multiple files, nor it would be possible to cache multiple outputs for multiple inputs.
Also a simple input list would cause that a change of one file will result in executing the task for all files again.
It looks as if the only solution gradle provides here is the incremental task inputs, but I cannot make any use of output-caching then.

Right now for the code generation I am wondering how the caching works in a multi-project build. Imagine a build of the projects A, B, C and each of the projects executes the CodeGeneratorTask with different input parameters. Now would the execution of CodeGeneratorTask in project B already clear the cached output for its input parameters in project A? If so, it seems impossible to use output caching for tasks that are executed two times in two projects that are part of the same multi-project build.


(Simon Bethke) #2

I think I have just learned a bit more about the caching. Thanks to this this video: https://www.youtube.com/watch?v=M1MqYE7bB90

I learned, that Gradle-Output caching seems to be not about caching the output itself, but just a hash about it.
So on the one hand I need to leave the stale files in the output place (directory) but on the other hand, the cache seems to be kept infinite which I really like.


(Simon Bethke) #3

Just a few tries later this is still not what I expected or am able to use here:

  1. Run the task from the command line with templateA as input to generate not existing outputA
  2. Run the task from the command line with templateB as input to generate not existing outputB
  3. Run the task from the command line with templateA as input to generate existing outputA

Expected
At run 3 it should see that the output is already up to date and not execute the task

Actual
At run 3 the task is executed again…

:frowning:


(Simon Bethke) #4

I think I have a set of very clear questions:

  1. How many caches are there?
  2. What does the cache contain?
  3. When is the cache cleared?

For question 2 I sometimes think it contains complete outputs (even binary) other times I think it contains only key=value pairs for hashed(input)->hashed(output) so the cache can only help to know if stale output is still up to date.

For question 3, depending on what the cache contains, if it contains whole output data I think the cache is cleared with EVERY task execution if the input doesn’t matcht the previous (thats what I observed already). If the cache contains key=value hashes, it could contain an infinite history. This is also what I read somewhere.


(Chris Doré) #5

Before Gradle 4.1, your expected behaviour would be true. I don’t know what change caused the change in behaviour, and I do not know if the behaviour change was intentional or not. I tested with the following task:

task cacheTest {
    File templateFile = file( project.property( 'template' ) )
    File outputFile = new File( buildDir, "${project.property( 'template' )}.compiled" )

    inputs.file( templateFile )
    outputs.file( outputFile )

    doLast {
        logger.lifecycle( "Compiling template ${templateFile} into ${outputFile}" )
        outputFile.parentFile.mkdirs()
        outputFile.text = templateFile.text
    }
}

If you didn’t already know, running Gradle with -i will give you more information about why a task is considered not UP-TO-DATE.

The output of my test:

--------------------4.0.2--------------------
$ ~/bin/gradle-4.0.2/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.001 secs.
Executing task ':cacheTest' (up-to-date check took 0.004 secs) due to:
  No history is available.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template1 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled
:cacheTest (Thread[Task worker,5,main]) completed. Took 0.011 secs.

$ ~/bin/gradle-4.0.2/bin/gradle cacheTest -Ptemplate=template2 -i
:cacheTest (Thread[Task worker,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.0 secs.
Executing task ':cacheTest' (up-to-date check took 0.003 secs) due to:
  No history is available.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template2 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template2.compiled
:cacheTest (Thread[Task worker,5,main]) completed. Took 0.011 secs.

$ ~/bin/gradle-4.0.2/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.0 secs.
Skipping task ':cacheTest' as it is up-to-date (took 0.003 secs).
:cacheTest UP-TO-DATE
:cacheTest (Thread[Task worker,5,main]) completed. Took 0.003 secs.

--------------------4.1--------------------
$ ~/bin/gradle-4.1/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker for ':',5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.003 secs.
Executing task ':cacheTest' (up-to-date check took 0.006 secs) due to:
  No history is available.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template1 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled
:cacheTest (Thread[Task worker for ':',5,main]) completed. Took 0.017 secs.

$ ~/bin/gradle-4.1/bin/gradle cacheTest -Ptemplate=template2 -i
:cacheTest (Thread[Task worker for ':',5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.0 secs.
Executing task ':cacheTest' (up-to-date check took 0.002 secs) due to:
  Output property '$1' file C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled has been removed.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template2 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template2.compiled
:cacheTest (Thread[Task worker for ':',5,main]) completed. Took 0.008 secs.

$ ~/bin/gradle-4.1/bin/gradle cacheTest -Ptemplate=template1 -i
:cacheTest (Thread[Task worker for ':' Thread 2,5,main]) started.
:cacheTest
Putting task artifact state for task ':cacheTest' into context took 0.001 secs.
Executing task ':cacheTest' (up-to-date check took 0.002 secs) due to:
  Output property '$1' file C:\Users\Chris Dore\Projects\taskOutputCaching\build\template2.compiled has been removed.
Compiling template C:\Users\Chris Dore\Projects\taskOutputCaching\template1 into C:\Users\Chris Dore\Projects\taskOutputCaching\build\template1.compiled
:cacheTest (Thread[Task worker for ':' Thread 2,5,main]) completed. Took 0.006 secs.

(Simon Bethke) #6

Hi @Chris_Dore,
thanks for this explanation! This already helps me to understand that I am not entirely missunderstanding this.
Your example is already summarizing the task I am doing very well.

What do you suggest for forwarding this issue as a bug-report or is there somebody who might know why this changed?


(Stefan Wolf) #7

Hi Simon,

so the change was that we don’t try to match up the previous output directories to the current ones when the task re-executes. A task only has one set of output files and one last execution.

Regarding the questions about caching of task outputs:

  • incremental build: if the inputs and outputs of the task didn’t change since the last execution, don’t do anything. This is documented here: https://docs.gradle.org/nightly/userguide/more_about_tasks.html#sec:up_to_date_checks
  • task output caching: if the inputs are the same to a task execution which was stored in the build cache, then re-use the task output from the build cache. This is documented here and here.
  • incremental tasks: only re-process the inputs that changed. This is documented here

Incremental tasks work in combination with the build cache und incremental build: Only process the actual files that changed. Currently, incremental build and the build cache only work on the task level. That means that you should probably should use all of the three features together for optimized performance.

Cheers,
Stefan


(Simon Bethke) #8

Hi @Stefan_Wolf,

thanks for this explanation. Most of the linked documentation I already read multiple times. Only the one about caching in the network.

How would I ideally solve this scenario: I have two projects that belong to the sasme multi-project build but can also get built separatly. Each is making use of the same task that generates some project specific output from project specific input.
Running the multi-project build twice would not make use of any of the gradle cache because the task input is always changing between the sub-projects.

Cheers,
Simon


(Stefan Wolf) #9

Why don’t you add two tasks, one for each project. That is probably what your plugin should do when applied to the corresponding project. Or am I missing something?


(Simon Bethke) #10

The task definition is a plugin implemented in Java. Somehow I have the impression I don’t understand something fundamental that is totally clear to you and others :confused:

I understand, that I have one task implementation in the plugin and call this task once in each project. My impression is, that the cache is caching per task-implementation and not per task-call or per sub-project.


(Axel von Engel) #11

I think you’re confused about what is considered a task. It can either mean the implementation class (e.g. CodeGeneratorTask extends DefaultTask {...} in CodeGeneratorTask.java) or an instance of that task in a specific project (e.g. task codeGenerator(type: CodeGeneratorTask) {...} in build.gradle). I think the important point you’re missing is that Gradle considers the same task defined in both of your subprojects to be different tasks:

:rootProject
- :subProjectA
    - :subProjectA:yourTask
- :subProjectB
    - :subProjectB:yourTask