Cache inputs resolved from main input file to speed up configure times


(Staffan Forsell) #1

Hi,
I feel like the problem of caching task inputs resolved from another input file in order to speed up subsequent configure times would a have standard Gradle solution but I have not found any.

This problem can be found in many builds using e.g xml entities/XIncludes or header files.
In our case it’s a lot of xml entities and parsing these xml files every configure time is expensive and slows down our “nothing to do” time a lot.

I saw Caching results of @InputFiles method, is it bad? but in this example, getDependencies() is still always called in configure time which in our case is bad enough.

Is there any good examples on how to avoid expensive inputs calculations on subsequent executions?

My idea for something like this would be:

@InputFiles
Collection<File> xsltInputs = new CacheInputs { 
  // Calculate file inputs closure
}

CacheInputs would then save the calculated files to ‘/build/tmp/taskName-something’.
During the next executions’s configure time, if this file exists, read inputs from it and check file modification times for these files. If file times have changes, run closure to calculate new inputs and save these.

But if there was a was a way to use gradle’s own built in task input caching, that would probably be more effective than creating a own cache in /build.


(uklance) #2

You could remove @InputFiles from the xsltInputs and use a custom method. Eg

FileCollection xslts;

@InputFiles 
FileCollection createHashes() {
    // create file(s) under $buildDir based on xslts
}

(Staffan Forsell) #3

Yes I could, but the idea of assigning the CacheInputs would be that this class could contain all the caching logic and be reusable and the closure would be supplied based on how to extract dependencies from some srcFiles.
I’m experimenting a little bit and will post a tested prototype when I’m done.


(uklance) #4

You could do it like

FileCollection xslts 

@InputFiles 
FileCollection xsltHashes = new CustomFileHash(this) {
   createHashes(xslts)
}
public CustomFileHash implements FileCollection {
    private FileCollection fileCollection
    private final Task task
    private final Closure closure 
    public CustomFileHash(Task task, Closure closure) {
        this.task = task
        this.closure = closure 
    }
    protected FileCollection lazyInitFileCollection() {
        if(fileCollection == null) {
           fileCollection = task.project.files(closure)
         }
         return fileCollection
    }
    @Override
    Set<File> getFiles() {
        return lazyInitFileCollection().getFiles() 
    }
    // todo implement other FileCollection methods
} 

(Stefan Wolf) #5

Hi @staffanf,

if you don’t use Collection<File> but FileCollection, things become a little bit easier. You can still create your CacheInputs class which takes a closure. But then you would wrap that class into a FileCollection by using project.files(). If the result of CacheInputs implements Callable, then the Callable itself would be evaluated as late as possible.

Another option could be to split off a separate task which calculates the inputs. This task would have the original xml file as an input and would create a file containing all the “real” input files. The current task would then read that file and declare the contents as input files. This way both tasks can use Gradle’s incremental build infrastructure. You could even share the analysis via the build cache, if you manage to keep the information relocatable (i.e. using a relative path). We did something similar for some times to detect headers for native compilation.

Currently, we use a CalculatedInputFileCollection for the header dependencies. Note that this is still internal API, so I would not suggest you use it in your plugin, but it should give you some ideas how to achieve what you want to achieve.

Cheers,
Stefan


(Staffan Forsell) #6

Hi @Stefan_Wolf Thanks for the reply and good feedback.
I went back and tried to do the simplest possible working xslt task.
My current implementation was incremental and parallel and this combined with tracking calculated inputs made me take a step back and remove the incremental stuff for now.

I landed in something like this right now:

  @InputFiles
  FileCollection calculatedInputs = project.files(new CalculatedInput())

  class CalculatedInput implements Callable<Collection<File>> {

    @Override
    Collection<File> call() throws Exception {
      // Do inputs calculation
    }
    
  }

But see this method is getting called at least once in configure() and then just before the task executes.
I’m curious if there’s info about when/why task inputs are evaluated in configure().

I’m thinking I have a case where I wan’t to cache these inputs the first time the method is called but I can’t be really sure that they are correct since the input file might not exists until just before the task executes. The inputs might be generated by a previous transformation.

But I guess if I would save a cache during execution with file mod times. If these where not changed, then the same inputs could be used again.

CalculatedInputFileCollection would be interesting but I guess it needs some work before it will be an public api.

Question:
I guess org.gradle.PersistentCache<> isn’t something that will be public anytime soon? It seems that might be useful in this kinda case… Although it would open up a lot of possibilities for errors.