Cache Understanding

Hi,

I am trying to understand the behaviour of the build-cache in contast to the (so-called) “project-cache”.

Due to the discussion in my previous topic and I managed to enable incremental build and skip fetching proto-files from another repistory to fully skip the process of generating (java-) code from the proto files.

Thus, I am a bit confused about the following behaviour: After cleaning up, i.e., executing ./gradlew clean, I understand the following behaviour:

$ ./gradlew build --console=verbose --no-daemon -Duser.language=en
> Task :protobufDummy UP-TO-DATE
> Task :extractIncludeProto
> Task :extractProto

> Task :fetchProtoFiles
Cloning into '***\multi-project\build\cloned'...

> Task :generateProto
> Task :compileJava
> Task :processProtoResources
> Task :processResources
> Task :classes
> Task :resolveMainClassName
> Task :bootJar
> Task :jar
> Task :assemble
> Task :extractIncludeTestProto
> Task :extractTestProto
> Task :generateTestProto NO-SOURCE
> Task :compileTestJava NO-SOURCE
> Task :processTestProtoResources
> Task :processTestResources NO-SOURCE
> Task :testClasses UP-TO-DATE
> Task :test NO-SOURCE
> Task :check UP-TO-DATE
> Task :build

BUILD SUCCESSFUL in 9s
13 actionable tasks: 13 executed
$ ./gradlew build --console=verbose --no-daemon -Duser.language=en
> Task :protobufDummy UP-TO-DATE
> Task :extractIncludeProto UP-TO-DATE
> Task :extractProto UP-TO-DATE
> Task :fetchProtoFiles SKIPPED
> Task :generateProto UP-TO-DATE
> Task :compileJava UP-TO-DATE
> Task :processProtoResources UP-TO-DATE
> Task :processResources UP-TO-DATE
> Task :classes UP-TO-DATE
> Task :resolveMainClassName UP-TO-DATE
> Task :bootJar UP-TO-DATE
> Task :jar UP-TO-DATE
> Task :assemble UP-TO-DATE
> Task :extractIncludeTestProto UP-TO-DATE
> Task :extractTestProto UP-TO-DATE
> Task :generateTestProto NO-SOURCE
> Task :compileTestJava NO-SOURCE
> Task :processTestProtoResources UP-TO-DATE
> Task :processTestResources NO-SOURCE
> Task :testClasses UP-TO-DATE
> Task :test NO-SOURCE
> Task :check UP-TO-DATE
> Task :build UP-TO-DATE

BUILD SUCCESSFUL in 5s

Thus, after the first run, the local project-cache is built and in the second run, fetchProtoFiles can be SKIPPED due to the onlyIf closure and thus generateProto is UP-TO-DATE.

However, now I try the same with caching.

I do

  • ./gradlew clean
  • ./gradlew build --build-cache -Duser.language=en --console=verbose --no-daemon (which leads to the folder build-cache-1 in im gradle home directory
  • ./gradlew clean
  • ./gradlew build --build-cache -Duser.language=en --console=verbose --no-daemon

And now I am confused

$ ./gradlew build --build-cache -Duser.language=en --console=verbose --no-daemon
To honour the JVM settings for this build a single-use Daemon process will be forked. For more on this, please refer to https://docs.gradle.org/9.3.0/userguide/gradle_daemon.html#sec:disabling_the_daemon in the Gradle documentation.
Daemon will be stopped at the end of the build 
> Task :protobufDummy UP-TO-DATE
> Task :extractIncludeProto
> Task :extractProto

> Task :fetchProtoFiles
Cloning into '***\multi-project\build\cloned'...

> Task :generateProto FROM-CACHE
> Task :compileJava FROM-CACHE
> Task :processProtoResources
> Task :processResources
> Task :classes
> Task :resolveMainClassName
> Task :bootJar
> Task :jar
> Task :assemble
> Task :extractIncludeTestProto
> Task :extractTestProto
> Task :generateTestProto NO-SOURCE
> Task :compileTestJava NO-SOURCE
> Task :processTestProtoResources
> Task :processTestResources NO-SOURCE
> Task :testClasses UP-TO-DATE
> Task :test NO-SOURCE
> Task :check UP-TO-DATE
> Task :build

BUILD SUCCESSFUL in 8s

:red_exclamation_mark: Why is fetchProtoFiles not retrieved from cache, i.e., not attributed with FROM-CACHE? Even stranger, the task generateProto, which depends on that, is retrieved from cache?

Thus, due to the local project-cache, this task can be skipped, but with build-cache it is not possible to do so? :thinking:

in contast to the (so-called) “project-cache”.

There is no “so-called project cache”.
I have no idea what you are talking about.
Where did you get that term from, who calls what like that?

Thus, after the first run, the local project-cache is built and in the second run, fetchProtoFiles can be SKIPPED due to the onlyIf closure and thus generateProto is UP-TO-DATE.

That has nothing to do with a “project cache”.
For each task Gradle knows a fingerprint of the inputs and outputs and if nothing changed since the last execution, the task is up-to-date.
Even if fetchProtoFiles would be executed each time, as long as the files it clones and are used as input for generateProto have the same content (and no outputs were modified somehow) the generateProto task would still be up-to-date.

Why is fetchProtoFiles not retrieved from cache, i.e., not attributed with FROM-CACHE ?

Why should it be?
That task is not a cacheable task.
Tasks are only cacheable if you explicitly declare them to be.
For many task types it for example would even have a negative effect.
For example for a copy or sync task it is much faster to just do the copy operation than to use a cached result, as the cached result would need to be unzipped first.
Or for a jar or other zip or archive task, it is much faster to just build the jar except of unpacking a zip from a zip which is the build cache entry.

Even stranger, the task generateProto , which depends on that, is retrieved from cache?

Not strange at all.
The generateProto task is a cacheable task.
The files you cloned and use as inputs are the same, so the cache fingerprint is the same, and so the cached result can be used for that task.

For example in this forum post or there at stackoverflow

At this point I don’t get the difference between incremental build and cache or let’s say, what “fingerprints” go into the cache or how and why is a task cachable or not.

We discussed in the Best Practice To Enable Incremental Build that generateProto should depend on the fetchProtoFiles via outputs and not plain dependsOn.
And all the files we are talking about, being it cloned, copied or auto-generated are localed in the build/ directory.
Exactly that folder is purged, when doing .gradlew cleanand thus onlyIf { targetFolder } will lead to execution of the fetchProtoFiles task, because targetFolder does not exists.
The same holds implicitely true for the generateProto-task, no? No source files from the targetFolder, no output of generated code, since the whole build/ folder is gone …

But … but … for any reason, guess that’s the caching mechanism, when building with --build-cache then purging the build/ directory does not affect the generateProto task. Although the source files as well as the generated java files based on the proto-files are gone, everything is gone … these information are stored in the cache?

So, if caching successfully prevents generateProto from runing again - which is awesome - then I guess it simply would make sense to also prevent the fetchProtoFiles task from running again, since generateProto, which depends on it, doesn’t need it - no? :thinking:

Can we somehow manually add fetchProtoFiles to the cache?

Generally, I want to make usage of bitbucket’s caching. :slight_smile:

For example in this forum post or there at stackoverflow

Ah, sorry, forgot that <root project>/.gradle is called “project cache dir” or I might have understood what you mean.

But still, speaking of a “project cache” is not really appropriate.

At this point I don’t get the difference between incremental build and cache or let’s say

Incremental build means that a task can be up-to-date.
up-to-date means, that the inputs are the same since last executing the specific task and outputs were not changed since the task last finished.

what “fingerprints” go into the cache

The inputs of a task (including runtime classpath of the task and so on) are fingerprinted and build together the cache key.
Under the cache key, the task output is stored in an archive so that it can be reused for the same inputs in the future.

why is a task cachable or not

A task is cacheable if it declares to be cacheable, as not for all task types it makes sense to use the cache as it could even slow down the build due to the overhead and it might be faster to just execute the task.

But … but … for any reason, guess that’s the caching mechanism, when building with --build-cache then purging the build/ directory does not affect the generateProto task

Yes, it does affect the task.
If you wouldn’t have cleared the build folder, the generateProto task could have been up-to-date.
As you removed the outputs of the task, but it is cacheable and you already executed the task for the same inputs before, the outputs were taken from the cache which is not located in build.

The cache is not bound to one worktree.
If you have two work-trees, build in one, then in the other run the same task with the same inputs, the outputs can also be taken from the cache there.
Or if you switch between branch A and B in one work-tree and in both branches you have different inputs for the task, then it can go like this - given no cache entry is there yet for the task:

  • switch to A
  • run task => task is executed, result is cached
  • switch to B
  • run task => task is executed, result is cached
  • switch to A
  • run task => result is taken from cache
  • switch to B
  • run task => result is taken from cache

You can even have CI/CD that writes to a remote cache instance,
and then if you locally run the same task with the same inputs,
the result can be downloaded from the remote cache and reused on your local machine even though you never executed the task locally before.

Although the source files as well as the generated java files based on the proto-files are gone, everything is gone … these information are stored in the cache?

The source files were not gone.
Not from the view of the generateProto task.
The fetchProtoFiles task was run before, so the files are there again.
And as the files are the same as before, the task result for generateProto could be taken from cache.

So, if caching successfully prevents generateProto from runing again - which is awesome - then I guess it simply would make sense to also prevent the fetchProtoFiles task from running again, since generateProto , which depends on it, doesn’t need it - no?

No, Gradle could not guess what fetchProtoFiles does.
The outputs of fetchProtoFiles are the inputs for generateProto.
So to evaluate whether generateProto can be taken from cache, fetchProtoFiles output files have to be present, so it cannot be skipped.

If fetchProtoFiles would be a cacheable task its result could have been taken from cache.
But just with the up-to-dateness, it doesn’t really make sense as the inputs are remote files that you clone, so you could never know whether the inputs are the same and so you could never cleanly take the outcome from cache.

Can we somehow manually add fetchProtoFiles to the cache?

You can of course make it cacheable yes.
It just does not make sense, as you cannot cleanly define its inputs.
Just like it also does not make sense to ever have it up-to-date as you never know how the state of the remote files is.

Hm,
I beg to differ. I updated my sample project according to my real project such that the git clone process is tagged by the git tag.

val fetchProtoFiles = tasks.register<Exec>("fetchProtoFiles") {
    // omitted
    commandLine(
        "git",
        "clone",
        "--depth=1",
        "--branch=1.0.0", // <<== fixed by git-tag
        "--single-branch",
        "https://github.com/patient-developer/foo-server.git",
        layout.buildDirectory.dir("cloned").get().asFile.absolutePath
    )
    // omitted
}

That way, the input proto files are fixed, at least to my understanding. As long as nobody changes the git commit while preserving the git tag, the input is pretty clearly defined to me. :+1:

This approach works for us pretty well. The other repository may advance and further update or extend the proto-files (with new git-tags), but we are save - no?

So, does this somehow convince you, that fetchProtoFiles might be cacheable? :wink:

That way, the input proto files are fixed , at least to my understanding. As long as nobody changes the git commit while preserving the git tag, the input is pretty clearly defined to me.

Yes, as long as that does not happen, it could happen though.
But just as I said with your onlyIf, you have to know your conditions.
The question is whether it makes sense to do a Git clone operation for something that is never exepected to change ever.
You could then as well check in those files to remove the need to download them if they never change. :man_shrugging:

So yeah, you could probably make the task cacheable if you want,
only having the actual files you are going to use later on as output, so that those files can be taken from the cache.

Well, as so often, it is hard to tell whether the imported proto files change regularly or rarely, it can be weeks - yes, but it can also be months.
Basically, the imported files are extended very often, but it is hard to tell how often this modification affects my project, i.e., how often I have to update the git tag, i.e., how often I have to re-import them.

But yes, in contrast to the imported proto files, our code base is modified very often.

Thus, to my understanding / for my point of view, it would be pretty comfortable with having the imported files in the cache and when I am forced to update the git tag, then yes … I have to re-run the caching again (i.e., probably delete the Bitbucket cache once).

So, I’d say it is worth testing. :slight_smile:

Thus, according to the build-cache#run-time-api it suffices to simply put outputs.cacheIf { true } into my fetchProtoFiles task? :slight_smile:

Thus, to my understanding / for my point of view, it would be pretty comfortable with having the imported files in the cache and when I am forced to update the git tag, then yes … I have to re-run the caching again (i.e., probably delete the Bitbucket cache once).

If you declare the tag name as input and rely on the tags not being modified, there will not be a problem.
It can be taken from cache or be up-to-date as long as the tag is the same and as soon as you configure a different tag, the task will be out-of-date and also not taken from cache, as the inputs changed.

So, I’d say it is worth testing.

Always. :slight_smile:

Thus, according to the build-cache#run-time-api it suffices to simply put outputs.cacheIf { true } into my fetchProtoFiles task? :slight_smile:

Well, yes and no.
When making a task cacheable you should even more make sure that the inputs and outputs are declared correctly.
When the inputs and outputs only control whether the task is up-to-date, then you still manually delete the outputs to make the task rerun.
But even then it is better to have the inputs and outputs corrected.
So, remove the onlyIf, declare everything that needs to be declared as input as input.
Ask yourself what if changed should cause the task to rerun, like the tag name, the server url, …
And make sure to declare the final outputs of the task properly.
Ask yourself what files you expect to be there when the task is served from cache and not executed.

If you got the inputs and outputs declared properly, the onlyIf is not necessary, as the task will be up-to-date if nothing changed and then you can also declare the task cacheable with the line you mentioned so that it can be served from build cache without the need to manually discard cache entries or similar.