Best Practice To Enable Incremental Build

Hi,

I have a project which fetches proto-files from another repository in order to be able to generate a gRPC-client.
This behaviour is summarized in a custom task, which is readily available in my public repository as sample.

tasks.register<Exec>("fetchProtoFiles") {
    description = "Fetching proto files from 'foo-server' to generate gRPC client."
    onlyIf { !file("build/cloned").exists() }
    workingDir(".")
    commandLine(
        "git",
        "clone",
        "--depth=1",
        "--branch=main",
        "--single-branch",
        "https://github.com/patient-developer/foo-server.git",
        "build/cloned/"
    )
    doLast {
        copy {
            from("build/cloned/src/main/proto/")
            into("build/proto/")
        }
    }
}

Due to the onlyIf closure I receive the desired result, that generateProto is UP-TO-DATE since fetchProtoFiles is SKIPPED.

$ ./gradlew build --console=verbose -Duser.language=en
> Task :protobufDummy UP-TO-DATE
> Task :extractIncludeProto UP-TO-DATE
> Task :extractProto UP-TO-DATE
> Task :fetchProtoFiles SKIPPED
> Task :generateProto UP-TO-DATE
> Task :compileJava UP-TO-DATE

However, the Task is now SKIPPED.

In order to be aligned with incremental build, it would be desirable to define inputs and outputs, no? To inspect this approach, I did setup the branch inputs-outputs.

I am struggeling to define the inputs and outputs of that task. The output-directory would be build/proto- no? But when setting the input-directory to build/cloned it fails if the directory is not available, i.e., when doing a first checkout or gradle clean, e.g.,

$ ./gradlew clean build --console=verbose -Duser.language=en

I receive

> Task :clean
> Task :protobufDummy UP-TO-DATE
> Task :extractIncludeProto
> Task :extractProto
> Task :fetchProtoFiles FAILED

[Incubating] Problems report is available at: file:///***/multi-project/build/reports/problems/problems-report.html

FAILURE: Build failed with an exception.

* What went wrong:
A problem was found with the configuration of task ':fetchProtoFiles' (type 'Exec').
  - Type 'org.gradle.api.tasks.Exec' property '$1' specifies directory '***\multi-project\build\cloned' which doesn't exist.
    
    Reason: An input file was expected to be present but it doesn't exist.
    
    Possible solutions:
      1. Make sure the directory exists before the task is called.
      2. Make sure that the task which produces the directory is declared as an input.

However, the Task is now SKIPPED.

In order to be aligned with incremental build,

Having it skipped under some condition I would also count as incremental build (if the skipping is done correctly).
Incremental build means, that unnecessary work is avoided.
If a task does properly declare its inputs and outputs, Gradle can check whether either the inputs changed since last execution or the outputs changed since last execution and if neither the inputs nor outputs changed can assume the task is up-to-date (or for cached tasks could take the result from the cache).

it would be desirable to define inputs and outputs, no?

Always, especially outputs to be able to then wire those task outputs to other task’s inputs to automatically have the necessary task wiring implicitly. (Practically any explicit dependsOn where the left-hand side is not a lifecycle task is a code-smell and usually a sign that this wiring is not done properly)

But the inputs you probably cannot declare for your task, as your input consist of remote files that Gradle could not check up-front.

Whether skipping the task just because “something” is clone already is correct is up-to-you.

But when setting the input-directory to build/cloned it fails if the directory is not available, i.e., when doing a first checkout or gradle clean , e.g.,

Even if it would not fail, you would get a wrong behavior then.
Because the build/cloned is not your input, it is an intermediary result of the first half of your task acitons.
The actual input is the files in the internet and so cannot really be declared.

If you have other inputs for the task to which you want to wire outputs of other tasks, it might still make sense to declare those inputs, but as you have remote files as inputs, you would then probably declare the task as “untracked”, so that it will never be up-to-date due to inputs / outputs, but could still take part in proper output/input wiring.

So you would then probably still have the outputs declared though either way, so that you can wire the task outputs to some other task’s input properly.

You could also split your task into two, one that does the cloning and one that does the copying, but if you anyway never touch / change / update the clone you would probably not win much.

Btw. in most cases you do not want to use Copy or copy, but Sync or sync or you might end up with additional undesired files in the target location.

Okay, so this approach is okay’ish. :+1:

So, since generateProto is not a lifecycle task (no?), this approach is code-smell?

tasks {
    generateProto {
        dependsOn("fetchProtoFiles")
    }
}

It would be cleaner to write something like (just a sketch):

tasks {
    generateProto {
        inputs(from("fetchProtoFiles"))
    }
}

generateProto is not a lifecycle task (no?)

No.
Lifecycle tasks are tasks that are about the lifecycle.
They do no own work and are only meant to depend on other tasks explicitly.
Examples are assemble or build.

So, since generateProto is not a lifecycle task (no?), this approach is code-smell?

Yes.

It would be cleaner to write something like (just a sketch)

Exactly.
Properly define the task outputs and wire the task outputs of fetchProtoFiles to the task inputs of generateProto and you will get the necessary task dependency automatically as long as it is necessary.

Okay, great.

I updated my branch inputs-outputs by

tasks.register<Exec>("fetchProtoFiles") {
    description = "Fetching proto files from 'foo-server' to generate gRPC client."
    val clonedFolder = "build/cloned"; // (1)
    onlyIf { !file(clonedFolder).exists() } // apply (1)
    outputs.dir("build/proto") // (2)
    workingDir(".")
    commandLine(
        "git",
        "clone",
        "--depth=1",
        "--branch=main",
        "--single-branch",
        "https://github.com/patient-developer/foo-server.git",
        clonedFolder // apply (1)
    )
    doLast {
        sync { // (3)
            from("$clonedFolder/src/main/proto") // apply (1)
            into(outputs.files) // apply (2)
        }
    }
}

As you can see

(1) defined the build/clonedfolder to re-use it
(2) defined the outputs.dir and use it to copy parts
(3) use sync instead of copy as you said

However, I cannot wrap my head around the syntax to make generateProto depend on the outputs. Probably something like this, but it does not work

tasks {
    generateProto {
        inputs(from("fetchProtoFiles").outputs)
    }
}

And last but not least: Can generateProto depend on multiple inputs ? I am asking, since my “real project” fetches proto-files from multiple repositories.

Probably something like this, but it does not work

Yeah, no, I did not think you mean it that literal above. :smiley:
That does not make much sense of course.

I don’t know your generateProto task.
But it probably has some input property in which you configure which files to generate from.

Assuming this is the one from the protobuf plugin, this should probably do what you want:

val fetchProtoFiles = tasks.register<Exec>("fetchProtoFiles") {
    outputs.dir("build/proto")
}
tasks.generateProto {
    addSourceDirs(files(fetchProtoFiles))
}

Btw. don’t hard-code build, it is also a configurable location, use layout.buildDirectory.dir("proto") instead, similar for clonedFolder.
And workingDir(".") should be redundant, if none is configured explicitly, it should use the project dir automatically already.

Ah, sorry. This task is provided by the protobuf gradle plugin, which generates java code from proto-files. Or does our discussion simply not work for generateProto since it is a lifecycle task?

So, when I leave it with dependsOn via

tasks {
    generateProto {
        dependsOn("fetchProtoFiles")
    }
}

then I see the dependency in the task-graph

$ ./gradlew generateProto --task-graph
Task graph printing is an incubating feature.
Tasks graph for: generateProto
\--- :generateProto (com.google.protobuf.gradle.GenerateProtoTask)
     +--- :extractIncludeProto (com.google.protobuf.gradle.ProtobufExtract)
     |    \--- :protobufDummy (org.gradle.api.DefaultTask)
     +--- :extractProto (com.google.protobuf.gradle.ProtobufExtract)
     |    \--- :protobufDummy (*)
     \--- :fetchProtoFiles (org.gradle.api.tasks.Exec)

(*) - details omitted (listed previously)

However, when simply working with source-files via

tasks {
    generateProto {
        addSourceDirs(files("fetchProtoFiles"))
    }
}

I don’t see the fetchProtoFiles dependency in the task-graph

$ ./gradlew generateProto --task-graph
Task graph printing is an incubating feature.
Tasks graph for: generateProto
\--- :generateProto (com.google.protobuf.gradle.GenerateProtoTask)
     +--- :extractIncludeProto (com.google.protobuf.gradle.ProtobufExtract)
     |    \--- :protobufDummy (org.gradle.api.DefaultTask)
     \--- :extractProto (com.google.protobuf.gradle.ProtobufExtract)
          \--- :protobufDummy (*)

(*) - details omitted (listed previously)

Or does our discussion simply not work for generateProto since it is a lifecycle task?

It is not a lifecycle task and I just have shown you how to properly wire it, so I don’t get your question.

I don’t see the fetchProtoFiles dependency in the task-graph

Because you configure the directory fetchProtoFiles as source dir, that is not what I have shown you.

Ah, sorry - I missed the detail, that you also defined fetchProtoFiles via
val fetchProtoFiles = tasks.register(“fetchProtoFiles”) { }

Yes, now everything works and for the sake of completeness, the build.gradle.kts looks like

import com.google.protobuf.gradle.id

plugins {
    java
    alias(libs.plugins.google.protobuf)
    alias(libs.plugins.spring.framework)
}

java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(25)
    }
}

repositories {
    mavenCentral()
}

dependencies {
    implementation(libs.grpc.all)
    implementation(libs.protobuf.java)
    implementation(libs.spring.boot.starter)
    implementation(libs.spring.boot.starter.web)
    implementation(platform(org.springframework.boot.gradle.plugin.SpringBootPlugin.BOM_COORDINATES))
    implementation(libs.spring.framework.grpc)
}

val fetchProtoFiles = tasks.register<Exec>("fetchProtoFiles") {
    description = "Fetching proto files from 'foo-server' to generate gRPC client."
    val clonedFolder = layout.buildDirectory.dir("cloned"); // (1)
    onlyIf { !file(clonedFolder).exists() } // apply (1)
    outputs.dir(layout.buildDirectory.files("proto")) // (2)
    commandLine(
        "git",
        "clone",
        "--depth=1",
        "--branch=main",
        "--single-branch",
        "https://github.com/patient-developer/foo-server.git",
        clonedFolder // apply (1)
    )
    doLast {
        sync { // (3)
            from("$clonedFolder/src/main/proto") // apply (1)
            into(outputs.files) // apply (2)
        }
    }
}

sourceSets.main {
    proto {
        srcDir(layout.buildDirectory.dir("proto"))
    }
}

tasks {
    generateProto {
        addSourceDirs(files(fetchProtoFiles))
    }
}

protobuf {
    protoc {
        artifact = libs.protobuf.protoc.get().toString()
    }
    plugins {
        id("grpc") {
            artifact = libs.protoc.grpc.java.get().toString()
        }
    }
    generateProtoTasks {
        all().forEach {
            it.plugins {
                id("grpc") {
                    option("@generated=omit")
                }
            }
        }
    }
}

Some more points:

  • I would probably replace onlyIf { !file(clonedFolder).exists() } by onlyIf { !clonedFolder.get().asFile.exists() }
  • clonedFolder is a Provider<Directory> and commandLine uses toString(), so you probably want to use a command line argument provider that resolves it only at execution time or you get a very strange folder at a very strange place, so you want to replace the commandLine(...) call by
    executable("git")
    argumentProviders.add(
        CommandLineArgumentProvider {
            listOf(
                "clone",
                "--depth=1",
                "--branch=main",
                "--single-branch",
                "https://github.com/patient-developer/foo-server.git",
                clonedFolder.get().asFile.absolutePath
            )
        }
    )
    
    and from("$clonedFolder/src/main/proto") by from(clonedFolder.map { it.dir("src/main/proto") }).
  • Never configure paths to wire things together, but again, wire task outputs to inputs, so you probably want to replace
    sourceSets.main {
        proto {
            srcDir(layout.buildDirectory.dir("proto"))
        }
    }
    
    tasks {
        generateProto {
            addSourceDirs(files(fetchProtoFiles))
        }
    }
    
    by
    sourceSets.main {
        proto {
            srcDir(fetchProtoFiles)
        }
    }
    
    then all consumers (that properly use the Gradle API) of the proto source directory set of the main source set automatically get the outputs of the fetchProtoFiles task as source directory and automatically have the necessary task dependency. (Assuming that the generateProto task does exactly that)

Thanks for the follow-up. :slight_smile:

I also thought, that I do not want to emphasie the cloned-folder, since this is simply a helper directory for me. Thus, I came up with the idea to use the proto-folder as condition for the onlyIf closure, like

val fetchProtoFiles = tasks.register<Exec>("fetchProtoFiles") {
    description = "Fetching proto files from 'foo-server' to generate gRPC client."
    val clonedFolder = layout.buildDirectory.dir("cloned")
    val protoFolder = layout.buildDirectory.dir("proto")
    onlyIf { !protoFolder.get().asFile.exists() }
    outputs.dir(protoFolder.get().asFile.absolutePath)
    commandLine(
        "git",
        "clone",
        "--depth=1",
        "--branch=main",
        "--single-branch",
        "https://github.com/patient-developer/foo-server.git",
        clonedFolder.get().asFile.absolutePath
    )
    doLast {
        sync {
            from("${clonedFolder.get().asFile.absolutePath}/src/main/proto")
            into(protoFolder.get().asFile.absolutePath)
        }
    }
}

And later, when the java-code is generated from the proto-files, simply remove the build/cloned folder, no? :thinking:

tasks {
    generateProto {
        addSourceDirs(files(fetchProtoFiles))
        doLast {
            delete(layout.buildDirectory.dir("cloned"))
        }
    }
}

Actually, I tried to use the outputs property to not have to initialize another variable and simply write

outputs.dir(layout.buildDirectory.files("proto"))
onlyIf { !file(outputs.files).exists() }

But then I receive

> Task :fetchProtoFiles FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Could not evaluate onlyIf predicate for task ':fetchProtoFiles'.
> Could not evaluate spec for 'Task satisfies onlyIf spec'.

And last but not least - sorry, I don’t get your suggestion

clonedFolder is a Provider<Directory> and commandLine uses toString(), so you probably want to use a command line argument provider that resolves it only at execution time or you get a very strange folder at a very strange place, so you want to replace the commandLine(...) call by

what strange folder ?

Thus, I came up with the idea to use the proto-folder as condition for the onlyIf closure,

Whatever you think makes sense.
No matter what you choose it imho is wrong, because you will never get an update if the external files change, unless you manually delete the folder you check there.
Maybe you want to put an additional file somewhere where you check when you cloned last and check that additionally, so that it at least is re-cloned every X days to at least mitigate that and delete the clone folder in a doFirst.
But as I said, you have to know what condition is right for you. If those remote files never change, it is probably unnecessary. :man_shrugging:

outputs.dir(protoFolder.get().asFile.absolutePath)

outputs.dir(protoFolder) should be enough, that method can properly handle providers of directories lazily like most methods that expect something.

And it is not just that it “should be enough”, doing it like you do now is plainly wrong, because again you now resolve the provider during configuration time so way too early as its configuration could still change later on.

Within the onlyIf { ... } it is fine to do so, as well as within the doLast { ... } as both are done at execution time. But at conrfiguration time you should practically never de-lazify a property or other lazy type as you then introduce the same race-conditions from the bad-old afterEvaluate days.

So to stay lazy it would be either outputs.dir(protoFolder.map { it.asFile.absolutePath }) or - as it is totally unnecessary to do it manually - just outputs.dir(protoFolder)as just said.

clonedFolder.get().asFile.absolutePath

Same here, you prematurely evaluate the provider that could still change, thus I said you need to use the argument provider which is evaluated only when needed at execution time and not already at configuration time.

from("${clonedFolder.get().asFile.absolutePath}/src/main/proto")

You could do that as it is at execution phase, but it is overly long and unnecessary, just do it like I suggested in my last comment, that greatly improves readability and also idiomaticness.

into(protoFolder.get().asFile.absolutePath)

Like above, into(protoFolder) should just be fine and enough.

And later, when the java-code is generated from the proto-files, simply remove the build/cloned folder, no?

That seems quite a unnecessary waste of precious time.
Just leave it laying around, it does not disturb.
If you don’t want to have it in build directly, you could clone to temporaryDir of Task.

But then I receive

Run with --stacktrace and you see the full error.
`outputs.files´ is potentially multiple files and folders, so you cannot derive one file from it like that.
Also, better be not stingy with local variables, it is cleaner and and more reliable to have the variable.

what strange folder ?

val clonedFolder = layout.buildDirectory.dir("cloned")
println(">>>$clonedFolder<<<")

=>

>>>map(org.gradle.api.file.Directory property(org.gradle.api.file.Directory, fixed(class org.gradle.api.internal.file.DefaultFilePropertyFactory$FixedDirectory, .../showcase/build)) org.gradle.api.internal.file.DefaultFilePropertyFactory$PathToDirectoryTransformer@5348b048)<<<