Automatically build task when input is missing

rjbell4 · March 14, 2018, 8:46pm

First, a disclaimer: I’m brand new to Gradle, yet trying to replace a complicated build system with Gradle. Nevertheless, I’ve been very impressed with the measured, calculated approach that Gradle development seems to have taken, and the Gradle community as a whole. Therefore, I’m hoping you can help.

I’ve read up about incremental build tasks, the build cache, specifying build inputs and outputs, etc. My understanding is that when I connect outputs of one task to the input of a second task, Gradle should “do the right thing”. I take that to mean that if the input is missing, that Gradle will invoke the task that produces that input.

Instead, what I see is that Gradle is basically treating a missing input as an empty input file. If the file does exist, Gradle’s up-to-date logic appears sound. But what I cannot get (and what I desire) is for the producing task to be run automatically, in order to generate its output, which is the input for the desired task.

I’m working on a more complicated environment (biting off a lot at once!), but I think this small example demonstrates what I mean:
defaultTasks ‘foo’

task bar {
    inputs.file "${projectDir}/in.txt"
    outputs.file "${projectDir}/bar.txt"
    doLast {
        def f = file("${projectDir}/bar.txt")
        if (!f.exists()) {
            f.text = "bar"
            println "Created file ${f}"
        }
    }
}

task foo {
    inputs.file "${projectDir}/bar.txt"
    outputs.file "${projectDir}/foo.txt"
    doLast {
        def f = file("${projectDir}/foo.txt")
        if (!f.exists()) {
            f.text = "foo"
            println "Created file ${f}"
        }
    }
}

If I run this in an otherwise empty directory (or if in.txt exists), just the foo task runs, even though bar.txt does not exist. I was hoping – without explicitly declaring the dependency – that when I requested foo to run, Gradle would see that it needs bar.txt, and that bar.txt is provided by the bar task, and therefore would run the bar task first.

Similarly, if I change in.txt, and then request to build foo, there is no consideration of the bar task. I would hope in that scenario that Gradle would see that the foo task depends on the bar task via bar.txt, and that since in.txt change, bar is out-of-date, and therefore that bar should be rebuilt, and if that changes bar.txt, then foo should be rebuilt as well.

Am I missing something that would achieve my desired behavior, without needing to explicitly link all the tasks together (which to me would be redundant)?

Thanks,
Bob

rjbell4 · March 14, 2018, 9:32pm

Note that if I say something like explicitly making foo dependsOn bar, then both foo and bar get evaluated, and the correct evaluation happens depending on if they are up-to-date or not.
But shouldn’t Gradle automatically be able to state that foo dependsOn bar, based on the connection between the inputs and outputs?

sterling · March 14, 2018, 11:26pm

Hi @rjbell4

Gradle can do the wiring of tasks together for you, but you need to do it a little differently. The behavior you’re seeing is due to the way you’re specifying the inputs to the foo task. Gradle doesn’t automatically connect tasks on file path alone.

There are a couple of different ways to setup the producer/consumer relationship. If you pass bar as an input to foo, Gradle will treat any output of bar as an input to foo.

Here’s your example rewritten with that in mind:

task bar {
    def outputFile = file("bar.txt")
    
    inputs.file "in.txt"
    outputs.file outputFile
    doLast {
        outputFile.text = "bar"
        println "Created file ${outputFile}"
    }
}

task foo {
    def outputFile = file("foo.txt")

    inputs.files bar
    outputs.file outputFile
    doLast {
        outputFile.text = "foo"
        println "Created file ${outputFile}"
    }
}

One thing to consider with what you’re doing now, these tasks are ad-hoc tasks and do not have a specific type. The runtime inputs.file and outputs.file APIs are a little more cumbersome to use. It also prevents you from giving your inputs/outputs nice names and makes it hard to enforce required inputs.

If you wanted to make your tasks cacheable, these ad-hoc tasks wouldn’t be a good fit either. As they are, they’ll re-run any time someone changes the build file, even if its inputs/outputs have not changed.

You can also rewrite this to use a custom task type (based on your example, both tasks do the same thing, so I only created a single task type):

@CacheableTask
class Foobar extends DefaultTask {
    @InputFile
    final RegularFileProperty inputFile = newInputFile()

    @OutputFile
    final RegularFileProperty outputFile = newOutputFile()

    @Input
    final Property<String> message = project.objects.property(String)

    @TaskAction
    void generate() {
        def output = outputFile.get().asFile
        output.text = message.get()
        println "Created file ${output}"
    }
}

task bar(type: Foobar) {
    inputFile = layout.projectDirectory.file("in.txt")
    outputFile = layout.projectDirectory.file("bar.txt")
    message = "bar"
}

task foo(type: Foobar) {
    inputFile = bar.outputFile
    outputFile = layout.projectDirectory.file("foo.txt")
    message = "foo"
}

This can all go into the build.gradle until you’re ready to move Foobar into buildSrc or a separate plugin to share across several builds.

If the direct foo -> bar connection doesn’t make sense in some cases (e.g., maybe foo.inputFile is sometimes an existing file, sometimes generated by a task, sometimes downloaded…), you can also wire this together through a domain object extension:

class FoobarExtension {
    final RegularFileProperty someInput
    FoobarExtension(ProjectLayout layout) {
        someInput = layout.fileProperty()
    }
}

@CacheableTask
class Foobar extends DefaultTask {
    @InputFile
    final RegularFileProperty inputFile = newInputFile()

    @OutputFile
    final RegularFileProperty outputFile = newOutputFile()

    @Input
    final Property<String> message = project.objects.property(String)

    @TaskAction
    void generate() {
        def output = outputFile.get().asFile
        output.text = message.get()
        println "Created file ${output}"
    }
}

class FoobarPlugin implements Plugin<Project> {
    void apply(Project project) {
        project.with {
            def foobar = extensions.create("foobar", FoobarExtension, project.layout)

            tasks.create("foo", Foobar) {
                inputFile = foobar.someInput
                outputFile = layout.projectDirectory.file("foo.txt")
                message = "foo"
            }
        }
    }
}

apply plugin: FoobarPlugin

task bar(type: Foobar) {
    inputFile = layout.projectDirectory.file("in.txt")
    outputFile = layout.projectDirectory.file("bar.txt")
    message = "bar"
}

foobar {
    someInput = bar.outputFile
}

In this example, the FoobarPlugin and related classes can be somewhere else. And someone who uses your plugin can configure foobar.someInput to be the output of another task or an existing file path somewhere without knowing about foo.

Here are some guides on plugin development that cover some of this:

HTH

rjbell4 · March 15, 2018, 1:23am

Thanks, @sterling! You clearly invested time in your response, and I appreciate it. I am in fact doing a number of the things you mention – I’m using a custom task with InputFiles and OutputFiles, defining a plugin, etc. That’s why I was saying I was biting off a lot for my first real project! That said, it’s all well documented; it’s can just be difficult to really keep it all in my head and truly “get it”, due to my lack of experience. I was trying to go a bit simpler with my example, for brevity and clarity. I think it’ll probably take me some time to mull over your response, and then hopefully that’ll lead to some illumination…

rjbell4 · March 15, 2018, 1:49am

So in my “real” case, foo may depend on some subset of the output of bar (there are a lot of foos and bars, BTW, and I’m actually treating then as different projects in a multi-project build), and I was hoping to declare that fairly specifically, rather than depending on all of bar, since Gradle seems pretty smart about detecting whether content actually changed or not. The input to foo (AKA output from bar) pretty much won’t exist outside of a build of bar, but I am hoping to be able to pull pre-built artifacts for the bar build when available. This is something I originally thought we were going to have to craft (and then maintain) ourselves, but after doing my research, it appears that we can tie into the Gradle build cache. It’s always great to minimize the amount of custom code we need, so I’m really hoping that part works out!

What I gather from your example is that I can’t connect things just on file paths (you said as much), but that I’ll have to in some way (you provide several examples) say that foo depends "on this file from bar". So I can’t be entirely independent and say “I depend on bar.txt – however that happens”, but rather at some point (directly or indirectly), I need to more or less say "I’m talking about that output from bar". It actually seems best to get away from the filename in that case then, and refer to a variable/property/whatever in the dependent task

If I have that correct, thank you for the clarification and help. If I do not, please do correct me. And actually, thank you in either case!

sterling · March 15, 2018, 3:39am

This is mostly right. The one extra thing I would add to this is that you could also do this as publications between projects. I didn’t mention that since your example was in a single project. With publications, foo and bar would be in different projects and not know about each other. You’d connect them through a Configuration and dependencies.

Ah, my examples might not work out so well when you bring multiple projects into the mix. You really want to avoid “reaching” into other projects because then it implies ordering between subprojects. Publications could make sense for this, but if the subprojects are artificial, you might be able to avoid that.

If you don’t mind, it might be worth stepping back a bit and describing what you’re trying to replicate/do. I can try to offer suggestions. From what I gather, you have a few tasks that produce several kinds of outputs. And then you have some other tasks that consume subsets of those outputs (and so on, I assume).

rjbell4 · March 15, 2018, 7:51pm

Well, we are getting more complex that I originally intended my query to me, but it seems like it might be required, and you seem to be willing to think about my situation in more depth, which I appreciate. I may have to sanitize some of this because it’s “work stuff”, and I don’t want to have to run before lawyers in order to throw up some real code, but I think we can make due; it’ll just get verbose without code examples. So here it goes…

Publication is indeed part of the plan. I kind of was waiting to try to add that on at the end (did I mention it’s a lot at once), but maybe it’s integral to the functionality. I just haven’t quite figured that part out yet, though mostly it’s because I haven’t fully invested in that part yet. Most developer builds would just publish arbitrary artifacts to a local directory, while CI builds would publish to an Artifactory repository, where dependencies could be found for later CI and developer builds, potentially by the Gradle build cache.

The overall situation here is that we have a bunch of repositories (more than a dozen, currently less than two dozen) of mostly in-house code that is builds separately, but which have dependencies between them, and which all together form a single product. We currently have a bunch of code to help link them together with a custom approach and custom scripting, but that code is … unwieldy, imperfect, and in need of replacement (and I’m being kind). I’m looking to implement much of that replacement via Gradle.

The overall workflow is that a developer would checkout one repo, or possibly more than one repo, but not necessarily every repo. We’d implement some logic to determine what revision of the all the repos we’d need, both those present, and those not present (all repos are in Git, BTW). The developer would then build the repo they checked out. For the repositories they haven’t checked out, I am hoping to use the Gradle build cache to pull down matching builds from Artifactory as dependencies. (If that doesn’t work, I can implement custom logic, but I’m hoping that the Gradle build cache can help fill this role) I have a method of calculating revisions of those dependencies without necessarily having source code present that I am optimistic I’ll be able use to with Gradle.

There might be Java parts to this code base, but it’s far from prevalent, much less universal. To my knowledge, Gradle isn’t currently in use at all. Right now, the goal is to use Gradle for the overall “glue” to join together the repositories. The build task would essentially be a “hand-off” to whatever script or build system actually builds each particular repository. I currently have that as something that’s configurable, though I’m hoping that’ll be unnecessary, and it’ll be a wrapper script of the same name and usage in every repository. We might eventually see about moving some repositories to actual Gradle builds, but that would be a more longer term item. Every repository would be buildable by the same Gradle task, which would be the default task. That is, just running gradle gets you a build (or maybe gw (gdub), which I plan to look at once I get things working).

So my current approach is to have a common repository with the Gradle code for every repository that’s always pulled down (we are already using a common repository in this way). I might eventually look at using Composite Builds, but at the moment am limiting my implementation to a Multi-Project Build. Part of the reason for that is (1) Composite Builds are just more stuff that add in extra complexity, which I have enough of already, and (2) I’d like to be able to give the user very helpful messages if for some reason a matching dependency cannot be found, rather than a default error message from a composite build in Gradle. But I may look at Composite Builds in a later phase, after I get something working. (I’m just mentioning it, because so far, every time I’ve left something out, you’ve recommended it! )

So in my simple prototype (mocked up repos/projects), every repository just has a very simple settings.gradle file that defines the root project name and then applies a common settings file from the common repo. That common settings file pulls in a common custom plugin (implemented as a Groovy script plugin), and includes the build scripts for every other repo/project that defines our overall product. The common custom plugin automatically defines common tasks, like a common build task, with specific inputs and outputs, to allow for up-to-date checks to work, and eventually also utilization of the build cache. I’m using a custom class that I add as an extension to be able to configure the dependencies (input files) and build artifacts (output files), which the common task then references. Individual projects therefore basically just need to define their inputs and outputs, and the rest is provided by the infrastructure.

That’s a lot, but I don’t know if that’s even enough to really convey the goals and current plans. But you’ve demonstrated a willingness to review the approach and provide feedback, so I’m very open to whatever feedback you can provide, and would be willing to answer any questions that you may have about our goal.

Thanks!

rjbell4 · March 16, 2018, 3:50am

I’m kind of thinking that maybe the thing to do is to have each repo/project add to a shared map for the entire product as a whole, where each output is named (all outputs are basically named already in the old system), and associated with a File object. That File object is used and the output from the project that produces it, and can be used (reference via the name in the map) as the input to any consuming projects.

Would that work? If so, I just need to figure out the smoothest way to do that…

rjbell4 · March 16, 2018, 3:53pm

I continue to work to try to grok this. I think that I might be able to achieve what I want by:

Stating dependencies in a dependencies block using a configuration and the group:name syntax, which does not appear to be exclusive to Java-ish outputs
Using contraints at on overall project level to specify the specific versions that should be expected
Publishing between projects, so that the outputs can be consumed as dependencies in the other projects

Where I am currently struggling (but continuing to work is):

What configuration to use. There appear to be no configurations by default. I can add a custom one and add a dependency into it, but that doesn’t appear to do anything by default. So I need to figure out in the case how to make Gradle “care” about the dependencies in the configuration that I added.
How to be able to pull dependencies from a repository, but only publish locally. I definitely do not want to require publishing a developer build to an external server just to be able to consume that in the build of another local project. I’m thinking that with more doc reading (over and over again), I’ll pick this up.

I’m experienced in general, but I have zero background in Gradle, Maven, Ivy, and basically Java, so there are a lot of new concepts here that I’m adapting to.

sterling · March 16, 2018, 6:38pm

@rjbell4 FYI, so you don’t think I’m ignoring you. I’ve started a reply to your post, but I’m not going to finish it before I have to run off for the evening.

The short reply I have:

What you’re doing is really common
There are some existing features that will help with this. You mentioned composite builds.
There are some new features we’re working on right now that align with what you’re trying to do too with multi-repos and wrapping existing build systems. This is the time consuming part to describe.

I think you’re on the right track with your latest reply. For your struggles:

The names of the configurations don’t really matter. Pick something that makes sense for your domain.
You can publish things by adding them as artifacts to a configuration
I think you want to avoid the “publish locally” step completely. If you get “publish locally” working, composite builds should “just work” at that point and be much better.

We have some samples in the Gradle distribution, but they probably don’t have all of what you need to prototype this. We have an example of wrapping a CMake build with Gradle: https://github.com/gradle/native-samples/blob/master/samples-dev/src/templates/build-wrapper-plugin/src/main/groovy/org/gradle/samples/plugins/WrappedNativeLibraryPlugin.groovy

But this is more complicated than where you are right now.

I’ll try to reply more later.

rjbell4 · March 16, 2018, 7:59pm

I appreciate it, @sterling. My overall impression is that there’s a lot of documentation, but it’s generally pushing towards common Java-related cases and such. It’s enough that I know it should be possible, but missing just enough examples that some things have been just out of my reach, so far.

I did start to dummy up some of the things mentioned recently, using some duplicate code just to make things earlier, figuring I’ll refactor to the “right” way later, and avoiding need to figure out some structural issues at the same time. I realized one of my issues is probably related to some lazy evaluations of dependencies. I’m adding dependencies and expecting that to “do something”, when what seems to really be happening is that when I use those dependencies, then suddenly the dependency “counts”. Now that I’m realizing that, I think I can explore my options more effectively.

Enjoy your evening. I’ll keep poking, and I look forward to reading your response later.

rjbell4 · March 16, 2018, 8:08pm

BTW, just putting this out there: one of the reasons I talked about using a map tracking artifacts is because it’s my desire (perhaps misplaced?) to be able to state that “this project depends on that specific artifact”, or perhaps “this project depends on that specific artifact produced by that specific other project” (being ignorant of the producer is probably preferable). Configurations seem aimed at grouping together files or whatever into collections, but I didn’t see how to refer to distinct members of that configuration, once collected. Unless you have a configuration per referenceable artifact, of course, which seems … awkward.

rjbell4 · March 19, 2018, 7:06pm

I continue to experiment, though I’m not sure I learned much. I tried to go back and create a basic experiment that could be shared. That experiment is available at https://github.com/rjbell4/gradle-experiment

I feel pretty confident that I’m not using Gradle functionality all that completely in this experiment; I’m probably re-inventing several things (like maybe modules?), etc. But more fundamentally, it doesn’t work.

Basic on the conversation and previous experimentation, I was under the impression that although file pathnames wouldn’t automatically link dependencies between projects, actually sharing the same object would. Therefore, the construct I have does manage to get the exact File object reported as an output from bar to as be reported as an input to foo. However, when I run the command ./gradlew :foo:myBuild, it does not discover that dependency.

In fact, it checks for the inputs and outputs of foo, but doesn’t seem to even check for the inputs and outputs for bar … something I thought I had working before.

Anyway, I thought I’d share my “progress”, in case @sterling or any one else was available to assist.

rjbell4 · March 19, 2018, 9:05pm

I found a hint elsewhere, in a reply by @sterling, that perhaps my problem is that I’m passing around Files, when really I need to be passing around something that’s Buildable? From my research, maybe I need a BuildableComponentSpec or a PublishArtifact?

I feel like I might have found a possible reason, but still need a solution…

rjbell4 · March 19, 2018, 10:28pm

I have to check out for the evening, but I feel like I made a bit of progress. I am now using the artifacts declaration and configurations, but not publishing, all of which seems to align with what @sterling said, so that gives me some hope. However, I’m not yet seeing how to connect those Artifacts as inputs, as that gives me errors.

My work in progress is in the “inprogress” branch of the aforementioned GitHub repository.

rjbell4 · March 20, 2018, 6:30pm

I did get something working. A key realization for me (if I’m correct!) is that Gradle seems to be using the dependencies reported by the FileCollection to link things together. When I use project.files in another project, I’m potentially dropping that dependency, and Gradle will view the “origin” of the FileCollection as the current project.

So I’m now taking care of that explicitly, using buildBy to maintain the origin task dependency. I’m quite away that I might be bastardizing something that could be handled more natively in Gradle. I’m also concerned that I’m not set up for build caching yet, because I don’t have versions of anything tied in.

For those reasons, I’d be delighted to have feedback on what I’ve done. It’s exhibited in the aforementioned repo, and particularly at https://github.com/rjbell4/gradle-experiment/blob/master/common/common.gradle#L22

rjbell4 · March 21, 2018, 4:03am

I’m not sure who I’m talking to anymore (I guess @sterling became unavailable?), but I’ll keep going.

I discovered that the Gradle build cache wasn’t working, because I was returning a FileCollection (via project.files) from my @TaskOutputs method. In order to allow for the output to be cacheable, I needed to return a Map … which it just so happens is how I was already choosing to identify my outputs for my own purposes. Hmm… it’s like I was on the right track all along!

Anyway, now I can see build caching working – and it’s great!

What I feel like I’m still missing is a good way to refer to inputs, and pick them up. I feel like I’ve really hacked something into the system that might be better represented natively. For example, it feels like modules could be a good fit, but they seem to require the use of repositories, which seems like an undesired dependency.

rjbell4 · March 21, 2018, 9:28pm

@sterling, I’m re-reading this thread, and understanding much more of now. In particularly, I’m looking up RegularFileProperty, which appears to be a fairly recent addition, and I’m understand why that was used. It’s basically the same reason I eventually turned to archives, but seems a much easier fit, so I plan to look at using it.

rjbell4 · March 25, 2018, 3:26am

Well, I think I’ve taken this as far as I can. Maybe it’s as good as it can be, although I’m not sure.

Here’s my implementation using RegularFileProperty: https://github.com/rjbell4/gradle-experiment/tree/master
This is the model I plan to use for now

I also made an attempt to use my own Provider, thinking it would be a neat way to return the Map lookup: https://github.com/rjbell4/gradle-experiment/tree/inprogress
However, that’s not working

I plan to move forward with something modeled on the former example, though I’d still love to have feedback from @sterling or anyone else.

rjbell4 · March 25, 2018, 10:03pm

FWIW, I update the master branch to use properties a little more fuller, and I’m somewhat satisfied moving forward with that model.

Topic		Replies	Views
Gradle ignores TaskOutputs when building Old Forum Archive	1	715	August 13, 2013
Incremental build won’t work unless a task has at least one task output? Help/Discuss	0	342	September 10, 2021
Incremental build task for variable generated output files Help/Discuss	0	666	May 16, 2018
Up-To-Date for empty inputs Help/Discuss	2	1353	June 8, 2015
What is TaskInputFilePropertyBuilder.optional() meant to do? Help/Discuss	2	731	September 27, 2021

Automatically build task when input is missing

Related topics