Best Way to Extract Archives generated from another task

Hello!

As part of our build script, we have a Perl script that downloads a bunch of prebuilt plugins from an external mount point, these files are downloaded into build/repo.

Our goal is to travel the file tree of build/repo, extract all the .tar.gz files into what ends up being the root of the distributions/installation directory.

Right now this is what I have:

def pullCommonDir = new File(buildDir, 'repo')
pullCommonDir.mkdirs()
task pullCommon(type:Exec) {
    outputs.file "${pullCommonDir.path}/versions
    executable = 'perl'
    args = ['pull_common.pl', '-f']
}


distributions {
    main {
        contents {
            pullCommonDir.eachFileRecurse {
                if (it.path.contains("tar.gz") {
                    from tarTree(it)
                }
            }
        }
    }
}

The reason for the mkdirs, is because otherwise, Gradle complains during the Configuration Phase that pullCommonDir does not exist (which it doesn’t, because the pullCommon task hasn’t been executed yet).

Two problems I’ve noticed:

  • It looks like the build extracts the archives for each distribution task, which isn’t necessarily what I want to do

  • Edit: This actually isn’t true, it extracts it once to the tmp folder, and then copies it in. The distZip, distTar, and installDist tasks were taking a while, so I assumed it was extracting, but I think the time was just because it was copying the files into place.

  • We have two tasks that build different versions of our code for branding reasons, which are implemented by invoking a task of type GradleBuild, and passing in properties, when running ./gradlew clean taskName, the final output directory is missing the contents of the archives

I feel like I may have hacked this together and that’s what’s causing the problem, so what’s the best way to do this?

I’ve thought about extracting the archives once, and then adding that output folder in the distributions copy spec, but I’m not sure that will fix my issue.

Thanks!

Just to clarify, an alternative I explored was to create a task that extracted it myself. This is what I came up with:

task extractArchives(dependsOn:pullCommon) {
    doLast {
        copy{
            pullCommonDir.eachFileRecurse {
                if (it.path.contains("tar.gz") {
                    from tarTree(it)
                }
            }
            into extractedArchives
        }
    }
}

This one works fine in the branding situation I described earlier, but I’m still not sure if this is the best way to do it.

The reason you needed the mkdirs() and you’re not seeing the contents of the tar.gz’s in the final archive are related. This bit of code gets evaluated at configuration time, before any tasks get to execute, so pullCommonDir will most of the time be empty. In that case, you’ll never add anything to the distribution (i.e., from tarTree(it) is never called).

What you need to do is delay the evaluation of from until after the build/repo directory is populated. You can do that by using a Closure instead of passing the tarTree directly:

distTar.dependsOn pullCommon

distributions {
    main {
        contents {
            from { // closure here
                fileTree(dir: "${buildDir}/repo", include: "**/*.tar.gz").collect { tarTree(it) }
            }
        }
    }
}

The Closure is evaluated just before the task needs to execute. That’s sort of what your alternative solution is doing. The copy {} is only evaluated at execution time, after the directory has been populated.

Why are you doing it this way (passing properties) vs just having both variants defined in the build script? If the build is very fast, maybe it doesn’t matter, but having to run ‘clean’ and a task with specific properties is a bit of a smell.

Thanks for your help Sterling!

Your first reply answers my question, and definitely helps me, there are a few other checks other than .tar.gz in the actual code, but I can just use filter(Closure) for that.

Why are you doing it this way (passing properties) vs just having both variants defined in the build script? If the build is very fast, maybe it doesn’t matter, but having to run ‘clean’ and a task with specific properties is a bit of a smell.

The reason we did this is because we’re not sure when we would have to add or remove branding and we have so many subprojects that require the branding (maybe 6 out of the 12).

The solution that I came up with was have those subprojects output to a branded directory structure (build/libs//target.jar), that way we don’t have to do a clean before each branded build:

This way, in the future, all someone has to do is add branding information to the brands map, and they don’t have to change any other files.

./gradlew nightly

Works just fine for us.

Keep rocking!