Incremental zip unpacking

I found a few old threads about this, but none that really provides a solution beyond various cludges that all seem to break the ability to do incremental builds.

task stuffs(type:Copy) {
    from { zipTree('foo.zip') }
    into 'fooDir/'
}

In my case foo.zip comes from an Ivy repo, as a dependency, but there’s really no difference if it’s just a local files, it seems.

If anything in foo.zip is read only, for whatever reason, I cannot run the task “stuffs” twice without getting a permission error, or cleaning the temporary folder first, which effectively prevents me from doing incremental builds as unzipping again updates the inputs to the next task and so on. If the archive is large, it also takes a fair amount of time to unzip it each time.

The various workarounds I’ve seen involve deleting the tmp folder, modifying the permissions, and variations on those, but they require knowledge of the internals of zipTree that don’t seem like I should have to deal with as a user.

Whatever way there might be to get around this, it seems like this is causing issues for a great deal of people and would either need some kind of rework of the tasks involved, or documentation update which clearly outlines the steps required.

Edit: I’m using Gradle 2.7

By default, copy will maintain the file permissions from the source. If you want different permissions in the destination you can do this:

task stuffs(type:Copy) {
    from { zipTree('foo.zip') }
    into 'fooDir/'
    fileMode 0777
    dirMode 0777
}

Problem is that using file/dirMode only affects the Copy target folder. The zipTree temp folder (build/tmp/expandedArchives/foo/**) will still contain the read-only files, and will barf on the next run.

I’d also prefer, in most cases, not to mess with the protection flags, as they work differently on Linux and Windows for instance, plus I don’t want to randomly set or clear the x flag. If there was a way to express u+w, then it’d be a little bit better, but still somewhat going against the purity of just unpacking an archive.

IMO, the real problem is that, as far as I can tell, there’s no way to make Gradle think of the dependency as foo.zip -> fooDir/. It always thinks of the operation as tmp/foo/ -> fooDir/, with the unpacking of foo.zip into tmp/foo/ being just a side effect of evaluating zipTree(), and not part of the actual dependency chain. Please correct me if I’m wrong, I’ve not dug into the code to find out, but my tests are agreeing with this theory. Look at the time stamps of tmp/foo/* after each run, or use a very large zip file, and you’ll see this.

Update: So I’ve gotten one step closer to a solution. It’s not a catch-all yet, but perhaps with some input from you all we can close this one. I should mention that this solution is mostly useful when you have really large zip-files. In my case it’s about a gigabyte (don’t ask).

So, in short, this is what I’m doing:

import org.apache.commons.codec.digest.DigestUtils

task unpack(type:DefaultTask) { // is DefaultTask needed/default?
    def zipFile = 'foo.zip'
    def manifestFile = 'manifest.txt'
    def targetDir = 'unpacked'
    inputs.file file(zipFile)
    outputs.file new File(manifestFile)
    doLast {
        // it may make sense to first delete targetDir here
        ant.unzip(src:zipFile, dest:targetDir)
        def targetFiles = fileTree(dir:targetDir, include:'**/*.*')
        new File(manifestFile).withWriter { manifest ->
            targetFiles.each {
                manifest.println(DigestUtils.md5Hex(new FileInputStream(it)) + ',' + it.absolutePath)
            }
        }
    }

    outputs.upToDateWhen {
        if (!new File(manifestFile).exists()) {
            return false
        }

        def upToDate = true

        def targetFiles = fileTree(dir:targetDir, include:'**/*.*')
        def current = []
        targetFiles.each {
            current += DigestUtils.md5Hex(new FileInputStream(it)) + ',' + it.absolutePath
        }
        def manifest = file(manifestFile).collect { it }
        if (current != manifest) {
            println 'manifest mismatch'
            upToDate = false
        }
        return upToDate
    }
}

This runs significantly faster than a copy using zipTree, but of course comes with a couple of caveats. It doesn’t work terribly well when you unzip more than one package to the same directory. Being able to capture the output of the unzip task instead of using fileTree should ensure the manifest only contains the relevant files. In addition the upToDateWhen closure should then be allowed to ignore extra files.

It doesn’t properly populate outputs, but in my specific case I don’t need it. However, if one could parse the zip file without extracting it during configuration, that could be solved.

Finally, this should be put into a reusable task, not be in the script.

All feedback appreciated!

Further research gives me that ant.untar trashes all my file permission, so this solution doesn’t work at all in my real case (I came up with the above using test data that didn’t take that into account).

At this point I’m just going to give up and go with the cludge of modifying the write permissions in the temp folder, I need to get some actual work done.

Sorry, I hadn’t realised there was a temp folder involved but looking at the ZipFileTree implementation I can see there is.

I agree with you that this temp folder shouldn’t cause failures for incremental builds. This could be a bit smarter by storing file permissions in a metadata file alongside the temp directory rather than as actual file permissions on the temp files themselves.

Sounds like a worthy jira candidate

It should be noted that zipTree handles permissions well in most other cases, compared to ant.unzip, which just drops them. Allowing zipTree to overwrite its own temp folder, and for Copy/Sync to force the copy, would solve that, I believe.

Additionally, if zipTree could be a bit more clever than having to always re-unzip (into tmp) an archive that hasn’t changed, we could see some potential speedup for large archives.

I’ve raised this issue as GRADLE-3348.

We are doing significant work in the area of incremental build improvements right now. This is something we’ll look into addressing as part of that work.

Thanks, I’m very happy to hear this!

There will be improvements in this area for Gradle 2.9 . In master there is a commit to snapshot the zip/tar/tar.gz/… file in the up-to-date checking instead of extracting the files and snapshotting each individual file.

Since some files are read-only in the presented use-case, it is required to make the files writable before the files are copied in the next incremental build. I’d probably do this in a doLast block of the copy task by using ant.chmod(dir: 'fooDir', perm: 'u+rw') or simply by using fileMode=0644 in the copy task in the first place.

In the mean-time before the optimisatons are implemented, can the VFS plugin maybe help you?

// Assuming the plugin is applied
task extract << {
  vfs {
    cp "zip:${PathToZipFile}!/internal/path/to/be/unpacked", file('fooDir'), 
        recursive :true, overwrite : true
  }
  inputs.source.file PathTpZipFile
  outputs.source.dir file('fooDir')
}

Thanks for the additional tips and suggestions.

I’ve actually ended up with a slightly different solution altogether, as the target folder can get modified by the build process, (and I can’t do anything about that), and I don’t want to re-run the unzip unless the incoming version of the zip changes. It’s pulled in as a dependency, so I can track the version using an input property, store that in a file, and run a project.copy call on the zip file, in case the version has changed, or the output folder is missing.
I first added an input dependency on the resolved dependencies, but since that points to a cache folder, and I don’t want to have to rebuild just because the same file is re-cached, I decided to use the version as input instead.

FYI, this will be fixed in Gradle 2.9.

2 Likes

I assume the fix is part of the zipTree no longer checking every file in the zip for a change?

Yes, the fix was a side effect. :slight_smile: -Lari