Copy task extremely slow

swpalmer · August 13, 2013, 1:45am

Using Gradle 1.7 I have a task to install some build products into a specific folder so they are ready to use for testing.

task installSDK(type: Copy, dependsOn: publishToMavenLocal) {

from project.stagingDir

into System.getenv(‘MY_TEST_DIR’)

}

This task only has to copy about 75MB to a few subdirectories within the target folder. The “stagingDir” already has been setup with the exact directory structure that I want to copy over, but the target folder (specified by the environment variable MY_TEST_DIR, as other developers will have this configured differently on their computers) is also already populated with a TON of other files - 99.9% of which are completely independent of what I have built in the “stagingDir”.

This copy operation takes literally several minutes on a non SSD drive (I’m still waiting for it to complete so I can tell you a more precise number - the disk is thrashing away as I write… ah it finished in about 20 minutes). On a system with a SSD drive it takes 40 seconds. It is only copying 75MB it should take a few seconds at most. It seems that the fact that the target location already has a lot of unrelated files slows it down. Is it scanning the entire target folder to build checksums for every existing file or something, rather than only checking for paths that actually match the files it will be copying?

Copying the same files to the “stagingDir” in the first place took only a second or two. But it would be either empty or populated with older versions of the same files and nothing more.

swpalmer · August 13, 2013, 1:53am

Note:

If the target directory is empty it takes about 42 seconds

Adding:

outputs.upToDateWhen { false }

to the copy task sped it up to take only 55 seconds when the target directory was populated like it normally is.

When the target folder contains only a copy of the files copied from a previous run it takes < 20 seconds (with and without the outputs.upToDateWhen { false } )

Clearly Gradle is doing too much work with irrelevant files.

Peter_Niederwieser · August 13, 2013, 5:29am

How many files are we talking? From what I know, after the task has completed Gradle lists all files in the target directory to find the ones written by the task (by their last changed date) so that it can hash them.

swpalmer · August 13, 2013, 11:30am

It shouldn’t take 20 minutes to list the files either. There are a lot though. The target directory has around 50000 files, but less than 2000 are copied by the task (1700 are HTML javadoc files).

Peter_Niederwieser · August 13, 2013, 11:41am

Which OS/FS? Some file systems are known to get unbearably slow when dealing with tens of thousands of files in the same directory. Have you questioned whether you need to go through a staging directory at all? Don’t the files end up in an archive anyway?

swpalmer · August 13, 2013, 11:59am

All the files are not in the same directory. The destination folder has a tree containing hundreds of sub-folders. It is Windows 7 with NTFS.

The staging directory (the source of this copy with 2k files) is what we archive, but the destination of this copy is the an “install” directory (with 50k files) where the runtime environment is configured for testing. So we need the staging directory to assemble the source of the build artifact.

Peter_Niederwieser · August 13, 2013, 12:11pm

My guess is that listing hundreds of directories and comparing timestamps of tens of thousands of files is what makes this slow, in particular on mechanical drives. As far as I know, it’s currently not possible to avoid this cost by turning off up-to-date checks. However, using ‘project.copy’ instead of a ‘Copy’ task will save this cost at the expense of not having an up-to-date check (perhaps a good tradeoff in this case). Another potential solution is to copy to a lesser populated directory. I’ve raised GRADLE-2867 to track this.

swpalmer · August 13, 2013, 7:18pm

Thanks. It seems odd that Gradle would need to compare timestamps of the 40000 files in the destination that aren’t copied from the source folder. What would it compare them to? Is it assuming that everything in the destination folder is an output of the task?

swpalmer · August 13, 2013, 7:21pm

Note that copying to a lesser populated directory is not an option for this step. The rest of the files are part of the “installed” product and need to be there. (This build is creating a “plugin” that will be used by the larger system.)

I will use project.copy in this case as there is clearly no benefit to having an up-to-date check when the check takes longer than the copy.

swpalmer · August 13, 2013, 7:24pm

Come to think of it… when would a straight copy not be faster than doing the up-to-date check plus the copy?

Peter_Niederwieser · August 13, 2013, 10:08pm

Never. It would typically be slower than just an up-to-date check though.

swpalmer · August 14, 2013, 5:29am

I don’t believe the difference would be significant. If you are calculating checksums of both source and destination the only difference you could hope for is that the write speed on the destination is significantly slower than the read speed to justify what is essentially the same amount of IO as just doing the copy without a check.

Harald_Schmitt · August 14, 2013, 8:59am

The benefit is that if gradle can mark a task as up-to-date tasks that use the output as input will not be executed and this can mean a huge decrease of the build time.

Peter_Niederwieser · August 14, 2013, 9:07am

Whether a task will be executed isn’t influenced by whether depended-on tasks have been executed, but by whether their outputs have changed. In other words, it doesn’t matter to downstream tasks whether copy is re-executed as long as its outputs stay the same.

Sean_Liang · October 17, 2013, 9:51am

I met the same problem when using gradle 1.8 on Linux server.

task deploy(type: Copy) {
 into "/dest/dir"
 with war
}

There is an “upload” directory under /dest/dir which doesn’t exist in source and contains a large amount of files. Gradle spends too much time on walking over the files.

I tried to add exclude like this:

task deploy(type: Copy) {
 into ("/dest/dir") {
  exclude "**/upload/**"
 }
 with war
}

But it fails with “No value has been specified for property ‘destinationDir’.”

public AbstractCopyTask into(Object destDir) {
        getRootSpec().into(destDir);
        return this;
    }
      public AbstractCopyTask into(Object destPath, Closure configureClosure) {
        getMainSpec().into(destPath, configureClosure);
        return this;
    }

It seems into with closure uses mainSpec while into without closure uses rootSpec.

The only way to make it work is moving the upload folder out of /dest/dir.

Peter_Niederwieser · October 17, 2013, 4:52pm

‘Copy’ tasks must have a top-level, standalone ‘into’, as in your first ‘deploy’ task.

Sean_Liang · October 18, 2013, 1:26am

Is there any way to exclude upload directory from up-to-date check instead of moving it out and in?

Peter_Niederwieser · October 18, 2013, 1:45am

You can try ‘outputs.upToDateWhen { false }’, but I’m not sure if it will prevent the standard check from happening.

swpalmer · October 18, 2013, 1:56am

Why does the standard check care about files in the destination that are not (and never have been) in the source? I.e. in this example the “upload” folder is not a task output.

Peter_Niederwieser · October 18, 2013, 1:59am

I’m not sure it does. However, up-to-date checks for ‘Copy’ tasks have known limitations, and maybe this is one of them.

Topic		Replies	Views
Copy task takes multiple seconds although UP-TO-DATE Old Forum Archive	6	2906	May 31, 2013
Up-to-date checks doing too much work? Old Forum Archive	2	610	February 28, 2012
Copy task is hanging Help/Discuss	0	454	April 5, 2018
Copy task behaves differently in Gradle 2.13 Bugs	6	1576	May 17, 2016
Copying file from network share Old Forum Archive	5	4136	March 1, 2013

Copy task extremely slow

Related topics