This task only has to copy about 75MB to a few subdirectories within the target folder. The “stagingDir” already has been setup with the exact directory structure that I want to copy over, but the target folder (specified by the environment variable MY_TEST_DIR, as other developers will have this configured differently on their computers) is also already populated with a TON of other files - 99.9% of which are completely independent of what I have built in the “stagingDir”.
This copy operation takes literally several minutes on a non SSD drive (I’m still waiting for it to complete so I can tell you a more precise number - the disk is thrashing away as I write… ah it finished in about 20 minutes). On a system with a SSD drive it takes 40 seconds. It is only copying 75MB it should take a few seconds at most. It seems that the fact that the target location already has a lot of unrelated files slows it down. Is it scanning the entire target folder to build checksums for every existing file or something, rather than only checking for paths that actually match the files it will be copying?
Copying the same files to the “stagingDir” in the first place took only a second or two. But it would be either empty or populated with older versions of the same files and nothing more.
If the target directory is empty it takes about 42 seconds
Adding:
outputs.upToDateWhen { false }
to the copy task sped it up to take only 55 seconds when the target directory was populated like it normally is.
When the target folder contains only a copy of the files copied from a previous run it takes < 20 seconds (with and without the outputs.upToDateWhen { false } )
Clearly Gradle is doing too much work with irrelevant files.
How many files are we talking? From what I know, after the task has completed Gradle lists all files in the target directory to find the ones written by the task (by their last changed date) so that it can hash them.
It shouldn’t take 20 minutes to list the files either. There are a lot though. The target directory has around 50000 files, but less than 2000 are copied by the task (1700 are HTML javadoc files).
Which OS/FS? Some file systems are known to get unbearably slow when dealing with tens of thousands of files in the same directory. Have you questioned whether you need to go through a staging directory at all? Don’t the files end up in an archive anyway?
All the files are not in the same directory. The destination folder has a tree containing hundreds of sub-folders. It is Windows 7 with NTFS.
The staging directory (the source of this copy with 2k files) is what we archive, but the destination of this copy is the an “install” directory (with 50k files) where the runtime environment is configured for testing. So we need the staging directory to assemble the source of the build artifact.
My guess is that listing hundreds of directories and comparing timestamps of tens of thousands of files is what makes this slow, in particular on mechanical drives. As far as I know, it’s currently not possible to avoid this cost by turning off up-to-date checks. However, using ‘project.copy’ instead of a ‘Copy’ task will save this cost at the expense of not having an up-to-date check (perhaps a good tradeoff in this case). Another potential solution is to copy to a lesser populated directory. I’ve raised GRADLE-2867 to track this.
Thanks. It seems odd that Gradle would need to compare timestamps of the 40000 files in the destination that aren’t copied from the source folder. What would it compare them to? Is it assuming that everything in the destination folder is an output of the task?
Note that copying to a lesser populated directory is not an option for this step. The rest of the files are part of the “installed” product and need to be there. (This build is creating a “plugin” that will be used by the larger system.)
I will use project.copy in this case as there is clearly no benefit to having an up-to-date check when the check takes longer than the copy.
I don’t believe the difference would be significant. If you are calculating checksums of both source and destination the only difference you could hope for is that the write speed on the destination is significantly slower than the read speed to justify what is essentially the same amount of IO as just doing the copy without a check.
The benefit is that if gradle can mark a task as up-to-date tasks that use the output as input will not be executed and this can mean a huge decrease of the build time.
Whether a task will be executed isn’t influenced by whether depended-on tasks have been executed, but by whether their outputs have changed. In other words, it doesn’t matter to downstream tasks whether copy is re-executed as long as its outputs stay the same.
I met the same problem when using gradle 1.8 on Linux server.
task deploy(type: Copy) {
into "/dest/dir"
with war
}
There is an “upload” directory under /dest/dir which doesn’t exist in source and contains a large amount of files. Gradle spends too much time on walking over the files.
I tried to add exclude like this:
task deploy(type: Copy) {
into ("/dest/dir") {
exclude "**/upload/**"
}
with war
}
But it fails with “No value has been specified for property ‘destinationDir’.”
Why does the standard check care about files in the destination that are not (and never have been) in the source? I.e. in this example the “upload” folder is not a task output.