Use an in memory, write behind, disk cache for build optimisation


(mike mills) #1

There seems to be lots of effort going into adding parallelism to Gradle tasks.

I think parallel builds are a great idea, especially with builds that have many loosely coupled modules, but I think people would mostly make changes over a couple of modules for any one feature. So even if you whole project has 10 modules, it is likely changes will only occur in 2 or 3, so the current Gradle module dependency optimizations are quite effective.

I have always noticed that my builds were always disk io bound, using ssd/raid was always the easiest way to speed up a build.

The main problem in the past with slow builds was that every underlying tool (javac, junit etc etc) would go to disk and read / write files, I have often wondered what the speed up would be if the build tool read in the files and kept them in memory and feeding the in memory byte stream of the resulting file to the next tool down the line.

An example would be where javac and pmd and checkstyle all read the source files separately, read them in once, and provide a reference to each. Likewise Cobertura and Junit read the class files.

In the case of cobertura, where it reads .class files, then creates new .class files, the in memory streaming of results would seem to offer considerable benefits to speed considering the intermediate cobertura files might not actually need to be stored. Unix has used the streaming of one command output into another to great effect for decades, it seems like most builds do not take advantage of this concept.

People have been fixated over using more cores, and yet are not utilizing the vast and cheap memory available in a modern computer. (my current PC has 10G free that will probably not be used all day).

So if we could make use of an in memory disk, that uses write-behind to commit only the required files to disk, then it would seem to me that the builds would produce a result quicker, and save unnecessary io for temporary files.


(Luke Daley) #2

It’s an interesting idea, thanks for sharing.

It’s something we’ll keep in mind, but my initial thoughts are that it is not really practical without massive engineering effort. So many tools are simply file system based, which means that we’d end up having to go to the file system at some point. Also, many users don’t have anywhere near that amount of memory free on the system. We have the opposite problem in many cases, where Gradle is using too much memory and we are having to spool to disk to use less.

You might want to consider setting up a RAM drive, and having your Gradle projects live on it. That would give you a nice way to use that extra RAM.