Provide a module-level artifact cache for large-scale projects to prevent unnecessary rebuilds


(Jake Wharton) #1

Consider the case when a shared module is changed between branches for two versions, both of which are being maintained. As features are cherry-picked to and from each branch, the common module is constantly being rebuilt despite the fact it has not changed on either branch. This is because currently we have task-based incremental compilation which is far too fine grained for any kind of long-lived cache. The inputs and outputs are cached in the buildDir as transient things.

However, at the module level, we have a much more coarsely grained set of inputs and outputs. The inputs are the combination of all task-based inputs of the leaf nodes and the outputs are the artifacts of the module. These can be shared in a larger, semi-persistent cache.

With this, in the aforementioned scenario, the common module that exists as two very different things on each branch are only ever built once. Additionally, this module-level input and output system allows for skipping executing entire module task graphs (even if they all will only be skipped).

Here’s another scenario: Do a clean build of an entire project. Make some changes and do another build. A few of the untouched modules are quickly skipped and you get incremental compilation of the changed modules. Now undo the changes you make and run a build. Every module is skipped because you have reverted to a state in which every module’s inputs match that of a previous build and therefore the cached module artifacts are swapped in for an instant completion.

Since we don’t want the cache to grow indefinitely, it should behavior in an LRU fashion and evict artifacts which are not used for some time. Additionally, companies with modules that number in the hundreds would benefit from a centralized remote cache which would be a fallback on local cache miss. This would allow big beefy boxes to build all the things and cache all the things on a less resource-constrained system.

While seemingly only applicable to large projects, I think and hope the entire-module skipping functionality would help medium sized projects to have quicker builds as well.