Feature Request: Gradle 3.5 Build Cache and Amazon S3

(Scott) #1

I was just reading about the Gradle Build Cache planned for Gradle 3.5. I’ve found Gradle’s support of S3 as a maven wagon EXTREMELY convenient for releasing libraries private to my company. Supporting S3 as a distributed HTTP build cache would also be extremely convenient as well. I think it would lower the bar for setup, making it a lot easier to configure, and drive adoption for the feature.

(Sterling Greene) #2

Hi @Scott_Pierce,

We made the interfaces that are required to add a new build cache connector to be simple to implement.

One of the major differences between publishing your libraries to a repository and using the build cache is that you are typically going to have very few artifacts and we can reliably cache resolved copies. For the build cache, Gradle will produce a cache entry for every cacheable task (for a single Java projct, this will be at least 3 tasks) and could look for a cache entry every build for those cacheable tasks. We’ll also download from the build cache any time we need a cache entry (there isn’t any local caching of remote cache entries yet).

I think you can get by with something simple like S3 for repositories, but for the build cache, it’s useful to have a little more. It’s important to get more information about cache hit rate/statistics and S3 won’t be able to provide on its own (you might be able to work it out from logs/metrics). You’ll also need some sort of eviction policy that doesn’t just drop the oldest entries, too.

For the Gradle project, the stats for the last 30 days have been:

Stored entries   453372
Evicted entries  253292
Hits            1680475
Misses           529150
Hit Rate           76 %
Data Received 307.84 GB
Data Sent     386.21 GB

The last 24 hours have been:

Stored entries    13062
Evicted entries       0
Hits              61167
Misses            16236
Hit Rate           79 %
Data Received   6.40 GB
Data Sent      20.49 GB

I also think it’s super important to get some kind of build diagnostics that goes along with the build cache, so you can figure out why you’re getting cache hits/misses for a particular build. We use the build-scan plugin to collect custom values that are useful for this. We then use a dogfooding instance of Gradle Enterprise as a build cache backend. That gives us an eviction policy and some simple statistics.

(Scott) #3

Thanks, you are probably right!

(Ben Manes) #4

@sterling Do you think that you would be able to acquire a cache access trace? That could be a text file with the key’s hash, one line per cache read. We could then simulate it to see how efficient the cache can be and what eviction policy is the most optimal. That could be helpful for optimizing your build cache implementations.

(Sterling Greene) #5

I know we keep track of that in some way.

@luke_daley @mark_vieira Do we keep a complete cache access log? Is this something we could export to have Ben take a look?

(Lóránt Pintér) #6

@Scott_Pierce, do you think this would work for you? https://github.com/myniva/gradle-s3-build-cache

(Scott) #7

Potentially! This looks really cool. Once Gradle 3.5 is out I’ll give it a shot. Thanks!

(Mark Vieira) #8

Yes, this is definitely something we could work up from our logs. I expect it would depend a lot on the project, etc. Right now it’s hard to separate that information from our logs as we use a single cache server for a bunch of different builds.

FWIW we also just use a simple LRU strategy for eviction and for our purposes our cache size is much larger than what we push in a given day so we don’t actually evict a lot. For example, with a 50GB cache we’ve had 0 evictions in the last week. I’m sure organizations with a larger build farm would encounter more evictions.

(Luke Daley) #9

@Benjamin_Manes you can find this here: http://ge.tt/3pHzFXj2

That’s about 30 days worth of data.

(Ben Manes) #10

Thanks @luke_daley and @mark_vieira!

The trace has 2 million requests and an unbounded maximum of 92.5% (compulsory misses). The number of elements needed is quite small, in the range of hundreds to the low thousand. The optimal policy reaches this at about 2,500 entries, but practically gets there at around 1,250.

LRU is great with FIFO and Random strong follow ups. Frequency has a negative impact, causing LFU to have a very poor hit rate. When the cache size is reduced, e.g. to 500, then policies that try account for recency and frequency can squeeze a few extra points. But overall the simple classic policies are nearly perfect because history has little benefits.

If the local cache is absorbing hits and this trace only shows those from the perspective of the remote requests, then this could explain the results. In that case the unchanged artifacts are local and the up-to-date check quickly skips over them.

For now LRU is a an excellent strategy and until we get more data, there is no reason to switch.

This is now part of the simulator so you can periodically check to see if the above stays true.

(Luke Daley) #11

@Benjamin_Manes thanks for doing that.

Note that this is from our internal server which has a reasonably homogenous workload. Other usages where there is a wider variety of projects being built may yield different results.

I’ll be sure to run the simulator on other data sets that I can get.

(Mark Vieira) #12

This is awesome feedback @Benjamin_Manes. Thanks!