Transparent/pluggable input/output checksums

According to docs gradle keeps track of “checksums” of task inputs / outputs to understand whether a task should execute.
However there is no explanation on when and how these checksums are calculated.

Is it possible to make gradle’s checksuming mechanism pluggable or at least more transparent?

Use case: let’s say your project tasks operate on huge amount of data as inputs and outputs. Let’s assume naively calculating md5 of files is inefficient and does not scale well. What if there is an external service that efficiently keeps track of checksums and gradle could be instructed to ask the external service for checksums of given files instead of calculating them by itself?

For this purpose you can use name, Object value) where the value can be a Closure or a Serializable object

Of course I can apply something like this for 1% of all calculations for my custom tasks.

What I am after is a possibility from the top build script to say, hey gradle, do your usual thing with all the plethora of tasks but when it comes to file checksum calculation use this replacement universallly.

Wouldn’t it be useful?

I think that if you’ve got a fantastic algorithm for calculating checksums that outperforms Gradle’s algorithm then you should open source it and let Gradle use that instead of its current mechanism

I believe the algorithm is perfectly fine.
It’s about when checksums are calculated, not how.
If I understand it right (gradle docs are not any clear on this) gradle calculates checksums right at the time when tasks are performed. What if these operations could be done in advance to save build time.

I am considering two options
a) a daemon that tracks file system events and calculates checksums asynchronously as files are changed
b) a fuse based file system that proxies file operations and calculates checksums asynchronously as files are changed

In both cases gradle just asks “give me the checksum of this file” and is likely to receive precalculated checksum almost always instantaneously.

Hi Genadz,

we are not planning on making the checksum algorithm pluggable. What we are doing though is to watch the file-system for changes to re-use the checksums as much as we can - even between builds: That should make calculating the checksums much faster.

We are also thinking about calculating the checksum in the background when the daemon is idle every time we detect a change. Though that may be more in the future when we see the need.

Maybe you could give file-system watching a spin and see if it helps for your use-case.