Transparent/pluggable input/output checksums

bgeradz · March 7, 2020, 8:27am

According to docs gradle keeps track of “checksums” of task inputs / outputs to understand whether a task should execute.
However there is no explanation on when and how these checksums are calculated.

Is it possible to make gradle’s checksuming mechanism pluggable or at least more transparent?

Use case: let’s say your project tasks operate on huge amount of data as inputs and outputs. Let’s assume naively calculating md5 of files is inefficient and does not scale well. What if there is an external service that efficiently keeps track of checksums and gradle could be instructed to ask the external service for checksums of given files instead of calculating them by itself?

Lance · March 7, 2020, 2:47pm

For this purpose you can use inputs.property(String name, Object value) where the value can be a Closure or a Serializable object

bgeradz · March 8, 2020, 12:12pm

Of course I can apply something like this for 1% of all calculations for my custom tasks.

What I am after is a possibility from the top build script to say, hey gradle, do your usual thing with all the plethora of tasks but when it comes to file checksum calculation use this replacement universallly.

Wouldn’t it be useful?

Lance · March 8, 2020, 12:36pm

I think that if you’ve got a fantastic algorithm for calculating checksums that outperforms Gradle’s algorithm then you should open source it and let Gradle use that instead of its current mechanism

bgeradz · March 8, 2020, 3:14pm

I believe the algorithm is perfectly fine.
It’s about when checksums are calculated, not how.
If I understand it right (gradle docs are not any clear on this) gradle calculates checksums right at the time when tasks are performed. What if these operations could be done in advance to save build time.

I am considering two options
a) a daemon that tracks file system events and calculates checksums asynchronously as files are changed
b) a fuse based file system that proxies file operations and calculates checksums asynchronously as files are changed

In both cases gradle just asks “give me the checksum of this file” and is likely to receive precalculated checksum almost always instantaneously.

Stefan_Wolf · June 4, 2020, 4:45pm

Hi Genadz,

we are not planning on making the checksum algorithm pluggable. What we are doing though is to watch the file-system for changes to re-use the checksums as much as we can - even between builds: https://blog.gradle.org/introducing-file-system-watching. That should make calculating the checksums much faster.

We are also thinking about calculating the checksum in the background when the daemon is idle every time we detect a change. Though that may be more in the future when we see the need.

Maybe you could give file-system watching a spin and see if it helps for your use-case.

Cheers,
Stefan

Topic		Replies	Views
Is there any way to generate a checksum for a folder in Gradle? Help/Discuss	3	746	June 28, 2023
Is there any way to generate a checksum for a set of files in Gradle? Old Forum Archive	1	4012	August 15, 2012
Generating checksums on upload? Old Forum Archive	3	1852	March 26, 2012
Where/how does gradle store up-to-date checks Help/Discuss	1	2065	October 15, 2015
Custom Task: Calculated @Input based on other @Input Help/Discuss	2	442	October 17, 2020

Transparent/pluggable input/output checksums

Related topics