Gradle build time sucks on large repo

deanhiller · July 31, 2022, 2:38pm

Webpieces in CI takes 5 minutes as seen in the build times and this would be 1 minute in a monobuild.
github project: GitHub - deanhiller/webpieces: A project containing all the web pieces (WITH apis) to create a web server (and an actual web server)
circleCI: https://app.circleci.com/pipelines/github/deanhiller/webpieces

Orderly Health has a monobuild that is way way much more code and if we change a microservice, it builds in 1-3 minutes (depending on how many tests. Using the gradle build and matching webpieces build system with composite builds, a CLEAN build time goes to about 15 minutes because gradle incorrectly builds the whole world

Can we get a gradle version that only builds changes and the projects that changed?

yes, yes, bazel is a product but it is way nicer to open gradle projects that then open all the projects I depend on.

ALSO, I love the ‘implementation lib’ vs. ‘api lib’ so can gradle actually have the ability to open a ‘lib’ and it wil open up all projects with ‘implementation lib’ as well as any projects that pull in libraries with ‘api lib’ that it gets exposed to such that I can refactor that lib easily in my monorepo. The flipside is I have to find each and every project and see if I can add it which doesn’t always work as it is not a feature in use by many I think. This one click open would rock for refactoring in a monorepo

We will be releasing an open source build template that works great for smaller/mid-size companies and is not optimized for CI just yet (ie. it can be further split to build N lead nodes on N machines to further reduce the 15 minutes above to 3-5 minutes).

ie. you want to just run ./monobuild.sh and it detects changes and only builds projects that change, dependencies and transitive dependencies.

Dean

Vampire · August 17, 2022, 10:33pm

If you do a clean build, how should any buildsystem in the world not build the whole world?
You just cleaned it, so nothing is there anymore.
Well, you can use the build output cache, it would take parts from the build cache that are worth caching like for example class compilation.

Gradle is usually excellent in avoiding unnecessary work unless builds break it by bad configuration.
If things are rebuilt that shouldn’t because they didn’t change and you did not clean, then you should use build scans or --info output to find out why tasks are rerun incorrectly.

deanhiller · August 22, 2022, 9:30am

@Vampire I am glad you asked. This is actually what systems for monorepos are designed to do(pants/bazel) for monorepos. This build times are much better since they do not build Twitter’s 100’s of projects and libraries but only what changed. The important thing here is to only build what changed AND projects that depends on those projects. Twitter’s pants build system does this. I think Google’s Bazel does this too. This is the advantage of a monobuild and frankly why we ran one at Orderly Health for 2 years with master never being broken(except for flaky tests).

It gets way more advanced than that when you dive into the details. They even sometimes avoid building libraries the leaf nodes need by caching last built artifacts as well. There is parallelization(not needed for smaller companies ) as well. I am happy to discuss further live if you like and are curious. (Calendly - Dean Hiller)

Vampire · August 22, 2022, 3:29pm

Sorry, but I’m still not getting your point.
Gradle does exactly all that too.
It even only rebuilds downstream projects if the ABI of the upstream project did change, so if you have project A and a project B that depends on A and then you only change private signatures or method implementations in A, B will not rebuild as it is not necessary.

deanhiller · August 23, 2022, 8:30am

I am failing so let me do quite some detail and see if that helps. You can simulate CI by having a clean computer (Realize after you do this process, you must blow away the webpieces directory if you want to simulate CI again).

git clone GitHub - deanhiller/webpieces: A project containing all the web pieces (WITH apis) to create a web server (and an actual web server)
modify any file in http-dev-router
run ./gradlew build

This results in building the world which is way slower than a monobuild that does ‘not’ build the world. In the monobuild, the only thing that is built is http-dev-router and the things it depends on as well as the things that depend on it (which is not the full repository).

Gradle on a clean build builds everything resulting in drastically slower build times.

We did a comparison of our ‘hacked monobuild that builds composite gradle projects’.

our hacked monobuild follows this process

Detect files changed by developer
map to projects
find projects that dependds on those projects until hitting a ‘leaf node’
build all leaf nodes(all leaf nodes pull in what they need via composite projects)

This rarely results in ‘build the world’ semantics. In fact, I am thinking about applying the monobuild to webpieces as I get tired of 5-7 minute build times when a leaf node change is only a 1 minute build time if it was a monobuild.

I hope this was more clear?

deanhiller · August 23, 2022, 8:44am

oh, another feature of monorepos/monobuilds - you open a leaf node project with intellij which brings in only projects it uses. This results in much faster load times. Back to the webpieces example again, if I bring in http-dev-router in intellij, not all of the pieces of webpieces are brought into intellij.

Vampire · August 24, 2022, 10:22pm

In the monobuild, the only thing that is built is http-dev-router and the things it depends on as well as the things that depend on it (which is not the full repository).

I guess I understand now. Those “monobuild” tools figure out from Git changes which projects are changed? Well, I was never a fan of monorepos and probably will never be. I always thought they are ugly and unmanageable. Everything that should be branched, built, and tagged together should be versioned together. Everthing that is not, should not. But that’s probably more a philosophical discussion. If you really want to just build http-dev-router, its dependencies and dependents, do ./gradle :http-dev-router:build :http-dev-router:buildNeeded :http-dev-router:buildDependents. Gradle builds what you tell it to build.

I totally disagree that CI builds should not build everything, or respectively not always the same. If it depends on included changes what is built, all CI statistics are useless. No build times history, no test amount history, no coverage movement over time, …
Then it’s better to have separate jobs for separate entrypoints and restrict VCS triggers to only react on relevant projects or similar.
How would the clean build even know what has changed if he doesn’t know anything about the past?

Or as I mentioned earlier, make use of the task output cache, then it doesn’t really hurt if the world is built, because the time intense tasks are delivered from cache and you can even use a remote cache’s entries populated by CI on local developer machines.

If you really want such - imho strange - behavior, you could probably have some code in your build that just disables all projects build tasks that should not be run so that only the “interesting” one and its dependencies is run.

oh, another feature of monorepos/monobuilds - you open a leaf node project with intellij which brings in only projects it uses. This results in much faster load times. Back to the webpieces example again, if I bring in http-dev-router in intellij, not all of the pieces of webpieces are brought into intellij.

Should be the same with IntelliJ, shouldn’t it?

deanhiller · August 27, 2022, 8:48am

Very cool and interesting thoughts!! thank you!!

"If it depends on included changes what is built, all CI statistics are useless. No build times histroy, no test amount history, no coverage movement over time, "

This is really interesting. I do like to have ‘CI stats’ per leaf node (ie. the main microservices that people change). Seems like an interesting tradeoff. faster build times for developers that only change a microsevice vs. change a library affecting more of the repo. The cache you mention is interesting though it leads to the same issues regarding CI stats/code coverage stats, etc - though personally I think build times going down enough may trump the stats (though I want those still now that you mention it…lol)

“Should be the same with IntelliJ, shouldn’t it?” - I am not sure how here. Currently, I load webpieces root directory and it indexes the world. I can’t open a single project or intellij just fails because it is a multiproject build not a composite build. Is there a way to not have intellij index the world? That and downloading the world’s 3rd party deps is a killer vs. loading a leaf node and only indexing what I am going to work on and only loading 3rd parties of what I will work on.

I really like the stats idea. in our comparisons, the 5x speed of not building the world was great(though maybe caching is another way to do that.). It might be a ton of work to get to caching at this point though…I was sort of hoping gradle had some progress in monorepo for those that wanted it. We also could consider going to bazel of course as well.

deanhiller · August 27, 2022, 8:52am

@Vampire Thinking more on the subject. CI stats, code coverage over time. This could exist actually if gradle wanted it to. ie. only stats for projects built are kept in stored over time so you ‘could’ in theory go into any libraries history just fine. Each lib has it’s own build time in a monorepo as well. Each lib has it’s own code coverage. All that could simply be stored on cases where it is actually built. I guess my plea above is for better monorepo support like github is doing. The codeowners file rocks as people work cross team and it just tags the teams of the code you touched mapping code to team easily.

Vampire · August 27, 2022, 10:43pm

This is really interesting. I do like to have ‘CI stats’ per leaf node (ie. the main microservices that people change).

I guess if you need it per leaf node, you need to set up CI separately for each leaf node anyway.
And that means you do run :that:leaf:node:build and it will only build that leaf node and what is necessary. Depending on the CI system, you can then probabably also configure which paths trigger a build if changes are detected.

The cache you mention is interesting though it leads to the same issues regarding CI stats/code coverage stats, etc

Well, the “build time” stat was maybe a bad example, as it will depend on how much is rebuilt and how much is coming from cache. But all other stats should work with task output coming from caches. Amount of tests or test coverage for example is usually imported to CI by parsing the XML report file the test task or coverage report task is generating and even if theses are taken from cache, they will have the correct numbers.

though personally I think build times going down enough may trump the stats (though I want those still now that you mention it…lol)

Yes, build time is important for big builds, but with task output cache and upcoming configuration cache it should be much better than when rebuilding everything each time.

And especially for developer builds I think the time is much more relevant than on CI.
And developer builds should usually not be run clean, so even without build cache most tasks should simply be up-to-date if not much changed.

I can’t open a single project or intellij just fails because it is a multiproject build not a composite build.

Then I probably misunderstood you. I understood that they are combined through composite builds.
Maybe you should change that and it will allow to do it like that then.

Is there a way to not have intellij index the world?

If it is a big big multi-project and not a composite build forest where you can just open a part like assumed, then there might still be options. You can for example “Unload” modules in IntelliJ which marks them as “I don’t work with those currently, don’t index them, don’t highlight, and so on”.

And you can most probably combine this with idea-ext plugin. As far as I remember it does not have an option to unload modules. You could probably open a feature request for that. But as an ad-hoc measure afair it also has the “manipulate my XML” functionality the idea plugin also has, so you can probably at least with that configure the unloaded modules, you just have to check where IntelliJ persists this setting.

That and downloading the world’s 3rd party deps is a killer vs. loading a leaf node and only indexing what I am going to work on and only loading 3rd parties of what I will work on.

Well, those should be in the Gradle cache once you downloaded them, so this should not be a persistent problem.

I was sort of hoping gradle had some progress in monorepo for those that wanted it.

I have no idea whether Gradle has any plans in that direction or whether maybe some plugin for this exists or could exist. While I’m still not sure how you detect which leaves to build if you do a clean build as you don’t know which two code states to compare if you do not have locally modified files, assuming this is possible somehow, I guess it shouldn’t be too hard to write some plugin that determines which paths / leaves changed and then have some monoBuild task that depends on build, buildNeeded, and buildDependents for those projects that have changes which would probably achieve what you want.

But as I said, I’m not aware of any plans or options, as I’m personally no big fan of monobuilds yet.

Thinking more on the subject. CI stats, code coverage over time. This could exist actually if gradle wanted it to. ie. only stats for projects built are kept in stored over time so you ‘could’ in theory go into any libraries history just fine.

I don’t really get what you mean.
Those are CI stats, not Gradle stats.
And also how should Gradle keep anything anywhere if you always do a clean build as you described?
There is no place to persist anything and thus also no way to determine “what changed”.
Or what do I miss?

Topic		Replies	Views
OK, Build Cache is pretty awesome Help/Discuss	6	1966	May 23, 2017
How to call multiple separate multi-project gradle builds? Help/Discuss	2	1405	April 3, 2020
Trivial build takes 10 seconds - is this normal? Old Forum Archive	28	2124	February 13, 2013
Very slow builds Help/Discuss	11	5238	August 21, 2015
Gradle composite builds and setup for breaking changes? Help/Discuss	1	562	April 13, 2020

Gradle build time sucks on large repo

Related topics