Dependency management in multi-project build - Jar every Project?


(Simon Bethke) #1

Hi,
at the moment I am part of a project which is about migrating a big product consisting of more than 10 java-projects from ANT to Gradle.
Currently we are stuck in a discussion on how to setup the build for multiple projects. We are trying to be as close to any available best practice as possible. My question here is:

Should every single project result in a jar that is uploaded to our Nexus Server and followup projects rely on these jars?

My impression is that we should rather have a central place with multi-project build-files and start a build which is not jaring everything but rather compiles everything, jars the required artifacts and upload them as a complete release.

Are there any best practices we can follow? Should every multi-project build have an additional central place for starting the multi-project build?

Best Regards,
Simon


(Chris Doré) #2

Can you clarify what your current project’s layout looks like? Is there one mono-repo containing all of the projects, which are built together, are there multiple repos each buildable on their own, or is there a mix of the two? To put it another way, what’s in the the master/trunk branch(es), individual projects or all projects?

Do you want to stay with the current layout, or do you wish to make a change/evolution towards something else?


(Simon Bethke) #3

Currently we have all projects below one branch/trunk. The versioning and building of the projects is always running in sync and feature implementations spread across multiple projects.
Some projects are Core-projects, in the middle of the dependency graph api Client and Server and Frontend a rich Java Applikation, a Web-Based Client and some add-ons.
Some of the project separations appear to be obsolete and could be joined but most are functional units and the separation could also help to build in parallel.

A big challenge is, that the build cannot be migrated in one step as this would never fit in a user-story with a size acceptable for one Sprint. The current state is that we have migrated two dependant Core projects in two Sprints.
Edit: a Main motivation for this is to introduce dependency Management. That allows us to get libraries out of our code-repository. This again appears to be a prerequisite for getting from SVN to Git.


(Chris Doré) #4

How are you currently integrating these migrated projects back into the remaining ant build? Or is that to be determined by this conversation? :slight_smile:

When you say you want to get libs out of the code repo, I believe you are referring to 3rd party binaries (jars). However, are you also referring to your own library code? For example, you extract your Foo library into its own code repository and version it independently. Builds of this library are published to a binary repository and then consumed/packaged/released by downstream projects. Or, at the end of the migration from ant to Gradle, do you still wish to have a single “trunk” for all of your projects and only have 3rd party dependencies resolved from a binary repository?

Why do you say that?


(Simon Bethke) #5

At the moment, we call the gradle wrapper batch-file with a simple exec task in ant. So the migrated projects still have a build.xml that only contains the calls (clean and build) to Gradle. This is working pretty smoothly and allows us to have the migrated build already productive.

Yea, refering to 3rd party jars. Own code is usually not kept precompiled as libraries (only a code-generator for API Stubs - that we want to migrate to a gradle plugin).
So its rather the later ‘have a single trunk’ that applies to the software. To keep own libraries as pre-compiled jars from our binary repo or not is part of the question I asked initially. As features are often implemented cross-project I don’t see additional use in mixing versions. We partially do this already with our Add-Ons and this is very error-prone as it is hard to know which versions are compatible and which are not.

In our SVN we currently have about 750MB of binary jars in every branch. This is very bad practice and simply has grown to that size over the years. When migrating to Git, this would require a huge Git repository and from what I’ve seen Git has more restrictions in size than SVN. (however, if I am wrong here, 750MB is a good reason on its own :slight_smile: )


(Simon Bethke) #6

A recent internal discussion about this topic showed that we think we should have different build-strategies building on a continuous-build-server and building on a development computer.

So a build server would build each project using only dependency libs from Nexus (also libraries of own code) and upload the project as jar again to Nexus.
While a development machine would not use own code jars from Nexus but build all class-files locally.

Another item that came up is if projects are building their dependencies recursivly (so every project only knows its own parents) or if the multi-project build-script knows the whole dependency-graph and starts building at the root(s).

I am currently wondering if it makes sense to implement a recursive approach for development-machine builds still having a multi-project configuration that is only aware of the leaf-projects.
And at the same time have a supervised build for the build-server that is always uploading all artifacts to Nexus.

Unfortunately we’d end up with a third build that is supervised by eclipse. Maybe this third build should be the development-machine build which would be very nice. But for that we’ll need to integrate a code-generator into the build-mechnism that is triggered in eclipse (using Gradle Buildship).


(Chris Doré) #7

@simonbethke, sorry for taking so long to get back to you, here’s some of my thoughts and opinions.

I recommend you avoid going down the multi-repo road as long as you can, especially if you don’t plan on releasing the individual modules on different cadences. At the very least, I would stay with a single multi-project/module build until you can get everything working nicely with Gradle. Soon after you start splitting the modules into independently buildable units and start reintegrating them via binaries, you’ll likely start to encounter a number of extra complexities that just don’t exist with a single multi-project build. I would defer this extra work to a later phase if you still wish to go down that road.

Most of the extra complexity revolves around dependency management, however there will be added development complexity as well (which itself can also be related to dependency management). Development can become much more difficult if the modules are not clearly defined with good APIs and low coupling. If you have a legacy system anything like ours, then the coupling could be fairly bad. If modules are extracted without putting in the work to refactor them into clean(er) APIs, you’ll end up in a situation where developers are constantly having to work on multiple modules at the same time. You’ve already indicated that you’d like to have developers always working with all the code, but even if they do that these modules will be built by independent CI processes with independent timings. With high coupling, the binaries will need to be in sync once resolved by their consumers and this will be a headache (ie; a consumer depends on two modules that work together, changes to both modules were made, but one builds faster than the other and you have a period of time where the module binaries are out of sync).

Dependency management and module integration in CI will probably be something you’ll need to address. When it comes to declaring dependencies on binaries, there’s pretty much three main options for specifying versions; fully static, fully dynamic, and somewhere in the middle.
Fully static dependencies means your modules will have dependencies like “1.2.3”. In this scenario a dependency change will have to be made by a developer, by making a commit, which would naturally trigger a CI build. This option gives you the slowest rate of integration and the highest maintenance overhead.
Taking it to the opposite extreme, there’s fully dynamic dependencies. A fully dynamic version is simply “+”, meaning, “give me the highest”. In this scenario developers no longer need to commit updated versions and you get the highest rate of integration with the lowest maintenace overhead. Every build will try to use the latest versions of everything, similar to if you had all the code in a single project.
If the modules are versioned in a meaningful way (ie; semantically), then dependencies can be declared like “1.+”, meaning, “give me the latest version compatible with the 1.x API”. This adds in a bit more control and some safety against breaking changes…but…
In all three options, issues can still arise when resolving the dependencies. Given the following modules and dependencies:

A-1.0
B-1.0 dependsOn A-1.+
C-1.0 dependsOn A-1.+
D-1.0 dependsOn B-1.+, C-1.+

All is good here and D’s resolved dependencies are A-1.0, B-1.0, and C-1.0.
Version 2 of A comes out and B is updated to use it. It doesn’t impact B’s API, so the new version of B is 1.0.1. We now have:

A-1.0
A-2.0
B-1.0.1 dependsOn A-2.+
C-1.0 dependsOn A-1.+
D-1.0 dependsOn B-1.+, C-1.+

The resolved dependencies are now A-2.0, B-1.0.1, and C-1.0. Gradle’s default conflict resolution policy is to take the highest version, unless the build is configured to do otherwise (there are a number of ways to accomplish this in Gradle). Depending on what differs in A’s breaking API change, A may or may not work with both C and D. Both of them were coded, built, and tested against the A-1.+ API, but now they’ve had A changed “behind their backs”.

When describing the two dynamic version options there was a missing piece that was mentioned in the static version option. How do new versions of a module trigger CI builds of downstream modules that have declared dynamic dependencies? There’s nothing a developer can commit to bump up the version “1.+”. This is where solutions such as dependency locking can help, but implementing the CI jobs to manage automatic/periodic lock-build-test-commit cycles will be up to you.
While we’re on the topic of dependency locking, regardless of CI integration, you’ll want to do use some type of locking in order to get reproducible builds. When using any type of dynamic dependency, nothing will frustrate developers more than:

  1. SCM checkout
  2. Build
  3. Do some testing.
  4. Make some changes.
  5. Build…FAILURE!
  6. Hmm, must have been my changes, revert changes.
  7. Build…same FAILURE! WTF!!!

What happened between builds 2 and 7 when none of the code changed? Some time later they figure out that between those builds a bad version of a dependency was produced and pulled down. Flaming emails and Slack messages follow. Having dependency locking in place makes those dynamic dependencies static, but then you need something/someone updating the lock files. The above mentioned lock-build-test-commit CI jobs happen to do this (and devs can do it manually on demand when needed, ie when changing the dependencies in the build script). The above failure scenario wont happen with dependency locking since the only way devs will actually get new dependency versions is if they do an SCM update or re-generate the lock files themselves.

I won’t go into much detail, but depending on how you manage your binaries throughout their lifecycle, you might want to implement some sort of binary promotion process as well. For example, our pipelines follow the general flow of:

  1. Build, unit test, publish binaries to build-repo.
  2. Additional automated or manual functional testing that may involve deploying the new binaries with others in order to do more integration testing.
  3. Automated or manual promotion of binaries to candidate-repo, making them available for downstream consumption (builds only resolve from the candidate or higher repo…there’s a release-repo as well that we promote to).

Having a single, multi-project build pretty much avoids all of this (and probably some other things I’ve encountered over the years and forgotten about). You still might have dependency resolution conflict issues with 3rd party libraries, and the dynamic dependencies/locking/CI build issues may still be an issue should you choose to use that strategy with 3rd party libs. However, my experience is that 3rd party libs won’t change anywhere near as often as your own internal dependencies, so static dependency management could be an acceptable overhead/cost for them.

As I mentioned at the beginning, having a multi-repo setup is good if you have a need to version and deploy/release modules on independent cadences. However, if your goal for having a multi-repo setup is related to things like performance or build-times, then I think Gradle has a lot of functionality that will get you a long way with a single multi-project build. Since you mentioned devs working with all the source code, I’m guessing that performance isn’t your goal (if it was, then checking out only the required modules would be the fastest). Some tips to making a single multi-project build perform well:

  1. Incremental builds and tasks, with well functioning task inputs and outputs, can make subsequent builds perform very well.
  2. Avoid doing expensive work during Gradle’s configuration phase whenever possible.
  3. Look into Gradle’s new’ish build cache (you don’t need Gradle Enterprise to use it). Having your CI builds push to the cache and dev builds pull from it can help make a fresh/clean build perform well. NOTE: if you don’t have #1 nailed down, this level of caching won’t be very effective, or worse it will cause issues from incorrect cache hits.

Hopefully some of this is useful and isn’t just the ravings of a madman :slight_smile:
If something doesn’t make sense, feel free to ask more questions.


(Simon Bethke) #8

Hi Chris,

this is still a project in progress at ours and we making small steps from time to time. Right now I just would like to let you know that I really appreciate the effort you spent helping us here. Not only I but also my collegues read your answer and the first important thing we take from it is, that we will try to avoid doing any multi-repo thing.

Kind Regards,
Simon