Tooling API: Let us know if multiple projects got evaluated


(Attila Kelemen) #1

The problem -----------

I’m currently in the process of letting extensions to be written for the NetBeans Gradle integration. While doing so, I have realized that there is a very serious performance issue to be solved.

In NetBeans you may load all the projects of a multi-project build separately. This improves performance when dealing with large multi-project builds if you are not actually editing all the projects (less classpath to index). As of now, if you were loading an IdeaProject, all Gradle projects were actually loaded (by the Tooling API). Therefore, I could simply parse all of them and cache them, so subsequent project loads of the same multi-project build were close to a no-op.

However, Gradle 1.6 introduced the possibility of loading 3rd party models. This is nice but as of the current version of the Tooling API, this brings a major performance issue (given that I want to exploit the new feature of Gradle). Namely, it is likely that users are keeping open multiple projects of the same multi-project build. Assume that I request the Tooling API for models specifying one of those projects. It is likely that evaluating a project will cause the evaluation of multiple other Gradle projects (as of now, every project is evaluated). Given that multiple projects are evaluated it is a considerable waste not to have models for other evaluated projects as well. That is, if I attempt to load the next project, I cannot refer to my cache because that project has not yet been loaded and I cannot deduce what will happen if I request the Tooling API. This will eventually lead me to many unnecessary project evaluation, which will have a huge performance impact.

Finding a solution for this problem is extremly important for making use of 3rd party models (or anything beyond models currently available in Gradle). That is, without a solution I cannot use the new feature without a huge performance hit (even for projects not using 3rd party models).

Example -------

Assume that someone writes a Jacoco model. Also assume that there is a NetBeans Gradle extension which is aware of this model. Since I have to faithfully serve models for extensions, I will have no chance but to load each project separately even though projects were evaluated. Notice that having this Gradle extension installed will slow down loading of plain old J2SE projects even if they do not really use the extension mentioned previously. This is simply unacceptable.

A possible solution -------------------

Unlike ‘ProjectConnection’, provide a short lived connection to the daemon which does not reevaluate the project if a new model is requested. Also, I expect this connection to be able to tell if there are projects already evaluated. For example:

DaemonConnection connection = ...;
EvaluatedProject project = connection.evaluateProject(projectDir);
  List<MyModel> models = new LinkedList<>();
models.add(project.getModel(MyModel.class));
for (EvaluatedProject otherProject: project.getOtherEvaluatedProjects()) {
    models.add(otherProject.getModel(MyModel.class));
}

I hope that the example code above describes the proposed solution clearly enough. Notice that this also solves my previously reported issue as well.

Thank you for your attention :slight_smile:


(Adam Murdoch) #2

Thanks for your feedback. Excellent as always.

There are a few things we plan to do that should help solve your problem:

The first is to provide some way for you to tell us exactly which projects you are interested in. At the moment, the contract of the tooling API is ‘tell me about every project in this build’. But, as you know, the user is often not interested in every project of a given build. And often the user is interested in projects from multiple builds. Instead, I think we want some way for you to define a ‘workspace’ - that is the set of projects (from any build) that you are working with - and allow you to query models from this workspace.

This has several advantages:

  • We can configure only those projects that you need to do work on. For Gradle 1.x builds, this means we have to configure every project in a build - this is the contract and we have to honour this for backwards compatibility. For builds that use configure-on-demand (which is likely to be the default in Gradle 2.x) we can skip configuration of any project that is not required. - We can replace project dependencies on those projects that are not included with jar dependencies. - We can replace external dependencies on projects that are included with IDE project dependencies.

Another thing we plan to do is decouple the information about the hierarchy from information about a project, so that project models are facets of a project, or views over that project. At the moment, you are forced to ask about a single facet (the java model, say) for all projects in the build, and you have to do this one facet at a time. Instead, you should be able to query these things separately or together in any combination, and for any set of projects.

One fundamental question here is whether the API remains batch oriented (you have to tell us everything you will need up front, then we go and do the work and return the result), or whether we can do the work on-demand as you ask for it (and we take care of caching and reusing work that’s already been done).

I imagine an on-demand API is going to be easier to use from your point of view. There are, however, a few issues with this that we’d need to sort out:

  • For Gradle 1.x builds, we’d still need to configure everything, for backwards compatibility reasons. I don’t think this is necessarily a problem, as we can hide this behind the API. Plus we are (gradually) moving away from this model towards an on-demand configuration model. - For some models, tasks need to be executed. Again, our Gradle 1.x task execution model assumes a single batch of tasks are run. Perhaps we allow up to one set of tasks to execute and then fail if the build is using the 1.x model. For the configure-on-demand model, we can run as many batches of tasks as we like. - With the on-demand approach, we need to deal with files that affect the model changing between invocations. - A project’s model can vary based on the type of build that is to be executed. At the moment, this is often inferred by the set of tasks to be executed. So, we’d need to know the tasks to be executed when we configure a project. Again, we want to move away from this approach.

In short, on-demand is just a bit too different to the Gradle 1.x execution model to fit well, but it fits very well with where we want to go over the long term.

My current plan is to offer some batch oriented API that solves these problems, and then later offer an on-demand API once we’re closer towards most people using the 2.x model.

Do you think this will work for you? That is, do you have some way to know exactly which projects and models you are interested in up front?


(Attila Kelemen) #3

Thank you for your detailed answer.

I don’t know how familiar you are with NetBeans but for the reference I will write down how things usually go in NetBeans: Unlike in Idea or Eclipse, NetBeans does not know what a multi-project build is (multi-project builds would be possible to implement but not generally done). It can however deduce that a project’s source set depends on another project’s source sets (which effectively a project dependency). From the user point of view, this is very convenient because you don’t have to set up workspaces. You just open a single project and start working on it. Even better, if you happen to jump to a code of a dependent project, you can edit that code without opening its project. Actually, this is something very conveniently works with Gradle because building will consider the changed sources (unlike Maven). This said, I don’t want to require the user to set up a workspace. This would be just unnecessarily inconvenient.

About batch oriented API vs. on-demand: Assuming that there are already multiple opened projects awaiting for model retrieval, I can request these models in a single call. There is a little problem with the first model request as it may happen that it is requested too fast before I know about other projects. So, I can make use of batch oriented API but to me on-demand can be more efficient. Although, combining on-demand with batch would probably be the best.

I was not very explicit in my “possible solution” but what I most need is not the on-demand feature. I just thought the API looks clearer this way. What I will badly need is what other projects got evaluated. That is, even with configuration-on-demand, evaluating a project will get several other projects evaluated. So, it means, I can parse those projects right away and cache them. Therefore, if the user happens to open those projects, I will have their model readily available.

I will reflect on your points:

We can configure only those projects that you need to do work on. For Gradle 1.x builds, this means we have to every project in a build - this is the contract and we have to honour this for backwards compatibility. For builds that use configure-on-demand (which is likely to be the default in Gradle 2.x) we can skip configuration of any project that is not required.

As I mentioned above, I only expect to know what projects got evaluated even if I did not explicitly request it.

We can replace project dependencies on those projects that are not included with jar dependencies.

I don’t see the benefit of this. The content of the jar still have to be indexed by the IDE and you lose to ability to jump to sources and edit it if needed. Also, I expect building jars take considerable more time than evaluating the project.

We can replace external dependencies on projects that are included with IDE project dependencies.

I don’t understand what you mean. Can you elaborate?

For Gradle 1.x builds, we’d still need to configure everything, for backwards compatibility reasons. I don’t think this is necessarily a problem, as we can hide this behind the API. Plus we are (gradually) moving away from this model towards an on-demand configuration model.

I agree, that configuring everything is not really something to worry about. I believe users should switch to configure-on-demand for better performance.

For some models, tasks need to be executed. Again, our Gradle 1.x task execution model assumes a single batch of tasks are run. Perhaps we allow up to one set of tasks to execute and then fail if the build is using the 1.x model. For the configure-on-demand model, we can run as many batches of tasks as we like.

I don’t know Gradle internals but if something is not possible to be efficiently implemented in the non configure-on-demand mode, then a less efficient way to mimic it is acceptable (again switch to configure-on-demand for performance :)). Anyway, since I can see that configure-on-demand is a very important property, would it be possible to query if configure-on-demand is enabled or not through the Tooling API, so I could base decision on that?

With the on-demand approach, we need to deal with files that affect the model changing between invocations.

If I understand you correctly, this would improve performance if editing build scripts then reloading projects? If so, I would make this a lower priority issue, since people should usually code, rather than edit the build script.

A project’s model can vary based on the type of build that is to be executed. At the moment, this is often inferred by the set of tasks to be executed. So, we’d need to know the tasks to be executed when we configure a project. Again, we want to move away from this approach.

I would expect, that in the future each plugin could provide its own model (per project). For example, the “java” plugin could provide a JavaModel containing source sets, compiler arguments, and whatever possible to be configured. So an IDE may request multiple models and once have them parses them. I prefer to move away from IdeaProject and EclipseProject because they are very limiting, yet trying to provide too much. That is, I believe Gradle should not try to format new abstract models because IDEs should be a better judge on how information is actually needed.

As a conclusion, I think batch oriented API - although useful - provides only a very limited help in itself. This is because I don’t want users to configure what projects they need prior opening projects. This would be user unfriendly.