I am observing very slow completion of the build, after the BUILD SUCCESSFUL and Total time messages. The process pauses for a long duration while the daemon takes a full core, and then completes. I only noticed this when running a full test and receiving messages regarding the daemon disappearing, which I assume is because it became unresponsive for this reason. When I ran with --no-daemon and kill -3 during this time, the only thing of interest was regarding kryo.
I am not sure when this began happening, as I am observing it on older releases in addition to 3.2-rc-2. I may have increased the test count causing this to be observable. If I recall correctly, previously a test task execution ran ~50k tests and now is running ~100k tests. My guess is that writing the binary test report is the cause of the lag. In total the build runs over 1M tests due to parameterization.
I think others should be able to observe this by checking out my project and running gradle clean :caffeine:strongKeysAndStrongValuesSyncCaffeineTest.
If I am straining the test reporting, but it worked, then I’d view this as a opportunity to performance tune. But since running the full test suite is now reporting that the daemon disappeared, its a bug. My only workaround would be to try to further break apart the test tasks to run smaller subsets or reduce coverage.
I realized a recent addition was introducing the buildscan plugin. Removing it and the build completes normally. Even though the flag -Dscan was not used, it seems to be interacting in some way causing a long lag. Unfortunately that means I’ll probably have to remove it until the performance issue is fixed.
Unfortunately the buildscan result is also unable to show the test report. It correctly shows the summary of 94,505 tests, but then claims 0 tests were run when looking at the detailed version. I think its understandable for a beta feature that there are scalability limitations to be worked out.
We just pushed a fix for tests coming from TestNG.
You can browse the build scan mentioned above and see your tests in the expanded Tests section.
Using deep links, you can go directly to all tests (longest first) or to tests by projects.
You may notice the UI crashing if you scroll too deep in the tests tree.
We are currently working on it, along with performance improvements for the UI to render the tests faster.
I ran a new buildscan using the new 1.3 release. Unfortunately I observe the same pause reported earlier, where the plugin is capturing the information and uploading it to Gradle. I hope my build is useful in your improvement efforts.
Sorry, I wasn’t clear enough on my previous reply.
We fixed a bug preventing to show properly TestNG tests in the UI, as you can see in your build scans.
The issue with the build scan plugin taking too long to upload so much data is still in progress.
And as stated above, we are also working on improving the performance of the UI as well, when displaying so much test data.
Hey Ben, just a quick update. We can reproduce this issue and have identified the hotspot in the build scan plugin. We let you know once we have a fix in place.
I saw that 1.4 was released today and it works beautifully! Thank you and the rest of the team. I enabled it in my build so it will run with Travis.
Unfortunately there’s a failure when the cloud service tries to send the activation email, so I haven’t seen the report. This appears to be a parsing issue, as if I run the task with up-to-date skips it works fine. But I’m not concerned by that, since I did jump the gun and upgrade before you got a chance to notify me.
yeah I just wanted to reach out to you and let you know about the 1.4 release and ask you to test this. We’re investigating the build scan activation issue. It seems also kind of related to the huge size of your test suites. It’s quite impressively big We’ll keep you posted.
It appears that buildscan causes subsequent up-to-date checks to fail. From the logs it ran the main test suite (22 minutes) with scanning enabled. Then when ./gradlew coveralls ran without scanning enabled, its task dependency caused the test suite to run again. Since the previous time was 36 minutes, it would now become 36+22=58 and exceed the threshold. I excluded the test task for coveralls and kicked off a new build.
I think the up-to-date check shouldn’t be invalidated due to the buildscan configuration. But I don’t know how tricky that would be to implement.
Thanks for you detailed explanation. Can you point me to an existing Travis build that exhibits the problem, and guide me through your analysis ? I’m not sure I understand what the problem is.
See this problematic build and the fixed build after force pushing the change to use ./gradlew coveralls uploadArchives -x test.
In the first build I realized it was running the tests again, even though a previous step had done that. I had verified that if I ran with and without -Dscan, the up-to-date check was reset. The cleanest solution, in my case, was to explicitly skip the test tasks instead of relying on it being done implicitly.
There’s almost certainly something else going on here. I can’t see any way that the build scan plugin would influence incremental build for any task.
@Benjamin_Manes are you forwarding build system properties to your tests? That would explain it as then the presence of -Dscan would change the inputs to your test task. If this is the case, I suggest being more restrictive in what system properties you forward.
That would be a good explanation, but unfortunately I don’t see any broad forwarding. There are explicit system properties being passed in, but all explicitly by key from what I see.
You’re right. Its unrelated to the buildscan plugin.
When narrowing it down, it seems that if :caffeine:test has any dependent test tasks then all are invalidated. When run individually multiple times, they are up-to-date. When added as a tasks.test.dependsOn(it) the task and all other test tasks are invalidated. They rerun on each subsequent execution, despite there being nothing changed. All preceding tasks are up-to-date.
When downgrading to 3.2.1 the problem goes away. This appears to be a bug introduced in 3.3-rc1 and remains in the 3.3 release.