Weirdness in Gradle exec on Windows


(Dimitar Dimitrov) #1

I am trying to bring up a PostgreSQL instance as part of a build and then do some Liquibase migrations on it. Despite my best efforts, I can’t seem to get the pg_ctl start command to return control to the build.

I can confirm that the server is successfully started and functional even if I kill the gradle build (the pg_ctl takes care to detach the child process).

It is weird, because I use pg_sql stop in another task and it returns normally. Here are the two tasks - kill works, run gets stuck at the end.

task run(type: Exec) {
    workingDir pgWork
    executable "$pgBin/pg_ctl.exe"
    args 'start', '-w'
    args "-D", pgData
    args "-l", "$pgWork/postgres.log"
}

task kill(type: Exec) {
    ignoreExitValue = true
    workingDir pgWork
    executable "$pgBin/pg_ctl.exe"
    args 'stop'
    args "-D", pgData
    args "-m", "fast"
}

Running from the console returns just fine in both cases:


refdata-poc> C:\Users\1325647\pkg\pgsql\bin/pg_ctl.exe start -w -D build\pgData\ -l build\pgWork\postgres.log
waiting for server to start.... done
server started

refdata-poc> C:\Users\1325647\pkg\pgsql\bin/pg_ctl.exe stop -D build\pgData\ -m fast
waiting for server to shut down.... done
server stopped

1325647@933P66150902001 C:\sandbox\refdata-poc

The relevant stack trace is:

"Daemon worker" #29 prio=5 os_prio=0 tid=0x000000005f292000 nid=0x2f68 waiting on condition [0x000000006063c000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c06fbc70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at org.gradle.process.internal.DefaultExecHandle.waitForFinish(DefaultExecHandle.java:276)
        at org.gradle.process.internal.DefaultExecAction.execute(DefaultExecAction.java:29)
        at org.gradle.api.tasks.AbstractExecTask.exec(AbstractExecTask.java:54)
        ...
 
"Run command 'C:\Users\1325647\pkg\pgsql\bin/pg_ctl.exe'" #43 prio=5 os_prio=0 tid=0x000000005f1ed800 nid=0x514 waiting on condition [0x0000000064abe000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c0722038> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
        at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
        at org.gradle.internal.concurrent.StoppableExecutorImpl.stop(StoppableExecutorImpl.java:62)
        at org.gradle.internal.concurrent.DefaultExecutorFactory$TrackedStoppableExecutor.stop(DefaultExecutorFactory.java:63)
        at org.gradle.internal.concurrent.StoppableExecutorImpl.stop(StoppableExecutorImpl.java:53)
        at org.gradle.process.internal.streams.StreamsForwarder.stop(StreamsForwarder.java:78)
        at org.gradle.process.internal.ExecHandleRunner.run(ExecHandleRunner.java:82)
        at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
        at org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
     

(Sterling Greene) #2

When you’re running from the command-line, are you in the same directory as the Exec task would be? i.e., workingDir

Does it make any difference if you don’t log to postgres.log?


(Dimitar Dimitrov) #3

Tried running from the same dir - works fine.

Omitting the -l option causes some async output to the console (didn’t know it’s possible in Windows), but otherwise the postgress process is correctly started in background. The control correctly returns to the console - i.e. I can type more commands and they would run when I press Enter.


(Dimitar Dimitrov) #4

Ok, I am back at this problem. There is one main difference from the case where exec terminates and the case where it doesn’t - the pg_ctl --start command forks new postgress process before it dies. It almost looks as if Gradle is waiting for the whole process tree (I have no idea how to achieve that in Java).

I can see in ExecHandleRunner#run() that the process dies, we get the exit status and then it gets stuck while it tries to shutdown the executor consuming the stdio streams.

I went to the executor threads and confirmed that the streams have the closed boolean field as false. Looking at the StreamsForwarder.stop() method, we close only the stdin

On the other hand, that should be ok as I expect the OS should close the process stream after the proves dies and any children would inherit separate file descriptors for the stdio… or does it work that way on Windows?

I will look more tomorrow, meanwhile here are the relevant stack traces in case they help:

"Run command 'C:\Users\...\pkg\pgsql\bin/pg_ctl.exe'@6109" prio=5 tid=0x4b nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at org.gradle.internal.concurrent.StoppableExecutorImpl.stop(StoppableExecutorImpl.java:58)
	  at org.gradle.internal.concurrent.DefaultExecutorFactory$TrackedStoppableExecutor.stop(DefaultExecutorFactory.java:63)
	  at org.gradle.internal.concurrent.StoppableExecutorImpl.stop(StoppableExecutorImpl.java:53)
	  at org.gradle.process.internal.streams.StreamsForwarder.stop(StreamsForwarder.java:78)
	  at org.gradle.process.internal.ExecHandleRunner.run(ExecHandleRunner.java:82)
	  at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
	  at org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	  at java.lang.Thread.run(Thread.java:745)


"Forward streams with process: command 'C:\Users\...\pkg\pgsql\bin/pg_ctl.exe' Thread 2@6120" prio=5 tid=0x4f nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
	  at java.io.FileInputStream.read(FileInputStream.java:233)
	  at org.gradle.process.internal.streams.ExecOutputHandleRunner.run(ExecOutputHandleRunner.java:51)
	  at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
	  at org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	  at java.lang.Thread.run(Thread.java:745)

"Forward streams with process: command 'C:\Users\...\pkg\pgsql\bin/pg_ctl.exe' Thread 3@6121" prio=5 tid=0x50 nid=NA runnable
  java.lang.Thread.State: RUNNABLE
	  at java.io.FileInputStream.readBytes(FileInputStream.java:-1)
	  at java.io.FileInputStream.read(FileInputStream.java:255)
	  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	  at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
	  at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	  - locked <0x183b> (a java.io.BufferedInputStream)
	  at java.io.FilterInputStream.read(FilterInputStream.java:107)
	  at org.gradle.process.internal.streams.ExecOutputHandleRunner.run(ExecOutputHandleRunner.java:51)
	  at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
	  at org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
	  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	  at java.lang.Thread.run(Thread.java:745)

(Dimitar Dimitrov) #5

Ok, found the problem is caused by special behavior of pg_ctl - it passes the FD of stdout to its children, so one can redirect it to a file if needed. The PostgreSQL guys say it’s a feature and using -l should close the streams. But then I tried that and it still blocks on Windows.

http://www.postgresql.org/message-id/17855.1419977186@sss.pgh.pa.us

Will keep digging.


(Dimitar Dimitrov) #6

Final update in case anybody has the same problem.

In a nutshell, the code below works fine for Linux and there is a workaround for Windows.

task run(type: Exec) {
    group = 'pgSql'
    workingDir pgWork
    executable "$pgBin/pg_ctl.exe"
    args 'start', '-w'
    args "-D", pgData
    args "-l", "$pgWork/postgres.log"

    if (org.gradle.internal.os.OperatingSystem.current().windows) {
        def execActionField = task.getClass().superclass.superclass.getDeclaredField('execAction')
        execActionField.accessible = true
        def execAction = execActionField.get(task) as ExecHandleBuilder
        execAction.streamsHandler [:].withDefault { return { a = null, b = null, c = null -> } } as StreamsHandler
    }
}

Despite all my attempts, I could not get pg_sql to close the stdout pipe on Windows, so as a workaround I hacked the stream handler not to do anything (the pg_ctl command doesn’t output much, so we don’t risk on filling the buffer, so the only thing we lose is some diagnostics if the startup fails).

The problem goes quite deep in the way Postgresql spawns worker instances and handles the console I/O. The problematic code in pgSql can be seen here: pg_ctl::do_start => pg_ctl::start_postmaster => CreateRestrictedProces

There is something with the CreateRestrictedProcess that causes the stdio pipe of the caller to be kept open.

I will soon need to come up with a better general way of handling long running processes (i.e. to accomodate scenario such as: start primary and backup server, run some tests, kill primary server, do some more tests, do failover, run some more tests). If anybody has done anything like this or has some tips, please let me know.


(JR) #7

Any further luck here? It’s a blocker for me when attempting to use gradle to execute msbuild tasks.


(Dimitar Dimitrov) #8

The last code snippet I posted solves the problem for me. I don’t think anybody from Gradle Inc. has picked up the problem, as it is a relatively fringe use-case.