Grinder agent communication shut down when worker process launched

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Grinder agent communication shut down when worker process launched

alfinhaji
Hi all,

Looking for an issue we have been troubleshooting for a few days now.  This issue has exposed itself after our client moved over to a new data center requiring us to rebuild our grinder environments.  Our grinder test infrastructure is fully functional in the old environment, and these issues have exposed themselves in the new environment only.  Here is a run down of the issue.

1.  Running headless console as we haven't been provided with X11 terminal emulation. (this is new compared to the old environment, where we ran the Console UI)
2.  Console starts up fine, agent starts up fine, and connects to the console.

$ tail -f grinderAgentoutput.log
Setting argument grinder.consolePort=6372
+ JAVA_ARGS=' -Dgrinder.useConsole=true -Dgrinder.consoleHost=L-01942-C5F -Dgrinder.consolePort=6372'
+ java -cp /test_data/grinder_artifacts/grinder/lib/grinder.jar: -Dgrinder.useConsole=true -Dgrinder.consoleHost=L-01942-C5F -Dgrinder.consolePort=6372 net.grinder.Grinder -daemon 10
2015-11-19 15:26:41,336 INFO  agent: The Grinder 3.11
2015-11-19 15:26:41,358 ERROR agent: Failed to connect to 'L-01942-C5F/10.237.119.71:6372'
2015-11-19 15:26:41,358 INFO  agent: agent finished
2015-11-19 15:26:41,358 INFO  agent: sleeping for 10000 ms
2015-11-19 15:26:51,358 INFO  agent: The Grinder 3.11
2015-11-19 15:26:51,404 INFO  agent: connected to console at L-01942-C5F/10.237.119.71:6372
2015-11-19 15:26:51,405 INFO  agent: waiting for console signal

3.  At this point I can test the agent service and get a valid JSON response back:

$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"RUNNING","workers":[]}]

4.  I kick off our test automation, which eventually makes a call to start the worker process:

http://L-01942-C5F:6373/agents/start-workers  (we pass the grinder properties in when we make this POST call using the REST API)

5. As soon as the worker process is started, we see the test kick off perfectly fine, validated from the agent log file:

2015-11-19 15:28:04,320 INFO  agent: received a start message
2015-11-19 15:28:04,325 INFO  agent: communication shut down
2015-11-19 15:28:04,473 INFO  agent: Worker process command line: /test_data/grinder_artifacts/java/jdk1.7.0_25/jre/bin/java '-javaagent:/test_data/grinder_artifacts/grinder/lib/grinder-dcr-agent-3.11.jar' '-Dpython.path=/test_data/projects/common/src' '-Dpython.cachedir=/tmp/cachedir' '-Xms2g' '-Xmx2g' '-XX:NewSize=1g' '-XX:MaxNewSize=1g' '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+UseConcMarkSweepGC' '-XX:+UseParNewGC' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+CMSConcurrentMTEnabled' '-XX:+AlwaysPreTouch' '-XX:+UseCompressedOops' -classpath '/test_data/grinder_artifacts/lib/jsoup-1.6.1.jar:/test_data/grinder_artifacts/lib/jyson-1.0.2.jar:/test_data/grinder_artifacts/grinder/lib/grinder.jar' net.grinder.engine.process.WorkerProcessEntryPoint
2015-11-19 15:28:04,493 INFO  agent: worker L-01942-C5F-0 started
10.940: [GC 10.940: [ParNew: 838912K->32398K(943744K), 0.0570450 secs] 838912K->32398K(1992320K), 0.0572580 secs] [Times: user=0.10 sys=0.00, real=0.06 secs]
2015-11-19 15:28:18,767 INFO  L-01942-C5F-0: starting threads
14.306: [GC 14.306: [ParNew: 871310K->39438K(943744K), 0.0550070 secs] 871310K->39438K(1992320K), 0.0551420 secs] [Times: user=0.11 sys=0.00, real=0.06 secs]
2015-11-19 15:28:19,128 INFO  L-01942-C5F-0: will shut down after 900000 ms
2015-11-19 15:28:56,156 INFO  L-01942-C5F-0: finished
Heap
 par new generation   total 943744K, used 179113K [0x000000077ae00000, 0x00000007bae00000, 0x00000007bae00000)
  eden space 838912K,  14% used [0x000000077ae00000, 0x00000007821cf968, 0x00000007ae140000)
  from space 104832K,  57% used [0x00000007b47a0000, 0x00000007b82badb0, 0x00000007bae00000)
  to   space 104832K,   0% used [0x00000007ae140000, 0x00000007ae140000, 0x00000007b47a0000)
 concurrent mark-sweep generation total 1048576K, used 0K [0x00000007bae00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 35456K, used 35295K [0x00000007fae00000, 0x00000007fd0a0000, 0x0000000800000000)
2015-11-19 15:28:56,503 INFO  agent: agent finished
2015-11-19 15:28:56,503 INFO  agent: sleeping for 10000 ms



Our custom test automation we have built, however, polls the agent status on a regular basis to see if the worker process has completed.  We do this to automate multiple tests in sequence, once a test ends (worker process complete), our automation will immediately kick off another worker process to start the next test we have defined.  Unfortunately, what I am seeing since moving to this "new" grinder environment is this polling of the agent status comes back with a valid HTTP 200 with nothing in the JSON body, completely blank, while the worker process is executing the test.  Here is what I see when I continuously send the CURL request off to check the agent status just as the worker process is starting:


$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"RUNNING","workers":[]}]
$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"FINISHED","workers":[]}]
$ curl http://L-01942-C5F:6373/agents/status
[]    <---- blank JSON response

Also note in the agent log above we see a "communication is shut down" message (which we don't see in the old environment).

As soon as the worker process completes the grinder test, we see the agent re-establish connection to the console and start getting a valid RUNNING status on the agent status:

Agent log file:

2015-11-19 15:29:06,504 INFO  agent: The Grinder 3.11
2015-11-19 15:29:06,505 INFO  agent: connected to console at L-01942-C5F/10.237.119.71:6372
2015-11-19 15:29:06,505 INFO  agent: waiting for console signal

Curl output

$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"RUNNING","workers":[]}]


Some more info:

Running console, agent, and worker process info:


$ ps -ef | grep java
grinder  28379     1  1 15:26 pts/0    00:00:17 /test_data/grinder_artifacts/java/jdk1.7.0_25/bin/java -cp /test_data/grinder_artifacts/grinder/lib/grinder.jar: -Dgrinder.console.httpHost=L-01942-C5F net.grinder.Console -headless
grinder  28385 28380  0 15:26 pts/0    00:00:03 java -cp /test_data/grinder_artifacts/grinder/lib/grinder.jar: -Dgrinder.useConsole=true -Dgrinder.consoleHost=L-01942-C5F -Dgrinder.consolePort=6372 net.grinder.Grinder -daemon 10
grinder  29190 28385 42 15:53 pts/0    00:00:00 /test_data/grinder_artifacts/java/jdk1.7.0_25/jre/bin/java -javaagent:/test_data/grinder_artifacts/grinder/lib/grinder-dcr-agent-3.11.jar -Dpython.path=/test_data/projects/common/src -Dpython.cachedir=/tmp/cachedir -Xms2g -Xmx2g -XX:NewSize=1g -XX:MaxNewSize=1g -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+CMSConcurrentMTEnabled -XX:+AlwaysPreTouch -XX:+UseCompressedOops -classpath /test_data/grinder_artifacts/lib/jsoup-1.6.1.jar:/test_data/grinder_artifacts/lib/jyson-1.0.2.jar:/test_data/grinder_artifacts/grinder/lib/grinder.jar net.grinder.engine.process.WorkerProcessEntryPoint



So the million dollar question, why is it that the agent loses communication with the console when the worker process starts executing?  This is essentially causing our call the /agent/status to fail, and our test automation fails out even though worker process kicks off the test just fine.

Any assistance or input would be greatly appreciated.

Thanks,
Alfin Haji

------------------------------------------------------------------------------

_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grinder agent communication shut down when worker process launched

Ouray Viney
Hi Alfin ,

How restrictive are the firewall rules between the agent and the console?

Cheers,

Ouray

On Thursday, November 19, 2015, alfinhaji . <[hidden email]> wrote:
Hi all,

Looking for an issue we have been troubleshooting for a few days now.  This issue has exposed itself after our client moved over to a new data center requiring us to rebuild our grinder environments.  Our grinder test infrastructure is fully functional in the old environment, and these issues have exposed themselves in the new environment only.  Here is a run down of the issue.

1.  Running headless console as we haven't been provided with X11 terminal emulation. (this is new compared to the old environment, where we ran the Console UI)
2.  Console starts up fine, agent starts up fine, and connects to the console.

$ tail -f grinderAgentoutput.log
Setting argument grinder.consolePort=6372
+ JAVA_ARGS=' -Dgrinder.useConsole=true -Dgrinder.consoleHost=L-01942-C5F -Dgrinder.consolePort=6372'
+ java -cp /test_data/grinder_artifacts/grinder/lib/grinder.jar: -Dgrinder.useConsole=true -Dgrinder.consoleHost=L-01942-C5F -Dgrinder.consolePort=6372 net.grinder.Grinder -daemon 10
2015-11-19 15:26:41,336 INFO  agent: The Grinder 3.11
2015-11-19 15:26:41,358 ERROR agent: Failed to connect to 'L-01942-C5F/10.237.119.71:6372'
2015-11-19 15:26:41,358 INFO  agent: agent finished
2015-11-19 15:26:41,358 INFO  agent: sleeping for 10000 ms
2015-11-19 15:26:51,358 INFO  agent: The Grinder 3.11
2015-11-19 15:26:51,404 INFO  agent: connected to console at L-01942-C5F/10.237.119.71:6372
2015-11-19 15:26:51,405 INFO  agent: waiting for console signal

3.  At this point I can test the agent service and get a valid JSON response back:

$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"RUNNING","workers":[]}]

4.  I kick off our test automation, which eventually makes a call to start the worker process:

http://L-01942-C5F:6373/agents/start-workers  (we pass the grinder properties in when we make this POST call using the REST API)

5. As soon as the worker process is started, we see the test kick off perfectly fine, validated from the agent log file:

2015-11-19 15:28:04,320 INFO  agent: received a start message
2015-11-19 15:28:04,325 INFO  agent: communication shut down
2015-11-19 15:28:04,473 INFO  agent: Worker process command line: /test_data/grinder_artifacts/java/jdk1.7.0_25/jre/bin/java '-javaagent:/test_data/grinder_artifacts/grinder/lib/grinder-dcr-agent-3.11.jar' '-Dpython.path=/test_data/projects/common/src' '-Dpython.cachedir=/tmp/cachedir' '-Xms2g' '-Xmx2g' '-XX:NewSize=1g' '-XX:MaxNewSize=1g' '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' '-XX:+UseConcMarkSweepGC' '-XX:+UseParNewGC' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+CMSConcurrentMTEnabled' '-XX:+AlwaysPreTouch' '-XX:+UseCompressedOops' -classpath '/test_data/grinder_artifacts/lib/jsoup-1.6.1.jar:/test_data/grinder_artifacts/lib/jyson-1.0.2.jar:/test_data/grinder_artifacts/grinder/lib/grinder.jar' net.grinder.engine.process.WorkerProcessEntryPoint
2015-11-19 15:28:04,493 INFO  agent: worker L-01942-C5F-0 started
10.940: [GC 10.940: [ParNew: 838912K->32398K(943744K), 0.0570450 secs] 838912K->32398K(1992320K), 0.0572580 secs] [Times: user=0.10 sys=0.00, real=0.06 secs]
2015-11-19 15:28:18,767 INFO  L-01942-C5F-0: starting threads
14.306: [GC 14.306: [ParNew: 871310K->39438K(943744K), 0.0550070 secs] 871310K->39438K(1992320K), 0.0551420 secs] [Times: user=0.11 sys=0.00, real=0.06 secs]
2015-11-19 15:28:19,128 INFO  L-01942-C5F-0: will shut down after 900000 ms
2015-11-19 15:28:56,156 INFO  L-01942-C5F-0: finished
Heap
 par new generation   total 943744K, used 179113K [0x000000077ae00000, 0x00000007bae00000, 0x00000007bae00000)
  eden space 838912K,  14% used [0x000000077ae00000, 0x00000007821cf968, 0x00000007ae140000)
  from space 104832K,  57% used [0x00000007b47a0000, 0x00000007b82badb0, 0x00000007bae00000)
  to   space 104832K,   0% used [0x00000007ae140000, 0x00000007ae140000, 0x00000007b47a0000)
 concurrent mark-sweep generation total 1048576K, used 0K [0x00000007bae00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 35456K, used 35295K [0x00000007fae00000, 0x00000007fd0a0000, 0x0000000800000000)
2015-11-19 15:28:56,503 INFO  agent: agent finished
2015-11-19 15:28:56,503 INFO  agent: sleeping for 10000 ms



Our custom test automation we have built, however, polls the agent status on a regular basis to see if the worker process has completed.  We do this to automate multiple tests in sequence, once a test ends (worker process complete), our automation will immediately kick off another worker process to start the next test we have defined.  Unfortunately, what I am seeing since moving to this "new" grinder environment is this polling of the agent status comes back with a valid HTTP 200 with nothing in the JSON body, completely blank, while the worker process is executing the test.  Here is what I see when I continuously send the CURL request off to check the agent status just as the worker process is starting:


$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"RUNNING","workers":[]}]
$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"FINISHED","workers":[]}]
$ curl http://L-01942-C5F:6373/agents/status
[]    <---- blank JSON response

Also note in the agent log above we see a "communication is shut down" message (which we don't see in the old environment).

As soon as the worker process completes the grinder test, we see the agent re-establish connection to the console and start getting a valid RUNNING status on the agent status:

Agent log file:

2015-11-19 15:29:06,504 INFO  agent: The Grinder 3.11
2015-11-19 15:29:06,505 INFO  agent: connected to console at L-01942-C5F/10.237.119.71:6372
2015-11-19 15:29:06,505 INFO  agent: waiting for console signal

Curl output

$ curl http://L-01942-C5F:6373/agents/status
[{"id":"L-01942-C5F:1509753335|1447964801332|1679207110:0","name":"L-01942-C5F","number":0,"state":"RUNNING","workers":[]}]


Some more info:

Running console, agent, and worker process info:


$ ps -ef | grep java
grinder  28379     1  1 15:26 pts/0    00:00:17 /test_data/grinder_artifacts/java/jdk1.7.0_25/bin/java -cp /test_data/grinder_artifacts/grinder/lib/grinder.jar: -Dgrinder.console.httpHost=L-01942-C5F net.grinder.Console -headless
grinder  28385 28380  0 15:26 pts/0    00:00:03 java -cp /test_data/grinder_artifacts/grinder/lib/grinder.jar: -Dgrinder.useConsole=true -Dgrinder.consoleHost=L-01942-C5F -Dgrinder.consolePort=6372 net.grinder.Grinder -daemon 10
grinder  29190 28385 42 15:53 pts/0    00:00:00 /test_data/grinder_artifacts/java/jdk1.7.0_25/jre/bin/java -javaagent:/test_data/grinder_artifacts/grinder/lib/grinder-dcr-agent-3.11.jar -Dpython.path=/test_data/projects/common/src -Dpython.cachedir=/tmp/cachedir -Xms2g -Xmx2g -XX:NewSize=1g -XX:MaxNewSize=1g -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+ExplicitGCInvokesConcurrent -XX:+CMSConcurrentMTEnabled -XX:+AlwaysPreTouch -XX:+UseCompressedOops -classpath /test_data/grinder_artifacts/lib/jsoup-1.6.1.jar:/test_data/grinder_artifacts/lib/jyson-1.0.2.jar:/test_data/grinder_artifacts/grinder/lib/grinder.jar net.grinder.engine.process.WorkerProcessEntryPoint



So the million dollar question, why is it that the agent loses communication with the console when the worker process starts executing?  This is essentially causing our call the /agent/status to fail, and our test automation fails out even though worker process kicks off the test just fine.

Any assistance or input would be greatly appreciated.

Thanks,
Alfin Haji


--
Sent from Gmail Mobile

Ouray Viney
Blog: https://www.viney.ca
Skype: Ouray Viney
LinkedIn: https://ca.linkedin.com/in/ourayvine


------------------------------------------------------------------------------

_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grinder agent communication shut down when worker process launched

Gary Mulder-3
On 19 November 2015 at 22:09, Ouray Viney <[hidden email]> wrote:
Hi Alfin ,

How restrictive are the firewall rules between the agent and the console?

Firewalls dropping or blocking connections are unlikely to cause successful HTTP 200 codes but empty responses.

Chances are you're seeing a blank JSON document because one or more of the components that make up the document is not being generated by the worker. At the end of the test this value is likely populated and so valid info starts getting returned. I'd guess running the worker within a debugger might help isolate the issue.

Gary

------------------------------------------------------------------------------

_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grinder agent communication shut down when worker process launched

Ouray Viney
Hey Alfin,

I have seen this before.  It sounds like the worker processor is not reporting back.  

Why else would a rest call to get agent status report nothing even though your tests are running?

Have you tried to trace the comms using netstat?  You can do from both the console and agent host.  The agent host is more interesting.  I am guessing the worker processor initially opens a socket to the console; but closes soon after init (which it should not).

HTH,

Ouray

On Thursday, November 19, 2015, Gary Mulder <[hidden email]> wrote:
On 19 November 2015 at 22:09, Ouray Viney <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;ouray@viney.ca&#39;);" target="_blank">ouray@...> wrote:
Hi Alfin ,

How restrictive are the firewall rules between the agent and the console?

Firewalls dropping or blocking connections are unlikely to cause successful HTTP 200 codes but empty responses.

Chances are you're seeing a blank JSON document because one or more of the components that make up the document is not being generated by the worker. At the end of the test this value is likely populated and so valid info starts getting returned. I'd guess running the worker within a debugger might help isolate the issue.

Gary


--
Sent from Gmail Mobile

Ouray Viney
Blog: https://www.viney.ca
Skype: Ouray Viney
LinkedIn: https://ca.linkedin.com/in/ourayvine


------------------------------------------------------------------------------

_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grinder agent communication shut down when worker process launched

alfinhaji

Thanks guys for all the guidance.   I found the root cause of our issue.   The worker process jvm was passing grinder.useConsole=False as an argument.  This was causing the worker process to terminate the connection to the console when it's jvm was started up.  I believe during troubleshooting this flag was switched from true to false thinking it was required to run headless.

Hope this helps anyone who might hit a similar issue in the future.

On Nov 20, 2015 12:07 PM, "Ouray Viney" <[hidden email]> wrote:
Hey Alfin,

I have seen this before.  It sounds like the worker processor is not reporting back.  

Why else would a rest call to get agent status report nothing even though your tests are running?

Have you tried to trace the comms using netstat?  You can do from both the console and agent host.  The agent host is more interesting.  I am guessing the worker processor initially opens a socket to the console; but closes soon after init (which it should not).

HTH,

Ouray

On Thursday, November 19, 2015, Gary Mulder <[hidden email]> wrote:
On 19 November 2015 at 22:09, Ouray Viney <[hidden email]> wrote:
Hi Alfin ,

How restrictive are the firewall rules between the agent and the console?

Firewalls dropping or blocking connections are unlikely to cause successful HTTP 200 codes but empty responses.

Chances are you're seeing a blank JSON document because one or more of the components that make up the document is not being generated by the worker. At the end of the test this value is likely populated and so valid info starts getting returned. I'd guess running the worker within a debugger might help isolate the issue.

Gary


--
Sent from Gmail Mobile

Ouray Viney
Blog: https://www.viney.ca
Skype: Ouray Viney
LinkedIn: https://ca.linkedin.com/in/ourayvine


------------------------------------------------------------------------------

_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use


------------------------------------------------------------------------------

_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Loading...