AssertionError on barrier addWaiter

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

AssertionError on barrier addWaiter

Amaury Denoyelle
Hello everyone,

I'm using The Grinder 3.11 for load test. My configuration is :
- The Grinder console on one host
- 2 agents on two separate hosts, each running 10 processes / 500 threads

So, the total of threads running is 10,000. During scenario, each
threads are synchronised over the same barrier (not at the same time,
each thread calls time.sleep(threadNumber) before waiting on the
barrier). But randomly during test execution, a Java exception is thrown
from the console, probably when some threads call barrier.await().

Output of the console (BarrierIdentity is truncated, as it shows 411
strings):

Exception in thread "main" java.lang.AssertionError: (411 [BarrierIdentity[Process 'ibmblade01-4' [ibmblade01-4:411631404|1493911201434|2103114530:5], 1134], BarrierIdentity[Process 'ibmblade01-4' [ibmblade01-4:411631404|1493911201434|2103114530:5], 1375] [...]
        at net.grinder.synchronisation.AbstractBarrierGroups$BarrierGroupImplementation.addWaiter(AbstractBarrierGroups.java:221)
        at net.grinder.console.synchronisation.ProcessBarrierGroups$1.addWaiter(ProcessBarrierGroups.java:87)
        at net.grinder.console.synchronisation.WireDistributedBarriers$3.handle(WireDistributedBarriers.java:100)
        at net.grinder.console.synchronisation.WireDistributedBarriers$3.handle(WireDistributedBarriers.java:96)
        at net.grinder.communication.MessageDispatchSender.send(MessageDispatchSender.java:116)
        at net.grinder.console.communication.ConsoleCommunicationImplementation.processOneMessage(ConsoleCommunicationImplementation.java:287)
        at net.grinder.console.ConsoleFoundation.run(ConsoleFoundation.java:226)
        at net.grinder.Console.run(Console.java:69)
        at net.grinder.Console.main(Console.java:86)

Looking at the code where the exception is thrown, it seems that
m_barriers and m_waiters.size() are equal (411)
(synchronisation/AbstractBarrierGroups.java).

The barrier is used properly as described in documentation. It is
defined in __init__ function and await used in __call__. All threads are
started at the same time, and none are cancelled prematurely.

Do you have any idea of the cause of the problem ? Is 10,000 threads
waiting on one barrier a little bit overkill ?

Thank you for your help,

--
Amaury Denoyelle

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Reply | Threaded
Open this post in threaded view
|

Re: AssertionError on barrier addWaiter

Gary Mulder-3
On 5 May 2017 at 08:02, Amaury Denoyelle <[hidden email]> wrote:

Do you have any idea of the cause of the problem ? Is 10,000 threads
waiting on one barrier a little bit overkill ?

I can't speak about your barrier exception, but I can say that 5000 threads per server is quite a large number for the JVM to manage. With reasonably simple test code I usually have a rule of thumb of no more than 100 threads per core with random sleeps between HTTP requests. This would imply a 50 core server for 5000 concurrent threads.

However you may have a bigger problem if a many of those 5000 threads are being synchronised at a barrier. You say that the threads aren't synchronised, but with that number of threads the Linux scheduler is likely being hammered as many threads become ready to run at the same time. A 16 core hyper-threaded server has 32 hyper-cores, so the Linux scheduler can physically only concurrently schedule 32 threads. If you're using Windows the problem is likely worse as the Linux scheduler is a lot more lightweight than the Windows scheduler.

Finally, keep an eye on your JVM Garbage Collection. With some many threads creating objects your GC is likely running quite hot.

In all, you will be fighting a lot with your test execution performance and need to prove to yourself that it isn't the primary bottleneck, otherwise your test time will reflect the performance of your test harness and not system under test.

Regards,
Gary

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use
Reply | Threaded
Open this post in threaded view
|

Re: AssertionError on barrier addWaiter

Amaury Denoyelle
Gary Mulder <[hidden email]> wrote:

> On 5 May 2017 at 08:02, Amaury Denoyelle <[hidden email]> wrote:
> >
> > Do you have any idea of the cause of the problem ? Is 10,000 threads
> > waiting on one barrier a little bit overkill ?
> >
> I can't speak about your barrier exception, but I can say that 5000 threads
> per server is quite a large number for the JVM to manage. With reasonably
> simple test code I usually have a rule of thumb of no more than 100 threads
> per core with random sleeps between HTTP requests. This would imply a 50
> core server for 5000 concurrent threads.
>
> However you may have a bigger problem if a many of those 5000 threads are
> being synchronised at a barrier. You say that the threads aren't
> synchronised, but with that number of threads the Linux scheduler is likely
> being hammered as many threads become ready to run at the same time. A 16
> core hyper-threaded server has 32 hyper-cores, so the Linux scheduler can
> physically only concurrently schedule 32 threads. If you're using Windows
> the problem is likely worse as the Linux scheduler is a lot more
> lightweight than the Windows scheduler.
>
> Finally, keep an eye on your JVM Garbage Collection. With some many threads
> creating objects your GC is likely running quite hot.
>
> In all, you will be fighting a lot with your test execution performance and
> need to prove to yourself that it isn't the primary bottleneck, otherwise
> your test time will reflect the performance of your test harness and not
> system under test.

Thank you for your quick answer.

I currently use two agents, with 16 cores on each. You are maybe right
on the fact that it is too much for the system, but I really need 10,000
threads. I will maybe try to add more agents, but I do not have
countless of these servers.

That being said, I find that the tests on the agents are accurate. I do
not measure precise timer, but just check some basic conditions and all
10,000 tests are finely executed. All agents are doing their works fine.
The problem I encounter is on the console, on a dedicated host without
other processes so that confuses me a little.

--
Amaury Denoyelle

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
grinder-use mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/grinder-use