NQueen’s KAAPI performances at the PLUGTEST 2006

The MOAIS/KAAPI Team have

successfully test the KAAPI library at the 3rd NQueens contest during the plugtest of the Grid@work event organized by ETSI, INRIA and CoreGRID.

The 4 key points of reaching this performance are :

1. have a simple API to develop parallel program. The program have been developed on top of the Athapascan API of KAAPI that defines only two keywords to describe parallelism. The first version was developed in less than half a day based on the original Takaken code. Three full-time days has been devoted to optimize the sequential code using C++ template specialization. The gain with the original Takaken code is about 34%. Optimized code is available with the KAAPI examples of the open source distribution here.
2.use a proved theoretical scheduling algorithm for these kind of strict multithreaded computation (our NQueens is a pure serie-parallel program) for both homogeneous or heterogeneous (in speed) clusters. This scheduling algorithm is basically a workstealing algorithm that is theoretically efficient for program with small critical path with respect to the work. Experiments during the contest have demonstrated its very good scalability up to 1458 processors.
3.use an efficient implementation of the workstealing algorithm. The KAAPI implementation takes care of the work-first principle that argues to move most of the extra instructions to handle the parallelism on the critical path rather than on the work. Moreover KAAPI implements a lock-free workstealing algorithm and its based on a light weight active message inter-process communication layer that allows dynamic message aggregation and it exhibits good overloading of communication by computation.
4.to finish, use parallel algorithms for most of the basic operation such as deployment of process onto machines. In KAAPI, we have port this deployment step on top of the TakTuk library. The result is here : in the same time required by other participants to deploy their application, our NQueens application 1/ is deployed and 2/ computes the solution of the NQueens N=22.

NQueen’s KAAPI performances

at the PLUGTEST 2006

Date : 27-29 nov. 2006

Location : Sophia-Antipolis

Country : France

Organizer : ETSI, INRIA, CoreGrid

Official site : here

Contest : NQueens

Documents

The official results send by ETSI organizer

Some timings during the plugtest

1.NQueens N=21 in 78s on about 1000 processors
2.NQueens N=22 in 502.9s on 1458 processors
3.NQueens N=23 in 4434.9s on 1422 processors. On this run, the reported average idle time per process is 0.625% of the elapsed time !
4.Less than 20s to deploy up to 1000 processes on 1000 machines thank to Taktuk

Related links

Ganglia view of the runs

Orsay cluster network view that display 6 of the 8 instances for NQueens N=22. First pic corresponds to the deployment of the application on all the nodes. Others pics report the network activity due to the steal requests. Between them, very few requests are emitted.

Overview of the load (CPU usage) on the cluster at Orsay that corresponds to the picture above. When CPUs are busy, there is no request emitted to steal work. The capture reports 6 of the 8 instances for NQueens N=22.

This picture shows the cpu usage view of the whole Grid5000 during our run of 3 instances NQueens N=23. The time scale is not the same as previous pictures. The pic at the right corresponds to 3 NQueens N=23 computations. The center pic is one instance NQueens=23 that we have launch during the night. Between them, nobody is using the Grid. All Pics at the left are due to other participants at the plugtest.

Capture of the online monitoring tool of KAAPI. The tachometer reports instantaneous speed of the execution.