Code Generation and Run-time Support for Multi--level Parallelism Exploitation

Marc Gonzŕlez, Xavier Martorell, José Oliver, Eduard Ayguade and Jesus Labarta

In this paper we will present the compilation components of the NANOS environment. The main objective of NANOS is to investigate possible ways to accomplish both high system throughput and application performance for parallel applications in multiprogrammed environments on shared-memory multiprocessors. The target of the project has been the development of a complete environment in which interactions between mechanisms and policies at different levels (application, compiler, threads library and kernel) are carefully coordinated, in order to achieve the aforementioned goals.

The paper will focus on describing the main components of the NanosCompiler, an OpenMP compiler whose implementation is oriented towards the efficient exploitation of multiple levels of parallelism. The paper will present an analysis of the requirements needed at the threads library level to support this kind of parallelism. These requirements are analyzed in our current implementation named NthLib. Program parallelization relies both on the automatic parallelization capabilities of the base compiler and the information obtained from user--supplied directives. The compiler uses a hierarchical internal representation that unifies both sources of parallelism, proceeds with a task identification phase that adapts the granularity of the final tasks to the target architecture and then generates parallel code that uses the services offered by the aforementioned threads library.