Current parallelizing compiler technology does not handle irregular applications very well. Approaches in the past have either focused on specific irregular applications to generate good code, or relied on fall-back run-time techniques. This work proposes a compiler-driven approach that makes use of a novel run-time execution model for a cluster of computers. The model consists of data-flow style workers to enable load-balancing, and HPF-style data-blocking (or tiling) to divide computations into units of work. The run-time system is loosely based on Space-Time Memory abstraction developed at Compaq's Cambridge Research Lab, which enables addressing data values, existing as tiles, totally independently of their physical location.
The model has several advantages. It allows a high degree of concurrency and high flexibility in scheduling computations. In some cases where a traditional compiler optimization (like loop fusion) can not be performed statically, this framework may lead to a natural dynamic equivalent. Workers allow good automatic load-balancing while compiler analysis allows locality aware specific data and computation partitionings to be implemented based on application specific knowledge. The model achieves this with minimal changes to compiler's code-generation phase and within the HPF programming framework. We plan to leverage the dHPF compiler work at Rice University for this purpose. Finally, it is scalable in terms of space usage since space is required for only the currently relevant data tiles. The whole array may never exist at one time. Scalability in computation is being studied using hierarchical mapping of the model onto the underlying topology of a cluster.
We have developed this new approach via studying how it would work with two representative applications: blocked Cholesky factorization and N-body simulation (molecular dynamics). In the paper we will detail the approach and provide evidence of its effectiveness on these two applications.