Efficient Parallelisation using Combined Loop and Data Transformations

Michael O'Boyle

This paper attempts to minimise parallelisation overhead on distributed shared memory machines, such as the SGi Origin 2000, by the combination of non-singular loop and data transformations. By treating loop and data transformations in a unified manner, we show that conflicting requirements on a loop transformation may be resolved by using a data transformation and vice-versa. We develop optimisation criteria for locality, synchronisation and communication and show that neither loop nor data transformations can be solely used for efficient parallelisation. This leads to the development of a novel global optimisation heuristic which is applied to 3 SPEC benchmarks where it is shown to outperform techniques solely based on loop or data transformations and gives significant improvement over an existing state-of-the-art commercial auto-paralleliser.