This paper attempts to minimise parallelisation overhead on distributed shared memory machines, such as the SGi Origin 2000, by the combination of non-singular loop and data transformations. By treating loop and data transformations in a unified manner, we show that conflicting requirements on a loop transformation may be resolved by using a data transformation and vice-versa. We develop optimisation criteria for locality, synchronisation and communication and show that neither loop nor data transformations can be solely used for efficient parallelisation. This leads to the development of a novel global optimisation heuristic which is applied to 3 SPEC benchmarks where it is shown to outperform techniques solely based on loop or data transformations and gives significant improvement over an existing state-of-the-art commercial auto-paralleliser.