On time optimal supernode shape

Edin Hodzic and Weija Shang

With the objective of minimizing the total execution time of a nested loop with uniform dependencies on a distributed memory parallel computer, we discuss the selection of an optimal supernode transformation (also known as tiling). We identify three parameters of a supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies transformed with supernode transformation, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose roots include the optimal supernode size. For two special cases: two dimensional problems, and n-dimensional problems with the communication cost dominated by the startup cost, and thus approximated by a constant, we give a closed form expression for the optimal supernode size, which is independent of the supernode relative side lengths and cutting hyperplane directions. In the same model, with the assumption that the communication cost is dominated by the startup cost, and therefore can be approximated by a constant, we derive closed form expression for an optimal linear schedule vector, and a necessary and sufficient condition for optimal relative side lengths. We prove that the total running time is minimized by cutting hyperplane direction matrix whose rows are vectors from the surface of the tiling cone, the polar cone to the cone spanned by dependence vectors. We also show that optimal hyperplane direction matrix for the special case of algorithms with exactly n extreme dependence directions, which is always the case with two dimensional algorithms, assumes rows in the direction of the extreme directions of the tiling cone. The results are derived in continuous space and should, for that reason, be considered approximate.