Tiling is a common compiler optimization, used for adjusting the parallelism granularity or for enhancing locality. For the case of a 2-dimensional iteration space (domain) tiled with oblique tiles, we address the problem of determining the tile size that minimizes the total execution time on a distributed memory parallel machine. We restrict our attention to uniform dependency computations where one of the tile boundaries is nevertheless parallel to the domain boundaries, in the interest of good load balance. Our solution is obtained by formulating and resolving a discrete nonlinear optimization problem. Experimental running times on an Intel Parageon and an IBM SP2 compare well with our analytical predictions.