Many modern parallel computers exhibit a hierarchical structure, e.g., by arranging the processors into clusters. The exchange of information between the processors within a cluster is usually faster then the exchange of information between clusters. Executing mixed task and data parallel programs on such machines may benefit significantly from an adaptation of the task parallelism to the hierarchical structure of the architecture. We describe a scheduling algorithm suitable for a compiler system to adjust data parallel tasks to the cluster organization of the target machine.