MLOpt: Machine-learning based automatic parallelization

Funding

ANR PRCI, 42 months from may 2025.

People

Christophe Alias (Coordinator, France)
Ali Jannesari (Coordinator, USA)
Sid Touati
Vincent Mastain, PhD student
Sid's PhD student
Ali's PhD student

Objectives and research hypothesis

Large Language Models (LLMs) have demonstrated impressive capabilities to generate quality texts and may even assist programmers by producing pieces of codes, projects skeletons, etc. However, they are inherently bounded to "almost" correct codes, no correctness guaranty can be given. The goal of MLOpt is to explore how LLMs coupled with translation validation may apply to automatic rephrasing of sequential code with OpenMP parallelization constructs. The project is divided into the following tasks.

First, we will develop a framework for automatically generating the parallel version of a sequential program by combining graph-based Multimodal Contrastive Learning with LLMs. The framework consists of three major components. Firstly, a GNN-based Con- trastive Learning module will be used leveraging both the textual representation and the graph-based representation of programs to learn the intrinsic parallel characteristics (parallelism opportunities, parallel patterns etc.) that lie within sequential source programs. Secondly, a Prompt Generator will be used to generate enhanced prompts based on the learnings from the GNN-based Contrastive Learning module. Finally, an LLM-based code generator will be used to generate the parallel counterparts of the sequential programs based on the enhanced prompts. The framework will improve developer productivity as develop- ers will need to spend less time finding parallelism opportunities within sequential programs. Also, the framework will improve code efficiency by generating highly performant parallel codes.

Then, we will select the tasks computed by the program. Existing research has attempted to select programs by special algorithms, but significant limitations persist. Either the analyzed program must be written in a mathematical form to aid the analysis, or the analysis method deals with arbitrary code but restricts itself to a syntactic form. We intend to approach the problem with a new methodology, leveraging our expertise in compilation and neural networks.

Finally, we will check the correctness of the inferred program transformation. Although the program transformations inferred are likely to be correct, they remain speculative; no formal guarantee of correctness might be provided. Hence, we need to enforce the translation correctness by adding a translation validation stage to our compilation framework. We aim at building a translation validator able to handle the parallelizing program transformations addressed in our project in a scalable way. We will focus on algebraic transformations based on reduction optimization, a very common transformation in High-Performance Computing (HPC) kernels. We will address that challenge by leveraging our expertise in program equivalence and automatic parallelization.

[retour à ma page principale]