SM207-1 Multimodal Effective Representation Learning of Evolution of birds - Multisensory fusion
|
|
|
|
SM207-2 Multimodal Effective Representation Learning of Evolution of birds - Representational learning
|
|
|
|
SM207-3 ACtive Multimodal mErging
|
|
|
|
SM207-4 Domain adaptation et transfert learning pour la maintenance prédictive
|
|
Description
|
|
In most predictive maintenance problems, data are collected from various production lines, assembly lines, or is captured by different devices. In this context, the data are called multi-modal or multi-view. The heterogeneity of this data creates perturbations in the learning phase of machine learning algorithms and does not allow a good generalization and an accurate prediction. Consequently, these disturbances can lead to malfunctions in production lines or potential dangers. To face this problem, we have recently developed new multi-view approaches based on Deep Learning and tested on case studies. The results being encouraging, we aim at extending these results, by testing our algorithms on real data having particular specificities. Indeed, they are uncertain, heterogeneous, noisy, and unstructured.
URL sujet detaillé : https://github.com/HenneGalile/Stage-Recherche-Domain-adaptation-et-transfert-learning-pour-la-maintenance-predictive/files/9684533/Descriptif.stage.pdf
Remarques : ' Mehdi Hennequin, Galilé: m.hennequin.galile.fr ' Khalid Benabdeslem, LIRIS : khalid.benabdeslem-lyon1.fr
' Haytham Elghazel, LIRIS: haytham.elghazel-lyon1.fr
Remuneration: 850â=82=AC/mois
|
|
|
|
|
SM207-5 Variational Contrastive Learning
|
|
Description
|
|
The aim of the internship is to study how the representations learned by recent contrastive models can benefit from a variational dimension (as in variational auto-encoder). The planned tasks are: - to propose an extension of contrastive methods so that representations are no more points in a high dimensional space, but rather distributions - to study how to regularize learned distributions as in VAE. This will require to make compatible the distributions learned by a constrative method (related to an hypersphere) with the one of a VAE (related to a multivariate Gaussian).
URL sujet detaillé : https://perso.liris.cnrs.fr/mathieu.lefort/jobs/stage/VoCaL/sujet.pdf
Remarques :
|
|
|
|
|
SM207-6 Improving Diagnosis Quality and Performances of a Formal Verification Tool for Electric Circuits at Transistor Level
|
|
Description
|
|
Aniah is a Start-up that offers tools for analyzing integrated circuits at an industrial scale (https://www.aniah.fr/). Aniah has introduced algorithms that significantly pushes the boundaries of the size of analyzable circuits, from a few hundred thousand elements to several trillion. Aniah is working in a collaboration with the Laboratoire de l'Informatique du Parallélisme (LIP) and the Verimag laboratory to consolidate and generalize its approach by supplementing its practical results with a theoretical backbone. One of the objectives of this study is to explore the applicability of state-of-the-art model-checking techniques to the problem of circuit electric verification.
We already have a prototype using the Z3 solver to exhibit a list of errors from a circuit. One area for improvement of the tool is the quality of the diagnosis (giving an exhaustive list is impossible since the list is infinite, we need to provide one example for each class of equivalence of error state, for a well-chosen equivalence relation).
URL sujet detaillé : http://www.ens-lyon.fr/LIP/CASH/wp-content/uploads/2022/10/M2-diagnostique.pdf
Remarques : Co-advising between LIP and Verimag, the internship can take place in either of the laboratories, depending on the candidate's preference.
|
|
|
|
|
SM207-7 Dedicated Solver for Formal Verification of Electric Circuits with Multiple Power Supplies
|
|
Admin
|
|
Encadrant : Matthieu MOY |
Labo/Organisme ://www.aniah.fr/). Aniah has introduced algorithms that significantly pushes the boundaries of the size of analyzable circuits, from a few hundred thousand elements to several trillion. Aniah is working in a collaboration with the Laboratoire de l'Informatique du Parallélisme (LIP) and the Verimag laboratory to consolidate and generalize its approach by supplementing its practical results with a theoretical backbone. We already started one post-doc and one Ph.D on the topic. One of the objectives of this study is to explore the applicability of state-of-the-art model-checking techniques to the problem of circuit electric verification. |
URL : http://www.ens-lyon.fr/LIP/CASH/ |
Ville : Lyon or Grenoble |
|
|
|
Description
|
|
Aniah is a Start-up that offers tools for analyzing integrated circuits at an industrial scale (https://www.aniah.fr/). Aniah has introduced algorithms that significantly pushes the boundaries of the size of analyzable circuits, from a few hundred thousand elements to several trillion. Aniah is working in a collaboration with the Laboratoire de l'Informatique du Parallélisme (LIP) and the Verimag laboratory to consolidate and generalize its approach by supplementing its practical results with a theoretical backbone. We already started one post-doc and one Ph.D on the topic. One of the objectives of this study is to explore the applicability of state-of-the-art model-checking techniques to the problem of circuit electric verification.
We already have a prototype using the Z3 solver to exhibit a list of errors from a circuit. Z3 is a powerful SMT solver able to solve formulas using advanced theory for numerical values (integers, rationals, etc.). We actually rely on a very small subset of this theory (essentially, we manipulate totally ordered sets of values, without operators like addition or subtraction). We believe that a solver tailored to our need could perform better than a generalist solver like Z3.
URL sujet detaillé : http://www.ens-lyon.fr/LIP/CASH/wp-content/uploads/2022/10/M2-SMT-avec-T-minimaliste.pdf
Remarques : Co-advising between LIP and Verimag, the internship can take place in either of the laboratories, depending on the candidate's preference.
|
|
|
|
|
SM207-8 Compiler Intermediate Representation for Algebraic Data Types
|
|
Description
|
|
The goal of this internship is to provide a dedicated intermediate representation for the compilation of Algebraic Data Type. This intermediate representation will help compiler writers by providing language constructs to build and pattern matching Algebraic Data Types, and the algorithm to compile such pattern matching to lower representations, such as LLVM.
During the internship, the intern will gain working knowledge of algorithm to compiler pattern matching of Algebraic Data Types, propose language constructs for ADT in MLIR, the intermediate representation of choice, and implement them.
URL sujet detaillé : http://www.ens-lyon.fr/LIP/CASH/wp-content/uploads/2022/10/main.pdf
Remarques :
|
|
|
|
|
SM207-9 Semantics and Implementation of Actors in Multicore OCaml
|
|
Description
|
|
The objective of the internship is to develop a library of distributed actors in multicore OCaml. This work includes both theoretical and implementation aspects. The internship should cover both aspects but can be oriented more towards theoretical contribution or implementation depending on the student.
URL sujet detaillé : http://www.ens-lyon.fr/LIP/CASH/wp-content/uploads/2022/10/actors-en.pdf
Remarques : Co-encadrement avec Ludovic Henrio.
|
|
|
|
|
SM207-10 Types for complexity analysis in a process calculus
|
|
Description
|
|
Some type systems have been introduced in the literature to analyse the time complexity of functional programs. With such a system, given a type derivation for a program M, one can extract from it an upper bound on the (sequential) execution time of M on any input. A more recent challenge is that of complexity analysis for parallel or concurrent computation. The Pi-calculus is a formal calculus to study parallel and concurrent computation, just as Lambda- calculus allows to study functional computation. It represents processes communicating by messages sent through channels. The work in this internship will consist first in exploring which notions of complexity can be relevant for Pi-calculus concurrent systems, and then in introducing a type system to analyse these complexities and proving its properties.
Keywords: type systems, process calculus, complexity analysis
URL sujet detaillé : https://www.cristal.univ-lille.fr/profil/pbaillot/sujet_complexityanalysisPiCalculus.pdf
Remarques :
|
|
|
|
|
SM207-11 Implicit Computational Complexity in Pi-calculus
|
|
Description
|
|
How can one define some simple and modular programming disciplines in a high-level programming or specification language in such a way that the corresponding programs exactly characterize a certain complexity class of functions, for instance polynomial time (FP) or polynomial space (FPSPACE) ? This is the goal of implicit computational complexity (ICC) which aims at providing such machine-independent characterizations for a large variety of complexity classes and of source languages, without refering to any explicit bound on time or space usage. This research area uses ideas and techniques coming from logic, recursion theory and type systems. Most of ICC results have been obtained for sequential languages, initially for Lambda-calculus or the functional computing paradigm, and then for imperative and object-oriented languages. Just as Lambda-calculus represents sequential computation, process calculi such as Pi-calculus have been introduced to represent parallel and concurrent computation. This language represents processes communicating by messages sent through channels. The goal of this internship is to give an ICC characterization of a complexity class in Pi-calculus. This can build on ideas coming from recursion theory and type systems. A natural candidate could be the complexity class FPSPACE.
Keywords: process calculus, complexity classes, implicit computational complexity, type systems, recursion
URL sujet detaillé : https://www.cristal.univ-lille.fr/profil/pbaillot/sujet_ImplicitComplexity.pdf
Remarques :
|
|
|
|
|
SM207-12 Rationalization of CAD assemblies (Computer-Aided Design, Optimization, Geometry Processing)
|
|
Description
|
|
Context: Many objects that surround us are created by assembling simple parts, and the cost of fabricating and repairing these objects highly depends on the availability of their constituent parts. Our goal is to assist designers in creating families of objectscomposed of the same parts.
Approach: We will formulate this problem as an optimization, where the input is a set of assembliescreated using Computer-Aided Design (CAD), and the output is a new set that best satisfies two competing objectives:â=80ê - Each assembly should remain as similar as possible to its initial state,â=80ê - All assemblies should share as many parts as possible.â=80ê Solving this problem requires identifying similar parts across assemblies, and modifying thedimensions of these parts until they are identical. Importantly, these geometricmodifications should maintain the functionality of the original assemblies, which inducescomplex dependencies between parts in each assembly.Â
More details: http://www-sop.inria.fr/members/Adrien.Bousseau/stages/CADAssembly.pdf
URL sujet detaillé : http://www-sop.inria.fr/members/Adrien.Bousseau/stages/CADAssembly.pdf
Remarques : Work environment: The internship will take place at Inria Sophia Antipolis. Inria will provide a monthly stipend of around 1100 euros for EU citizens in their final year of masters, and 400 euros for other candidates.Â
Requirements: Candidates should have strong programming and mathematical skills as well as knowledge in computer graphics, geometry processing and optimization.
|
|
|
|
|
SM207-13 Semantics and types for synchronous programming with state machines in a multi-periodic setting
|
|
Description
|
|
Synchronous programming languages such as LUSTRE or SIGNAL have been introduced to provide a high level of abstraction for programming real-time systems. They are based on solid, elegant and yet simple mathematical foundations, which make it possible to handle the compilation and the verification of a program in a formal way. However there is a need for more expressive languages. First, complex embedded systems, such as e.g. aircraft flight control systems, are generally multi-periodic, because different devices of the system have different physical characteristics. The PRELUDE language [FBLP10] has been designed precisely to program such systems by handling explicitly various real-time constraints (e.g. periodicity, deadline). A second characteristic of control systems is that they are often multi-mode, where each mode implements a different behaviour [FF22, For22]. An intuitive way of thinking about multi-mode systems is by using a state machine representation, where each state corresponds to a mode. However defining the semantics of such a system is not straightforward and various choices are possible. In this internship we propose to explore a general semantics for a multi-periodic and multi- mode synchronous language and to investigate the corresponding static analysis, in particular a dedicated clock type system.
References
[FBLP10] Julien Forget, Frédéric Boniol, David Lesens, and Claire Pagetti. A real-time architecture design language for multi-rate embedded control systems. Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), pages 527-534. ACM, 2010. [FF22]Frédéric Fort and Julien Forget. Synchronous semantics of multi-mode multi-periodic systems. SAC '22: pages 1248-1257. ACM, 2022. [For22]Frédéric Fort. Programing adaptative real-time systems. PhD thesis, Université de Lille, 2022.
URL sujet detaillé : https://www.cristal.univ-lille.fr/profil/pbaillot/sujetPrelude2023.pdf
Remarques : Co-encadrement avec Julien Forget (CRIStAL, EPC SyCoMoRES) et Sylvain Salvati (CRIStAL, EPC LINKS)
Gratification de stage standard.
|
|
|
|
|
SM207-14 Types for sensitivity analysis and differential privacy in functional programming
|
|
Description
|
|
Program sensitivity bounds the distance between the outputs of a program when run on two related inputs. This notion plays an important role in differential privacy, a rigorous approach for ensuring privacy in database queries and data analysis computation. Among the programming languages approaches to differential privacy is the Fuzz language [RP10], whose types are inspired by linear logic. In Fuzz, each type is equipped with its own notion of distance, and sensitivity analysis is carried out by type checking. The language is also equipped with a monadic type for probabilistic computation. This leads to theorems stating that if a program is well-typed in this system, then it is differentially private. In [jwdABG22] an extension of Fuzz was proposed, called Bunched Fuzz, with a richer type system allowing to account for arbitrary Lp distances. In this internship we propose to study further the properties of Bunched Fuzz, extend it with other distances on probability distributions, or/and compare it to other frameworks for reasoning on differential privacy.
References:
[jwdABG22] june wunder, Arthur Azevedo de Amorim, Patrick Baillot, and Marco Gaboardi. Bunched Fuzz: Sensitivity for vector metrics. CoRR, abs/2202.01901, 2022.
[RP10] Jason Reed and Benjamin C. Pierce. Distance makes the types grow stronger: a calculus for differential privacy. In ICFP 2010. ACM, 2010.
URL sujet detaillé : https://www.cristal.univ-lille.fr/profil/pbaillot/sujetTypesDP.pdf
Remarques : Gratification de stage standard.
|
|
|
|
|
SM207-15 Algorithmique des isogénies de variétés abéliennes et cryptographie post-quantique
|
|
Description
|
|
Isogenies of elliptic curves have many applications in post-quantum cryptography. In this internship, we propose to work on expanding the algorithmic toolbox for isogenies of abelian varieties - which can be regarded as generalizations of elliptic curves. This approach is motivated by the objective of designing new isogeny-based cryptosystems.
URL sujet detaillé : https://members.loria.fr/PJSpaenlehauer/data/stage_isogenies.pdf
Remarques : Gratification prévue.
|
|
|
|
|
SM207-16 Implicit Neural Representation for nondestructive imaging
|
|
Description
|
|
Implicit Neural Representation are powerful techniques to reconstruct shapes or render scenes from different viewpoints. They use a single neural network to represent a function in the ambient space, function that serves for rendering the scene, or to extract the final shape. While these methods initially suffered from high computation times, fast variants have been developed allowing to train these representations in a few seconds.
In this internship, we will explore how implicit neural representation can be used for nondestructive imaging, such as for medical imaging or archaeological imaging. The goal is to reconstruct relevant organs, or archaeological artifacts from a sparse set of projections of an acquired volume. To do that, we will develop dedicated regularization techniques for handling data sparsity and possible measurement noise. As a secondary goal, we will explore how these representations can help the later segmentation of the acquired objects into different parts for example.
URL sujet detaillé : https://perso.liris.cnrs.fr/julie.digne/stages/inr_nondestructive.pdf
Remarques : co-advised with Nicolas Bonneel. (funding available)
|
|
|
|
|
SM207-17 Generative Models for non Euclidean Data
|
|
Description
|
|
Synthesizing data is a big challenge of today's computer vision and computer graphics research. From initial VAE to GAN to diffusion generative models image synthesis has reached a high level of realism, with globally coherent images and realistic high frequency details. Several reasons explain this discrepancy, the most important one being the fact that shapes are non-euclidean data, for which defining an equivariant rotation is an open problem.
In this internship, we will focus on a single issue: re-introducing some topological knowledge into shape synthesis. The idea is to synthesize first a shape of the desired topology, and then to deform it into a detailed geometric shape. This point of view will allow for topology-consistent shape interpolation. The internship will start by an extensive review of various recent shape synthesis methods and how shape topology can be introduced in these models.
URL sujet detaillé : https://perso.liris.cnrs.fr/julie.digne/stages/inr_topo.pdf
Remarques : (funding available)
|
|
|
|
|
SM207-18 Static Analysis under a Given Time Budget
|
|
Description
|
|
Static analyses by abstract interpretation are often guaranteed to terminate in finite time. However, it would be beneficial to be able to specify resource constraints (time, memory) that a static analyzer should respect, even if the precision of the analysis has to be reduced. An immediate usecase is the software verification competition [3], which limits each verification task using 15 minutes of CPU time and 8GB of RAM. We are submitting our static analyzer [4], called Mopsa, to this competition this year. We currently aim at running Mopsa through more and more expensive and precise analyses, until (a) the program is proved correct or (b) the resources are exhausted. The goal of this internship is to develop techniques to estimate how long an analysis is going to take on a given program. A starting point could be to develop an offline method, acting as a pre-analysis, estimating the complexity of the analysis of a program, possibly as a symbolic formula (Ballabriga et al [2]).
URL sujet detaillé : https://rmonat.fr/proposals/M2R_2022_time_budget.pdf
Remarques : Please contact me by email if you would like to have more information.
|
|
|
|
|
SM207-19 Incremental Static Analysis
|
|
Description
|
|
Traditionally, automatic program analyses do not reuse results they have previously established, although program verification is theoretically simpler than program analysis [3]. A program (or a slightly patched program) may be analyzed multiple times, for example when it is validated through a continuous integration pipeline. The goal of this internship is to explore the reuse of previous results on a same program, where different analyses (with different precisions) may be used. A starting point could be a theoretical study of this approach on loops, with an experimental evaluation within the Mopsa static analysis platform [4] if time permits.
URL sujet detaillé : https://rmonat.fr/proposals/M2R_2022_incremental.pdf
Remarques : Please contact me by email if you would like to have more information.
|
|
|
|
|
SM207-24 Conception et mise en oeuvre de l'interface d'analyse d'un lac de données web
|
|
Description
|
|
--------------- Contexte : --------------- Le projet LIFRANUM (LIttératures FRAncophones NUMériques), porté par le laboratoire MARGE, vise a identifier, indexer et analyser des productions littéraires nativement numériques dans l'aire francophone. Pour cela, l'outil de référence de l'archivage web Heritrix a en premier lieu permis de constituer un corpus sous forme de fichiers de conservation au format Web ARChive (WARC). Des métadonnées des pages HTML ont ensuite été extraites des fichiers WARC (contenu textuel, fichier PDF, images, vidéo, etc.) et indexées dans Solr. Par ailleurs, en s'appuyant sur des blogs appartenant a des auteurs identifiés, un deuxième corpus (fichiers JSON) a été constitué via les API de Wordpress et Blogger. Des informations, pages, posts et commentaires ont été extraits et constituent la série de métadonnées, métadonnées stockées et indexées dans MongoDB.
--------------- Sujet et missions : --------------- L'objectif de ce stage est de reprendre et améliorer une interface web (technologie de type dash en Python) commune aux deux types de sources de métadonnées (celles issues des WARC et celles provenant des API de blogs) pour permettre aux chercheurÂes du laboratoire MARGE de requeter et d'analyser les données sous-jacentes. Il faudra pour cela : ' étudier l'architecture de données existante ' concevoir un schéma d'alignement des métadonnées des WARC (données d'archivage du Web) et des API ' Reprendre l'interface graphique existante permettant de requeter les données afin de l'améliorer, et d'y ajouter des fonctionnalités (recherche sur les liens, sur des caractéristiques du texte, ...) ' proposer des visualisations Â" toutes faites Â" (mais paramétrables) ou ad-hoc, en lien avec les chercheurÂeÂs du laboratoire MARGE (visualisation de graphe, d'éléments clefs du texte, ...) Compétences souhaitées :
--------------- Compétences attendues : ------- - Technologies big data - Programmation Web (langage Python) - Visualisation de données
--------------- Pour candidater : --------------- Merci d'envoyer le dossier a julien.velcin-lyon2.fr avant le 17 novembre : - CV - Lettre de motivation - derniers relevés de note
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-25 Conception et mise en oeuvre de l'interface d'analyse d'un lac de données web
|
|
Description
|
|
Le projet LIFRANUM (LIttératures FRAncophones NUMériques), porté par le laboratoire MARGE, vise a identifier, indexer et analyser des productions littéraires nativement numériques dans l'aire francophone. Pour cela, l'outil de référence de l'archivage web Heritrix a en premier lieu permis de constituer un corpus sous forme de fichiers de conservation au format Web ARChive (WARC). Des métadonnées des pages HTML ont ensuite été extraites des fichiers WARC (contenu textuel, fichier PDF, images, vidéo, etc.) et indexées dans Solr. Par ailleurs, en s'appuyant sur des blogs appartenant a des auteurs identifiés, un deuxième corpus (fichiers JSON) a été constitué via les API de Wordpress et Blogger. Des informations, pages, posts et commentaires ont été extraits et constituent la série de métadonnées, métadonnées stockées et indexées dans MongoDB.
L'objectif de ce stage est de reprendre et améliorer une interface web (technologie de type dash en Python) commune aux deux types de sources de métadonnées (celles issues des WARC et celles provenant des API de blogs) pour permettre aux chercheurÂes du laboratoire MARGE de requeter et d'analyser les données sous-jacentes. Il faudra pour cela : ' étudier l'architecture de données existante ' concevoir un schéma d'alignement des métadonnées des WARC (données d'archivage du Web) et des API ' Reprendre l'interface graphique existante permettant de requeter les données afin de l'améliorer, et d'y ajouter des fonctionnalités (recherche sur les liens, sur des caractéristiques du texte, ...) ' proposer des visualisations Â" toutes faites Â" (mais paramétrables) ou ad-hoc, en lien avec les chercheurÂeÂs du laboratoire MARGE (visualisation de graphe, d'éléments clefs du texte,...)
Compétences attendues : - Technologies big data - Programmation Web (langage Python) - Visualisation de données
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-26 Computability and Complexity Theory for Models of Very Deep Learning
|
|
Description
|
|
General context
With no contests, models and approaches from deep learning have revolutionized machine learning. It is well known that when the number of layers increases (so-called very deep models, with sometimes more that 100 or 1000 layers), the models become very hard to train. Among a plethora of options that have been considered, Residual Neural Networks (ResNets) [8] have very clearly emerged as an important subclass of models. They mitigate the gradient issues [1] arising when training the deep neural networks. The idea in these particular models is to add skip connections between the successive layers, an idea partially bio-inspired. Since residual neural network was used and won the ImageNet 2015 competition, this particular architecture become the most cited neural network of the 21st century according to some studies (see references in wikipedia). Up to this date, winners of this competition are variations of such models. Some authors, such as [14], proved that there is a mathematical explanation for their performance in practice, as the discrete time process used in these models can be proved to be actually the Euler discretization of some continuous time Ordinary Differential Equation (ODE). The observed obtained robustness and training properties, comes then from the well-known robustness of ODEs with respect to perturbation and with respect to perturbation of their initial conditions. It was later realized and proved mathematically that various efficient models are actually nothing but reformulations of discretization schemes for ODEs. For example, following [12], the architecture of PolyNet [15] can be viewed as an approximation to the backward Euler scheme solving the ODE ut = f (u). Fractalnet [11] can be read as a well-known Runge-Kutta scheme in numerical analysis. RevNet [6] can be interpreted as a simple forward Euler approximation of some simple continuous dynamical system. All these models are very deep models, but this remains true for simpler models. For example, following [10], it transpires that the key features of well-known GRU [5] or an LSTM [9], over generic recurrent networks, are updates rules that look suspiciously like discretized differential equations. This leaded to consider some models such as neural ODE [4], which can be seen as continuous versions of ResNet. While Neural ODEs do not necessarily improve upon the sheer predictive performance of ResNets, they offer the vast knowledge of ODE theory to be applied to deep learning research. For instance, the authors in [7] discovered that Neural ODEs are more robust for specific perturbations than convolutional neural networks. Moreover, inspired by the theoretical properties of the solution curves, they proposed a regularizer that improved the robustness of Neural ODE models even further. We do not intend to be exhaustive on the various applications of this new point of view on deep learning models.
Description of the work
We are expert of computability and complexity issues related to continuous time models of computation, and in particular models based on ordinary differential equations. In particular, we know how to program with ordinary differential equations, and how to measure complexity for such models: see e.g. [3, 2, 13] for surveys. We used this knowledge in various contexts to solve some open problems in bioinformatics, applied mathematics and other contexts. We propose here to develop this approach to above models of very deep learning. The purpose of the internship is to discuss complexity and computability issues for models of very deep learnings. While most of the approaches in the context of deep learning try to learn models, without clear understanding of what is feasible and what is not, the fact that we can actually build on purpose particular ordinary differential solving a given problem do provide some lower and upper bounds on the hardness of the learning process. The objective will be to develop such results, and provide the basis for a theory for models of very deep learning. Notice that this is the fact that these very deep models are very close to models based on ordinary differential equations that makes this analysis feasible, while complexity theory is not well adapted to discuss classical models from (not very deep) deep learning.
References
[1] David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, and Brian McWilliams. The shattered gradients problem: If resnets are the answer, then what is the question? In International Conference on Machine Learning, pages 342-350. PMLR, 2017. [2] Olivier Bournez and Manuel L. Campagnolo. New computational paradigms. changing conceptions of what is computable. chapter A Survey on Continuous Time Computations, pages 383-423. Springer-Verlag, New York, 2008. [3] Olivier Bournez and Amaury Pouly. A survey on analog models of computation. In Handbook of Com- putability and Complexity in Analysis, pages 173-226. Springer, 2021. [4] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571-6583, 2018. [5] Kyunghyun Cho, Bart Van Merri =CC=88enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. [6] Aidan N Gomez, Mengye Ren, Raquel Urtasun, and Roger B Grosse. The reversible residual network: Backpropagation without storing activations. Advances in neural information processing systems, 30, 2017. [7] YAN Hanshu, DU Jiawei, TAN Vincent, and FENG Jiashi. On robustness of neural ordinary differential equations. In International Conference on Learning Representations, 2019. [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016. [9] Sepp Hochreiter and Ju =CC=88rgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735- 1780, 1997. [10] Patrick Kidger. On neural differential equations. arXiv preprint arXiv:2202.02435, 2022. [11] Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. ICLR, 2016. [12] Yiping Lu, Aoxiao Zhong, Quanzheng Li, and Bin Dong. Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations. In International Conference on Machine Learning, pages 3276-3285. PMLR, 2018. [13] Pekka Orponen. A survey of continous-time computation theory. In D.-Z. Du and Ker-I Ko, editors, Advances in Algorithms, Languages, and Complexity, pages 209-224. Kluwer Academic Publishers, 1997. [14] E Weinan. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 1(5):1-11, 2017. [15] Xingcheng Zhang, Zhizhong Li, Chen Change Loy, and Dahua Lin. Polynet: A pursuit of structural diversity in very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 718-726, 2017.
URL sujet detaillé : http://www.lix.polytechnique.fr/~bournez/load/ENS-LYON/sujet-2023-deep-learning.pdf
Remarques : The actual topic of the work is related to computability and complexity theory. This requires only common and basic knowledge in ordinary differential equations. Most of the intuitions of our today's constructions come from classical computability and complexity.
There is no specific prerequisite for this internship, except some knowledge about computability theory. This subject can be extended to a PhD. Possibilities of funding according to the administrative situation of candidates.
The subject can also be adapted according to requests, knowledge and skills of candidates. Please contact me if interested or in case of questions.
|
|
|
|
|
SM207-27 Programming with Ordinary Differential Equations and Continuous Time
|
|
Description
|
|
General Introduction
It has been understood quite recently that it is possible to program with Ordinary Differential Equations (ODEs). This actually was obtained as a side effect from attempts to relate the computational power of analog computational models to classical computability. The former are working over continuous data (for example voltage, or concentations) with a continuous time (ex: according to a system of ordinary differential equations corresponding to some analog electronic circuits, or the kinetic of reactions), unlike the latter, such as Turing machines, working on discrete entities such as words, with a discrete time. Refer to [6, 3, 14] for surveys on analog computation with a point of view based on computation theory aspects, relating these models to discrete models, or to [17, 16, 13] for surveys discussing historical and technological aspects related to analog models of computation. Analog models of computation include in particular the differential analyzers. This has been obtained by realizing that continuous time processes defined by ODEs, and even defined by polynomial ODEs, can simulate various discrete time processes. They hence can be used to simulate models such as Turing machines [7, 11], and even some more exotic models working with a discrete time but over continuous data. This is based on various refinements of constructions done in [7, 11, 12, 15, 9]. We call this ODE programming, as this is indeed some kind of programing with various continuous constructions. Forgetting analog machines or models of computation, it is important to realize that ODEs is a kind of universal language of mathematics that is used in many, if not all, experimental sciences: Physics, Biology, Chemistry, . . . . Consequently, once it is known that one can program with ODEs, many questions asking about universality, or computations, in experimental contexts can be solved. This is exactly what has been done by several authors, including ourselves, to solve various open problems, in various contexts such as applied maths, computer algebra, biocomputing. . . Applications include: Characterization of computability and complexity classes using ODEs [12, 11, 15, 9]; Proof of the existence of a universal (in the sense of Rubel) ODE [5]; Proof of the strong Turing completeness of biochemical reactions [8], or more generally various statements about the completeness of reachability problems (e.g. PTIME-completeness of bounded reachability) for ODEs [4].
Description of the work
However, when going to say more about proofs, their authors including ourselves, often feel frustrated: Currently, the proofs are mostly based on technical lemmas and constructions done with ODEs, often mixing both the
ideas behind these constructions, with numerical analysis considerations about errors and error propagation in the equations. We believe this is one factor hampering a more widespread use of this technology in other contexts. The article [2, 1] are born from an attempt to popularize this ODE programming technology to a more general public, and in particular master and even undergraduate students. We show how some constructions can be reformulated using some notations, that can be seen as a pseudo programming language. This provides a way to explain in an easier and modular way the main intuitions behind some of the constructions, focusing on the algorithm design part. We focused here, as an example, on how the proof of the universality of polynomial ODEs (a result due to [10], and fully developed in [11]) can be reformulated and presented. The purpose of the internship will be to develop this approach, in order to reformulate or improve some of the constructions. New results include the characterization of complexity classes with ordinary differential equations, or applications in experimental sciences.
References
[1] Olivier Bournez. Informatique Math =CC=81ematique Une photographie en 2022. Cours donn =CC=81es `a l'Ecole Jeunes Chercheurs en Informatique Math =CC=81ematiques., chapter Le Calcul Analogique. [2] Olivier Bournez. Programming with ordinary differential equations: Some first steps towards a pro- gramming language. In Ulrich Berger, Johanna N. Y. Franklin, Florin Manea, and Arno Pauly, editors, Revolutions and Revelations in Computability - 18th Conference on Computability in Europe, CiE 2022, Swansea, UK, July 11-15, 2022, Proceedings, volume 13359 of Lecture Notes in Computer Science, pages 39-51. Springer, 2022. [3] Olivier Bournez and Manuel L. Campagnolo. New computational paradigms. changing conceptions of what is computable. chapter A Survey on Continuous Time Computations, pages 383-423. Springer-Verlag, New York, 2008. [4] Olivier Bournez, Daniel S. Grac =CC=A7a, and Amaury Pouly. Polynomial Time corresponds to Solutions of Polynomial Ordinary Differential Equations of Polynomial Length. Journal of the ACM, 64(6):38:1-38:76, 2017. [5] OlivierBournezandAmauryPouly.Auniversalordinarydifferentialequation.LogicalMethodsinComputer Science, 16(1), 2020. [6] Olivier Bournez and Amaury Pouly. A survey on analog models of computation. In Handbook of Com- putability and Complexity in Analysis, pages 173-226. Springer, 2021. [7] M. S. Branicky. Universal computation and other capabilities of hybrid and continuous dynamical systems. Theoretical Computer Science, 138(1):67-100, 6 February 1995. [8] Francois Fages, Guillaume Le Guludec, Olivier Bournez, and Amaury Pouly. Strong turing completeness of continuous chemical reaction networks and compilation of mixed analog-digital programs. In Compu- tational Methods in Systems Biology-CMSB 2017, 2017. [9] Riccardo Gozzi. Analog Characterization of Complexity Classes. PhD thesis, Instituto Superior T =CC=81ecnico, Lisbon, Portugal and University of Algarve, Faro, Portugal, 2022. [10] D. S. Grac =CC=A7a, M. L. Campagnolo, and J. Buescu. Computability with polynomial differential equations. Adv. Appl. Math., 40(3):330-349, 2008. [11] Daniel S. Grac =CC=A7a. Computability with Polynomial Differential Equations. PhD thesis, Instituto Superior T =CC=81ecnico, 2007.
URL sujet detaillé : http://www.lix.polytechnique.fr/~bournez/load/ENS-LYON/sujet-2023-prog-ode.pdf
Remarques : The actual topic of the work is related to computability theory. This requires only common and basic knowledge in ordinary differential equations. Most of the intuitions of our today's constructions come from classical computability. There is no specific prerequisite for this internship, except some knowledge about computability theory. This subject can be extended to a PhD. Possibilities of funding according to the administrative situation of candidates.
The subject can also be adapted according to requests, knowledge and skills of candidates. Please contact me if interested or in case of questions.
|
|
|
|
|
SM207-28 Surface deformation and modeling
|
|
Description
|
|
Computer-aided design of 3D objects is becoming more and more accessible to the general public thanks to complex modeling software. Surface deformation tools allowing to sculpt a model directly in 3D have been used for a long time by animation studios thanks to software accessible to all (ZBrush, Blender 3D). In this internship, we propose to create new modeling methods acting directly on geometric quantities (curvature, distances between points).
URL sujet detaillé : https://homepages.loria.fr/ECorman/Stages/Curvature.pdf
Remarques :
|
|
|
|
|
SM207-29 Mesh generation for numerical simulation
|
|
Description
|
|
Numerical simulations have become an essential step in mechanical design. It avoids the need to build prototypes in the preliminary stages of development. To simulate physical phenomena on a computer, it is first necessary to create a discrete representation of the geometry. To do this, the volume in which the simulation takes place is divided into a multitude of elementary domains on which the discrete physical quantities (velocity, acceleration, pressure...) are stored. This partition is crucial to obtain physically correct simulations. The most common elements are triangules or tetrahedra. However, partitions in quadrangle or hexahedron elements (deformed cubes) are particularly sought after for their regularities and convergence properties. They are unfortunately difficult to generate automatically.
URL sujet detaillé : https://homepages.loria.fr/ECorman/Stages/ff3d.pdf
Remarques :
|
|
|
|
|
SM207-30 Deformation transfer between 3D surfaces
|
|
Description
|
|
Computer-aided design of 3D objects is becoming more and more accessible to the general public thanks to complex modeling tools automating many tasks. Surface deformation tools allowing to sculpt a model directly in 3D have been used for a long time by animation studios thanks to software accessible to everyone (ZBrush, Blender 3D). During this internship, we will be interested in problem of transferring deformations: given an animation of a character how to transfer its movements to another character? To do this, we must find points of correspondences between the two surfaces and then apply a joint deformation.
URL sujet detaillé : https://homepages.loria.fr/ECorman/Stages/quat_map.pdf
Remarques :
|
|
|
|
|
SM207-31 Ï=80-calculus, internal mobility and behavioural properties
|
|
Description
|
|
The pi-calculus is a formalism to reason about concurrent systems exchanging messages on channels. The internship focuses on a subcalculus of the pi-calculus, and its usages to analyse the encoding of programming language features in the pi-calculus.
Keywords: Ï=80-calculus, Î"-calculus, coinduction, bisimulation, type system.
URL sujet detaillé : https://perso.ens-lyon.fr/daniel.hirschkoff/dh-internal.pdf
Remarques :
|
|
|
|
|
SM207-32 Finding Hard Instances is Hard
|
|
Description
|
|
Most combinatorial optimization problems are NP-hard, meaning that there exists a family of instances which requires exponential-time to solve them exactly (unless P = NP). However, in practice, a large part of the instances can be solved efficiently. It is also known that random instances of a problem are often easy to solve (see also the Transition Phase phenomenon).
The goal of this project is to generate a family of hard (feasible) instances, without specific knowledge of the underlying problem (without expert rules or specific reductions), and understand why these instances are hard.
Hard instances are of importance to compare different algorithms solving the same problem or to improve the performances of an algorithm solving a specific problem (and understand where it struggles) or to disprove some graph conjectures.
Our approach will be to move these discrete problems to the continuous space and try different approaches like classical supervised Machine Learning techniques (to predict how to con- struct instances) or Reinforcement Learning (Deep Q-learning, curiosity-driven learning...). We can also think of starting from already known hard instances and try to improve its hardness by incremental changes (hill climbing) or Monte Carlo Search. We will have to test different "measure of hardness". This could be the time taken by an algorithm (but it is very dependent to a given solver and could not give an important information on the intrinsic difficulty of an instance), or the depth of a search tree in a solver, or the number of backtracks needed by a solver. A challenging issue will be to get enough data while the instances should be hard to solve (and therefore require time to compute).
URL sujet detaillé : https://www.lamsade.dauphine.fr/~sikora/stage/2022-Finding_Hard_Instances_is_Hard.pdf
Remarques : Co encadrement avec Florian Yger et Benjamin Negrevergne
|
|
|
|
|
SM207-33 Mixed precision preconditioned iterative methods for large sparse linear systems---with industrial applications
|
|
Description
|
|
The goal of the internship is to design new variants of preconditioned iterative methods that employ multiple levels of floating-point precisions to accelerate the solution of large, sparse linear systems. These methods will be tested with industrial applications coming from our partner IFPEN.
Detailed description (in French) is attached.
URL sujet detaillé : https://www-pequan.lip6.fr/~tmary/stages/stage_LIP6_iter.pdf
Remarques : Co-encadrement avec Fabienne JEZEQUEL
|
|
|
|
|
SM207-34 Implémentation de hiérarchies de structures algébriques en théorie des types / Implementation of hierarchies of algebraic structures in type theory
|
|
Description
|
|
Hierarchies of abstract mathematical structures today are the cornerstone of modern libraries of formalized mathematics. These structures provide interfaces for domains equipped with an algebraic structure and hierarchies describe the inheritance and sharing relations between these different abstractions. In type theory, it is possible to give an internal, first-class representation to these abstractions using telescopes (also called dependent tuples, or dependent records).
In this internship we propose to extend a tool called Hierarchy Builder, designed to help users with defining, extending and maintaining these hierarchies in the Coq proof assistant. The objective is to study the generation of the relevant category associated with a structure, and to automatically set up the associated morphism hierarchy. In practice, such a generation would indeed make it possible to increase the range of automation tools that can be derived generically from a structure definition. For example, one could declare or generate a left adjoint to the forgetting functor: in the case of a structure without axioms, one would thus obtain the syntax tree corresponding to the signature of this structure, the fact that it is is an instance of the given structure, and the morphisms performing the addition.
URL sujet detaillé : http://people.rennes.inria.fr/Assia.Mahboubi/MPRI23.pdf
Remarques : Co-encadrement avec Cyril Cohen (Inria). Rémunération possible.
Co-advisor : Cyril Cohen (Inria).
Intership fellowship available.
|
|
|
|
|
SM207-35 Interruptions en espace utilisateur pour le réseau BXI
|
|
|
|
SM207-36 Replaying with feedback
|
|
Description
|
|
=== Topic ==Researchers use simulations to compare the performance (execution time, energy efficiency, ...) of different scheduling algorithms in High-Performance Computing (HPC) platforms. The most common method is to replay historic workloads recorded in real HPC infrastructures: jobs are submitted to the simulation at the same timestamp as in the original log.
A major drawback of this method is that it does not preserve the submission behavior of the users of the platform. In fact, in reality, when the scheduling algorithm is more performant and the jobs finish earlier, the users tend to submit their next jobs earlier as well. We propose to tackle this problem by doing a replay with feedback. There are different ways to do so. For example, instead of preserving the original submission dates in the simulation, one can rather preserve the thinking time between the jobs (ie the time elapsed between the end of one a job and the submission of the next one). Alternatively, one can deduce from the log "working sessions" for each user and replay the jobs accordingly.
=== Objective of the internship === - Review the literature on replay with feedback for HPC simulations - Propose different models of replay and implement them in the datacenter simulator Batsim thanks to the layer Batmen enabling the simulation of users - Conduct an experimental campaign to highlight the characteristics of each model
URL sujet detaillé : https://www.irit.fr/~Georges.Da-Costa/post/replaying/
Remarques : The student will be closely supervised by Ma"l Madon (PhD student, mael.madon.fr) and Millian Poquet and/or Georges Da Costa in a friendly atmosphere :). A computer and an office will be provided, as well as a monthly internship stipend of ~600â=82=AC.
|
|
|
|
|
SM207-37 Mise en oeuvre scalable de solutions d'apprentissage fédéré sur des réseaux orientés contenu
|
|
Description
|
|
L'apprentissage fédéré est un paradigme qui permet a différents clients d'échanger leurs modèles sans révéler les données qui ont permis son élaboration. Les réseaux orientés contenu apparaissent comme une solution prometteuse pour assurer le facteur d'échelle des approches d'apprentissage fédéré. Dans un contexte de détection d'attaques par déni de service distribuée opérée dans un environnement de cloud public multi-tenants, l'objectif sera d'évaluer la performance d'un algorithme d'apprentissage fédéré, sélectionné dans la littératures, mis en oeuvre sur un réseau orienté contenu tel que Named Data Networking.
URL sujet detaillé : https://leolavaur.re/Offre_de_stage_ICN_FL.pdf
Remarques : Co-encadrement avec Guillaume Doyen (Pr.) ; stage rémunéré. Pour soumettre une candidature, fournir CV, lettre de motivation et relevés de notes.
|
|
|
|
|
SM207-38 Driving Online Training with Adaptive Experimental Design
|
|
Description
|
|
In supervised learning, successfully training advanced neural networks requires annotated data of sufficient quantity and quality, which remains a limiting factor. One alternative is to synthetically generate training data. The advantages are that synthetic data can be generated at will, in potentially unlimited amounts, the quality can be degraded in a controlled manner for more robust trainings, and the coverage of the parameter space can be adapted to focus training where relevant. Today, a large variety of simulation codes are available, from computer graphics, computer engineering, computational physics, biology and chemistry, and so on. When training data are produced from simulation codes, they can be produced on-line under the control of the training process. There are multiple benefits. This approach allows to bypass storage and I/O performance issues that impair traditional file-based training approaches: there is no need to store and move a huge data set. More fundamentally, the data can be generated in an adaptive manner according to the observed behavior of the training process. The training does not have to take place on repeated presentations of the same examples as done with epoch-based approaches. Examples can always be new ones, potentially allowing to improve the quality of the training. But this on-line training process also requires the development of adapted infrastructure and learning strategies. Today data parallelism enables training using thousands of concurrent accelerators. To match these massive processing capabilities, hundreds to thousands of simulations should be running simultaneously as well to provide training data. Our team is working on developping an adapted software architecture to deploy and control such massive executions on large supercomputers, based on the Melissa Architecture. We have been working on a framework for training a neural network feed online with data produced by multiple concurrent simulations instances extending the Melissa software initially developed for massive sensibility analysis [SC2017]. INRIA has started investigating specific non epoch-based learning rate management strategies, including cyclic learning [Smith2017] that ensures the neural architecture keeps enough plasticity during on-line training, replay buffers introduced for large scale deep reinforcement training [Horgan2018][Andry2018] to limit bias that may be introduced by the order the on-line data are produced. If the context that we are investigating here is different, focused on massive on-line parallel training from simulation data within a short time frame (from minutes to a few days of training), we will likely not face similar problems, but will anyway stay aware of the developments in these neighbor domains. Reversely the on-line training frameworks we will develop can be a valuable tool for simulating lifelong learning and testing novel adapted training strategies (however we likely won't be able to go that far in the context of this project). The objective of this internship is to study and develop strategies to decide the next simulations to be executed trying to maximize data efficiency, i.e. trying to figure out which simulations are the most relevant to run to produce data that are more relevant to improve training. For example this could be identifying simulations that could geerate data for a specific area of the domain where training perform poorly. Existing technics rely on Bayesian inference and a EIG (Expected Information Gain) metrics measuring the reduction of entropy from the prior. Recent development propose to use a neural network to enpower such approaches. See the blog https://desirivanova.com/post/boed-intro/ for a quick introduction. Technics derived from Simulation Based Inference (SBI) as well as deep reinforcement learning should also be considered. We already have use-cases ready to run for on-line training combining a solver code and a neural architecture designed to learn online a function approximating the solver (Lorenz Equation Solver, Computational Fuild Dynamics Solver). After gatting some practice with this framework and uses-cases, the candidate should be quickly be ready to focus on the issue of adaptive experimental design. So far we have only very basic strategies for deciding how to draw the parameter sets for the next simulations to run. This work will take place in tight collaboration with the team PhD students working on close topics. Location The internship will take place at the DataMove team located in the IMAG building on the campus of Saint Martin d'Heres (Univ. Grenoble Alpes) near Grenoble. The DataMove team is a friendly and stimulating environement gathering Professors, Researchers, PhD and Master students all leading research on High Performance Computing. The city of Grenoble is a student friendly city surrounded by the alps mountains, offering a high quality of life and where you can experience all kinds of mountain related outdoors activities.
URL sujet detaillé : https://gitlab.inria.fr/-/snippets/820
Remarques :
|
|
|
|
|
SM207-39 Graph Neural Network for Fluid Dynamics
|
|
Description
|
|
Computational Fluid Dynamics (CFD) solvers have benefited from strong developments for decades, being critical for many scientific and industrial applications. Eulerian based CFD solvers rely on a discretization of the simulation space, i.e. a mesh, augmented with different fields like velocity and pressure, and constrained by initial and boundary conditions. The solver progresses by discrete time steps, building from the state ut at time t, a new state ut+dt compliant with the Navier-Stokes equations. This process is compute intensive and often requires supercomputers for industrial grade simulations. Connecting CFD with machine learning has received a renewed attention with the emergence of deep learning. The goal is often to augment or supplant classical solvers for improved performance in terms of compute speed, error, and resolution. Here we focus on Deep Surrogates where a neural network is trained to provide a quality solution to the Navier-Stokes equations for a given domain, initial and boundary conditions. Deep surrogates are currently addressed through different approaches. Data-free surrogates inject into the loss the different terms of the Navier-Stokes equations to comply with, leveraging automatic differentiation to compute the necessary derivatives. These appraoches are knwown as Physics Informed Neural Networks. Data-driven surrogates train from the data produced by a traditional CFD solver. The surrogate can mimic the solver iterative process, being trained to compute ut+dt from ut. But as the fluid trajectories are available at training time, other surrogates are trained to directly produce ut from the parameters characterizing the conditions at t. Surrogates also differ in their approach to space discretization. If the mesh is a regular grid, CNNs can be used. Irregular meshes or particle based approaches are more challenging, and can be addressed through some variations of Graph Neural Networks (GNNs). In our team we have been working on developping deep surrogate architectures for fluid dynamics based on GNNs. Refer to our paper [[https://arxiv.org/abs/2112.10296][Deep Surrogate for Direct Time Fluid Dynamics] fro the details]] and this video. But GNNS are intrinsically limited to encode information that is related to meshes and not graphs: GNNs relies on isotropic operators, i.e. operators that disregard the geometric information a mesh carries. Recent work have identify this limitation and started to propose solutions refered as anisotropic operators. See Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks or this video from ICL 2022 keynote. Papers that try to address this shortcoming, first with CNNs Learning shape correspondence with anisotropic convolutional neural networks and a more recent one DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds. The goal of this intership is to study and develop solutions to improve the quality of deep surrogates for fluid dynamics, taking as a starting point the extension to anisotropic operators of our current neural architecture. We already have a full environement ready with tests using various fluid dynamics simulations (the Von Karman Cortex Street is our favorite) and parallel computers with GPUs for training. This work will take place in tight collaboration with the team PhD students working on close topics.
URL sujet detaillé : https://gitlab.inria.fr/-/snippets/821
Remarques :
|
|
|
|
|
SM207-40 Ensemble Runs on Supercomputers
|
|
Description
|
|
Numerical simulations are today commonly used for modeling complex phenomena or systems in different fields such as physics, chemistry, biology or industrial engineering. Some of these numerical simulations require supercomputers to run high-resolution models. In general, a numerical simulation needs a set of input parameters in order to produce the simulation outputs. The input parameters and the often complex internal model produce outputs that can be very large. Very large scale supercomputers have the capacity to support the execution of many instances of these numerical simulations, usually called an ensemble run or parameter sweep. Having a large sample of executions is used for many purposes, including sensibility analysis, deep reinforcement learning, deep surrogate training, data assimilation, Simulation-based Inference. Our team developed an original solution for running very large ensemble runs on supercomputers and processing the data on-the-fly. The framework, called Melissa, is open source and has been used for sensibility analysis, data assimilation and training of deep neural networks for physics (See refs....). So far the largest Melissa ensemble runs handled 80 000 simulations, processed on-line 278 TB of data, using up to 27 000 compute cores. The objective of this internship is to study and develop strategies to better control the execution of the ensemble with the goal of reducing execution time and power consumption. Melissa relies on an orchestrator, called the launcher, to control the execution of the different simulation instances. Melissa orchestrator has several degrees of freedom to tune each simulation (how many CPUs and/or GPUs to use). The orchestrator can also kill a running simulation and restart it with a different configuration if needed, or can increase or decrease the number of simulations running concurrently. The target execution platform is a supercomputer that, like the cloud, is shared between different applications. So the execution environment is uncertain as the availability of resources (CPUs, GPUs, file system and network load) can change over time. The batch scheduler is a service on supercomputers in charge of deciding when and where (which nodes) each application is expected to run. Melissa already interact with the batch scheduler for requesting simulation execution, but so far with simple strategies. We would like to extend Melissa to monitor the supercomputer environment and adjust the ensemble execution trying to optimize its objectives (duration, energy). Work To be able to control the execution environment, to reproduce experiments and have the possibility to run:
A training period to master the different concepts and tools: It includes Melissa and the Grid'5000 cluster for executions. Nixos-compose is a tool to deploy software and services to computing platform. It will be used to deploy a self-contained "supercomputer" with Melissa and the OAR batch scheduler and controllable external load. We already have a setup ready. You will need to learn to control it. Batsim is a simulator of batch scheduler. Deploying a whole melissa cluster with nixos-compose gives good confidence about the feasibility of the scenarios. However, it requires a lot of resources and time to gather results. In respect of that Batsim will be used to simulate the strategies and find best scenarios. You will learn how Batsim works, and run simple simulations to understand its functioning. Make a simple model that capture the essential characteristics of the environment (Melissa app and supercomputer), start to elaborate some strategies and evaluate their potential benefits with Batsim and/or nixos-compose. Start experimenting a promising strategy, analyse results, revisit and improve strategy and repeat. If required we will be able to have complementary tools to extend the experimentation context (runs on larger production supercomputers, or simulation runs in the BatSim environment).
The work will be pursued in tight collaboration with the DataMove members, including Engineers, PhD students and Postdocs working on these topics. We are the developers of Melissa, the OAR batch Scheduler, NixCompose deployment recipes (see publications in the reference section), which should give us significant freedom to try different strategies. This work is part of the REGALE European project. We will for instance use the EAR tool developed by our partners at the Barcelona Supercomputing Center for monitoring simulation energy usage. The candidate will have the opportunity to interact with the other teams from REGALE. Our team regularly receives grants for funding PhDs and engineer positions. This internship is a good way to integrate Datamove, and if the fit is good, stay for a PhD or engineering position.
URL sujet detaillé : https://gitlab.inria.fr/-/snippets/819
Remarques :
|
|
|
|
|
SM207-41 Proving Correctness of Reconfigurable Systems
|
|
Description
|
|
The goal of this internship is the study of logics for the design and verification of complex component-based distributed applications, using the principles of locality (the ability to describe the effect of an update only from the parts involved while gnoring the ones unchanged) and compositionality (the ability to join the results of local analyses into a global condition capturing the correctness requirement for the entire system). During this internship, the candidate will aquire command of advanced theoretical notions of logics and system verification. The internship comprises theoretical as well as implementation work.
URL sujet detaillé : https://nts.imag.fr/images/5/58/Reconfiguration.pdf
Remarques : stage rémuneré
|
|
|
|
|
SM207-42 Verifying Concurrent Systems with Automata over Infinite Alphabets
|
|
Description
|
|
The goal of this internship is to study extensions of finite-state automata over infinite alphabets and apply them to the verification of concurrent and distributed systems with unbounded numbers of threads. The internship comprises theoretical as well as implementation work, and will explore orthogonal domains, such as logic, automata theory and concurrency.
URL sujet detaillé : https://nts.imag.fr/images/5/5e/InfiniteAlphabetAutomata.pdf
Remarques : stage rémuneré
|
|
|
|
|
SM207-43 Decision Procedures for Separation Logic Modulo Theories of Data
|
|
|
|
SM207-44 Unsupervised Learning for Imaging Inverse Problems
|
|
Description
|
|
CONTEXT In recent years, deep neural networks have obtained state-of-the-art performance in multiple imaging inverse problems, such as computed tomography and image super-resolution. Networks are generally trained with supervised pairs of images and associated measurements. However, in various imaging problems, we usually only have access to incomplete measurements of the underlying images, thus hindering this learning-based approach. Learning from measurement data alone is impossible in general, as the incomplete observations do not contain information outside the range of the sensing process.
Recent advances in self-supervised learning methods have highlighted the possibility of learning from measurement data alone if the underlying signals are invariant to groups of transformations such as translations or rotations. The potential of these unsupervised methods has been demonstrated on various inverse problems, including computed tomography and magnetic resonance imaging. However, invariance to translations and/or rotations are not enough for various band-limited inverse problems appearing in practice such as image deblurring or super-resolution. This project will explore the use of scale invariance to tackle such problems, investigate their theoretical properties and propose practical unsupervised learning algorithms using deep learning.
GOALS The main goals of the internship will be: 1) Understand the challenges and fundamental limitations of unsupervised learning in the context of inverse problems. 2) Propose new practical deep learning-based unsupervised learning algorithms that enforce scale equivariance. 3) Study theoretical guarantees for learning from measurement data alone under scale invariance assumptions. end{enumerate}
SKILLS The applicant should have a strong background on signal processing and machine learning and be proficient in Python programming. Knowledge of PyTorch and high-dimensional statistics is a plus, although not mandatory.
APPLICATION Potential applicants are invited to write Julian with any question about the project or even to meet us at the physics department of ENS de Lyon. Applicants can contact us at julian.tachella-lyon.fr and patrice.abry-lyon.fr. Please include a CV and a statement of interest in your application email.
REFERENCES [1] Dongdong Chen, Julian Tachella, and Mike E Davies. Equivariant imaging: Learning beyond the range space. ICCV 2021 (Oral). [2] Julian Tachella, Dongdong Chen, and Mike Davies. Sensing theorems for learning from incomplete measurements. arXiv preprint arXiv:2201.12151, 2022. [3] Dongdong Chen, Julian Tachella, and Mike E Davies. Robust equivariant imaging: a fully unsupervised framework for learning to image from noisy and partial measurements. CVPR 2022 (Oral). [4] Patrice Abry, Paolo Goncalves, and Jacques Levy Vehel. Scaling, fractals and wavelets. John Wiley & Sons, 2013.
URL sujet detaillé : https://tachella.github.io/assets/pdf/internship22_scale.pdf
Remarques : Co-advisor: Patrice ABRY (LP, ENSL). The internship is remunerated at the standard rate.
|
|
|
|
|
SM207-45 bayesian inference and lambda-terms transformations
|
|
Description
|
|
This internship will focus on a special class of inference algorithms based on Bayesian networks, such as the Variable Elimination algorithm, and it will study how to express them as program transformations of a special class of let-terms, corresponding to a fragment of simply typed lambda-calculus.
( see detailed project at https://www.irif.fr/~michele/bayesian.pdf )
URL sujet detaillé : https://www.irif.fr/~michele/bayesian.pdf
Remarques :
|
|
|
|
|
SM207-46 Problèmes de satisfaction de contraintes avec contraintes fonctionelles
|
|
Description
|
|
Constraint satisfaction problems are decision problems of the form "Given a set of variables, a set, and constraints over the variables, does there exist a map from the variables to the domain that satisfies all the constraints?" Such problems are parameterized by the set of constraints that are allowed in the input, called the constraint language. For constraint languages consisting of relations on a finite set, the complexity of problems has been completely characterized by a celebrated recent result by Bulatov and Zhuk: every CSP is either in P or NP-complete, and the border between the two cases can be characterized algebraically using tools from the theory of universal algebra. Recently, some extension was proposed by Barto, DeMeo, and Mottet, where constraint languages with relations and operations were considered. A similar dichotomy P vs. NP-complete result was obtained for constraints over a two-element set, but the problem for general finite sets remains wide open. The goal of the internship is to develop the existing methods and prove complexity dichotomies for larger classes of constraint languages, in particular in the 3-element case. The tools that will be employed are mathematical in nature, leveraging the theory of universal algebras.
URL sujet detaillé : http://amottet.github.io/internships.html
Remarques :
|
|
|
|
|
SM207-47 The theory of smooth approximations in infinite-domain constraint satisfaction
|
|
Description
|
|
Constraint satisfaction problems are decision problems of the form "Given a set of variables, a set, and constraints over the variables, does there exist a map from the variables to the domain that satisfies all the constraints?" Such problems are parameterized by the set of constraints that are allowed in the input, called the constraint language. For constraint languages consisting of relations on a finite set, the complexity of problems has been completely characterized by a celebrated recent result by Bulatov and Zhuk: every CSP is either in P or NP-complete, and the border between the two cases can be characterized algebraically using tools from the theory of universal algebra. By allowing the domain of the constraint language to have an infinite domain, the associated class of problems captures all decision problems! Looking for an extension of the Bulatov-Zhuk result to infinite-domain CSPs is therefore too ambitious in general, but one can restrict the class of constraint languages to so-called reducts of finitely bounded homogeneous structures. The Bodirsky-Pinsker conjecture states that this class of CSPs also admits a P/NP-complete dichotomy, where this time the border between tractable and intractable problems is conjectured to be characterized by algebraic and topological conditions. Recently, a new theory has been proposed by Mottet and Pinsker to approach this conjecture. This theory relies on a new tool, called smooth approximations, with which one could provide unified proofs of all known results involving the Bodirsky-Pinsker conjecture. The goal of the internship is to develop the theory of smooth approximations to be able to overcome its current limitations. In particular, this internship will involve a mix of algorithmic, Ramsey theory, universal algebra, and model theory.
URL sujet detaillé : http://amottet.github.io/internships.html
Remarques :
|
|
|
|
|
SM207-48 Flux d'information et analyse de purete
|
|
Description
|
|
Ce sujet de stage de Master a pour but d'utiliser des techniques d'analyse statique afin de détecter des interférences dans des programmes OCaml. Un objectif pratique est d'aider les développeurs OCaml en les avertissant qu'un programme OCaml est sensible a l'ordre d'initialisation des unités de compilation qui le composent.
URL sujet detaillé : https://people.irisa.fr/Benoit.Montagu/internships/2022-2023_ocaml_ambiguity.pdf
Remarques : Stage co-encadré par Thomas Jensen.
|
|
|
|
|
SM207-49 Privacy-Preserving Biometric Authentication System
|
|
Description
|
|
Biometric-based solutions are a major privacy concern topic as the reconstructed biometric templates can be used to identify or impersonate an individual. Thus, the more viable approach is to share a subset of the original database in a privacy-preserving manner and to give more control at user end. This way, the raw biometric templates remain only at the user end devices and are isolated from the other organizations/providers. It enables other organizations to authenticate the said individuals, but in such a way that the organizations cannot learn any biometric information from the shared template.
' To study the state of the art on secure, privacy-preserving Biometric-based protocols.
' To develop a secure and privacy-preserving biometric authentication protocols by using cryptographic primitives which provide privacy for both user templates stored on their devices and the biometric measurement captured by the sensors in the open setting.
' To document and present your work in conference/workshop.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-50 Solving Stackelberg Games with Strategic Information and Flexible End Users
|
|
Description
|
|
Internship Offer: The increasing integration of distributed energy resources and electrification of the end user energy usages (e.g., transportation and heating) raise challenges for the conventional centralized control of grid operations, due to the induced uncertainty, large-scale decentralized nature of the system, and changing load patterns. However, this new energy landscape also enables an unprecedented growing volume of invaluable flexibility. Consumption flexibilities come from individual end users or from groups of end users and are usually managed in an aggregated manner, by a particular entity (which is often called an aggregator): for example, flexibilities from the charging of electric vehicles can be reported by fleet or charging stations operators, flexibilities from water-heaters by an aggregator, etc.
The issue is that end users can often receive monetary gains by strategically misrepresenting their usage patterns (e.g., lying about their consumption baseline) and preferences to the utility company, and many of the incentive programs in deployment today are not robust to strategic data manipulation. To design such robust incentives, we will introduce an aggregator, which aims to learn the end users' private information, by designing a privacy contract incentivizing the end users to report their private information truthfully. The contracts are designed based on the readings, e.g., information about their nominal demand, sent by the end users to the aggregator and broadcasted to all the end users. These readings can be biased and noisy, i.e., the end users might have incentives to manipulate strategically their information to optimize some privacy metrics.
We will formulate the problem involving the end users and the aggregator, as a Stackelberg game [1,2], i.e., a class of non-cooperative games where every follower knows the leader's action and responds by computing its best response. In our setting, at the lower level of the Stackelberg game, the end users compute the reading to send to the aggregator to optimize some privacy metrics, as well as their demand and local generation, being flexible around a nominal private demand value, to maximize their usage benefit while minimizing their local production cost. The end users' demand and local generation, at equilibrium, incorporate endogenously the shifts caused by their possibly fake readings. At the upper level, the aggregator, anticipating the reaction of the end users, optimizes each end user's contract payment to minimize the cost resulting from its estimation of the end users' private information and contract payments. Properties such as individual rationality, incentive compatibility, and non-negativity of the payments will be taken into account in the Stackelberg game formulation. The goal of this internship is (1) to model the interactions between the end users and the aggregator as a Stackelberg game in static and dynamic settings, (2) to develop a mechanism that the aggregator can use to design incentives while estimating the end users' private information, (3) to characterize the set of Stackelberg equilibria and the impact of information patterns on this set.
[1] T. Basar, G. J. Olsder (1999) Dynamic Noncooperative Game Theory, SIAM, Philadelphia. [2] R. Taisant, M. Datar, H. Le Cadre, E. Altman, Balancing Efficiency and Privacy in a Decision-Dependent Network Game, Preprint HAL.
Profile of the Candidate: We are looking for a highly motivated second year Master student, who would like to be involved in a 4 to 6 months internship. The internship can start as early as February but not later than September 2023. The intern will be hosted at Inria Lille-Nord Europe, in France. The candidate should have a very good background in mathematical optimization, basics in game theory and operations research; programming skills in Python is desirable but not mandatory.
URL sujet detaillé :
:
Remarques : The intern will have the possibility to work in close collaboration with a postdoctoral researcher, specialized in stochastic approximation theory, and possibly do a one week visit at Ecole Polytechnique Montréal, in Canada.
|
|
|
|
|
SM207-51 KK2DSAP
|
|
Description
|
|
------------------------ Context ------------------------
Aligning two sequences is a well-known measurement to get similarity degree of two strings. It is a well-studied problem in computer science, especially in bioinformatic for comparison of biological sequences by looking for a series of nucleotides or amino acids that appear in the same order in the input sequences, possibly introducing gaps [1]. Depending on the shape of the sequences, we distinguish the one-dimensional sequences alignment where the studied sequences have the form of a vector, and for some applications such as automatic natural language processing or speech processing, the shape of the sequences is two-dimensional [2, 3, 4]. That is the alignment of two-dimensional sequences, which consists in comparing two sequences of matrix form X and Y to find similarities (or distance) between them. Trees can also be compared using this type of alignment to index matrices [5] using suffix trees (or tables of suffixes) to find exact repeats or generalize to find approximate patterns.
Chanoni et al. [6] proposed an O(n^4)-time sequential algorithm for this problem based on the dynamic-programming technique. More precisely, they present the recurrence formulas for performing global as well as local alignments of two-dimensional patterns of respective size M and N in O(M =97 N) space and time, where M = m1 =97 m2 and N = n1 =97 n2. For that, they need to precompute in O(M =97 N) space and time the similarities between all the prefixes of all the lines of the two patterns and between all the prefixes of all the columns of the two patterns. However, experiments performed on most clusters showed that the execution time of the precomputation phase is time-consuming, uses a lot of memory, and occupies nearly 99% of the total execution time of the algorithm.
Parallelizing this algorithm, especially the precomputation phase, could significantly reduce the execution time and enable efficient use of memory for large-scale experimentation. Since the high-performance computing hardware has diversified significantly over the last decade, many parallel programming models (such as CUDA, OpenMP, OpenACC, and MPI) have been built to exploit efficiently these machines. This forces programmers to (re)-write codes for each programming model and for each HPC system, especially as these systems evolve. The Kokkos programming model offers a solution for performance portability. It enables developers to write their code in such a way that it can leverage all current and future systems without major rewriting of the code itself [7].
------------------------ Objectif ------------------------
This work aims to propose a tool based on Kokkos to solve the problem. This tool should allow to perform large-scale experiments. To measure the performance of this tool, a comparison with various solutions on other programming models will be necessary. The candidate will compare several metrics such as execution time, energy consumption, etc.
------------------------ Keywords ------------------------
Dynamic programming, high-performance computing, parallel programming model, Kokkos
------------------------ Required profile ------------------------
* Good programming skills (especially in C++) are required for experimental validation. * Basic knowledges of programming models such as OpenMP, MPI, and CUDA are required.
------------------------ Advisors ------------------------
* Thierry GAUTIER, INRIA Researcher at LIP (http://perso.ens-lyon.fr/thierry.gautier) * Jerry LACMOU ZEUTOUO, Researcher Engineer at LIP
------------------------ How to apply? ------------------------
Send an email to thierry.gautier.fr and jerry.lacmou-zeutouo.fr with your CV, a short text describing your motivation, and any document that can support your application.
------------------------ References ------------------------
1. Alberto Apostolico, Mikhail J. Atallah, Lawrence L. Larmore, and Scott McFaddin. 1990. Efficient Parallel Algorithms for String Editing and Related Problems. SIAM J. Comput. 19, 5 (1990), 968-988. https://doi.org/10.1137/0219066
2. Vincent Derrien. 2008. Heuristiques pour la résolution du problème d'alignement multiple. Ph.D. Dissertation. Université d'Angers.
3. James Wayne Hunt and M Douglas MacIlroy. 1976. An algorithm for differential file comparison. Bell Laboratories Murray Hill.
4. Charu Sharma and AK Vyas. 2014. Parallel approaches in multiple sequence alignments. International journal of advanced research in computer science and software engineering 4, 2 (2014), 264-276.
5. Al Erives. 2019. Maximal homology alignment: A new method based on two-dimensional homology. bioRxiv (2019), 593228.
6. Thierry Lecroq, Alexandre Pauchet, Emilie Chanoni, Gerardo A. Solano. 2012. Pattern discovery in annotated dialogues using dynamic programming. International Journal of Intelligent Information and Database Systems 6, 6 (2012), 603-618. https://dx.doi.org/10.1504/IJIIDS.2012.050097
7. C. R. Trott et al. 2022. Kokkos 3: Programming Model Extensions for the Exascale Era. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 805-817. https://doi.org/10.1109/TPDS.2021.3097283
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-52 Modélisation de graphes pour la vérification de contraintes de vol
|
|
Description
|
|
ONERA is the French public research centre on aerospace. Within ONERA, department DTIS gathers researches in computer science, mathematics, and automatics. In this department, research unit MIDL focuses on the use of formal methods in the context of large-scale aerospace systems.
The proposed internship takes place in collaboration with Laure Petrucci (Univ. Sorbonne Paris Nord) and focuses on the use of timed Petri nets to verify formally a conflict management module in an air traffic management system.
When applying this kind of method to a large-scale system, it is necessary to try to limit the combinatorial explosion. To this end, it is necessary to abstract certain details to limit the complexity of models and make reasoning easier before reintroducing later the abstracted details using refinement operations that must preserve the semantics of the models.
The considered system is an automated local conflict management system allowing to adjust in real time an initial flight planification in response to hazards (avoidance of dangerous zones, compensation of delays, ...) while still ensuring some key security properties (flyability of the trajectories, robustness to delays or emergency situations, ...).
The specific difficulty identified in the considered system is that some properties to be verified have a geometrical component, which is known to be difficult to discretize without losing too much information.
To tackle them, the intern will start from previous work having modelled the behaviour of an unmanned aircraft system in an anticollision perspective under changing wind conditions, and to analyze a graph of checkpoints to identify relationships between flight plans and geometrical properties. This way, the physical costraints will be converted into discrete properties according to the abstraction process described above.
The internship will focus on one family of safety properties, but there is an opportunity to pursue it as a PhD cofinanced by ONERA and Univ. Sorbonne Paris Nord in the more general case.
URL sujet detaillé : https://w3.onera.fr/stages/sites/w3.onera.fr.stages/files/dtis-2023-48.pdf
Remarques : The internship takes place within a collaboration between ONERA, who has a great expertise of formal methods for aerospace systems, and LIPN at Univ. Sorbonne Paris Nord, who has world-class expertise on model-checking techniques such as timed Petri nets. Depending on the status of the student, the internship may be paid.
|
|
|
|
|
SM207-53 Invariant Deep Learning Models for Object Detection
|
|
Description
|
|
Industrial adoption of Artificial Intelligence algorithms, in particular based on neural networks, relies on the ability to show a high degree of robustness to different perturbations of the input data, while keeping the need for additional training inputs as low as possible. In many industrial areas, the available training data is scarce and data augmentation techniques plus transfer learning are needed to obtain an acceptable performance. However, these techniques are difficult to validate, in particular in terms of perturbation coverage, and lack theoretical trustworthiness criteria or metrics for the solution. The methods based on Convolutional Neural Networks (CNNs) are the state-of-the-art in applications such as image classification and object detection. They are interesting because they provide robustness to translations of the objects to detect/classify. Through weight-sharing the result is inherently translation invariant, which indicates that the detection does not depend on the position of the object in an image (ignoring border effects). However, CNNs are not invariant to other important transformations such as rotation, scaling or contrast changes. During this internship, we propose the study of differential invariants [1,4] and group-equivariant CNN [2] for object detection in remote sensing or aerial images.
INTERSHIP DESCRIPTION
In this context, your main missions as part of this internship will be to:
1) Study the state-of-the-art on equivariant CNNs [1-4] (and others). New theoretical contributions are welcome! 2) Develop and benchmark different approaches for object detection in an existing remote sensing or aerial images database. 3) Write a report to present the methods and conclusions of the internship 4) Participate/Take part in Machine Learning/Deep Learning/ Signal Processing/ Radar meetings to share internship findings with the scientific community.
Bibliography
[1] M. Sangalli, S. Blusseau, S.Velasco-Forero and J. Angulo, Moving Frame Net: SE(3)-Equivariant Network for Volumes, NeurReps Workshop, Symmetry and Geometry in Neural Representations, New Orleans, Louisiana, USA, 2022.
[2] Taco Cohen and Max Welling, "Group Equivariant Convolutional Networks", Proceedings of The 33rd ICML, PMLR 48:2990-2999, 2016.
[3] P.-Y. Lagrave and M. Riou, "Toward Geometrical Robustness with Hybrid Deep Learning and Differential Invariants Theory", Proceedings of AAAI Spring Symposium MLPS, 2021
[4] M. Sangalli, S. Blusseau, S.Velasco-Forero and J. Angulo, "Differential invariants for SE (2)-equivariant networks", International Conference in Image Processing, 2022 (Oral Presentation)
URL sujet detaillé :
:
Remarques : This is an M2 internship with the participation of Thales. The student must spend most of the time between Massy (Thales) and Paris (Mines Paris)
|
|
|
|
|
SM207-54 Synthèse de code pour l'évaluation de fonctions spéciales
|
|
Description
|
|
This project is part of a long-term effort to automate the machine-precision floating-point implementation of special functions (mathematical functions slightly less ubiquitous than elementary functions like sin, cos, log, yet "common enough to deserve a name") and other mathematical functions appearing in specialized applications. Code generation has the potential to allow techniques developed for elementary functions to be applied at low cost to a much wider class of functions. This should provide faster and more accurate implementations that can also be better tailored to each application.
The intern will focus on Bessel functions, a family of special functions of central importance in mathematical physics. The aim of the project will be to automate the techniques used in a state-of-the-art human-written implementation of Bessel functions and integrate the result in a code generation framework under development.
URL sujet detaillé : http://marc.mezzarobba.net/offers/metabessel.pdf
Remarques :
|
|
|
|
|
SM207-55 Reliable numerical integration
|
|
|
|
SM207-56 Fast evaluation of elementary functions with medium precision
|
|
|
|
SM207-57 Sparse interpolation of rational functions
|
|
|
|
SM207-58 Numerical approach to structural parameter identifiability
|
|
|
|
SM207-59 How AI can advance the building of Domain Specific Languages ?
|
|
Description
|
|
Both Software Engineering and Artificial Intelligence (AI) could be used side-by-side for better dealing with modelling, checking and validating IoT-based systems. Our aim is to suggest a rigorous domain specific language (DSL) applied to, yet not exhaustively, healthcare systems. In a such context, we apply our approach to real- world case studies where, particularly, two medical sensors, namely, EEG and ECG are dealt with. This topic follows data-driven approaches to provide models based on the observed data. We aim at using AI concepts such as Machine or Deep Learning for modelling (including syntax and semantics levels) of a target IoT Domain Specific Language (DSL). The use of AI enables us to create more precise and efficient models, and this eventually implies better data analysis (control) applications. Models would be regularly updated and refined, e.g. once in a week or once in a month, etc. w.r.t. the historical data accumulated from the domain environment, and which can be gathered from e.g. a big data warehouse, etc.
URL sujet detaillé : https://github.com/omeriem/Offres-de-Stage-M2.git
Remarques : Co-encadrants : - Lotfi Chaari (lotfi.chaari.fr)
- Jean Baptist Raclet (raclet.fr)
Ce stage est financé par le projet IBCO - CIMI (2021-2023)
|
|
|
|
|
SM207-60 (Deep) Learning the Mechanics of Cell-based Materials
|
|
Description
|
|
The OM theory team aims to understand the physics of living materials by combining analytical active matter theories, numerical models, and new image analysis techniques.
Epithelial tissues line our organs and protect them from damage. Most cancers originate from the epithelium.
A long-term objective in the field is to build an algorithm to infer the mechanical properties of epithelial tissues based on time-lapse images. As rigidity is a clinical criterion for recognizing pre-cancerous tissues, such an algorithm could provide new insights into the blooming data from live endoscopies.
The cellular material theoretical framework, which was first introduced for liquid foams, has been successfully applied to the understanding of confluent epithelial tissues [1]. This theoretical toolbox allows us to infer the tissue mechanical properties from correlations between the local cell shape and strain-rate tensor fields [2]. Yet, in most cases, it remains much more difficult to analyze epithelial tissues than liquid foam due to the complexity of the task of tracking individual cell shapes over an extended period of time.
Together with Marc Karnat (Ph.D. student in the OM team) and Sham Tlili (CNRS researcher at IBDM, Marseille), we have proposed a set of image analysis tools that allow us to directly assess the epithelial tissue visco-elastic properties. Our technique unraveled a relationship between pharmacological treatments and tissue mechanics in several in vitro experiments.
During this internship, we propose to use our vertex model [3] - with well-controlled mechanical properties - to expand the capability of the deep learning code currently developed in the team. Our objective is to infer the mechanical properties of our simulated tissues. The second phase of the internship will consist in the exploitation of our trained deep-learning tool on in vitro tissue data from our collaborators [4].
1. F. Graner, B. Dollet, ... P. Marmottant, Discrete rearranging disordered patterns, EPJE (2008). 2. Tlili et al. Physical Review Letters (2020). 3. S.-Z. Lin, M. Merkel, J.-F. Rupprecht arxiv (2022) 4. S. Sonam, L. Balasubramaniam, S.-Z. Lin, ... J.-F. Rupprecht, B. Ladoux, Nature Physics (2022)
URL sujet detaillé :
:
Remarques : Possibilité de rémunération mensuelle
|
|
|
|
|
SM207-61 Model Checking for Malware (Virus) Detection
|
|
Description
|
|
The number of malwares that produced incidents in 2010 is more than 1.5 billion. A malware may bring serious damage, e.g., the worm MyDoom slowed down global internet access by ten percent in 2004. Authorities investigating the 2008 crash of Spanair flight 5022 have discovered a central computer system used to monitor technical problems in the aircraft was infected with malware. Thus, it is crucial to have efficient up-to-date virus detectors. Existing antivirus systems use various detection techniques to identify viruses such as (1) code emulation where the virus is executed in a virtual environnement to get detected; or (2) signature detection, where a signature is a pattern of program code that charaterizes the virus. A file is declared as a virus if it contains a sequence of binary code instructions that matches one of the known signatures. Each virus variant has its corresponding signature. These techniques have some limitations. Indeed, emulation based techniques can only check the program's behavior in a limited time interval. They cannot check what happens after the timeout. Thus, they might miss the viral behavior if it occurs after this time interval. As for signature based systems, it is very easy to virus developers to get around them. It suffices to apply obfuscation techniques to change the structure of the code while keeping the same functionality, so that the new version does not match the known signatures. Obfuscation techniques can consist in inserting dead code, substituting instructions by equivalent ones, etc. Virus writers update their viruses frequently to make them undetectable by these antivirus systems.
To sidestep these limitations, instead of executing the program or making a syntactic check over it, virus detectors need to use analysis techniques that check the behavior (not the syntax) of the program in a static way, i.e. without executing it. Towards this aim, we propose to use model-checking for virus detection.
Model-checking is a mathematical formalism that can check whether a system satisfies a given property. It consists in representing the system using a mathematical model M, the property using a formula f in a given logic, and then checking wether the model M satisfies the formula f. Model-checking has already been applied for malware detection. However, the existing works have some limitations. Indeed, the specification languages they use have only been applied to specify and detect a particular set of malicious behaviors. They cannot be used to detect all viruses behaviors.
Thus, one of the main challenges in malware detection is to come up with specification formalisms and detection techniques that are able to specify and detect a larger set of viruses.
The purpose of this internship is thus to:
1- Define an expressive logic that can be used to compactly express malicious behaviors. Our goal is to be able to express a large set of malicious behaviors that were not considered previously.
2- Define an efficient model-checking algorithm for this logic.
3- Reduce the malware detection problem to the model-checking problem of this logic.
4- Implement these techniques in a tool for malware detection and apply this tool to detect several malwares.
URL sujet detaillé : https://lipn.univ-paris13.fr/~touili/sujet.pdf
Remarques :
|
|
|
|
|
SM207-62 dispositif d'injection EM Dispositifs avancé pour la caractérisation sécuritaire matérielle des SOCs
|
|
Description
|
|
In order to exploit physical attacks in the FORENSIC field as a way to bypass the security mechanisms of mobile devices such as Smartphones, the main objective of this internship proposal consist in working on the characterization and medialisation of a new approach of attack Electro-Magnetic Fault Injection (EMFI: Electro-Magnetic Fault Injection) based. The CEA team team has shown the potential of EMFI to corrupt security functions of complex target such as SoC (System-On-Chip) of Smartphone, allowing obtaining a privilege elevation by authenticating with an illegitimate password [1], or by potentially allowing to extract a secret from an enclave like the TrustZone [2]. However, advanced tools are more and more required for the characterization of this type of target, knowing that the technology used in the design of the new SoCs becomes increasingly smaller (exp: 7 nm technology based on planar transistors, for the Sumsung S20). Consequently, attacks by exploiting hardware flaws require certain state-of-the-art equipment, not currently marketed (very small diameter probe, low output impedance injector, etc.). The main mission of this internship is therefore to characterize a new electromagnetic pulse injection attack device, designed internally at the LTSO laboratory of CEA Leti [3].
[1] C. Gaine, D. Aboulkassimi, S. Pontie, J.-P. Nikolovski, et J.-M. Dutertre, Â" Electromagnetic Fault Injection as a New Forensic Approach for SoCs Â", in 2020 IEEE International Workshop on Information Forensics and Security (WIFS), 2020. https://www.wifs2020.nyu.edu/ [2] P. Leignac, O. Potin, J.-M. Dutertre, J.-B. Rigaud, et S. Pontie, Â" Comparaison of side-channel leakage on Rich and Trusted Execution Environments Â", in 6th Workshop 3 on Cryptography and Security in Computing Systems, 2019. https:// [3] https://www.leti-cea.fr/
URL sujet detaillé : https://filesender.renater.fr/?s=download&token=c78f31a2-d384-4879-98cf-eaafbf305017
Remarques : Le candidat/La candidate devra etre en dernière année de Master ou école d'ingénieur d'un cursus Informatique, Electronique ou Cybersécurité. Une expérience en lien avec les attaques physiques n'étant pas obligatoire, mais appréciable. Le stagiaire/La stagiaire sera rémunéré(e) en fonction des grilles salariales CEA.
La poursuite de ce stage par une thèse est potentiellement envisageable. Le stage démarrera en 2023.
|
|
|
|
|
SM207-63 Static analysis of pseudo-LRU caches
|
|
Description
|
|
The naive vision of cache memory is that it stores "the most recently accessed data". In reality, each memory block can be stored in one specific part of the cache memory (the cache set suitable for the block's address), and furthermore the block evicted from the cache set is not necessarily the least recently used (LRU). In fact, many processors implement "pseudo-LRU" policies cheaper to implement in hardware and that have similar practical performance. For safety-critical hard real time applications, it is necessary to prove bounds on the worst-case execution time (WCET). For this, it is necessary to know which accesses are cache hits or misses. There are several good analyses, including "exact" ones, for LRU caches in the scientific literature. There are no such good analyses for pseudo-LRU policies. In fact, it can be shown that static analysis for pseudo-LRU policies belongs to higher complexity classes than for LRU. The topic of the internship (possibly leading to a thesis) is to research good and practically efficient analysis for pseudo-LRU policies.
URL sujet detaillé :
:
Remarques : co-encadrement avec Claire Ma=AFza
|
|
|
|
|
SM207-64 Luminous Robot Swarms in 3D Environments
|
|
Description
|
|
The main goal of this Master internship is to study coordination problems for luminous robot fleets in 3D environments. Classical problems are exploration (going through every location of the discrete environment), gathering (assembling every robot to the same location), and scattering (dispersing robots to different locations).
The aim is to study the feasibility, i.e., what abilities of the robot are necessary to fulfill their objective, and the complexity, i.e., how many robots and how many colors are necessary for other settings.
URL sujet detaillé : https://sancy.iut.uca.fr/~durand/sujetMasterRobots.pdf
Remarques : Possibility to continue with a Ph.D. thesis on related topics.
|
|
|
|
|
SM207-65 Analyse algébrique du problème MinRank
|
|
Admin
|
|
Encadrant : Magali BARDET |
Labo/Organisme : LITIS Laboratoire d'Informatique, du Traitement de l'Information et des Systèmes UR 4180. |
URL : http://magali.bardet.free.fr |
Ville : ROUEN (St-Etienne-du-Rouvray) |
|
|
|
Description
|
|
Algebraic cryptanalysis of the MinRank Problem.
The MinRank Problem is very simple to state: given K matrices with coefficients in a finite field GF(q) and an integer r, find a linear combination of these matrices whose rank is r. It is a NP-complete problem that is used to build post-quantum signature schemes or to attack systems. The goal of the internship is to understand the different modelings of the problem and to improve its solving.
See the detailed research subject.
URL sujet detaillé : https://ent.normandie-univ.fr/filex/get?k=a7UwBuurlNeQDAhAFe5
Remarques : Possibilité de poursuivre en thèse a la suite du stage.
|
|
|
|
|
SM207-66 Interoperability of proof systems
|
|
Description
|
|
This project is about formal proofs as digital objects, more specifically the translation of formal proofs between different proof systems. Formal proofs are used in mathematics but also in the industry for certifying the correctness of protocols, software and hardware.
Interoperability is a very important feature in computer science and engineering to avoid useless work duplication and allow more safety. Unfortunately, interoperability between proof systems is not well developed. One important difficulty is that proof systems may have incompatible features: their combination may be inconsistent. Therefore, to translate a proof from one system to the other, we need to analyze the features of the first system used in the proof and check whether they are compatible with the features of the target system.
The Î"Î -calculus modulo rewriting, and its implementation Dedukti, is a powerful logical framework allowing users to define their own logic and represent the proofs in those logics [1,2]. For instance, one can represent in Dedukti first-order logic and its proofs, simple type theory and its proofs, the Isabelle logic and its proofs [5], the Coq logic and its proofs [3], the Agda logic and its proofs [6], etc. In addition, there is a number of tools for transforming those proofs and translate them back to various other systems: HOL-Light, Coq, PVS, Lean, etc. [7].
There exist several tools for checking the correctness of Dedukti files: dkcheck, kontroli and lambdapi. While dkcheck and kontroli are mere checkers taking complete Dedukti files as input, Lambdapi is a proof assistant featuring implicit arguments, type inference, coercions, tactics, the possibility of calling external automated theorem provers, etc. for building Dedukti proofs interactively.
In this context, I propose different internship subjects, more or less theoretical/practical, related to automated theorem proving or interactive proof assistants, possibly with international collaborations: - Certifying PVS proofs - Generating Dedukti proofs from SMT solver proofs - Translating Lean to Dedukti - Inter-system translation of recursive functions - Integrating and reusing Dedukti proofs in other provers
Find more details on each subject on https://blanqui.gitlabpages.inria.fr/.
URL sujet detaillé : https://blanqui.gitlabpages.inria.fr/
Remarques :
|
|
|
|
|
SM207-67 Improving sparse penalties with non convexity and coefficient clustering
|
|
Description
|
|
See pdf for mathematical details.
Sparse optimization problems have become ubiquitous in high dimensional regression and classification problems. They are usually introduced as composite penalized optimization problems, with a datafit and a penalty. Non-convex penalties such as Lp (0 < p < 1), SCAD or MCP have been shown to mitigate the well-known amplitude bias of their convex counterparts. Amongst these, the Minimax Concave Penalty (MCP) (Zhang, 2010) optimally satisfies the necessary conditions for unbiased recovery of sparse signals (Soubies et al., 2017). The supervisors have begun investigating various open questions in this field. The intern will be able to explore one (or both) of the following research directions, depending on his/her interests. Improving the minimization of MCP: Computing the solution of MCP regularized problems is a challenge, due to the non convexity of the latter, that leads to existence of local minima that are not global. The first topic of this internship will be to improve the robustness of numerical solvers to spurious local minimizers. To that end, the intern will build upon SparseNet (Mazumder et al., 2011) that combines graduated non-convexity (Mobahi and Fisher, 2015) together with warm-start strategies so as to compute the full MCP regularization path (i.e., all possibles solution from a large lambda to a small one). In particular, the goal is to exploit known properties of MCP (Soubies et al., 2020), as well as fast solvers (Bertrand et al., 2022) to improve and accelerate the search strategy. - The second topic of this internship will be to study the numerical feasability of solvers for SLOPE like-MCP regression problems (see pdf). SLOPE's practical success relies on the efficient computation of its proximal operator (Zeng and Figueiredo, 2014) that allows the use of fast first order algorithms (Parikh and Boyd, 2014). As a start, the inter will thus work on the computation of the SLOPE-MCP proximal operator so as to efficiently deploy first order algorithms. Then, we shall also consider the use of iteratively reweighed algorithms
URL sujet detaillé : https://mathurinm.github.io/assets/pdf/sujet_stage_M2_nonconvexity.pdf
Remarques : Co-supervision with Emmanuel Soubiès (https://www.irit.fr/~Emmanuel.Soubies/). Internship in Lyon or Toulouse depending on the candidate's preference.
|
|
|
|
|
SM207-68 Exploitation de données issues des réseaux sociaux
|
|
Description
|
|
Le projet APs (pour Â" Augmented Proxemic services Â") est un projet de recherche s'inscrivant dans le cadre d'une thèse transfrontalière franco-espagnole, visant a mettre en place un cadre de travail (framework) générique pour (1) collecter, (2) traiter, (3) analyser puis (4) valoriser des données issues des réseaux sociaux en s'appuyant sur la théorie de la proxémique. Nous choisissons le domaine du tourisme comme cas d'étude pour nos expérimentations.
URL sujet detaillé : https://iutbayonne-my.sharepoint.com/
Remarques :
|
|
|
|
|
SM207-69 Formally verified optimizations for safety-critical embedded code
|
|
Description
|
|
Safety-critical applications are generally deployed on microcontrollers or specialized processors, and incur specific procedures for validation and qualification. General-purpose compilers, such as gcc or clang, may occasionally miscompile programs. CompCert is a formally verified compiler for the C programming language, in the sense that there is a machine-checked proof that, when compilation succeeds, it produces assembly code whose execution matches that of the source code. Correctness is not everything. Safety-critical systems must generally meet real-time specifications. Compiler optimizations result in faster code capable of meeting these specifications. At present CompCert is only moderately optimizing. At Verimag, we have already added various optimizations to CompCert (prepass and postpass scheduling, global common subexpression elimination, loop-invariant code motion, strength reduction...). Yet, there is still much to do for certain kinds of applications and certain target platforms, in particular 32-bit ARM, which is popular for embedded applications. The purpose of the internship is to investigate which optimizations CompCert is missing on certain classes of programs, and how to implement and prove them correct.
URL sujet detaillé : https://www-verimag.imag.fr/Formally-verified-optimizations-for.html
Remarques : Co-advised with Sylvain Boulmé. The internship may be followed by a CIFRE (part-time in industry) thesis.
|
|
|
|
|
SM207-70 Security counter-measures in a certiï=AC=81ed optimizing compiler
|
|
Description
|
|
CompCert is a compiler for the C programming language for the assembly languages of several processor architectures. In contrast to compilers such as Visual C++, GCC, or LLVM, its compilation phases are proved mathematically correct, and thus the compiled program always matches the source program: the formal correctness of CompCert states that if the compiler succeeds to produce an executable, then the observable behaviors of the executable are also observable on the source program. Other compilers may contain bugs that in some cases result in incorrect code being generated. The possibility of compilation bug cannot be tolerated in certain applications with high safety requirements, and then costly solutions such as disabling all optimizations are used to get assembly code that is close to the source. In contrast, CompCert, despite not optimizing as well as gcc -O3 or clang -O3, allows using optimizations safely.
Security of embedded systems like smart-cards, secured dongles and IoTs relies on the robustness of devices against physical fault attacks (such as laser or electromagnetic attacks). Current certification schemes (e.g., Common Criteria) require protection mechanisms against multi-fault injection attacks. Typically, these protections often consist in Counter-Measures (CM): monitors which perform redundant computations in order to detect/prevent some attacks.
Some of these CMs are "manually" written in the source code by developers. For example, the process of counting the number of trials for typing some pin-code is duplicated in order to make successful hardware attacks on this number more complex. However, because such CM perform redundant computations on execution without attacks, optimizing compilers may remove them.
A solution of has been experimented within LLVM compiler. It consists in introducing observations of the program state that are intrinsic to the correct execution of security protections, along with means to specify and preserve observations across the compilation flow. Such observations complement the input/output semantics-preservation contract of compilers. In practice, they are given as annotations of the source code.
>From these works, the internship will study how to use the formal notion of observable events of CompCert, in order to ensure that CMs are not removed by the compiler (as a consequence of its formal correctness).
URL sujet detaillé : https://www-verimag.imag.fr/Security-counter-measures-in-a-1019.html
Remarques : Co-advised with Sylvain Boulmé. This wide topic is also the subject of a future PhD thesis proposed as an extension to the internship.
|
|
|
|
|
SM207-71 1â=80=AF546 / 5â=80=AF000 Résultats de traduction Algorithm for reconstructing a continuous form from a discrete object
|
|
Description
|
|
Objects present on pixelated images are described by integer coordinates in the discrete space Zn. Their geometrical and topological properties then differ from objects in a continuous space (Rn). One of the objectives of discrete geometry is to define operations on these discrete objects (transformations of the plane, estimators of geometric quantities,...) reproducing as well as possible the topological andeometric properties of continuous objects. In order to verify the transfer of these properties from continuous to discrete, discretization processes have been proposed to define a discrete equivalent for each continuous object. It then becomes possible to compare the topological properties and the geometric characteristics of the continuous object with those of its discretization. It is nevertheless necessary to impose certain regularity constraints on the continuous object so that its properties are preserved during the discretization. In a previous paper, we introduced a new family of objects, LTB curves (Locally Turn Bounded ), verifying a regularity constraint including both polygons and smooth curves. We have shown that certain configurations could not appear in the discretization of LTB curves, and have deduced properties of topology preservation and geometric characteristics (perimeter, curvilinear integral). The objective of the internship is to verify, by a case study, that each discrete object not containing these excluded configurations is indeed the discretization of a LTB shape and to propose a reconstruction algorithm of this LTB shape. Otherwise, the objective is to identify exhaustively all the configurations excluded from the discretization of a LTB shape. This reconstruction method will be implemented in the sequel. of the internship. By following the strategy proposed in [Ngo et al, ICPR 18], this reconstruction algorithm could make it possible to define rigid transformations of the discrete plane preserving the topology of the object.
URL sujet detaillé : https://seafile.unistra.fr/f/9c972aaf22bd41eb9009/
Remarques : co-advised by E. Le Quentrec
|
|
|
|
|
SM207-72 Généralisation des permutations planes en 3D
|
|
Description
|
|
During this internship we will to generalize to 3-permutations known results on permutations. More particularly, we will be interested in plane permutations
URL sujet detaillé : https://www.labri.fr/~bonichon/plane_permutation_3D.pdf
Remarques : Stage co-encadré avec Philippe Duchon (https://www.labri.fr/perso/duchon/)
|
|
|
|
|
SM207-73 Formal language theory with monoidal categories
|
|
Description
|
|
Monoidal categories provide a convenient and general setting for the study of formal language theory, unifying several concepts.
An example is the class of regular monoidal languages, with a paper available at the link below. This work was presented at MFCS 2022.
https://arxiv.org/abs/2207.00526
There is much work to do continuing in this direction, including the study of automata and recognisability, algebraic approaches, extensions to other language classes, etc.
The candidate will join a vibrant research team consisting of Prof Sobocinski, 8 PhD students and three postdocs. The successful candidate will meet weekly with Sobocinski, Matthew Earnshaw and Fosco Loregian to work on this research thread.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-76 Interoperability of Proof Systems
|
|
Description
|
|
The goal of this internship is to propose a pragmatic approach to translate, as automatically as possible, a large Coq proof library into an equivalent Lean proof library.
URL sujet detaillé : https://dpt-info.u-strasbg.fr/~magaud/Master_GeoLean.pdf
Remarques : Co-encadrement avec Julien Narboux.
Gratification de stage standard.
|
|
|
|
|
SM207-77 Dynamic Shape Analysis for Sparse Tensor Codes
|
|
Description
|
|
Sparse data structures are widely used in high-performance computing and machine learning. This makes the code behavior highly dynamic and complicates the compilation process. Our overall objective is to delay the optimization of sparse code at runtime when the sparse structure is known.
In this internship, we propose to address the very first step, which is to infer automatically, at runtime the shape of the tensors flowing between the loop kernels of a scientific application. Well known static analysis techniques as linear relation analysis and more generally abstract interpretation might be involved.
This internship can possibly be followed with a PhD thesis.
More details: https://gitlab.inria.fr/alias/cours-m2-cr14/-/blob/master/2022/part-1-polyhedral-model/stages/stage-sparse.pdf
URL sujet detaillé : https://gitlab.inria.fr/alias/cours-m2-cr14/-/blob/master/2022/part-1-polyhedral-model/stages/stage-sparse.pdf
Remarques : Co-advising with ENS Paris (Xavier Rival) and Université de Strasbourg (Philippe Clauss) Stip-end : 500 euros/months
Possible pursue with a PhD thesis
|
|
|
|
|
SM207-78 Modélisation des réseaux complexes
|
|
Description
|
|
De nombreux jeux de données provenant de contextes variés, tels que les sciences sociales, la biologie, la linguistique, la médecine, les transports, les communications et d'autres, sont organisés sous forme de réseaux. Comprendre la structure de ces réseaux complexes de très grande taille revet des enjeux majeurs pour répondre aux questions qui se posent sur ces objets dans ces différents contextes.
Une des grandes problématiques de l'étude de ces objets est la modélisation, c'est-a-dire la génération aléatoire de réseaux synthétiques ayant les memes propriétés que celles observées sur les réseaux issus de contextes réels. Sur cette question, le domaine butte depuis de nombreuses années sur la difficulté de concevoir des modèles suffisamment génériques et capables de reproduire les quatre propriétés les plus largement partagées par les réseaux complexes: une faible densité globale, une faible distance moyenne, une distribution des degrés hétérogène et une grande densité locale. C'est cette dernière propriété qui pose le plus problème (les trois premières sont restituées de fa=A7on satisfaisante par le modèle de configuration) car la forme de corrélation qu'elle introduit entre les liens est difficile a restituer dans un procédé de génération aléatoire: les liens sont plus probables lorsque leurs extrémités ont un voisin en commun, formant ainsi un triangle.
Le but de ce stage est de concevoir une méthode de génération aléatoire qui se base sur le modèle de configuration et qui tienne compte en plus de la propension des liens a fermer un triangle. Les compétences nécessaires pour réaliser cet objectif sont les suivantes: - connaissances de base des graphes et des probabilités - aisance a programmer et a manipuler des jeux de données de grande taille - capacité de synthèse et d'analyse de résultats expérimentaux - communiquer a l'oral et par écrit sur des sujets scientifiques et techniques.
Le stage se déroulera au laboratoire I3S a Sophia Antipolis, prês de Nice.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-79 Link communities in complex networks
|
|
Description
|
|
Community detection is one of the most developed topics of research on complex networks. It started after it was observed that most of real-world networks can be partitioned into dense parts, called communities, that are sparsely connected between them. The question is to design method to automatically extract these communities from a network. The most commonly used approach is to partition the nodes of the network and some very good algorithms exist to do so, the seminal one being called the Louvain algorithm. It turns out that in many cases it would more relevant to partition the links of the network rather than its nodes. Consequently, the field has made much effort toward this goal, but without reaching a clear consensus on some method. The goal of the internship is to fill in this gap by following an approach similar to the one used for node partition. The main difficulty for doing so is that there is currently no quality function available to appreciate how good a partition of the links of the network into communities is, meaning in particular that the quality function used for node partitions does not adapt to the case of link partitions. Resolving this issue, which is a severe limitation for the domain, is the main challenge of the internship.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-80 A polynomial kernel for chordal edge deletion
|
|
Description
|
|
One of the earliest polynomial kernels for edge modification problems is the one designed in 1999 by Kaplan et al. for chordal edge completion. In this problem, one aims at turning an arbitrary graph G into a chordal graph, i.e. a graph with no induced cycle on at least four vertices, by adding at most k edges to G. Since then, it has been shown that the problem admits a subexponential parameterized algorithm and that the related problems of chordal edge editing and chordal edge deletion (where one respectively allows both addition and deletion of edges or deletion only) are also FPT, when parameterized by k. But the question of knowing whether chordal edge deletion (or editing) admits a polynomial kernel parameterized by k has since then remained open. One reason for this matter of fact is that breaking induced cycles by adding edges or by removing edges is very different. The goal of this internship is to design a polynomial kernel for chordal edge deletion, if possible, or to provide a proof of inexistence otherwise, based on classical complexity hypotheses (typically P not equal NP or ETH). Another related question that may be investigated during the internship is the existence of a constant ratio approximation algorithm for the problem.
The requirements for this internship are: - a strong taste for graphs and algorithms, - basic notions of computational complexity.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-81 Algorithms with Learnable Predictions for Combinatorial Optimization
|
|
Description
|
|
The advance of machine learning opens promising research avenues, especially in the design of algorithms for combinatorial optimization. Algorithms with machine-learning predictions have been recently introduced to circumvent worst-case analysis. The goal is to design algorithms that have near-optimal performance when the predictions are good and always have performed better than the worst-case guarantee even when the predictions are mediocre. At a high level, the methods incorporate machine learning advice to adapt their behavior to the properties of the input distribution and consequently improve their performance, such as runtime, space, or quality of the solution. The field has blossomed with many applications. In this direction, we aim to develop a model that allows for imperfect predictions and study the tradeoff between the quality of algorithms and that of the predictive information.
In this project, we plan to: 1. characterize the strength and the limit of predictive information; to which extent such information is useful and related to learnable predictions; =09 2. design algorithms with performance guarantees for problems with predictions. 3. design mechanism with machine learning predictions in the context of algorithmic game theory.
URL sujet detaillé : https://datamove.imag.fr/kimthang.nguyen/Stages/prediction.pdf
Remarques : The student will get the national standard payment for internships. The intern will additionally be in interaction with people in the chair Edge Intelligence https://edge-intelligence.imag.fr/.
Contact us for any further questions.
|
|
|
|
|
SM207-82 enrichment analysis based on GNN
|
|
Description
|
|
Analysing biological processes requires knowledge not only about entities themselves but also their interactions. Biological data, such as PPI (Protein Protein interaction) or GRN (Gene Regulatory Networks), are naturally presented in the form of network to highlight existing interactions. In the other hand, deep learning methods have been extended to support graph structured data and tackle many biological networks analysis related problems. We focus on pathway embedding using Graphical Neural Networks (GNN). We try to build the first GNN-based system that annotates across multiple datasets a list a gene or protein relatives to an experience.
URL sujet detaillé : https://jacob.cea.fr/drf/ifrancoisjacob/Pages/Departements/CNRGH.aspx
Remarques : rénumération de 700 euros /par mois
|
|
|
|
|
SM207-83 Smart Services for Edge Computing
|
|
Description
|
|
The global trend to extend connectivity to an increasing number of daily life consumer devices has led to the emergence of edge computing, a paradigm in which we try to deal with information at the edge of networks, as close as possible to the devices that produce this information and wish to transform or consume it. To this day we still lack a complete solution for the development and orchestration of services at the edge of networks which rigorously take into account the needs of users and the capacities both in terms of storage, availability and energy of the devices available at the edge of the network.
Keywords : edge, energy, mobile, service, orchestration, networks
URL sujet detaillé : https://www.reveillere.fr/jobs/2022/edge.pdf
Remarques : Stage rémunéré selon gratification, possible dé placement en Norvège pour rencontrer d'autres chercheurs travaillant sur ce sujet également.
|
|
|
|
|
SM207-84 Sécurité des communications orientée couche physique
|
|
Description
|
|
The continuous increase in the number of devices sharing information in the context of the Internet of Things has been strongly driven by the proliferation of sensor networks, whether in the context of smart cities, smart factories or smart health. These small pervasive objects and heterogeneous in their tasks are intended to be deployed to accomplish either the measure and report of information relating to the environment or a precise physical action. A large number of work address the security questions related to the wireless communications between these sensors and the rest of the infrastructure. We offer to research a novel security approach, a novel paradigm using the physical layer as its corner stone.
URL sujet detaillé : https://www.reveillere.fr/jobs/2022/PHYSec.pdf
Remarques : Stage rémunéré selon gratification
|
|
|
|
|
SM207-85 Parametric verification of periodic properties in a hybrid model of epidemiological dynamics
|
|
Description
|
|
CONTEXT
In an international context marked by a constant increase in major epidemic events, scientific research is in great demand, in order to provide institutional actors with decision-making assistance, in the face of urgency and uncertainty. Anticipating and controlling the spread of these emerging epidemics are two crucial issues that concern many scientific communities. A great deal of progress has recently been made in modelling the propagation dynamics of these events. To study the effect of human behaviour on biological dynamics, we know how to build hybrid models, by coupling discrete modelling tools such as automata, derived from computer science, with continuous modelling tools such as differential equations, derived from mathematics. The verification of the properties of these complex models is now an essential area of research.
OBJECTIVES
The main objective of this internship is to study a hybrid model of epidemic dynamics, and to implement a parametric verification method for the study of the periodic properties of this model, in order to better explain the phenomena of epidemic returns. The student will first have to familiarise himself/herself with the subject by reading a few scientific articles in depth. The realization of numerical simulations of the model will accompany the research of an algorithmic protocol of parametric verification of the periodic properties, and will be followed by a formalisation of the solution obtained in a wider context.
URL sujet detaillé : https://www.ls2n.fr/stage-these/verification-parametrique-de-proprietes-periodiques-dans-un-modele-hybride-de-dynamique-epidemiologique
Remarques : Stage co-encadré par Guillaume Cantin Stage rémunéré par le LS2N, au tarif réglementaire
|
|
|
|
|
SM207-86 Hole detection in point clouds using computational topology
|
|
Description
|
|
Point clouds are a type of unstructured data that is widely present in the field of computer graphics, and used in a large panel of applications. They are notably produced by techniques of surface acquisition such as photogrammetry or LiDAR. The size and complexity of the produced data usually challenge their analysis. In this internship, we propose to apply concepts taken from Topological Data Analysis to perform a robust and efficient extraction of topological information in 3D point clouds: the number and type of "holes" that are present in the sampled surface.
URL sujet detaillé : https://www.irit.fr/sslam/research-intern-holes-detection-in-point-clouds-using-computational-topology/
Remarques : The internship will take place in the STORM team (https://www.irit.fr/STORM/site/) at IRIT, in Toulouse. It will be co-advised by Jules Vidal (post-doc at IRIT, https://julesvidal.github.io), Nicolas Mellado (Researcher), and Loic Barthe (Professor). The expected salary is around 600 euros/month.
|
|
|
|
|
SM207-87 JIT Compiler Basic Block Splitting
|
|
Description
|
|
Basic Block splitting is a compiler technique that duplicates code to open later optimization opportunities. Deciding efficient splitting heuristics is challenging and critical for performance. First, splitting requires tracking type and constraint information in the compiler. Second, bad splitting decisions lead to code increases without performance improvements. Finally, splitting on JIT compilers requires those decisions to be taken fast.
This project aims to study basic block-splitting mechanisms in JIT compilation and meta-compilation schemes. It will target a new meta-compiler under development for the Pharo programming language (www.pharo.org). The student will implement extensions to the SSA (single-static-assignment) form to represent types and value constraints. Based on those, the student will implement optimisations such as dead path elimination, path splitting, and jump threading. The student will study the efficiency of those optimisations and cost models to take splitting decisions at compile time.
URL sujet detaillé :
:
Remarques : Stage gratifié
|
|
|
|
|
SM207-88 Automatic Benchmark Generation
|
|
Description
|
|
Designing application benchmarks that are good representatives of application behaviour and are not subject to internal runtime noise is a hard task for application developers. The objective of this internship is to use automatic program generation techniques (i.e., program synthesis) that are aware of runtime noise sources as well as application domain knowledge. Noise awareness will minimize internal noise by construction.
We study what properties turn existing application tests into relevant benchmarks. We will use two main techniques. First, we will extract runtime profiling information to detect application hot spots and identify code portions relevant to performance. Second, we will use such profiling information to guide static code analyses on existing application test cases. Such a study will lead us to the automatic identification of benchmark candidates from existing application tests.
We will then investigate how application tests can be automatically turned into benchmarks. Identified benchmark candidates will not exhibit the same performance profile as the application at runtime because they are by design built to run fast and have few dependencies. We will design program generation techniques to produce macro benchmarks from benchmark candidates. Such program generation techniques will produce benchmarks that remain relevant and minimize internal noises.
URL sujet detaillé :
:
Remarques : Stage gratifié
|
|
|
|
|
SM207-89 Representation Learning for Geographic Spatio-Temporal Generalisation
|
|
Description
|
|
Time-series are becoming prevalent in many fields, particularly when monitoring environmental changes of the Earth's surface in the long term (climate change, urbanisation, etc), medium term (annual crop cycle, etc) or short term (earthquakes, floods, etc). With the current and future satellite constellations satellite image time-series (SITS) expand remote sensing's impact. The project's goal is to develop domain invariant representations using deep learning for SITS analysis. Such methods will enable geographic generalisation, which consists of reusing information from the analysis of one geographic area to analyse others by using, or not, the same sensors, as proposed in [5]. Current approaches work for single images because they generally originate from the computer vision community. The internship will start the evaluation of the state-of-the-art and to implement and extend approaches already developed in ICube [5,6]. Current work on domain adaptation (DA) for time-series uses either weak supervision [1] or attention-based mechanisms [2,3] for classification or focus on the related problem of time-series forecasting [4]. However, none of these approaches tackle the problem of learning DIRs that can be applied to several geographical locations simultaneously. The work has two benefits: on the one hand, to reduce the burden of ground truth collection when sensors of different characteristics are used; and on the other to exploit the information contained in each data modality to learn representations that are more robust and general, i.e. to detect crops, land cover evolution, etc in different countries that exhibit different characteristics. Your contributions will be part of the global work of the SDC researchers and will be validated through the partnership with CNES and potential collaboration with Tour du Valat. SDC's aim is to propose and implement new generic methods and tools to exploit large sets of reference data from one domain/modality (sufficient to train an accurate detector) to train a multi-modal/domain detector that can be applied to imagery taken from another sensor for which there exists no reference data. As such, the work tackles key problems in many machine learning & computer vision applications.
URL sujet detaillé : https://seafile.unistra.fr/f/7b4b402e34124fb396b7/?dl=1
Remarques : Gratification : 600â=82=AC/mois Possibility to be extended to a PhD thesis (also ICube/CNES) subject to funding and candidate suitability
Strong collaboration with the CNES
|
|
|
|
|
SM207-90 Text Summarisation with Quantum Natural Language Processing
|
|
Description
|
|
Context: Quantum natural language processing (QNLP) is the use of quantum computing to solve NLP tasks faster than any classical computer. In a recent approach, text is represented as parameterised circuits that are optimised using a hybrid classical-quantum algorithm. This approach was implemented on noisy intermediate-scale quantum (NISQ) hardware, with promising experimental results on text classification and question answering.
Objective: The aim of the internship is to apply QNLP to the problem of automatic text summarisation. The student will design quantum algorithms, investigate their asymptotic speedup compared to classical ones and implement proof-of-concept experiments to evaluate them.
URL sujet detaillé : https://alexis.toumi.xyz/jobs/22-11-16-qnlp-summarisation
Remarques : Co-encadré par:
- Alexis Toumi (alexis.email)
- Benoit Favre (benoit.favre-amu.fr)
- Giuseppe di Molfetta (giuseppe.dimolfetta-lab.fr)
|
|
|
|
|
SM207-91 Compiler Transpilation
|
|
Description
|
|
The objective of this project is to study transpilation techniques to translate compilers written in high-level languages to low-level languages such as Rust or C++.
JIT (Just-in-Time) compilers are an optimization technique often used for interpreted languages and virtual machines. They allow spending time optimizing only frequently used code while falling back in slower execution engines for non-frequent code. For example, the Pharo and the Java VM run on a bytecode interpreter and eventually compile machine code for methods that are frequently called.
The current Pharo JIT compiler that is part of the Virtual Machine, aka Cogit, implements an architecture based on templates of native code per bytecode. When a method is compiled, each bytecode is mapped to its corresponding template. All templates are concatenated to form a single-machine code method. This architecture has a drawback in that the behaviour of the Pharo language is duplicated in both the bytecode interpreter and their corresponding machine code templates.
Currently, the Inria Lille team is working on a compiler infrastructure called DRUID, meant for meta-compilation and machine code compilation, and meant to replace Cogit. The objective of this project is to study transpilation techniques to translate such compilers to a low-level language such as Rust or C++. The student will study how intermediate languages are represented in high-level object oriented languages and their translation to low-level imperative languages, how their memory is managed and how code generation frameworks can be translated.
URL sujet detaillé :
:
Remarques : Stage gratifié
|
|
|
|
|
SM207-92 Fevrier-Juillet 2023
|
|
Description
|
|
Dynamic exploration of multidimensional spaces parameters of complex system models.
Proposal of precise methods for defining zones: The analysis of a given model on a space(zone) can trigger a breakdown of this zone and an in-depth exploration of its exploration of its sub-zones. Thus, in general, the areas of space in which the model analysis shows a great variability of the results will lead to more in-depth more in-depth treatments.
Search for interesting exploration methods to cover the parameter space.
URL sujet detaillé : http://pagesperso.ls2n.fr/~attiogbe-c/mespages/ENCADREMENT/sujetStageM2_ExplorDynamiquePrm.pdf
Remarques : Co-encadrement: ATTIOGBE Christian christian.attiogbe-nantes.fr
Rémunération/gratification: Financement du laboratoire LS2N.
|
|
|
|
|
SM207-93 Neural Tangent Kernels
|
|
Description
|
|
The recent advances of neural networks have lead to impressive practical breakthroughs in various applications such as image generation or Natural Language Processing (NLP). The mathematics of deep networks, combined with the intuition gained from the computational experiments, have now unveiled an interesting property of wide neural networks which allows to define Tangent Kernel of a very specific nature, taylored for neural network applications. The goal of this internship is to explore the computational properties of the Neural Tangent Kernels in various contexts and implement these approximation for different applications.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-94 Frustration index of signed graphs
|
|
Description
|
|
Signed graphs are graphs whose arcs are either positive or negative. They appear naturally in many applications. For instance, in the context of gene networks, experiments reveal the existence of activation or inhibition between genes, which are represented under the form of a signed graph G. This signed graph G imposes some constraints on the underlying dynamics of the network. A very useful operation to analyze these constraints is the switch operation: switching a vertex in G means that we switch the sign of all the arcs incident to this vertex (negative arcs become positive, and vice versa). The nice fact is that the switch preserves the underlying dynamics (up to isomorphism) while it can make the signed graph much more simple. For instance, if every cycle of G contains an even number of negative arcs, there is a sequence of switches which make all arcs positive, and the analysis of the underlying dynamics is then much more simple. More generally, one might think that the analysis of the underlying dynamics is simpler when the number of negative arcs is minimal. Two signed graphs are said switch equivalent if one can be obtained the other one by a sequence of switches. For a signed graph G, the minimum number of negative arcs in a graph which is switch equivalent to G is called the frustration index of G. Given an unsigned graph H, the frustration index of H is the maximum frustration index of the signed graphs having H as unsigned underlying graph.
Many simple questions about this parameter remain unsolved, both in the directed and undirected case. For H being an unsigned clique K, it is known that its frustration index f is the frustration index of the all-negative signed version K- of K. Moreover, any signed version of K that has frustration f is switch equivalent to K-. It has been conjectured that the same may hold not only for H being a clique but more generally for H being a chordal graph. The main purpose of this internship is to prove or disprove this conjecture. Another related question that may be investigated during the internship is the computational complexity of computing the frustration index of a signed chordal graph.
URL sujet detaillé :
:
Remarques : The internship will be co-advised by Florian Bridoux and Christophe Crespelle
|
|
|
|
|
SM207-95 Correct rounding of the atan2 function
|
|
|
|
SM207-96 A collaborative security-by-design platform using MDE approach
|
|
Description
|
|
Context Security-by-design is becoming the mainstream development method for building secure mission-critical and software-intensive systems [1], [2]. In such approach, security is built into the system from the ground up, starting with a robust requirement analysis and architecture design. Since these early phases in Software Development LifeCycle (SDLC) play a crucial role in the software development process, the consideration of security aspects during these phases can have a greater impact on the system's ability to anticipate, withstand, recover from, and adapt to adverse conditions, stresses, attacks, or compromises on cyber resources.
One of the most important security-by-design activities in software development is threat modeling [3]. It aims at identifying a coverage of all possible threats [4], vulnerabilities and preventing and/or mitigating the effects of threats and attacks on a software system. Several threat modeling methods exist, reviewed for example in [5], [6]. As part of all these methods, threat enumeration is at its core [7], which is traditionally carried out in brainstorming sessions. In a brainstorming session, vulnerabilities, both on a system-level and a component-level, are identified, and threats that exploit these vulnerabilities are enumerated.
Problem statement In software development, Model-Driven Engineering (MDE) is widely used to make sure the final product will satisfy customer needs, since it is based on the definition of models closer to the problem domain than to the implementation domain, alleviating the complexity of platforms. This is particularly relevant when developing Domain-Specific Modeling Languages (DSMLs), which are modeling languages specifically designed to carry out the tasks of a particular domain. However, security experts' participation in the DSML specification process is still rather limited nowadays [8].
In fact, in many cases security engineering and software engineering are "islands", in the sense that the disciplines work independently of one another [8]. Even in the case of the threat modeling, where there is a collaboration between security experts and software engineers, there is no detailed description of a procedure to support the brainstorming sessions, and no reference model to be used by such a procedure [9]. Due to the lack of guidance, the lack of sufficiently formalized process, the high dependence on participants' knowledge and the variety of participants' background, these sessions are often conducted sub-optimally and require significant effort [9].
Mission In order to address the above limitation, we aim at building a collaborative security-by-design platform using MDE approach. Firstly, the candidate will elaborate a DSML development process by enabling the active participation of all security-by-design participants, both software engineers and security experts, from the very beginning. Secondly, the candidate is supposed to propose a DSML, enabling the representation of security proposals during the language design and the discussion (and trace back) of possible solutions, comments and decisions arisen during the collaborative security-by-design. This DSML should evidently involve the definition of: abstract syntax (meta-model) which includes the collaborative security-by-design concepts, their relationships, the structuring rules that constrain the model elements and their combinations in order to respect the collaborative security-by-design domain rules; Concrete syntax, which provides a realization of the abstract syntax as a mapping between the metamodel concepts and their textual or graphical representation; Semantics using formal approaches.
Keywords MDE, security-by-design, threat modeling, collaborative platform
URL sujet detaillé :
:
Remarques : Co-encarant : Nan Messe Remarque
Possibility of pursuing a PhD thesis.
|
|
|
|
|
SM207-97 Verifiable Graph Data Integration
|
|
Admin
|
|
Encadrant : Angela BONIFATI |
Labo/Organisme : In this project, we consider the investigation of proof methods dedicated to the verification of graph data integration as being investigated in the LIRIS Lab. (Lyon) by A. Bonifati [1] and using graph transformation verification methods currently developed in the LIG Lab. (Grenoble) by R. Echahed [2]. The targeted framework will feature different kinds of property data graph transformations which may occur, for instance, when triggering update operators, processing graph to graph queries or performing data graph integration processes. When such transformations are not well specified, they may lead to inconsistent graph databases. To avoid such inconveniences, it is well known that verification techniques like model-checking or automated theorem proving methods can be used successfully to help mastering the correctness of the considered transformations. We will follow these lines of research in this project and develop new verification techniques the aim of which is to ensure the correctness of property graph data transformations. |
URL : https://perso.liris.cnrs.fr/angela.bonifati/ |
Ville : Lyon/Grenoble |
|
|
|
Description
|
|
KEYWORD: Program Verification, Property Graph Data Integration
CONTEXT: Graphs are a flexible and agile data model for representing complex network-structured data used in a wide range of application domains, including social networks, biological networks, bioinformatics, medical data, quantum calculi and knowledge management [3]. One tangible example of application of property graphs is the covidgraph.org intiative empowered by Neo4j as part of the Graphs4Good initiative. Such a property graph contains information about the spread of human coronaviruses, as well as related publications and patents, medical treatments and clinical trials on the disease. As such, this graph is constantly evolving as new information is injected into it on a daily basis. In the context of enterprise data management, many current graph database systems (e.g., Amazon Neptune, Neo4j, TigerGraph) support property graphs. A property graph is a multigraph where nodes and edges can have labels and properties (i.e., key/value pairs). The growing popularity of property graph data management is further witnessed by the fact that its development was picked recently by the main international standards body, namely ISO (International Organization for Standardization).
Graphs constantly need to be transformed in order to be updated with new information and to be transferred between applications [3]. Our state of understanding of such transformations is very preliminary. Therefore, we completely lack a framework for checking correctness of such transformations, such as adherence to the typing information or being compatible with the requirements of an application that uses the output of a graph query. In addition, graphs provide a very flexible data model that makes it conducive to integrating data from multiple disparate sources. The experience of relational data integration taught us the importance of several critical verification tasks such as building target instances that guarantee correct query answers or correct manipulation of mappings between schemas
Short Description of the Project: In this project, we consider the investigation of proof methods dedicated to the verification of graph data integration as being investigated in the LIRIS Lab. (Lyon) by A. Bonifati [1] and using graph transformation verification methods currently developed in the LIG Lab. (Grenoble) by R. Echahed [2]. The targeted framework will feature different kinds of property data graph transformations which may occur, for instance, when triggering update operators, processing graph to graph queries or performing data graph integration processes. When such transformations are not well specified, they may lead to inconsistent graph databases. To avoid such inconveniences, it is well known that verification techniques like model-checking or automated theorem proving methods can be used successfully to help mastering the correctness of the considered transformations. We will follow these lines of research in this project and develop new verification techniques the aim of which is to ensure the correctness of property graph data transformations. The successful candidate should have good programming skills. Basic knowledge of First-Order Logic would be a plus. The project can be easily adapted to the skills and motivations of the candidate (from theory to practice) and can be extended to a PhD thesis. The candidate will work in close collaboration with researchers at LIRIS lab (Lyon) and LIG lab (Grenoble), in addition to ENS/ULM (Paris) in the context of a larger grant supported by the French ANR agency. [1] A. Bonifati, I. Ileana: Graph Data Integration and Exchange. Encyclopedia of Big Data Technologies 2019 [2] J. Brenas, R. Echahed, M. Strecker: Reasoning Formally About Database Queries and Updates. FM 2019: 556-572 [3] S. Sakr, A. Bonifati et al. The future is big graphs: a community view on graph processing systems. Commun. ACM 64(9): 62-71 (2021)
URL sujet detaillé : Â https://lig-membres.imag.fr/echahed/m2rBonifatiEchahed.pdf
Remarques : Co-adviser: R. ECHAHED (CNRS, LIG, Grenoble). Payment: Yes
Possibility to extend the project to a PhD thesis.
|
|
|
|
|
SM207-98 Understanding games and linear programming through tropical geometry
|
|
Description
|
|
Oriented matroids are a powerful framework for studying questions from optimization and real algebraic geometry. Parity games or more generally mean payoff games form well-studied classes of games with applications in machine verification and with an intriguing complexity status at the intersection of NP and co-NP. The run of the simplex method for linear programming as well as the run of policy iteration for mean payoff games can be described by sign patterns forming an oriented matroid. The project will focus on contrasting the sign patterns arising for parity games, mean payoff games and for linear programming. A geometric framework for analyzing these games and oriented matroids is provided by signed tropical convexity. In the project, we will use insights from tropical geometry in order to improve our understanding of mean payoff games.
Depending on the interests of the student, the project may include the following subjects: - Tropical vs. classical realizability of oriented matroids - Policy iteration algorithms for parity games and their tropical interpretation - Combinatorics and algorithms for signed tropical polyhedra
URL sujet detaillé : https://homepages.laas.fr/mskomra/files/internship.pdf
Remarques : The internship will be co-supervised by Georg Loho (University of Twente, https://lohomath.github.io/). This is a paid internship, financed by a FMJH Program PGMO.
|
|
|
|
|
SM207-99 Causality
|
|
Description
|
|
We propose four subjects as M2 internship to work on several areas of causality, potentially leading to a PhD thesis.
1/ Dealing with hidden confounders through partial information
Both causal discovery and inference from observational data are significantly harder in the presence of hidden confounders. This said, additional assumptions, as the fact that the model is additive and that every latent variable belongs to at least one latent atomic cover ([1]), can lead to procedures that allow to identify all latent vairables (and thus all hidden confounders) and the complete causal graph. These assumptions are however not realistic in all situations and we want to explore here realistic assumptions that can be made on the hidden confounders themselves in order to infer from data the causal graph between observed variables.
[1] Biwei Huang, Charles Low, Feng Xie, Clark Glymour, Kun Zhang. Latent Hierarchical Causal Structure Discovery with Rank Constraints. NeurIPS 2022
Contacts: Charles Assaad (cassaad.com), Emilie Devijver (emilie.devijver-grenoble-alpes.fr), Eric Gaussier (eric.gaussier-grenoble-alpes.fr)
2/ Causal discovery and reasoning with partial graphs (PAGs, extended summary graphs, ...) in the presence of hidden confounders
Partial graphs are graphs which do not encode all relations between all variables. They are interesting inasmuch as it is in general easier and faster to infer them from observations in the absence of common confounders. However, inferring them in the presence of hidden confounders may be more complex. Furthermore, identifying do-expressions in such graphs in the presence of hidden confounders raises challenges, in particular in terms of complexity ([1,2]). The goal of this internship will be to develop effective and efficient methods to infer and reason on such graphs in the presence of hidden confounders.
[1] C. Assaad, E. Devijver, E. Gaussier. Discovery of extended summary graphs in time series. UAI 2022. [2] A. Meynaoui, C. Assaad, E. Devijver, E. Gaussier, G. Goessler. Identifiability in time series extended summary graphs. Submitted to AISTATS 2023.
Contacts: Charles Assaad (cassaad.com), Emilie Devijver (emilie.devijver-grenoble-alpes.fr), Eric Gaussier (eric.gaussier-grenoble-alpes.fr)
3/ Counterfactual reasoning in time series from interventional data
Previous work on causal discovery and reasoning with time series has mainly focused on discovery methods for various graphs (window causal graph, extended summary causal graph, ...) and under various assumptions (stationarity, causal sufficiency, ...) as well as on the identifiability of do-expressions. We want here to go further in the causal analysis of time series by developing methods for answering counterfactual questions as: knowing that there have been an intervention at time t-k leading to particular values at time t, what would have been the observed values at time t if another intervention (or no intervetion at all) had occcured at time t-k? One challenge here will be to address this problem with different causal graphs([1,2]).
[1] C. Assaad, E. Devijver, E. Gaussier. Discovery of extended summary graphs in time series. UAI 2022. [2] A. Meynaoui, C. Assaad, E. Devijver, E. Gaussier, G. Goessler. Identifiability in time series extended summary graphs. Submitted to AISTATS 2023.
Contacts: Charles Assaad (cassaad.com), Emilie Devijver (emilie.devijver-grenoble-alpes.fr), Eric Gaussier (eric.gaussier-grenoble-alpes.fr)
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-100 Inference of causal models from single observations
|
|
Description
|
|
A timely automated identification and root cause analysis (RCA) of the origins of performance issues allows executing the most adequate corrective actions and preventing their further propagation and global service degradation. In general, RCA is a hard problem in complex systems, because it requires a deep knowledge of cause-effect dependencies among many features, physical and logical components the network nodes. In the data driven approach, where most of this knowledge is assumed to be unavailable a priori, a major difficulty can emanate from the fact that many of the variables are hidden or unknown. Furthermore, even in a fully observable system we are faced with the combinatorial explosion of cause -and-effect dependencies and the difficulty to collect enough information for distinguishing causal dependencies from spurious correlations.
The objective of this project is to explore techniques to infer a causal model that represents the dependencies between components (or nodes) of the network, given a set of event logs. The vector of event logs can be seen as a single data point, hence in absence of prior knowledge - about, e.g., distributions of events -, well-known statistical inference approaches are not applicable. We will in particular explore the use of non reversibility to infer direction of causation. The rationale is that the complexity of the "true" causal process is expected to be in a lower class than the complexity of reconstructing a cause by only knowing its effect. This principle has been studied in the literature, see e.g. [1]. However, in order to be able to effectively apply complexity-based causal discovery, the first goal of the project is to explore a similar approach in a computable setting. Our second goal is to study the application to the construction of causal explanations for network failures.
Contact: Armen Aghasaryan , Gregor Goessler
[1] D. Janzing and B. Sch=B6lkopf. Causal inference using the algorithmic Markov condition. IEEE Trans. Inf. Theory 56(10), 2010.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-101 Software instrumentation for accountable computer systems
|
|
Description
|
|
Motivation/Context
Quoting a recent paper on the principle traceabilty in computer systems [1], "Accountability is widely understood as a goal for well governed computer systems, and is a sought-after value in many governance contexts. But how can it be achieved?" In particular, how can it be achieved in a context of widely distributed systems and services, under different administration authorities?
A first requirement for accountability is the monitoring and logging of relevant causal dependencies, a prerequisite for any performance, security and fault management in computer systems. Recent surveys on software logging [2,3] bear witness to the importance and relevance of the topic today.
Goal
The first objective of the project will be to investigate the different techniques used to instrument, capture and analyze causal dependencies in distributed computer systems executions in different contexts and domain of use, focusing in particular on whole system monitoring approaches such as in Pivot tracing [4] for distributed causal monitoring, Canopy [5] for performance tracing, OmegaLog [6] and Alastor [7] for tracing security intrusions. From this analysis of the literature, the second objective of the project will be to highlight key software system instrumentation requirements and techniques needed for causal logging and analysis in distributed systems and services.
Contact: Jean-Bernard Stefani and Gregor Goessler
References
[1] J.A. Kroll. Outlining Traceabilty: A Principle for Operationalizing Accountability in Computing Systems. In FAccT aâ=82=ACâ=84=A221: ACM Conference on Fairness, Accountability, and Transparency, March 2021.
[2] Boyuan Chen and Zhen Ming Jian. A Survey of Software Log Instrumentation. ACM Computing Surveys Vol. 54, No. 4, April 2021.
[3] Sina Gholamian and Paul A.S. Ward. A Comprehensive Survey of Logging in Software: From Logging Statements Automation to Log Mining and Analysis. arXiv:2110.12489v2, January 2022.
[4] Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In ACM SOSP `15, October 2015.
[5] Jonathan Kaldor, Jonathan Mace, Micha...â=80=9A Bejda et al. Canopy: An End-to-End Performance Tracing And Analysis System. In ACM SOSP `17, October 2017.
[6] Wajih Ul Hassan, Mohammad A. Noureddine, Pubali Datta, Adam Bates. OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-layer Log Analysis. In NDSS 2020, February 2020.
[7] Pubali Datta, Isaac Polinsky, Muhammad Adil Inam, Adam Bates, and William Enck Alastor: Reconstructing the Provenance of Serverless Intrusions. in 31st Usenix Security Symposium, August 2022.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-102 Lissage de maillages surfaciques via parameterization globale
|
|
Description
|
|
Dans l'équipe PIXEL, nous nous intéressons a la génération de maillages fortement structurés (surfacique quadrangulaire ou volumique hexahedrique). Pour ce faire, nous avons observé que ces maillages ressemblent fortement a des grilles régulières que l'on aurait déformé pour les adapter a la géométrie. Nous avons donc essayé de définir ce type de maillage par cette déformation ou, plus exactement, par son inverse: celle qui déforme l'objet de sorte a ce que le maillage quadragulaire (respectivement hexahedrique) se retrouve sur la grille unité. Concrètement, on cherche des atlas car les déformations doivent accepter des discontinuités pour représenter des maillages plus complexes (avec des sommets liants non pas 4, mais 3 ou 5 quandrangules -- en 2D).
Nous souhaitons produire des maillages quandrangulaires a partir de maillages triangulés. L'intéret pratique de ces travaux est la nature des maillages quadrangulaires qui est plus compacte et structurée que celle des surfaces triangulées, ce qui offre des avantages considérables pour certaines applications (simulation numériques, surfaces de subdivision, etc.). Dans ce stage, nous partirons d'un premier maillage quadrangulaire, et d'un atlas de la surface initiale. Un atlas est une collection de cartes qui associe a chaque point de la surface une carte et une position 2D dans cette carte.
L'objectif est d'optimiser la forme des quadrangles en dépla=A7ant les sommets du maillage dans les cartes de l'atlas. L'intéret et l'originalité de cette approche résident dans l'utilisation de l'atlas pour assurer que les sommets restent toujours exactement sur la surface. La principale difficulté sera de gérer correctement les changements de cartes, en exploitant une spécificité de nos atlas: la préservation de la grille unité par les fonctions de transitions.
URL sujet detaillé : https://members.loria.fr/DSokolov/wp-content/blogs.dir/123/files/sites/123/2022/11/main.pdf
Remarques :
|
|
|
|
|
SM207-103 Utilisation de l'apprentissage par renforcement pour améliorer les missions d'une flotte de drones en prenant en compte les communication radio
|
|
Description
|
|
The objective of this internship is twofold: (i) on one hand, providing a survey on solutions for controlling a fleet of drones based on multi-agent deep reinforcement learning (RL) techniques. We will then have to choose and adapt a multi-agent RL algorithm allowing the simple consideration of the quality of communications in the reward function; (ii) designing the simulation environment allowing to test and validate the chosen and adapted multi-agent RL algorithm. The simulation environment will be built from the ns-3 network simulator which models with a good accuracy the radio communications and which has the possibility to interface with classical deep RL frameworks (ns3-ai , ns-gym ).
See http://perso.ens-lyon.fr/isabelle.guerin-lassous/stage-DRONAR-2022.pdf
URL sujet detaillé : http://perso.ens-lyon.fr/isabelle.guerin-lassous/stage-DRONAR-2022.pdf
Remarques : Co-encadrement avec Laetitia Matignon (LIRIS)
|
|
|
|
|
SM207-104 Collaborative count-min-sketch for building large scale distributed systems resilient to Byzantine attacks
|
|
Description
|
|
Peer sampling is a first-class abstraction for building large-scale distributed systems. It is used in particular for the management of overlays and for information dissemination. In general, nodes have partial knowledge (also called their view) of the global and dynamic composition of the system. The goal of the peer sampling service is to build and continuously refresh this local view so that it corresponds as closely as possible to a uniform sample of the nodes that make up the system. The implementation of the peer sampling service is generally based on gossip protocols that implement periodic information exchanges between peers. A plethora of protocols have been published and studied addressing issues of node failures, churns, performance, ergodicity, and desirable structural properties such as balanced in-degree, small diameter, and the ability to quickly remove failed nodes from view of active nodes.
The resilience of peer sampling protocols to byzantine faults (i.e., malicious nodes) is crucial to the security of applications that depend on them. Indeed, malicious nodes that manage to be over-represented in the views of honest nodes can take control of higher layer protocols.For example, Bitcoin's peer-sampling protocol was found to be exposed to eclipse attacks, opening the door to multiple types of attacks such as selfish mining or consensus-level double-spending. In the state of the art of Byzantine attack-resistant peer-sampling protocols, views of honest nodes can quickly be poisoned by Byzantine identifiers as the proportion of malicious nodes in the system increases. With BRAHMS, the most resilient protocol to Byzantine attacks, honest node views can contain up to 81% Byzantine identifiers when only 18% of the nodes in the system are malicious.
The objective of this internship is to propose solutions to improve the resilience of peer sampling protocols against Byzantine attacks. To do so, we propose to build on a recently published work, RAPTEE, which is based on the existence of a few trusted nodes that (1) can recognize each other and (2) will never deviate from the peer sampling protocol. In this work, trusted nodes slow down the dissemination of identifiers transmitted by untrusted nodes while they speed up the dissemination of identifiers transmitted by other trusted nodes.
In order to improve this work, we would like to explore the use of count-min-sktech, a probabilistic data structure allowing a node to estimate the frequency of appearance of an element in a data stream. The idea is to allow nodes to limit the probability of adding or keeping over-represented identifiers in their view. A first approach has already partially explored this track by considering that each honest node can locally use such a structure. We would like to combine this latter approach with the approach described in RAPTEE to allow trusted nodes to collaborate via their count-min-sketch and de-pollute the partial knowledge they possess together on the system composition. In this way, these nodes could act as the least biased source of information possible for the honest nodes.
URL sujet detaillé : https://www.reveillere.fr/jobs/2022/dist_count_min_sketch.pdf
Remarques : Co-encadrement avec Pr. Laurent Réveillère et Pr. David Bromberg et Pr. Fran=A7ois Ta=AFani
|
|
|
|
|
SM207-105 Exploring Multiple Facets of Decentralized Learning
|
|
Description
|
|
There is a strong momentum towards data-driven services at all layers of society and industry. This started from large scale web-based applications such as Web search engines (e.g., Google, Bing), social networks (e.g., Facebook, TikTok, Twitter, Instagram) and recommender systems (e.g., Amazon, Netflix) and is becoming increasingly pervasive thanks to the adoption of handheld devices and the advent of the Internet of Things. Recent initiatives such as Web 3.0 are coming with the promise of decentralizing such services for empowering users with the ability to gain back control over their personal data, and prevent a few economic actors from over concentrating decision power. However decentralizing online services calls for decentralizing the machine learning algorithms on which they heavily rely. This project aims at exploring various facets of decentralized machine learning including Privacy, Robustness, Fairness and Performance. A master student shall choose one of the facets to explore. The master projet can be extended as a PhD project.
URL sujet detaillé : https://docs.google.com/document/d/1kUDDmSUKUlxcxk22wfee-muc3qCexkxV7m8A_Lmp0Qk/edit?usp=sharing
Remarques : Rémunération : environ 600â=82=AC/mois
|
|
|
|
|
SM207-106 Tor network measurement and analysis
|
|
Description
|
|
The current importance given to the security of communications in general is the result of a collective awareness at all levels, from state actors to individuals: the protection of privacy requires the protection of communications. Protecting oneself from an observer capable of determining the sender and the recipient of these exchanges, as well as their content and communication methods, has become a crucial objective for an ever-increasing number of scientists, journalists and citizens.
Some tools try to offer real anonymous communications, notably tools based on onion routing, of which the best-known example is the Tor network. This network, operated by volunteers around the globe, is among the most widely used tools bringing serious, although incomplete, anonymity to millions of people, 2,500,000 customers a day.
In such a system, messages are encapsulated in successive layers of encryption, analogous to the layers of an onion. The encrypted data is passed through a series of network nodes called relays, each of which "peels" a single layer of encryption, discovering the next destination of the data. When the last layer is decrypted, the message arrives at its destination. The sender remains anonymous because each intermediary knows only the nodes immediately before and after it.
The goal of this internship is to measure and analyze the performances and failures of Tor. This internship will address, among others, the following questions: What is Tor's performance in terms of throughput and latency; how, when, how often, and why do some Tor circuits fail? How does the Tor network react to circuit failures? How can we measure the impact of circuit failures on the Tor network?
URL sujet detaillé : https://www.reveillere.fr/jobs/2022/metro_tor.pdf
Remarques : Co-encadrement avec Dr. Stephane Delbruel, Pr. Laurent Réveillère, Pr. David Bromberg.
|
|
|
|
|
SM207-107 Approximate index for weighted strings
|
|
Description
|
|
Biological sequences are often represented as strings on the alphabet A, G, T, C. However, in certain applications we do not know the letter in a particular position exactly, as we can only say that with probability e.g. 1/2 it equals A, with probability 1/3 it equals G, etc.
The way to model this uncertainty is with weighted strings. A weighted string can be viewed as a matrix, the rows of which correspond to letters of the alphabet and the columns to positions, and a cell (X,j) contains the probability to have a letter X in a position j. We can define the probability for a regular string P to match at a position i of a weighted string T in a natural way via the product of corresponding probabilities.
The goal of this project is to design and implement a data structure for a collection of weighted strings to support the following queries efficiently: given a number z and a regular string P, find all positions where P matches with probability at least 1/z. This data structure, called an index, will be used to place genomic reads on a phylogenetic tree. Phylogenetic trees are acyclic graphs representing the lines of evolutionary descent of different species, organisms, or genes from a common ancestor, and are a fundamental tool for organizing our knowledge of biological diversity.
URL sujet detaillé :
:
Remarques : The internship will be co-advised by Krister Swenson from the university of Montpellier: https://www.lirmm.fr/~swenson/. The implementation part is important, but there are many interesting theoretical questions that are related to the research subject.
|
|
|
|
|
SM207-108 Efficient Planning and Learning for Resource Sharing
|
|
Description
|
|
Markov Decision Process (MDPs) and Reinforcement Learning (RL) have been encountered sucess in various applications. However, they often require large volumes of data and computing power to arrive at satisfactory results.
To define the next generation of more "democratic" and widely applicable algorithms, such methods still need to deal with very demanding exploration issues as soon as the state/action spaces are not small. One way around this is to use underlying knowledge and structure present in many MDPs. This is especially true for problems related to scheduling and resources sharing in among others server farms, clouds, and cellular wireless networks.
The research will revolve around this theme of improving the efficiency of learning algorithms by leveraging the structure of the underlying problem. Both model-based and model-free frameworks will be studied.
URL sujet detaillé : https://homepages.laas.fr/bala/Master-PhD-Toulouse-SOLACE-2022.pdf
Remarques : Plusieurs sujets dans ce thème sont proposes. Ils sont en co-encadrement avec Maaike Verloop (CR CNRS, IRIT), Urtzi Ayesta (DR CNRS, IRIT) et Balakrishna Prabhu (CR CNRS, LAAS)
Rénumération possible.
|
|
|
|
|
SM207-109 Stage en développement - orchestration et reconfiguration des réseaux
|
|
Description
|
|
Your role Orange Innovation contributes to the software transformation of networks which aims for greater flexibility, integration of cloud technologies and open source components. In this context, Orange Innovation is developing automation and programming chains for the deployment and reconfiguration of these networks. The purpose of this internship will be to implement an experimental component of network function deployment manager in a cloud environment. This manager should allow the user (consumer) to dynamically reconfigure the topology of a network function. It will be based on a substitution mechanism specific to the OASIS TOSCA language which makes it possible to reduce the complexity of the templates by substituting nodes with templates of complete topologies. You will perform the following steps: - Discovery of the OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) language. - Implementation of a "Flavour" management function by the TOSCA substitution mechanism. - Application to the case of deployment in a Kubernetes virtualization infrastructure Integration in the TOSCA Cloudnet Toolbox open source tool chain (https://github.com/Orange-OpenSource/Cloudnet-TOSCA-toolbox).
About you
Domain-Specific Languages (DSL). Python development, Docker (Kubernetes is a plus). Continuous integration tools (Git, CI/CD...).
You are curious and like working in a team.
Additional information
You will be integrated into a research team that has been leading a partnership with Inria for several years and which regularly welcomes and supports doctoral students and postdoctoral research engineers. You will work in a functional area at the heart of the telecom operator's business. You will practice the tools at the heart of the software transformation of networks.
Department
Orange Innovation brings together the research and innovation activities and expertise of the Group's entities and countries. We work every day to ensure that Orange is recognized as an innovative operator by its customers, and we create value for the Group and the Brand in each of our projects. With 740 researchers, thousands of marketers, developers, designers, and data analysts, it is the expertise of our 6,000 employees that fuels this ambition every day.
Orange Innovation anticipates technological breakthroughs and supports the Group's countries and entities in making the best technological choices to meet the needs of our consumer and business customers.
At Orange Innovation, you will be part of a research team on the cutting-edge of expertise in cloud and network hosting infrastructures and automated deployment of future network services (virtualization, 5G+, continuous integration). You will benefit from a research ecosystem in alongside anticipation engineers with opportunities to make experiments and implement the considered concepts with concrete use-cases.
URL sujet detaillé : https://orange.jobs/jobs/v3/offers/118589?lang=fr
Remarques :
|
|
|
|
|
SM207-110 A type safe quantum programming language for quantum control and indefinite causal orders
|
|
Description
|
|
The development of quantum programming languages that include quantum control primitives is one of the most important problems in quantum computing. During program execution, quantum control can allow quantum superpositions that depends on the quantum state and, therefore, provides all the quantum functionality that a programmer may want to access.
A fundamental example of quantum control is the quantum switch [1] which inputs two quantum evolutions U and V, and a control qubit, and consists in applying U followed by V or V followed by U depending on the state of the control qubit. In particular, when the qubit is in a superposition state, U and V are in an indefinite causal order [2]: in one branch of the superposition U is applied before V, but after V in the other branch.
Indefinite causal order is an important subclass of quantum control, which allows some speed up compared to standard classically controlled quantum computing [3,4,5].
The objective of this internship is to develop the syntax and semantics of a programming language with quantum control and to design a type system so that typable programs are precisely the programs that correspond to indefinite causal orders.
As an application, we will consider how well-typed quantum programs can be compiled into quantum circuits.
References: [1] Giulio Chiribella, Giacomo Mauro D'Ariano, Paolo Perinotti, and Benoit Valiron. Quantum computations without definite causal structure. Physical Review A, 88:2, 2013. https://arxiv.org/abs/0912.0195
[2] O. Oreshkov, F. Costa, Ä=8C. Brukner, Quantum correlations with no causal order. Nat. Commun. 3, 1092 (2012). https://arxiv.org/abs/1105.4464
[3] Mateus Ara=BAjo, Fabio Costa, and Ä=8Caslav Brukner. 2014. Computational advantage from quantum-controlled ordering of gates. Physical review letters 113, 25 (2014), 250402. https://doi.org/10.1103/PhysRevLett.113.250402 https://arxiv.org/abs/1401.8127
[4] Timoteo Colnaghi, Giacomo Mauro D'Ariano, Stefano Facchini, and Paolo Perinotti. 2012. Quantum computation with programmable connections between gates. Physics Letters A 376, 45 (2012), 2940-2943. https://doi.org/10.1016/j.physleta.2012.08.028 https://arxiv.org/abs/1109.5987
[5] Stefano Facchini and Simon Perdrix. 2015. Quantum circuits for the unitary permutation problem. In International Conference on Theory and Applications of Models of Computation. Springer, 324-331. https://doi.org/10.1007/978-3-319-17142-5_28 https://arxiv.org/abs/1405.5205
URL sujet detaillé : http://members.loria.fr/RPechoux/wp-content/blogs.dir/113/files/sites/113/2022/11/A-type-safe-quantum-programming-language-for-quantum-control-and-indefinite-causal-orders.pdf
Remarques : Coadvisor: Romain Péchoux
|
|
|
|
|
SM207-111 An innovative approach to collaborative time series classification
|
|
Description
|
|
1. CONTEXT AND OBJECTIVES Faced with the overabundance of temporal data (from remote sensing in our case) arriving almost continuously, but also with their complexity and that of the phenomena studied, the labelling phase of supervised learning can no longer be carried out by the experts, as it is too tedious and time-consuming. Moreover, there are many methods capable of analysing these data, each with its own biases and its own analysis strategy, but each capable of processing its own data in a privileged way, making it difficult to choose the "optimal" method and its parameters. To address these two issues, the HERELLES project (https://herelles-anr-project.cnrs.fr/) aims to design an original approach, never before proposed, consisting of a collaboration between supervised AND unsupervised learning algorithms through information transfer extracted from complementary temporal data. For example, clusters will be used to label data or as potential thematic classes, training data will be used to validate clusters or to generate constraints, information on clusters can be shared by all methods... This collaboration will be controlled by internal criteria (convergence of results, adequacy with a priori knowledge...) and by the expert who will intervene by evaluating intermediary results and by injecting new information (constraints between objects, object labelling...).
2.The SUBJECT : The project of the internship will consist in participating in the current work of the team on this collaboration. To do so, we will first rely on the two main concepts implemented in the SAMARAH collaborative clustering method. In the framework of Antoine Saget's thesis, elements of an answer and a global architecture are proposed. The intern will be involved in the development of this architecture mainly by participating in the implementation of the FoDoMuST platform (https://sdc.icube.unistra.fr/en/index.php/FODOMUST) and the validation of these proposals in the context of the analysis of remote sensing time series. But above all, in a second phase, by proposing extensions to the platform through original ideas and proposals.
URL sujet detaillé : https://seafile.unistra.fr/f/27d7c7d1b5714c508510/?dl=1
Remarques : Supervisors: Pierre Gan=A7arski (Prof.) - Baptiste Lafabregue (Assistant Prof.) Location: ICube - Strasbourg
Gratification: 600 â=82=AC/month
Duration: 4 to 6 months
Date: Spring 2023
To apply you must be very interested in temporal data mining and image analysis/processing. We will provide "training" for the missing aspect(s). A good knowledge of JAVA and Python will be a definite plus.
Send an email to pierre.gancarski.fr with a CV, a cover letter and your transcripts since your BAC. A 3rd and 4th year ranking would be a plus, allowing us to complete the application.
Translated with www.DeepL.com/Translator (free version)
|
|
|
|
|
SM207-112 Développement de nouvelles fonctionnalités pour l'outil de gestion de production de la plateforme de séquen=A7age (NGL) du CNRGH
|
|
Description
|
|
The CNRGH (Evry-Courcouronnes) is a genomics department of the CEA, it is a platform dedicated to the exploration of biodiversity and the interpretation of the human genome. This institute coordinates and participates in many genomics projects and relies on NGS (new generation sequencing) to obtain the sequences that will be used in bioinformatics analyses. The projects on which this platform works involve several thousand samples for which it is necessary to have complete traceability from the receipt of the sample to the generation of the sequence files.
The development and production management team develops and maintains applications for the traceability and allocation of production activities. It is in charge of developing a LIMS (Laboratory Information Management System): NGL. NGL is a set of web applications that can interact with each other. Each application has its own business logic but is based on the same REST architecture.
As part of a 6-month internship (first semester 2023) we would like to offer the opportunity to participate in the evolution of the NGL application and its new developments.
We are looking for candidates interested in web development. The back end of the application is in Java, MySQL and MongoDB and the front end in Javascript and AngularJS.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-113 Open Automata meet Session Types
|
|
Description
|
|
In the previous years, we have studied theoretical foundations for open systems and we defined open automata that can be seen as labelled transition systems (LTSs) with parameters and holes. The transitions of open automata are much more complex than transitions of an LTS: they include guards expressing the relations between the parameters of the automaton with the actions of the holes, and assignments encoding their effects.
The main objective of the internship is to deal with the characterization of interactions of open automaton with its environment. These interactions can be characterised as behavioural types.
URL sujet detaillé : http://www.ens-lyon.fr/LIP/CASH/wp-content/uploads/2022/11/2021-pNetsubjectM2.pdf
Remarques : co encadrement : Ludovic Henrio and Rabea Ameur Boulifa
|
|
|
|
|
SM207-114 Screaming fast symmetric cryptography
|
|
|
|
SM207-115 Propriétés garanties par la réduction de graphes et impact sur l'anonymisation de graphes
|
|
Description
|
|
The objective of this internship is to study property preservation in graph compression. This internship will be carried out within the ANR project COREGRAPHIE (https://coregraphie.projet.liris.cnrs.fr/) and will be co-supervised by partners of the project: Sergey KIRGIZOV (LIB, Dijon), Hamida SEBA (LIRIS, Villeurbanne) and Olivier TOGNI (LIB, Dijon) During the internship, the student will be hosted in Dijon or Lyon according to his/her preference. For a detailed description see https://coregraphie.projet.liris.cnrs.fr/data/uploads/sujet_stage_m2_2023.pdf
URL sujet detaillé : https://coregraphie.projet.liris.cnrs.fr/data/uploads/sujet_stage_m2_2023.pdf
Remarques : co-encadrement : Sergey KIRGIZOV (LIB, Dijon), Hamida SEBA (LIRIS, Villeurbanne) and Olivier TOGNI (LIB, Dijon). Gratification de stage standard.
|
|
|
|
|
SM207-116 Design and Evaluation of Strategies for Automated Proofs using Reasoning Modulo Equivalence
|
|
Description
|
|
The goal of the proposed research is to provide a decision procedure for the Coq proof assistant that would extend equality-based reasoning (i.e. the congruence tactic) to heterogeneous problems where equalities are expressed using multiple equivalence relations (i.e. setoids).
Previous internships have built a prototype tactic "setoid_congruence" and showed that the full problem is undecidable. Proposed work for the intern may be: develop heuristics for the full problem, develop a decision procedure for a decidable fragment, study structural rules for constructors or logical connectives, etc. Implementation work improving the current prototype is possible if desired.
See the internship web page for more details.
URL sujet detaillé : https://www.verimag.fr/Master-Design-and-Evaluation-of.html?lang=en
Remarques : Co-advising with Karine Altisen and Pierre Corbineau
|
|
|
|
|
SM207-117 Enriching Kernel-Based testing for single cell transcriptomics
|
|
Description
|
|
Single-Cell transcriptomics now allows the quantification of gene expression at the scale of individual cells, encoded in count matrices countaining thousands observations (cells) and tens of thousands features (gene expression values). The analysis of such data requires new methodological frameworks, dedicated to their complexity and size. A major challenge consists in comparing the distribution of gene expression between conditions (ex: control vs treatment). In a recent work we developed a non-parametric test based on supervised kernel-based classification. Our procedure belongs to the family of Maximum Mean Discrepancy tests (MMD), that rely on a distance between the expectation of distributions embeddings in a Reproducing Kernel Hilbert Space (RKHS). This strategy can be enriched by considering the dependency structure of the data in this RKHS, which appears central in the field of single cell transcriptomics, to better account for biological variability. This test is then restated in a test based on a kernel Fisher Discriminant Analysis (kFDA). The application of our procedure to experimental data is very promising, and raises new research challenges in machine learning. One main challenge is to quantify the importance of features (genes) that explain the discrimination between populations. This is a very general quiestion for kernel-based methods (non linear), for which there is no consensus framework to assess features importance. A possible strategy would be to use permutations to quantify each feature's importance, which is computationnally greedy, but could be simple to implement and interpret for biologists. Another strategy could be to perform feature selection using penalized methods like the lasso.
Another very promising aspect will be to incorporate the spatial positioning of cells in the tissue, thanksto the so-called spatial-transcriptomics technology. This new challenge will consists in integrating the spatial component to the kernel-based test that compares distributions.
URL sujet detaillé :
:
Remarques : The candidate will be co-supervised by Franck Picard (CNRS, ENS Lyon) and Bertrand Michel (EC Nantes), experts in computational statistics and statistical learning. The candidate will work at the ENS de Lyon, in an interdisciplinary environment, between mathematics, computer science and biology. Moreover, the candidate will benefit from the SingleStatOmics ANR project that gathers an interdisciplinary consortium in machine learning / IA dedicated to single cell genomics, with experts in machine learning, optimal transport and statistics.
References :
- A kernel two-sample test; Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Sch=B6lkopf,
Alexander Smola, The Journal of Machine Learning Research Volume 133/1/2012 pp 723-773
- Testing for Homogeneity with Kernel Fisher Discriminant Analysis, NIPS 2007, Moulines Eric, Francis
Bach, Za=AFd Harchaoui
- Regev A, Teichmann SA, Lander ES, et al. The Human Cell Atlas. Elife. 2017; 6:e27041.
- Lahnemann, D., K=B6ster, J., Szczurek, E. et al. Eleven grand challenges in single-cell data science. Genome
Biol 21, 31 (2020).
|
|
|
|
|
SM207-118 Graph-Based non-linear dimension reduction for single cell transcriptomics
|
|
Description
|
|
Recent technological advances in massively parallel sequencing and high-throughput cell biology technologies now give us the ability to describe population of cells with high dimensional molecular features. The so-called single-cell transcriptomic technology allows us to study cell-to-cell variability within a biological sample and investigate new questions like intra-tissue heterogeneity. Like many contemporary scientific fields, single-cell genomics raises new mathematical and computational challenges that are inherent to the massive production of large, high-resolution datasets that are complex and high-dimensional. In particular, unsupervised analysis is mandatory for researchers to handle the complexity of modern data, and machine learning methods known as dimensionality reduction have become a standard to reduce the size and complexity of data. Embedding high dimensional data into spaces with fewer dimensions is a central problem of machine learning, with the core motivation to preserve the intrinsic structure of the original data by keeping similar data points close and dissimilar data points distant in the low-dimensional space. In the literature, methods have been proposed, linear (like PCA) and nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps). Among non-linear methods, tSNE and UMAP have been the most successful in proposing new representations that respect the local complex geometry of single-cell datasets. These techniques are now routinely incorporated in most analysis pipelines. They consist in embedding the original dataset in a 2D space by preserving the non linear dissimilarity between points thanks to a Kullback Leibler divergence between distances in the original and in the embedded space. In a recent work we proposed a unifying statistical and probabilistic framework that encompasses many non linear embedding methods, thanks to the coupling of random graphs that govern the proximities of observations. Our model provides a probabilistic interpretation of most used methods like tSNE and UMAP, by showing how they rely on specific prior hypothesis on the underlying connectivity structure. Our first results concern simple graph priors like bernoulli or fixed degree distribution, and the project is to benefit from our framework to generalize those methods to more complex topologies. Two research directions would be to consider very general priors, like scale-free networks or stochastic block model priors, that would have the advantage to introduce some clustering information in the model, which could constitute a powerful extension of our first framework to perform non linear dimension reduction and clustering at the same time.
The candidate will be co-supervised by Franck Picard (CNRS, ENS Lyon) Thibault Espinasse (Institut Camille Jordan, Lyon) and Julien Chiquet (INRA Saclay), experts in computational statistics and statistical learning. The candidate will work at the ENS de Lyon, in an interdisciplinary environment, between mathematics, computer science and biology. Moreover, the candidate will benefit from the SingleStatOmics ANR project that gathers an interdisciplinary consortium in machine learning / IA dedicated to single cell genomics, with experts in machine learning, optimal transport and statistics.
References : - A Probabilistic Graph Coupling View of Dimension Reduction, van Assel, Hugues and Espinasse, Thibault and Chiquet, Julien and Picard, Franck, Neurips 2022 - L.J.P. van der Maaten and G.E. Hinton. Journal of Machine Learning Research 9(Nov):2579-2605, 2008 - L. McInnes and J. Healy and J. Melville, 2020, arxiv 1802.03426 - Regev A, Teichmann SA, Lander ES, et al. The Human Cell Atlas. Elife. 2017; 6:e27041. - Lahnemann, D., K=B6ster, J., Szczurek, E. et al. Eleven grand challenges in single-cell data science. Genome Biol 21, 31 (2020).
URL sujet detaillé :
:
Remarques : The candidate will be co-supervised by Franck Picard (CNRS, ENS Lyon) Thibault Espinasse (Institut Camille Jordan, Lyon) and Julien Chiquet (INRA Saclay), experts in computational statistics and statistical learning. The candidate will work at the ENS de Lyon, in an interdisciplinary environment, between mathematics, computer science and biology. Moreover, the candidate will benefit from the
SingleStatOmics ANR project that gathers an interdisciplinary consortium in machine learning / IA dedicated to single cell genomics, with experts in machine learning, optimal transport and
statistics.
References :
- A Probabilistic Graph Coupling View of Dimension Reduction, van Assel, Hugues and Espinasse, Thibault
and Chiquet, Julien and Picard, Franck, Neurips 2022
- L.J.P. van der Maaten and G.E. Hinton. Journal of Machine Learning Research 9(Nov):2579-2605, 2008
- L. McInnes and J. Healy and J. Melville, 2020, arxiv 1802.03426
- Regev A, Teichmann SA, Lander ES, et al. The Human Cell Atlas. Elife. 2017; 6:e27041.
- Lahnemann, D., K=B6ster, J., Szczurek, E. et al. Eleven grand challenges in single-cell data science. Genome
Biol 21, 31 (2020).
|
|
|
|
|
SM207-119 Reasoning with hard and soft constraints to repair and query inconsistent data
|
|
Description
|
|
Context:
It is widely acknowledged that real-world data is plagued with quality issues, such as inconsistencies (due to false or outdated information) and incompleteness (missing information). A prominent approach to handling inconsistent data is to use declaratively specified knowledge in the form of logical constraints to identify errors and define a space of possible "clean" databases (called repairs). By reasoning over these repairs, it is possible to obtain meaningful query answers from contradictory data and to classify answers according to their confidence. This general approach was first explored for relational databases equipped with integrity constraints. More recently, it has been extended to ontology-enriched databases, where the domain knowledge provided by the ontology not only serves to identify errors but also to infer missing information (helping to tackle the incompleteness issue).
Topic:
In both the pure database and ontology settings, logical constraints are typically assumed to be hard (absolute), which means repairs must not contain any constraint violations. However, there exist natural constraints that are mostly true but nevertheless admit rare exceptions. For example, while we may assert (absolutely) that a person has precisely one birthplace, we cannot reasonably state that a person has only one residential address, even though this is true for most people. Clearly, such soft constraints can provide valuable guidance for handling inconsistent data (suggesting potential errors), but their integration into repair-based approaches remains little explored.
The aim of this internship is to investigate notions of repairs in the ontology setting that allow for both hard and soft constraints. After reviewing the state-of-the art and formally defining suitable repair notions, the main task will be to study the computational properties of such repairs. This will involve pinpointing the computational complexity of the main computational tasks: recognizing and generating repairs, and answering queries under repair-based inconsistency-tolerant semantics. It is likely that many of these problems will prove intractable, so it will be necessary to find ways to circumvent the high complexity, either by identifying relevant tractable subcases or devising pragmatic algorithms that can be expected to behave well in practice.
Profile:
Prior experience with knowledge representation and reasoning and/or database theory is desired, but we also welcome applications from motivated candidates with excellent academic records in theoretical computer science and a solid understanding of logic and computational complexity.
URL sujet detaillé : http://intended.labri.fr/documents/master-intended-soft-constraints.pdf
Remarques : Co-supervised by Camille Bourgaux (DI ENS, Paris).
The Master's internship is part of the INTENDED Chair on Artificial Intelligence, whose aim is to develop intelligent, knowledge-based methods for handling imperfect data.
The internship can lead to a funded PhD position within the INTENDED project.
|
|
|
|
|
SM207-120 Random Fourier Features for PAC-Bayesian Domain Adaptation
|
|
|
|
SM207-121 Exploration of temporal graphs
|
|
Description
|
|
https://drive.google.com/file/d/1ZMW0wG8KELHgU4mRdfZTQpF9nrS1trSD/view?usp=sharing
URL sujet detaillé :
:
Remarques : No funding is available from the Liverpool side
|
|
|
|
|
SM207-122 Extremal properties of the subchromatic number
|
|
Description
|
|
A k-subcolouring of a graph G is a variant of a k-proper colouring, where each colour class induces a disjoint union of cliques. The subchromatic number of G is the minimum k such that there exists a k-subcolouring of G. Not much is known about the extremal value of the subchromatic number as a function of natural graph parameters (number of vertices, maximum degree). A few bounds exist, but their proofs are either really simple, or a consequence of a result that is not specific to subcolourings. In general there is a large gap between lower and upper bounds, hence it may be possible to reduce that gap. This will be the main motivation for this internship.
URL sujet detaillé : https://www.lri.fr/~fpirot/stages/stage_subchromatic.pdf
Remarques :
|
|
|
|
|
SM207-123 Etude de la qualité d'un réseau du point de vue de l'accessibilité
|
|
Description
|
|
The subject of this internship is part of the ANR ESCAPE and RIN ESCAPE SG projects which aim to produce modeling and computer simulation tools to represent and explore different evacuation scenarios in the context of natural and technological risk management to design the most appropriate plans. In concrete terms, the idea is to have at one's disposal a tool to simulate a disaster (e.g. tsunami, explosion of a chemical plant) and to study its consequences on the urban environment. The analysis of these consequences allows to adapt the infrastructures and the evacuation plans foreseen in the framework of the risk management for the concerned cities. The projects are based on a multi-agent simulation and plan to develop libraries for tsunami, dike rupture and industrial accident disasters with data from the cities of Rouen and Hanoi in Vietnam.
In the context of this internship, we are only interested in the analysis of the structure of the graph representing the space in which the agents move (a node represents an intersection and an arc a section between two intersections of the network). The objective is to determine the parts that could be at the origin of a degradation of the accessibility in the network (i.e. vulnerable parts). Indeed, one of the critical points in the context of an evacuation is the capacity of the road network to be resistant to deterioration related to the impact of the hazard (e.g. congestion problems), in other words, its capacity to maintain accessibility for all points of the network.
In this context, a first Master's internship was carried out to establish a state of the art around operators allowing to distinguish the vulnerable parts of this graph. However, the application of the operators revealed that graphs based on real data needed to be "cleaned" in order for these operators to reveal usable results. This suggests that fundamental preliminary work on randomly generated graphs could greatly assist in characterizing the biases encountered in real graphs and in correcting them. This fundamental work is the object of this new internship in which we will also focus on the complexity of the developed algorithms.
URL sujet detaillé :
:
Remarques : Co-encadrement : Paul Dorbec (GREYC, CAEN) ; Pierrick Tranouez (LITIS, ROUEN)
Gratification : oui
Durée : 6 mois
Démarrage : a partir de janvier 2023
|
|
|
|
|
SM207-124 Understanding social motivations and success factors of crowdfunding via data analysis
|
|
Description
|
|
CentraleSupélec is hiring a M2 intern in computer science, in preparation of a PhD to be launched in October 2023. The candidate will join the research team of the French national research program (ANR) UMICROWD (Understanding, Modeling and Improving the outcome of Crowdfunding campaigns). UMICROWD gathers experts in economy, sociology, mathematics and data science for studying the dynamics of crowdfunding and promoting sustainable and socially responsible project funding. The candidate will be hired by CentraleSupélec / L2S laboratory (Paris Saclay campus) and will work closely with the ESCE business school (Paris La Défense) and Avignon Université (Avignon) researchers.
Context: Crowdfunding (CF) allows entrepreneurs who are willing to overcome classical funding channels such as venture capital and credit loans to address directly the crowd via Internet-based Crowdfunding Platforms (CFP). Crowdfunding started as a niche but gained increasingly in importance in the past ten years and its social and economic impacts can no longer be neglected.
Objective: The objective of this internship is to understand empirically the dynamics and success factors of crowdfunding campaigns, following a quantitative approach. The candidate will perform a quantitative analysis of a database extracted from a crowdfunding platform to understand the impact of the project parameters (goal, duration, category, promotion by the platform) on the outcome of the CF campaign (success and amount of collected funds). We will make use of the classical statistical tools such as correlation and multivariate analysis, for conducting an exploratory data analysis (EDA) of the corpus. We will also construct the underlying graph describing the interaction between funders and entrepreneurs and describe the impact of the internal social capital of the entrepreneur on the success, using graph analysis tools.
URL sujet detaillé :
:
Remarques : Co-supervision by researchers from ESCE business school and Avignon Université. A phD on the topic of the internship is planned to start in october 2023.
|
|
|
|
|
SM207-125 Comment transformer nos modes de production de fa=A7on a respecter les limites environnementales, a augmenter notre résilience et a satisfaire nos besoins vitaux ?
|
|
Description
|
|
L'équipe STEEP est une équipe de recherche INRIA transdisciplinaire qui articule une analyse de la situation socio-environnementale pour produire des modèles qui en rendent comptent et permettent de diffuser un savoir ancré dans les réalités matérielles, d'orienter des décisions, des transformations qui contribuent a inciter a des coopérations en vue d'une transformation radicale de nos modes de vie. https://steep.inria.fr/
Plusieurs sujets de stage et thèse sont possibles ; pour appréhender le champs des possibles, voir au bas de la page suivante : https://steep.inria.fr/rejoindre-lequipe/
Exemple de sujet : https://steep.inria.fr/rejoindreequipe/relocalisation-dactivites-productives-modelisation-multi-echelle-basee-sur-les-graphes-et-les-contraintes/
Nos sujets sont systématiquement adaptés en fonction des contraintes et aspirations des candidats.
URL sujet detaillé : https://steep.inria.fr/rejoindreequipe/relocalisation-dactivites-productives-modelisation-multi-echelle-basee-sur-les-graphes-et-les-contraintes/
Remarques :
|
|
|
|
|
SM207-126 TinyML (Machine Learning on IoT devices)
|
|
Description
|
|
1. Administrative Context
Mines Saint-Etienne (MSE), one of the graduate schools of Institut Mines-Telecom, the first group of graduate schools of engineering and management in France under the supervision of the Ministry of the Economy, Industry and Digital Technology, is assigned missions of education, research and innovation, transfer to industry and scientific, technological and industrial culture. MSE consists of 2,400 graduate and postgraduate students, 400 staff, a consolidated budget of EUR46M, three sites on the Saint-Etienne campus (Auvergne Rhone-Alpes region, Lyon Saint-Etienne metropolitan area), a campus in Gardanne (SUD region, Aix Marseille metropolitan area), a site in Lyon within the digital campus of Auvergne Rhone-Alpes Region, six research units, five teaching and research centres and one of the leading French science community centres (La Rotonde EUR1M budget and +40,000 visitors per year). The Times Higher Education World University Ranking ranked us for 2022 in the 251-300 range for Engineering and Technology. Our work environment is characterised by high Faculty-to-Student, Staff-to-Faculty and PhD-to-Faculty ratios, as well as comprehensive stateof-the-art experimental and computational facilities for research, teaching and transfer to industry. The Henri Fayol Institute, one of the school's 5 training and research centers, brings together professors in industrial engineering, applied mathematics, computer science, environment and management around the theme of overall business performance. The Henri Fayol Institute is strongly involved in flagship projects of the Industry of the Future and the City of the Future.
2. Scientific Context
In recent years, Artificial Intelligence, in particular Neural Networks (NN), has shown impressive results in many applications, often beating humans in many domains, from Games (AlphaGo. . . ) to Health Care (skin & eye cancer detection. . . ). However, training such models requires large amounts of computing power, thus of energy; sometimes more than a small city over a year (e.g. GPT-3). As energy is the main source of release of CO2 in the atmosphere, such technological progress unfortunately goes along with the destruction of our planet. This goes in the opposite direction of UN's Sustainable Development Goals, that we need to achieve quickly to ensure our survival as a whole society.
3. Topic: TinyML
The field of TinyML seeks to find ways of implementing Machine Learning (ML) models (particularly NN) on small devices, with limited CPU power, RAM capacity, Network bandwidth and Battery life. Techniques developed in this domain could provide elements for a global solution, thus allowing to continue producing positive social impacts with AI/ML/NN (better health care, optimized transportation...), without destroying our planet. This internship proposes to explore state of the art techniques for reducing both the size and the training time of a NN, using small devices to impose strict energy consumption constraints. Keywords: Artificial Intelligence, Neural Network, Deep Learning, IoT, TinyML, Quantization, Pruning, Distillation, Training, Gradient Descent, Back-Propagation.
4. Organization
The internship will take place at Espace Fauriel in Saint-Etienne, in the ISI department of Institut Fayol.
The internship will follow a 3 steps plan: 1. The student will start with trying to reproduce the toy (but realistic) application which consists in designing a glove/bracelet that can recognize the characters drawn in the air by a person [Fre21]. Through this example the student will learn about techniques like Quantization, Pruning and Distillation. These techniques allow reducing the size of a Big NN that was previously learned on a standard computer. This solves the problem of the energy consumption at inference time, but not at training time. 2. Then, the student will explore state of the art techniques for training a NN directly on a small device, based on researches like [Lin+22]. 3. Based on these experiments, the student will be able to explore more realistic scenarios adapted to Industry 4.0 (e.g. the "Augmented Technician") or Health Care (e.g. "Smart Orthosis"), where we need both inference and training to be executed on-device, in order to detect custom gestures that can change over time.
5. Job requirements
The student should have prior following skills: o Solid background in Machine Learning, in particular Deep Learning o Strong coding in Python skills o Minimal background in IoT/Arduino o Curiosity of anything technological/scientific & Motivation for Sustainable Development o Master 2 or last year engineering school
6. Application
To apply, please send your CV, cover letter, and any other useful information before January, 15 2023 to guillaume.muller.fr
References [Fre21] Zack Freedman. AI Data Glove: Somatic. 2021. url: https://www.youtube.com/watch?v=6raRftH9yxM [Lin+22] Ji Lin et al. "On-Device Training Under 256KB Memory". In: arXiv preprint arXiv:2206.15472 (2022). url: https://tinyml.mit.edu/
URL sujet detaillé : https://seafile.emse.fr/f/922d29e048a542cba197/
Remarques : co-encadré par Mihaela JUGANARU
|
|
|
|
|
SM207-127 Parameterized Synthesis via Data Word Automata, Logics and Games
|
|
Description
|
|
Parameterized systems arise in the practice of, for instance, distributed algorithms, telecommunication protocols, or robotics. They consist in programs which are made of an arbitrary number of processes which can communicate with each other. The distributed nature of parameterized systems makes their design hard to realize correctly, and computer-assisted methods are necessary to help their design.
The main objective of this project is to establish mathematical foundations for a theory of parameterized synthesis based on the data word abstraction. In particular, the goal is to identify classes of specifications with decidable parameterized synthesis problems, whether those specifications are defined using data word automata or data word logics, and to design synthesis algorithms able to automatically generate (models of) parameterized systems.
URL sujet detaillé : https://pageperso.lis-lab.fr/~pierre-alain.reynier/sujet-M2.pdf
Remarques : Co-encadrement avec Emmanuel Filiot (ULB, Bruxelles). Possibilité de faire le stage a Marseille ou a Bruxelles, avec visites entre les deux sites.
|
|
|
|
|
SM207-128 Differential privacy for federated learning in Cosmetical Science. Application to safety data.
|
|
Description
|
|
Context/objective All cosmetic products placed onto the market must undergo a risk assessment for human health to ensure they are safe for consumers, including an assessment of skin sensitization risk. In house, logistic regression or Bayesian network models have been proposed to predict the probability of an ingredient to belong to a given potency class. There is a desire to develop within the main players in the cosmetics world a common reference approach while increasing the performance of the model from the pooling of information. The sharing of data between the different companies, however, poses obvious confidentiality problems. To avoid them, we want to develop a federated learning / data privacy type approach. The objective of this internship is to build a first proof of concept that can then be used to convince the various actors of the interest of this collaborative approach for the construction of a reference model.
Refs : Peter Kairouz et al. Advances and Open Problems in Federated Learning. Foundations and Trends in Machine Learning, 2021. https://www.nowpublishers.com/article/Details/MAL-083 chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://arxiv.org/pdf/1912.04977.pdf Cynthia Dwork and Aaron Roth, The Algorithmic Foundations of Differential Privacy, Foundations and Trends in Theoretical Computer Science Vol. 9, Nos. 3-4 (2014) 211-407 https://www.cis.upenn.edu/~aaroth/privacybook.html Joanna Jaworska et al., Bayesian integrated testing strategy to assess skin sensitization potency: From theory to practice, Journal of Applied Toxicology (2013) 33(11) https://www.researchgate.net/publication/236740749_Bayesian_integrated_testing_strategy_to_assess_skin_sensitization_potency_From_theory_to_practice
URL sujet detaillé : https://www.researchgate.net/publication/365652875_Internship_L'Oreal
Remarques : Required profile M2 Data Science student or equivalent
Interest in life science applications and data privacy / federated learning tools
autonomy, scientific curiosity
Good level in R/Python
Good level in English
Internship for a period of 5-6 months at the L'Oréal research center in Aulnay-sous-bois starting in February or
March 2023.
Remuneration: around 1700 â=82=AC /month
Supervisors
Philippe Bastien (L'Oréal)
Co-supervisor
Aurélien Bellet (INRIA - http://researchers.lille.inria.fr/abellet/) aurelien.bellet.fr
Send CV and cover letter to:
Philippe Bastien (philippe.bastien.loreal.com)
|
|
|
|
|
SM207-129 Smart Data Replication
|
|
Description
|
|
Motivation: Serverless environments allow the execution of functions on demand. Most of the literature focuses on optimizing its scalability, but little attention is paid to the data replication problem. Data is distributed in chunks located in different nodes and replicated for fault tolerance. The student will explore the utilization of AI to design a scheduler that optimizes the number and location of data (chunk) replicas as a trade-off between task completion time and the data center energy consumption. The next section addresses two possible directions for the application of ML/DL.
Project Outline: The project builds on top of a prior publication [1]. To limit the scope, the student will focus on the following tasks: - Propose a multi-objective scheduler that determines the best number of replicas per data chunk, and their location. The objective is a trade-off between execution time and energy consumption. - This can be formulated as a traditional ILP problem and then explore the application of Reinforcement Learning algorithms (e.g. DQN, PPO, Bandits). - The work can be done iteratively: focus first on the number of replicas, then on their location - Develop or adapt forecasting capabilities of the future resource consumption and/or incoming number of requests - This can be formulated as a regression or time series prediction problem. Techniques range from traditional (SVN, linear regression) to the use of DNN such as LSTMs. - Assess the quality of the proposed multi-objective scheduler using both a perfect prediction (using an Oracle) and the forecasted resource consumption (with errors). The quality will be studied in terms of energy consumption and execution time. Based on the student background, simulation (using Simgrid), real deployment (on Grid5000) or both will be conducted.
Scientific Extension: As a first interest extension, the prior formulation can be extended to consider an unreliable environment where multiple nodes (FaaS) fail at different times. The work can be extended in a second step to take economic issues into consideration for the choice of the number of replicas, and their locations (different locations may have different prices to host data and execute FaaS).
References [1] Morgan Séguéla, Riad Mokadem, Jean-Marc Pierson. Dynamic Energy and Expenditure Aware Data Replication Strategy. IEEE International Conference on Cloud Computing Technical Program (CLOUD 2022), IEEE, Jul 2022, Barcelona, Spain. https://hal.archives-ouvertes.fr/hal-03696210
URL sujet detaillé :
:
Remarques : Stage rémunéré ; Possibilité de poursuite en thèse ;
Coendadrement par Tania Lorido Botran, Roblox Inc., tbotran.com
|
|
|
|
|
SM207-130 Stochastic control for optimizing crowdfunding projects dynamics
|
|
Description
|
|
Crowdfunding is popular among entrepreneurs who aim to bypass classical funding channels and directly address the crowd via Internet-based Crowdfunding platforms (CFP). There are essentially two sets of works when it comes to describing the fund collection process and the success factors. The first set of works focuses on the behavior of funders and its impact on the project's success. [1] identifies friend and family (F&F) as a source of early funding for projects that account for 15 to 20% of raised funds for equity-based campaigns and 30 to 40% for reward-based campaigns. Other potential funders tend to procrastinate, and [2] studies the choice of donors toward funding a project early or waiting until the end of the funding window. In [3] a game theoretic approach was used to model the behavior of investors who must choose one out of many projects to fund. The two main conclusions that can be drawn from these works and several other data based studies is that: 1. A large portion of the funds is actually invested by F&F (friends and family), which means that this injection of funds can be controlled (in terms of temporal allocation). 2. The public/crowd is more likely to invest in projects that have a dynamic/rising collection of funds rather than a static project. A project that is more likely to succeed also attracts funds. These two factors motivate us to look at the problem of optimal allocation of funds in time for a project to maximize its chance of success. The F&F budget is finite and given, and maybe injected at the right time to keep the project alive, while ensuring that the total amount of funds collected. This problem can be first modeled as a Stochastic Optimization problem and particular a Markov Decision Process [4], as it involves random parameters such as F&F investing actions. Such optimization problem can be solved using dynamic programming approach. Then, the work of the intern will be composed of the following steps: 1. Description of a stochastic model for F&F policies for improving the result of a crowdfunding campaign, 2. Study solutions of this model by different optimization techniques like Dynamic programming, 3. To simulate the system and illustrate the results on different scenarios. This work is a first step for a more ambitious research project for a PhD study. In fact, a PhD fellowship (Contrat Doctoral 1760/month) is planned for starting September 1, 2023 on the same subject.
URL sujet detaillé : https://sites.google.com/site/yezekaelhayelsite/
Remarques : Co-endrant : V. VARMA, vineeth.satheeskumar-varma-lorraine.fr
|
|
|
|
|
SM207-131 Parameterized complexity
|
|
Description
|
|
The parameterized complexity approach appears to be really efficient to handle single-objective parameterized problems. In this internship, we would like to extend the potential of parameterized complexity such that it can also handle multi-objective optimization problems.
In particular, we want a parameterized complexity approach in order to handle Vertex Cover on a weighted graph where we want to minimize both the size of the solution and the weight of the solution at the same time.
URL sujet detaillé : https://www.iut-info.univ-lille.fr/~julien.baste/research/Baste_Jourdan_Internship_2023.pdf
Remarques : - Le stage est co-encadré par Laetitia Jourdan. - Il y a un financement permettant de rémunérer le stage.
- Ce sujet est un travail préliminaire au vu d'une poursuite en thêse
- Je ne peux pas garantir l'existence d'un financement de thèse.
|
|
|
|
|
SM207-132 Scientific Machine Learning for Integrated Assessment Modeling
|
|
Description
|
|
Integrated assessment models aim to link together the main features of society and economy with the biosphere in order to provide a unified framework to qualitatively forecast the evolution of the socioeconomic system. Several such models, such as the World3 model by Meadows et al. and the DICE model by Nordhaus, have largely influenced the scientific debate. Modern software engineering and machine learning techniques have the potential to revolutionize this field, allowing the development of models that are far more reliable than those considered so far. In particular, data-driven approaches to dynamical systems have already shown promising results in some scientific applications; this internship project aims at adapting these new techniques and contributing to the state of the art of integrated assessment modeling. In more details, the goal of the internship is to contribute to WorldDynamics.jl, an open source Julia programming library which aims at allowing scientists to easily use and adapt different integrated assessment models, by leveraging the modern scientific-computing language Julia. In the first part of the internship, the student will contribute to the library by implementing an integrated assessment model which is not yet included in it. In doing so, the student will follow best software engineering practices, collaborating with the other team members through the git versioning system, and writing high-quality Julia code. In the second part of the internship, the student will familiarize themself with modern scientific machine learning techniques, e.g. SINDy (Sparse identification of nonlinear dynamics), and work together with the other team members in adapting the techniques to improve the accuracy of current integrated assessment models.
URL sujet detaillé : https://mycore.core-cloud.net/index.php/s/XhCNH3gcFSVfAdo
Remarques : Le projet sera co-encadré par Prof. Pierluigi Crescenzi (https://www.pilucrescenzi.it/) qui visitera le laboratoire d'avril a juillet 2023.
|
|
|
|
|
SM207-133 Universality, inclusion and separation in quantitative history-deterministic automata
|
|
Description
|
|
Universalité, inclusion et séparation dans les automates quantitatifs déterministes en histoire
Les automates déterministes en histoire sont un modèle intermédiaire entre le déterminisme et le non-déterminisme, qui profite d'une partie du pouvoir du non-déterminisme sans perdre toutes les propriétés algorithmiques du déterminisme. Ils sont particulièrement utiles pour résoudre des problèmes de synthèse, représentés comme des jeux. Les problèmes algorithmiques tels que l'inclusion, l'universalité en particulier ont tendance a etre plus faciles pour les automates déterministes en histoire que pour les automates non-déterministes. Dans ce projet, nous nous demandons dans quelle mesure c'est aussi le cas pour les automates quantitatifs déterministes en histoire.
L'étudiant étudiera la complexité des problèmes d'universalité, d'inclusion et de séparation sur des automates déterministes d'histoire quantitative, a la fois a la recherche d'algorithmes efficaces et de bornes inferieurs.
English version:
Universality, inclusion and separation in quantitative history-deterministic automata
History-deterministic automata are an intermediate model between determinism and nondeterminism, which enjoy some of power of nondeterminsm without losing all of the algorithmic properties of determinism. They are particularly useful for solving synthesis problems, represented as two-player games. Algorithmic problems such as inclusion, universality in particular tend to be easier for history-deterministic than nondeterministic automata. In this project, we ask to what extent this is also the case for history-deterministic quantitative automata.
The student will study the complexity of universality, inclusion and separation problems on quantitative history-deterministic automata, both looking for efficient algorithms and hardness results.
URL sujet detaillé : http://www.pageperso.lif.univ-mrs.fr/~karoliina.lehtinen/internships/quantitativeHD-UIS/
Remarques :
|
|
|
|
|
SM207-134 Expressivité et succincté des automates quantitatifs déterministes en histoire
|
|
Description
|
|
Expressivité et succincté des automates quantitatifs déterministes en histoire
Les automates déterministes en histoire sont un modèle intermédiaire entre le déterminisme et le non-déterminisme, qui profite d'une partie du pouvoir du non-déterminisme sans perdre toutes les propriétés algorithmiques du déterminisme. Ils sont particulièrement utiles pour résoudre des problèmes de synthèse, représentés comme des jeux. L'un de leurs attraits est qu'ils peuvent =AAtre plus expressifs et/ou plus succincts que les automates déterministes.
Dans ce projet, l'étudiant étudiera l'expressivité relative et la succincté d'automates déterministes en histoire quantitative, ainsi que le problème de decidabilité du determinisme en histoire.
English version:
Expressiveness and succinctness of quantitative history-deterministic automata
History-deterministic automata are an intermediate model between determinism and nondeterminism, which enjoy some of power of nondeterminsm without losing all of the algorithmic properties of determinism. They are particularly useful for solving synthesis problems, represented as two-player games. One of their appeals is that they can be more expressive and/or more succinct than deterministic automata.
In this project the student will study the relative expressivity and succinctness of quantitative history-deterministic automata, as well as the question of deciding whether an automaton is history-deterministic.
There will be games, algorithms and automata.
URL sujet detaillé : http://www.pageperso.lif.univ-mrs.fr/~karoliina.lehtinen/internships/quantitativeHD-ES/
Remarques :
|
|
|
|
|
SM207-135 Construction parallèle d'arbre binaire de partition pour la segmentation hiérarchique d'images
|
|
Description
|
|
The objective of this internship is the parallel implementation of the Binary Partition Tree (BPT) structure. The BPT is a hierarchical image representation that is very efficient for object detection and segmentation purposes, especially for remote sensing images. Its construction process is very similar to hierarchical clustering: starting from an initial partition of the image, the regions are iteratively merged according to a similarity criterion until only one region remains (the root of the tree, corresponding to the full spatial support of the image). Each merging step includes the search for the two closest neighboring regions, their merging, the update of all distances between this newly created region and all adjacent regions, and then sorting the merging priority queue. This sequential building procedure is potentially very time and memory intensive, especially for large images. The objective of the internship is therefore to study whether the parallel construction techniques that have been proposed for the hierarchical clustering algorithm can be adapted to the structure of the BPT, to implement a solution and to thoroughly benchmark it.
URL sujet detaillé : https://www.lrde.epita.fr/~gtochon/stage/sujet_stage_fin_detude_BPT_parallele.pdf
Remarques : Le stagiaire sera basé physiquement a Strasbourg, sous la supervision de Jimmy Randrianasoa (enseignant-chercheur, LRE, EPITA Strasbourg) et sera co-encadré par Edwin Carlinet et Guillaume Tochon (tous deux enseignants-chercheurs, LRE, EPITA Paris) Rémunération : 1000â=82=AC brut/mois
|
|
|
|
|
SM207-136 Injection strategies for cellular traffic offloading over D2D communications
|
|
Description
|
|
In a content distribution system relying on device-to-device communications, the centrality of a terminal is directly linked to its ability to distribute content to a greater number of recipients. Identifying the terminals to which content should be "pushed" is an important step in maximizing the traffic offload rate. This innovation program focuses on the identification of these terminals.
In this internship, the candidate will be in charge of designing, evaluating, and experimenting algorithms to select the most appropriate subset of devices to serve as seed nodes. The challenge here is to find an appropriate tradeoff between number of seeds and overhead on the cellular channel. The work will involve analysis of real-world data and hands-on testing of the proposed solutions.
(See details on the long version - link below)
URL sujet detaillé : https://hopcast.eu/offers/2023-sujet-stage-injection.pdf
Remarques : - The intern will also work under the advisory of Farid Benbadis, CEO of Hopcast (farid.fr). - The intern will receive a monthly allowance of â=82=AC1,000.
- Remote work is accepted.
|
|
|
|
|
SM207-137 Routing and Scheduling in LoRa satellite constellation
|
|
Description
|
|
The LIG lab has an ongoing mission aboard the cubesat STORK-1 in orbit since January 2022 with our LoRa (low-power and long-range technology dedicated to IoT (Internet of Things)) communication board. The internship aims to prepare future missions by studying and enhancing routing protocols in LEO (Low-Earth Orbit) satellite constellations to relay traffic from/to IoT devices located in areas where terrestrial connectivity is not existing. We will use a network simulator (FLoRaSat) to evaluate different enhancements to routing protocols by taking advantages of LoRa link-layer mechanisms in different constellation configurations (satellite number, orbits, ...).
- Cubesat STORK-1 : https://www.n2yo.com/satellite/?s=51087 - ThingSat : https://gricad-gitlab.univ-grenoble-alpes.fr/thingsat/public/-/blob/master/cubesat_mission/README.md - FLoRaSat : https://gitlab.inria.fr/jfraire/florasat
URL sujet detaillé : https://lig-membres.imag.fr/alphand/Stages/2022ThingSatMAC_M2R_PFE.pdf
Remarques :
|
|
|
|
|
SM207-138 Fully decentralized learning under fairness and heterogeneity constraints
|
|
Description
|
|
The goal of this internship is to study fairness in the context of Decentralized Machine Learning. The main objectives are (i) to review some of the existing literature in the field, (ii) to design new algorithms to learn fair models in a fully decentralized and heterogeneous context and (iii) to derive theoretical guarantees on the fairness and utility levels of the obtained models.
URL sujet detaillé : https://batistelb.github.io/files/Fair_DSGD.pdf
Remarques : Co-supervised with Batiste le Bars
|
|
|
|
|
SM207-139 Link prediction in protein interaction networks
|
|
|
|
SM207-140 The performance cost of network privacy
|
|
Description
|
|
Recent developments in Internet protocols and services aim to provide enhanced security and privacy for user traffic. Apple's iCloud Private Relay [1] is a premier example of this trend, introducing a well-provisioned, multi-hop architecture to protect the privacy of users' traffic while minimizing the traditional drawbacks of additional network hops (e.g., latency). Announced in 2021, the service is currently in the beta stage, offering an easy and cheap privacy-enhancing alternative directly integrated into Apple's operating systems and core applications. This seamless integration makes a future massive adoption of the technology very likely, calling for studies on its impact on the Internet. Indeed, the iCloud Private Relay architecture inherently introduces computational and routing overheads, possibly hampering performance. Within this context, the goal of this internship will be to explore the performance impact that different configurations of HTTP proxies and servers can have on users traffic. Through the use of an experimental testing environment, the student will investigate the options currently offered for operating web proxies and servers, with particular attention to recent proposals, among all the Multiplexed Application Substrate over QUIC Encryption (masque)[2], which optimizes the QUIC transport protocol when used to contact a proxy. If time allows, the student will formalize the trade offs between such configurations and the benefits that these have on user privacy.
References.
[1] Patrick Sattler, Juliane Aulbach, Johannes Zirngibl, and Georg Carle. Towards a tectonic traffic shift? Investigating Apple's new relay network. In Proceedings of the 22nd ACM Internet Measurement Conference (ACM IMC '22)
[2] Mirja Kuhlewind, Matias Carlander-Reuterfelt, Marcus Ihlar, and Magnus Westerlund. 2021. Evaluation of QUIC-based MASQUE proxying. In Proceedings of the 2021 Workshop on Evolution, Performance and Interoperability of QUIC (EPIQ '21)
URL sujet detaillé :
:
Remarques : Gratification selon la législation en vigueur
|
|
|
|
|
SM207-141 AutoML on network traffic
|
|
Description
|
|
The advent of automated machine learning pipelines (e.g., AutoGluon [1]) makes it possible and efficient to explore a wide range of Machine Learning models (and model parameters), and to find the highest model performance relatively quickly. However, these pipelines typically assume that models are trained and executed on the same data, which may not consider the inherent trade-offs that there exist in inference models from network traffic. For example, a model that achieves 90% performance accuracy but that uses 10% of traffic collection system resources, might be more attractive than one that achieves 92% accuracy but requires 50% resources at deployment time. In our previous work [2], we studied how to make it possible to define various network data inputs that tradeoff systems constraints and model accuracy to arrive at a model that is suitable for practical deployments. In particular, our work considered the processing and state (i.e., memory) costs of different network data representations. During the internship, the student will explore how to extend our platform to integrate with AutoML pipelines, so that we can more automatically determine what parameters and algorithms best meet operational constraints. In other words, the student will work on extending AutoML pipelines to not only consider the accuracy of models when making recommendations, but to additionally consider operational constraints and to make recommendations within this constraint space.
References.
[1] AutoGluon: AutoML Toolkit for Deep Learning. https://auto.gluon.ai/
[2] Traffic Refinery: Cost-Aware Data Representation for Machine Learning on Network Traffic. F. Bronzino, P. Schmitt, S.Ayoubi, H. Kim, R. Teixeira, and N. Feamster. In ACM Sigmetrics 2022, Mumbai, India.
URL sujet detaillé :
:
Remarques : Gratification selon la législation en vigueur
|
|
|
|
|
SM207-142 Developing a web front-end to demonstrate the use of genetics algorithms to generate soft-robots.
|
|
Description
|
|
The DEFROST team works on deformable robots. A deformable robot is composed of deformable structures, which behaves by deforming. Their design is often inspired by the mechanical properties of living organisms [Kim et al. 2013]. These deformable robots have the advantage of being inexpensive to manufacture, robust, and less dangerous in the context of interaction with humans. This new branch of robotics opens many prospects of applications.
In our research we developed a simulation platform SOFA [Faure et al. 2012] that allows us to simulate and visualize, in real-time, the physical behavior of soft-material. Designing soft-robots is challenging as their shapes tend to be more and more complexes. We are investigating the use of genetic algorithm for semi-automatic generation of the shape of soft-robots.
We would like that our genetic algorithm method to be made accessible from a web-site to let other researchers to use it. The website should include a graphical editor letting the user specify in simple way the problem the soft-robots have to match, then starting the automatic robot generation based on our simulation framework. When the robots have been generated, the algorithm should allow the user to visualize the different robots produces, as well as some 3d rendering of their behavior.
URL sujet detaillé : https://team.inria.fr/defrost/job-offers/developing-a-web-front-end-to-demonstrate-the-use-of-genetics-algorithms-to-generate-soft-robots/
Remarques : Co-encadrent : damien.marchal-lille.fr; alexandre.bilger.fr
|
|
|
|
|
SM207-143 Treillis de Tamari et arbre de parking
|
|
Description
|
|
This internship is in algebraic combinatorics. One well-known combinatorial sequence is the Catalan numbers, which count the number of bracketting of a word of a given size, the number of planar binary trees, the number of Dyck paths (path from (0,0) to (n,n) which stays above the line x=y with north and east steps). Tamari endowed in 1962 the set of planar binary trees with a partial order. The obtained poset is called the Tamari lattice. It has been well studied since this time and has several different representations (according to the Catalan object on which you define it). We introduced recently an object called parking tree, which is a generalisation of Catalan objects and encode parking functions. The goal of the internship will be - to use this new object to understand the enumeration of intervals in Tamari lattices - to study a commutative generalization of Tamari lattices.
URL sujet detaillé : https://oger.perso.math.cnrs.fr/SujetCPT.pdf
Remarques : Co-encadrement avec Matthieu Josuat-Vergès (IRIF, Université Paris Cité)
|
|
|
|
|
SM207-144 Reconstruction de Séquences et Analyse du Graphe des Peptides
|
|
Description
|
|
In this internship, the goal is to provide algorithmic solutions to two applied problems : one is about merging/aligning biological sequences, the other is about analyzing a graph constructed from biological data. Although the data at hand is of biological nature, not prerequisite knowledge on biology is necessary, as the internship is oriented towards algorithmics, implementation and tests.
More precisely, we are interested in retrieving peptides from mass spectrometry experiments. Peptides are small amino acid sequences (of length up to 25). An amino acid sequence can be viewed as a word on an alphabet of size 20, just as DNA is a word on an alphabet of size 4. In mass spectrometry experiments, we obtain spectra (one per peptide). From each spectrum Sp, one wants to precisely determine the peptide sequence p it comes from. However, because biological modifications can occur on the way, most often we do not obtain p, but several Â" approximates Â" of p. For instance, when p=IVHNIVEEDR, we obtain the four following approximates : p1=IV[251,10]IVEEDR, p2=IVHNI[357,15]DR, p3=IVH[114,04]IVEE[76,99]VI and p4=IVHN[212,15]EEDR. Here, a number [k] between brackets mean one should insert one or several amino acid(s) of cumulated mass k in order to obtain p. The goal here is to design an algorithm that is able to retrieve p from its approximates, ie we want to best align the approximates so as to resolve all ambiguities from the numbers in brackets.
The second goal is to construct and analyze the "peptide/spectrum graph", where the nodes are the peptides and the spectra (from the human proteome), and an edge connects two nodes when the objects are sufficiently similar (according to a given score function). This graph can be complemented by information such as the approximate peptide sequence, as discussed above. Analyzing this graph requires new algorithms to be designed, implemented and tested. Among the questions of interest, we have the following: for a given spectrum node, what information can be drawn when looking at distance 2 in this graph?
In both cases, we have the data at hand, and once algorithms will be implemented, they will be tested and evaluated on our datasets.
URL sujet detaillé : https://pagesperso.ls2n.fr/~fertin-g/Sujet_M2_Octobre_2022.pdf
Remarques : Co-encadré par Géraldine Jean et Emile Benoist. Ce stage pourra etre gratifié si nécessaire.
|
|
|
|
|
SM207-145 Topological Optimization of Geometric Filtrations for Parametrized Models
|
|
Description
|
|
Topological Data Analysis (TDA) is a set of data science tools rooted in algebraic topology that enables to extract topological descriptors from structured objects, such as graphs, time series, or point clouds. In practice, these topological descriptors depend on the choice of a filtration, a map real-valued map defined on the ambient space, and gives detailed insight on the underlying topological structure appearing in our input observation.
The process of extracting a topological descriptor from an input observation is known to be Lipschitz, hence differentiable almost everywhere. This theoretically allows one to compute a gradient, and thus enables the minimization of objective functions that involve topological terms through gradient descent. For instance, one may want to remove small topological components from the point cloudâ=80=94considered as noise-, or in contrast to enforce some topological structure to be created in the point cloud, which is doable by optimizing an appropriated loss.
While this is theoretically appealing and numerically feasible, a practical limitation immediately appears: the gradient of our map only depends on the critical values of the filtration, which in the context of point clouds implies that only very few points are moved at each optimization step.
To overcome this limitation, we propose to parametrize the point cloud by the mean of a diffeomorphic flow. Namely, instead of optimizing on the point cloud directly, we will instead optimize a flow that moves *all* the points together, yielding less sparse gradient and possibly improving the optimization scheme.
The goal of the internship is then so study the feasability of this approach, its pro and cons compared to the naive topological optimization that is currently practiced. Depending on the student's will, the work can be of a more theoretical nature (proving convergence, studying stability/smoothness, etc.) or practical (providing an implementation that provably beat state-of-the-art topological optimization methods).
The student must be familiar with standard statistical and machine learning notions (optimization, classification, estimation, overfitting, etc). A background in Topological Data Analysis is appreciated but not required. A background in Deep Learning and/or neural ODE is not required (but will be appreciated as well). More importantly, the will to implement and experiment with such models is crucial.
URL sujet detaillé : https://tlacombe.github.io/research/material/InternProp2023.pdf
Remarques : The internship will be supervised by Mathieu CARRIERE (https://www-sop.inria.fr/members/Mathieu.Carriere/).
Therefore, the internship may actually take place either at Champs-sur-Marne or at Sophia-Antipolis (Inria - Université Cote d'Azur), depending on discussion with the student.
In the latter case, Mathieu Carrière will be the main supervisor, and the institute of affiliation will be Inria.
|
|
|
|
|
SM207-146 Reconstruction Algorithms and Combinatorial Geometry for Arithmetic Circuits
|
|
Description
|
|
This internship proposal suggests several research problems connected to arithmetic circuit complexity. The student could work on one or several problems depending on his/her interests and the available time. One of these problems is of a combinatorial and (convex) geometric nature. The other problems are about reconstruction algorithms for arithmetic circuits. The latter topic is essentially an algebraic version of computational learning theory.
URL sujet detaillé : http://perso.ens-lyon.fr/pascal.koiran/GetSimpleCMS-3.3.16/data/uploads/algebraic_complexity_internship.pdf
Remarques :
|
|
|
|
|
SM207-147 Designing efficient algorithms to compute the diameter on median graphs
|
|
Description
|
|
The goal of the internship is to propose an exact algorithm computing the diameter of a median graph, with a better running time than the current ones. The diameter of a graph is the largest distance between two vertices of the graph.
URL sujet detaillé : https://perso.isima.fr/~piberge/sujet_stage_median23.pdf
Remarques : Co-encadrants : Vincent Limouzy et Lhouari Nourine
|
|
|
|
|
SM207-148 Mixed Precision DNNs Training with Second Order OptimizationMethods
|
|
Description
|
|
In this stage we will investigate the development of scalable second order optimization methods in mixed-precision.
The student will design optimization algorithms capable of automatically tuning the precision of the computations in order to decrease the energy footprint of the training of neural networks. We will also explore the convergence properties of such methods. The work will build upon a custom precision training simulation framework constructed atop PyTorch, called mptorch, developed by the internship coordinators.
URL sujet detaillé : https://perso.ens-lyon.fr/elisa.riccietti/doc/mpsecondorder_internship.pdf
Remarques : The stage will be paid by the Cominlabs project
|
|
|
|
|
SM207-149 Mixed Precision DNN Training with Multilevel Strategies
|
|
Description
|
|
The aim of the internship is to develop training algorithms for neural networks exploiting multiple arithmetic precisions to reduce the energy footprint of the training.
In order to do so, we will leverage the framework of multilevel optimization, in which a problem is represented at different scales (in this case precisions levels) and the expensive computations are performed at the cheapest levels (low precision) but allow for progress at the most expensive level (high precision).
The work will build upon a custom precision training simulation framework constructed atop PyTorch, called mptorch, developed by the internship coordinators.
URL sujet detaillé : https://perso.ens-lyon.fr/elisa.riccietti/doc/mpmultilevel_internship.pdf
Remarques : The stage will be paid by the Cominlabs project
|
|
|
|
|
SM207-150 Hiérarchies de moments pour l'AC Optimal Power-flow
|
|
Description
|
|
Dans le cadre d'une collaboration industrielle entre le LAAS et RTE, nous proposons le sujet de stage de recherche de M2 sur les hiérarchies moments pour l'AC-OPF. Le sujet nécessitera d'appliquer des techniques d'optimisation polynomiale a l'aide de hiérarchie de relaxations convexes. La résolution des relaxations convexes sera accélérée l'aide d'algorithmes de décomposition et de différentiation automatique (utilisées habituellement dans le cadre de l'analyse de réseaux profonds en IA).
URL sujet detaillé : https://homepages.laas.fr/vmagron/sujets/fastopf2.pdf
Remarques : Co-encadrement avec Adrien Le-Franc (LAAS), Manuel Ruiz (RTE) et Jean-Bernard Lasserre (LAAS) Selon la motivation du stagiaire et l'avancement du travail, il sera possible de poursuivre le sujet par une thèse Cifre.
Le stage sera rémunéré par l'entreprise RTE.
|
|
|
|
|
SM207-151 Christoffel-Darboux kernels with applications in deep learning explainability
|
|
Description
|
|
Lasserre's Hierarchy is a generic tool which can be used to solve global polynomial optimization problems under polynomial positivity constraints. The general idea is to reformulate the initial problem as an optimization problem over probability measures. Recent research investigated the ability of Christoffel-Darboux kernels to capture information about the support of an unknown probability measure. A distinguishing feature of this approach is to allow one to infer support characteristics, based on the knowledge of finitely many moments of the underlying measure. The first investigation track will consist of analyzing the last layer of an existing classification network with the Christoffel-Darboux kernels. A more theoretical goal will be the study Christoffel-Darboux kernels to extend the existing approach for measures supported on specific classes of mathematical varieties. In a further step, we intend to apply this framework to deep learning network models, for which latent representation correspond to such low-dimensional varieties. Numerical experiments will be performed on several benchmark suites, including MNIST, CIFAR10 or fashion MNIST.
URL sujet detaillé : https://homepages.laas.fr/vmagron/sujets/CDkernelExplainM2.pdf
Remarques : This M2 internship will be funded by CNRS LAAS. The M2 candidate will be hosted at LAAS, CNRS in the POP team and co-supervised by Jean-Bernard Lasserre and Victor Magron. A PhD position can be granted if the candidate obtains satisfactory results. This PhD position will be funded by DesCartes (A CREATE Programme on AI-based Decision making in Critical Urban Systems), a hybrid AI project between CNRS and Singapore. It will be co-supervised between National University of Singapore (NUS) and LAAS CNRS. The PhD candidate will be hosted in NUS, Singapore.
|
|
|
|
|
SM207-152 Etude des stratégies de programmation et de performances du commutateur réseau Intel Tofino 2
|
|
Description
|
|
Proposition de sujet de stage de Master 2
Etude des stratégies de programmation et de performances du commutateur réseau Intel Tofino 2
Equipes d'accueil du laboratoire ICube : équipes Réseaux et ICPS Encadrants : Jean-Romain Luttringer, Pascal Mérindol, Cristel Pelsser et Philippe Clauss ==================================== 1 Introduction -------------- Lors du routage de paquet dans un réseau, les fonctionnalités sous-jacentes sont classées dans deux plans. Le plan de controle contient les fonctionnalités complexes (p. ex., élaboration de la topologie et prises de décision) et le plan de données transfère les paquets de données re=A7us vers la bonne interface, telle que calculée par le plan de controle. Le plan de controle effectuant des opérations complexes, ce-dernier est principalement logiciel et utilise le CPU. A l'inverse, le plan de données devant assurer un traitement des paquets a linerate (respectant le débit maximum annoncé), ce dernier reposait traditionnellement sur du matériel dédié optimisé aux performances prévisibles (p. ex. des ASICs). En particulier, un type de mémoire particulier, la TCAM, est utilisé pour assurer un longest prefix match extremement rapide (quelques cycle d'horloge [1]). Cependant, depuis peu, le plan de données des équipements n'est plus aussi immuable qu'auparavant. Les puces Tofino [2] permettent de personnaliser le plan de donnée d'un commutateur tout en garantissant un débit extremement élevé (400 Gpbs par interface) : certains commutateurs embarquant la puce Tofino 2 peuvent garantir jusqu'a 12,8 Tbit/s grace a la technologie 7nm et prend en en charge une vitesse de port allant jusqu'a 400 GbE pour les environnements a très grande échelle. L'équipe Réseaux du laboratoire ICube dispose de plusieurs de ces équipements. Sur ces équipements, le plan de donnée (c-a-d, en principe et presque exclusivement, la manière dont le commutateur décide de l'interface de sortie adéquate pour chaque destination) peut etre programmé via le langage P4, offrant la possibilité de branchements conditionnels et de calculs simples. Le plan de données peut donc etre intelligent, par exemple en réagissant directement aux pannes (sans dépendre d'une mise a jour du plan de controle) et aux changements topologiques en général, et/ou encore en adoptant des stratégies de routage limitant la consommation mémoire et énergétique du commutateur pour améliorer ses performances en général. Notamment, P4 permet un controle relativement fin des ressources utilisées, permettant par exemple de limiter l'utilisation de la TCAM, une ressource particulièrement énergivore.
1.1 Les mémoires TCAM ------------------------- Les cellules de ces mémoires TCAM [3] ne sont pas accessibles a travers leurs adresses, comme pour les mémoires classiques, mais a travers leurs contenus. Pour une mémoire CAM binaire, l'application utilisatrice fournit un mot de donnée et la mémoire CAM recherche dans toute la mémoire pour voir si ce mot y est stocké. Si le mot est trouvé, la CAM renvoie une liste d'une ou plusieurs adresses où le mot a été trouvé. Ainsi, une CAM est l'équivalent matériel de ce que l'on appelle un tableau associatif en logiciel. La CAM ternaire (TCAM) permet un troisième état de correspondance appelé Â"XÂ" ou Â"quelconqueÂ" pour un ou plusieurs bits dans le mot de donnée stocké, permettant l'ajout de flexibilité dans la recherche. Par exemple, une CAM ternaire pourrait avoir un mot stocké de Â"10XX0Â" qui correspondra aux recherches des mots Â"10000Â", Â"10010Â", Â"10100Â", ou Â"10110Â". La flexibilité de recherche additionnelle vient avec un co"t additionnel par rapport aux CAM binaires parce que la cellule de mémoire interne doit a présent encoder les trois possibilités d'état au lieu des deux d'une CAM binaire. Cet état additionnel est typiquement implémenté en ajoutant un bit de masque a chaque cellule mémoire. Au contraire de la mémoire RAM classique, qui a des cellules de stockage simples, chaque bit de mémoire individuel dans une CAM complètement parallèle doit avoir son propre circuit de comparaison pour détecter une correspondance entre le bit stocké et le bit d'entrée. En plus, les sorties de correspondances de chaque cellule du mot de donnée doivent etre combinées pour aboutir a un signal correspondant au mot entier. Le circuit additionnel augmente la taille physique de la puce CAM ce qui augmente le co"t de fabrication, et augmente également la puissance de dissipation [4] puisque chaque circuit de comparaison est actif a chaque cycle d'horloge. En conséquence, une CAM n'est utilisée que dans les applications spécialisées où la vitesse de recherche ne peut pas etre atteinte en utilisant une méthode moins co"teuse. 1.2 Le langage de programmation P4 Considéré comme une évolution du Software Defined Networking (SDN), le langage P4 permet de programmer la fa=A7on dont le flux est traité par l'acheminement de paquets sur du matériel de transmission de paquets réseaux tels que des routeurs, les commutateurs ou les pare-feux, qu'ils soient matériels ou logiciels. Comme son nom Â"Programmation de processeurs indépendants des protocolesÂ" l'indique, le langage ne tient pas compte du format d'un paquet. En effet, les développeurs déclarent le traitement d'un paquet dans un programme écrit en langage P4, et le compilateur le met par la suite au format souhaité selon le matériel cible. La programmation en langage P4 est notamment utilisée pour mettre en oeuvre les fonctions de transfert de niveau 3 et les fonctions INT. En plus des instructions du langage lui-meme, des pragmas spécifiques au compilateur utilisé peuvent etre ajoutés, afin de controler explicitement la localisation de certaines données (TCAM ou SRAM). 2 Objectifs du stage -------------------- Il existe différents compilateurs P4, par exemple BMv2 et Tofino. Alors que le premier est ouvert, le second est propriétaire. Lors de ce stage, l'objectif est d'explorer les choix laissés au programmeur P4 en fonction de l'environnement utilisé. Il s'agira dans un premier temps d'implémenter des stratégies et des solutions de re-routage efficace (comme exemple accessible et pertinent de programmation dans le plan de données) afin de réduire les temps de coupure en cas de changement topologique : comment implémenter efficacement ces fonctionnalités dans le plan de données ? Quelles sont les performances observées en fonction des choix d'implémentation (mémoire, nombre de stages consommés, energie, etc) ? La seconde étape consistera en l'étude du séquencage des services [5] : comment partager le pipeline lorsque plusieurs fonctionnalités doivent etre implémentés en chaine (par exemple filtrage, équilibrage de charge et re-routage) ? Les ressources étant limitées, comment organiser efficacement les différents codes en espace et en temps ? Enfin, l'impact énergétique des équipement réseaux étant non négligeable [6], l'étudiant se demandera quels sont les leviers disponibles pour réduire la consommation d'énergie. Dans l'environnement Tofino, cela revient déterminer si l'on peut influer sur la consommation en utilisant des mémoires différentes et en jouant sur la profondeur de la chaine de traitement des paquets. En BMv2, il est possible d'étudier le compilateur afin de proposer des transformations de code et ainsi d'utiliser des éléments moins consommateurs d'énergie.
Les trois principales missions sont résumées ici (deux parmi trois suffiront en pratique) : ----------------------------------------------- - Etude des performances (au sens large) selon diverses stratégies de (re-)routage dans le plan de données ; - Analyse du chainage de fonctionnalités ou comment organiser le partage optimal de ressources ; - Etude de la consommation énergétique selon diverses stratégies de programmation (selon pragmas, resubmit, nombres d'opérations, voire méthode de séparation des opérations dans le data-plane/control-plane).
Références ----------- [1] A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, and S. Ruepp, "TCAM-based high speed longest prefix matching with fast incremental table updates," in 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR), 2013, pp. 43-48. [2] Intel, "Intel Tofino Intelligent Fabric Processor," https://www.intel.com/content/www/us/en/ products/network-io/programmable-ethernet-switch.html. [3] P. K. Sisira, N. Aswathy, B. Prameela, and A. George, "Ternary content addressable memory," International Journal of Engineering and Advanced Technology (IJEAT), vol. 9, no. 4S, pp. 2249- 8958, May 2020. [4] V. Ravikumar and R. Mahapatra, "TCAM architecture for IP lookup using prefix properties," IEEE Micro, vol. 24, no. 2, pp. 60-69, 2004. [5] H. Soni, M. Rifai, P. Kumar, R. Doenges, and N. Foster, "Composing dataplane programs with Î=BCP4," ser. SIGCOMM '20. New York, NY, USA : Association for Computing Machinery, 2020, p. 329-343. [Online]. Available : https://doi.org/10.1145/3387514.3405872 [6] S. Tabaeiaghdaei, S. Scherrer, and A. Perrig, "Carbon Footprints on Inter-Domain Paths : Uncovering CO2 Tracks on Global Networks," 2022. [Online]. Available : https://arxiv.org/abs/ 2211.00347
URL sujet detaillé : https://reseaux.icube.unistra.fr/img_auth_namespace.php/a/af/Sujet_de_M2_Tofino_.pdf
Remarques : Co-encadrement par Jean-Romain Luttringer, Cristel Pelsser et Philippe Clauss
|
|
|
|
|
SM207-153 Floating-point expansions with low-precision formats
|
|
Description
|
|
The goal of this internship is to investigate the feasibility of using floating-point expansions based on low precision formats (sub 16-bit floating-point values), which offers the possibility of performing higher precision computations with just low precision hardware. The student will conceive and study algorithms for basic operations (such as addition, multiplication, dot product) using low precision expansions and investigate their use in a machine learning setting.
A more complete description along with the associated bibliography can be found at the URL with the detailed subject.
URL sujet detaillé : https://people.irisa.fr/Silviu-Ioan.Filip/files/internships/LPExpansionsInternship.pdf
Remarques : The internship will be co-supervised by Anastasia Volkova, Silviu Filip and Jean-Michel Muller, with the possibility for the student to be located in Lyon, Nantes or Rennes. The duration of the internship is 5 to 6 months. For further details, do not hesitate to contact the supervisors.
|
|
|
|
|
SM207-154 Exploring low precision arithmetic for continual learning tasks
|
|
Description
|
|
The goal of this internship is to look at the impact of using low precision arithmetic to perform continual learning tasks. The student will implement, test and adapt various low precision variants of continual learning methods (replay, regularization, and parameter isolation).
URL sujet detaillé : https://people.irisa.fr/Silviu-Ioan.Filip/files/internships/subject_continual.pdf
Remarques : The internship will be co-supervised by Olivier Sentieys and Silviu Filip. The expected duration of the internship is 5 to 6 months. For further details, do not hesitate to contact the supervisors.
|
|
|
|
|
SM207-155 Reconnaissabilité dans les systèmes dynamiques symboliques
|
|
Description
|
|
Symbolic dynamics is a field at the frontier of discrete mathematics and theoretical computer science. The objective of the internship is to extend results of recognizability of substitutive shifts to systems defined by sequences of morphisms called systems S-adic shifts.
URL sujet detaillé : https://igm.univ-mlv.fr/~beal/stageM2ENSLyon2022.pdf
Remarques : Le stage sera encadré par Marie-Pierre Béal et Dominique Perrin.
|
|
|
|
|
SM207-156 Secure Compilation of Counter-Measures against Spectre Attacks
|
|
Description
|
|
The current practice for cryptographic implementations is to harden them against side-channel attacks and to this end ensure that they are constant-time. Unfortunately constant-time security does not protect against information leakage due to speculative execution, e.g., the Spectre vulnerability. Nonetheless, specific counter-measures can be efficiently deployed, such as speculative load hardening (A. Shivakumar et al. 2022).
Compilation and program optimization may interfere with counter-measures so that vulnerabilities might be unexpectedly introduced at compile-time in otherwise secure programs. Fortunately some optimizing compilers do preserve some security properties; for instance recent work (Barthe et al. 2020, 2021) has shown how to formally prove preservation of the constant-time property.
The aim of this internship is to understand how to formally justify that program transformations (such as the ones found in optimizing compilers) do preserve security against side-channel attacks, in spite of speculative execution.
URL sujet detaillé : https://members.loria.fr/VLaporte/files/SecureCompilationOfSpectreCounterMeasures.pdf
Remarques : This subject can be extended to a PhD.
Possibilities of funding according to the administrative situation of candidates.
|
|
|
|
|
SM207-157 Exploring the use of low precision arithmetic for federated learning applications
|
|
Description
|
|
The goal of this internship is to look at the impact of using low precision arithmetic during federated learning tasks. The student will be tasked with identifying the main challenges of using low precision computations in a federated learning context and prototype potential solutions to these problems. To do so, we will use an existing federated learning library, Flower, inside which we will integrate support for custom precision arithmetic.
URL sujet detaillé : https://people.irisa.fr/Silviu-Ioan.Filip/files/internships/subject_federated.pdf
Remarques : The internship will be co-supervised by Olivier Sentieys and Silviu Filip. The expected duration of the internship is 5 to 6 months. For further details, do not hesitate to contact the supervisors.
|
|
|
|
|
SM207-158 Application of learning techniques for the automatic determination of mouse resolution
|
|
Description
|
|
Current systems are unable to determine the resolution of a computer mouse from classical events ("MouseEvents") or accessible system properties. The objective of this project is to develop a learning technique that allows to determine this resolution from the raw information sent by the mouse.
The resolution of a mouse is expressed in counts per inch (CPI) and can vary between 400 and 20,000 CPI depending on the model, some even being able to change this value by clicking a dedicated button. Very few models implement the HID standard that allows to expose this information to the system. However, the behavior of the mouse cursor is strongly influenced by this resolution, which has consequences on the performance of the users [1, 3] and prevents the optimal exploitation of their accuracy [2].
The objective of the project is to develop a machine learning algorithm able to determine the resolution of a mouse from the raw information received by the system (dx and dy displacements in counts). A first collection of data from mice of different resolutions has been done, and first encouraging results have been obtained with simple features and learning methods. The objective of the internship is to improve these results by collecting more general training data and by identifying other features and more advanced learning techniques.
Objectives: 1. State of the art of learning methods for temporal data 2. Additional data collection: different users, different uses, different resolutions 3. Implementation or adaptation of the most interesting techniques to the problem at hand 4. Evaluation of these techniques 5. Implementation of an on-line analysis demonstrator of the resolution 6. Writing a research paper on the results obtained
URL sujet detaillé : https://loki.lille.inria.fr/jobs/int/2023-ResolutionSouris-en.pdf
Remarques : Co-supervised with Mathieu Nancel https://mathieu.nancel.net/ and Sylvain Malacria http://www.malacria.com/
|
|
|
|
|
SM207-159 Game Semantics in Coq
|
|
Description
|
|
Game semantics is a denotational semantics that proposes an interactive description of proofs and programs, along the lines of the Curry-Howard correspondence. The execution of programs is formalized as a play in a 2-player game that the program plays with its execution environment. Types are represented as games that describe the available computational events and valid computations, and proofs / programs are represented as strategies on those games. In contrast with traditional tools in denotational semantics that regard programs as functions (set-theoretic or domain-theoretic models), the interactive description of programs offered by game semantics natively supports a number of computational effects (ground type [1] or higher-order references [2], control operators [3], concurrency [4]...).
Despite these successes, game semantics is considered difficult to formalize. The most widespread models cited above lack mathematical maturity. Their technical underpinnings vary from one paper to another. They build on a notion called "plays with pointers" that is difficult to manipulate rigorously, and many papers lack detailed proofs. The few past formalization attempts we are aware of did not go very far -- the most advanced [5], due to Churchill, formalized a simple version of game semantics incompatible with the models used in denotational semantics.
Despite this, several recent works originating from the research community in software certification, propose Coq formalizations of the interactive behaviour of simple first-order imperative programs [6,7,8]. Indeed, those are helpful to reason on programs compositionally. In spirit, their flavour is very much game-semantics like, but their actual technical underpinning is completely independent of game semantics and denotational semantics. It is regrettable that game semantics has, as of yet, no formal tools to offer to the community of program certification.
The objective of this internship is thus to develop a Coq formalization of a game semantics model. The originality of the approach rests on two aspects:
- firstly, the games model to be formalized is not the traditional model based on plays with pointers, but a modern reformulation resting on a recent line of work (starting with Melliès' asynchronous games [9] and Winskel et al's concurrent games [10]) on the mathematical foundations of game semantics;
- secondly, rather than naively translating the pen-and-paper mathematical definitions in Coq, we will exploit the expressiveness of dependent types: we propose to build on a mutually inductive definition of plays and interactions following adequate notions of polarity, hopefully allowing for simpler and more elegant proofs.
References:
[1] Samson Abramskyï=BF=BC, Guy McCuskerï=BF=BC: Linearity, Sharing and State: a fully abstract game semantics for Idealized Algol with active expressions, 1996. [2] Samson Abramskyï=BF=BC, Kohei Honda, Guy McCuskerï=BF=BC: A Fully Abstract Game Semantics for General References. LICS 1998. [3] James Laird: Full Abstraction for Functional Languages with Control. LICS 1997. [4] Dan R. Ghica, Andrzej S. Murawski: Angelic semantics of fine-grained concurrency. Ann. Pure Appl. Log. 151(2-3): 89-114 (2008). [5] Martin Churchill, Game semantics and Agda, slides of a talk at the Centre for Verification and Semantics, AIST, Osaka, 2010. [6] Gordon Stewart, Lennart Beringer, Santiago Cuellar, Andrew W. Appelï=BF=BC: Compositional CompCert. POPL 2015. [7] Li-yao Xia, Yannick Zakowski, Paul He, Chung-Kil Hur, Gregory Malechaï=BF=BC, Benjamin C. Pierce, Steve Zdancewic: Interaction trees: representing recursive and impure programs in Coq. Proc. ACM Program. Lang. 4(POPL): 51:1-51:32 (2020) [8] Nicolas Chappe, Paul He, Ludovic Henrio, Yannick Zakowski, Steve Zdancewic: Choice Trees: Representing Nondeterministic, Recursive, and Impure Programs in Coq. CoRR abs/2211.06863 (2022) [9] Paul-André Melliès: Asynchronous Games 2: The True Concurrency of Innocence. CONCUR 2004 [10] Simon Castellan, Pierre Clairambault, Silvain Rideau, Glynn Winskel: Games and Strategies as Event Structures. Log. Methods Comput. Sci. 13(3) (2017)
URL sujet detaillé :
:
Remarques : The internship will take place in Marseille, in the campus of Luminy, and will be co-supervised by Etienne Miquey (MCF I2M) and Pierre Clairambault (CR CNRS, LIS), hosted either at the LIS (computer science) or the I2M (mathematics).
|
|
|
|
|
SM207-160 Querying graph databases with RPQs
|
|
Description
|
|
Graph database management systems have increased in popularity over the last decades. In database theory, we abstract such databases as labelled graphs. Most real query languages are based on the well-known formalism of regular path queries (RPQs). Such a query is defined from a regular expression R. Any walk in the graph labelled with a word conforming to R is called a match, and in general there are infinitely many matches. The main challenge is to efficiently compute a finite and meaningful output from the matches.
Several approaches are used in practice and theory to reach this goal. Homomorphism semantics is the most studied and enjoy nice theoretical properties, but is not suitable for some practical applications (too little information is kept in the output). On the other side of the spectrum, the most widespread semantics in practice is called trail semantics and seems unreasonable from a theoretical standpoint (high complexity, arbitrary restrictions).
In a recent work, we suggested a new approach, run-based semantics, which seems a reasonable compromise. It restricts the infinitely many matches to a finite number by stopping when a cycle occurs in the computation of the query and in the graph simultaneously. The internship is about further investigating run-based semantics, and more generally about exploring the properties and connections between the semantics of RPQs.
URL sujet detaillé : https://victor.marsault.xyz/resources/Teaching/2022/2022-11_RunBased_M2ENS.pdf
Remarques : Co-encadrement avec Claire DAVID et Nadime FRANCIS.
|
|
|
|
|
SM207-161 Design and implementation of a soft manipulator
|
|
Description
|
|
Using a soft manipulator in industrial pick and place applications require them to be accurate and fast, which are both big challenges in soft robotics today. At medium to large scale, these robots are especially subject to vibrations due to their compliance that need to be compensated. The current research on their dynamic control is limited by the absence of a suitable experimental platform, i.e. a prototype physically capable of being accurate, moving fast, and with a control law running at high frequency (~1kHz). The goal of the two internships below is to design and implement such a prototype in the context of pick and place operations. It will focus in particular on a parallel architecture, where several elastic legs are fixed to a common end-effector platform to provide better payload and accuracy.
URL sujet detaillé : https://defiant-eucalyptus-c8a.notion.site/Internships-Design-and-implementation-of-a-soft-manipulator-07d34a7c21f84cf6ad0d28cf4d25ac8f
Remarques : Quentin Peyron: quentin.peyron.fr
Christian Duriez : christian.duriez.fr
|
|
|
|
|
SM207-162 Extension of the Alloy model finder
|
|
Description
|
|
Alloy is an open-source formal tool aimed at analysing models of software systems. It has been used on topics such as the verification of distributed protocols or safety and security analysis of aeronautical systems (aircrafts or drones). In its most version (developed by ONERA and Portuguese colleagues, leveraging work done at MIT), the Alloy language is based on first-order linear temporal logic (FO-LTL) and the analysis relies on model-checking techniques. To make analysis decidable, the user must bound the size of the first-order domain: this approach remains relevant as it still allows efficient bug-finding and model-finding. Recent work [CAV 2021] at ONERA has also developed new fragments of FOLTL and abstraction techniques that allow to regain completeness in certain cases.
In this context, we propose a research internship, possibly followed by a PhD, on various topics (depending on the interests of the student): continuing the work on decidable fragments; developing new formal verification techniques (e.g. developing new SMT-based model-checking techniques, statistical model-checking, formal testing...).
URL sujet detaillé :
:
Remarques : Co-advisor: Julien Brunel.
Paid internship.
|
|
|
|
|
SM207-163 Software architecture & communication for soft manipulator
|
|
Description
|
|
The internship consists in designing the software architecture allowing the software to communicate with 6 motors at high frequency (~1kHz).
The software relies on a simulation framework, called SOFA (https://github.com/sofa-framework/sofa), developed in the team. It runs fast physical simulations used to control the motors. Despite the high performances of the simulation software, they are still under the desired rate for the motors. To overcome this limitation, the program will need a dedicated control loop running at the desired rate.
URL sujet detaillé : https://defiant-eucalyptus-c8a.notion.site/Internships-Design-and-implementation-of-a-soft-manipulator-07d34a7c21f84cf6ad0d28cf4d25ac8f
Remarques : Alexandre Bilger: alexandre.bilger.fr Yinoussa Adagolodjo: yinoussa.adagolodjo.fr
|
|
|
|
|
SM207-164 Mixed Precision stochastic gradient descent strategies
|
|
Description
|
|
State-of-the-art training methods are based on low precision variants of stochastic gradient descent (SGD) methods, which are however not always convergent. The aim of this internship will be to study techniques to allow lower and higher precisions in different steps of the optimization, in order to guarantee convergence in all cases. We will devise strategies to automatically switch between the different available precisions. To achieve this, we will explore the framework of trust region methods, schemes that allow for an automatic choice of the learning rate and that could be potentially adapted to choose also the precision.
URL sujet detaillé : https://perso.ens-lyon.fr/elisa.riccietti/doc/sgd_internship.pdf
Remarques : The stage will be paid by the Cominlabs project
|
|
|
|
|
SM207-165 Décomposition des systèmes dynamiques discrets déterministes selon le produit direct de graphes
|
|
Description
|
|
Les systèmes dynamiques discrets finis (SDD) sont des systèmes pouvant prendre un nombre fini d'états et qui évoluent a chaque pas de temps (discret) d'un état a un autre en suivant une transition possible a partir de l'état d'origine. Ils sont représentés par leur graphe des états, dans lequel les sommets sont les états possibles du système et un arc entre deux sommets indique une transition possible du premier état vers le second. Lorsqu'un SDD est déterministe, son graphe des états est de degré sortant 1.
Une opération de composition entre deux SDD qui joue un role primordial dans l'étude de ces systèmes est le produit direct. Elle se définit sur le graphe des états comme suit. L'ensemble des sommets du produit direct de deux graphes G et H est le produit cartésien de leurs ensembles de sommets et il y a un arc du sommet (a,x) vers le sommet (b,y) ssi il y a un arc de a vers b dans G et de x vers y dans H. Une question centrale pour l'étude des SDD est de savoir si un graphe donné peut s'écrire comme le produit direct de deux graphes. Un résultat fondamental sur cette question est qu'un graphe connexe et non biparti admet une unique décomposition en facteurs premiers et que celle-ci peut =AAtre calculée en temps polynomial.
Ce stage visera a répondre a certaines des questions suivantes, dans le cas particulier des SDD déterministes. La décomposition en facteurs premiers d'un graphe orienté G de degré sortant 1 est elle unique des lors que G est connexe (sans requérir qu'il ne soit pas biparti)? Lorsqu'elle est unique, en quel temps cette décomposition peut-elle etre calculée? peut-elle l'etre en temps linéaire? A l'inverse, lorsque cette décomposition n'est pas unique, l'ensemble des décompositions possibles peut-il etre représenté de manière succincte? en espace linéaire? et en quel temps cette représentation peut-elle etre calculée?
URL sujet detaillé :
:
Remarques : Le stage sera co-encadré par Christophe Crespelle
|
|
|
|
|
SM207-166 recherche de motifs dans les séquences ordonnées
|
|
Description
|
|
Context: pattern matching on ordered sequences. The main two notions of pattern when it comes to ordered sequences are Order preserving matching and Cartesian Trees. This subject contain two objectives: 1) develop notions of non-exact pattern matching based on Cartesian Trees and describe efficient algorithms that go along. 2) conduct a probabilistic study on the link between Order preserving matching and Cartesian Trees.
URL sujet detaillé : https://lipn.fr/~david/doc/sujet_stage2023.pdf
Remarques : Ce stage est financé par la fédération de recherche Normastic. Il est co-encadré par Julien David (Caen).
Le stage pourra a ce titre se dérouler a Caen ou a Rouen. Possibilité de continuer en doctorat.
|
|
|
|
|
SM207-167 Quantum Cryptanalysis of Multivariate Cryptosystems
|
|
Description
|
|
The goal of this internship will be to advance the security analysis of post-quantum multivariate cryptosystems. On the one hand, we will study quantum algorithms for solving multivariate quadratic equations (MQ2). On the other hand, we will study the security of concrete designs such as the recently proposed MAYO.
URL sujet detaillé : https://andreschrottenloher.github.io/docs/stage2023.pdf
Remarques : Co-encadrement: Pierre-Alain Fouque & André Schrottenloher. Gratification de stage standard.
|
|
|
|
|
SM207-168 Etude comparative de différentes approches d'analyse formelle d'un modèle de propagation d'attaques dans un système cyber-physique.
|
|
Description
|
|
Le stage s'intègre dans les travaux communs entre le laboratoire IBISC et l'institut IRT- SystemX dans le cadre du projet RTI (Résilience du Transport Intelligent) en particulier dans ses travaux sur Â" Analyse des risques (s"reté/sécurité) adaptée aux transports résiliants et conception collaborative Â". Un modêle original de propagation de cyber-attaques a été développé et est actuellement enrichi par de nouvelles fonctionnalités. La modélisation et l'analyse de ces extensions est actuellement l'étude.
L'objectif du stage est d'appliquer plusieurs outils d'analyse formelle a des modèles de propagation d'attaques dans des cas d'étude industriels. Ce travail demande une adaptation pour chaque outil, le développement d'outillage permettant de tenir compte des formats d'entrée et l'interprétation des résultats.
URL sujet detaillé : https://drive.google.com/file/d/1V7oveIHH2VbwVbCbmXikDU4B_vu2y_LB/view?usp=sharing
Remarques : le stage est co-encadré par Hanna Klaudel et Franck Pommereau. Rémunération assurée.
Période : 2023.
|
|
|
|
|
SM207-169 Extension d'un outil de simulation de développement urbain avec de l'IA
|
|
Description
|
|
The SITI project (funded by the ANR) aims at developing knowledge and new agent-based simulation tools to analyze a particular mode of urban production in the Pacific islands. These areas and habitats called informal settlements, "villages in cities", are home to an increasingly large part of the population. They form residential pockets of small constructions on the fringes of regulatory urbanization and its development. The management of these informal settlements, at the intersection of traditional and modern lifestyles, is therefore a major issue for the authorities in charge of urban planning. The project is based on an interdisciplinary approach involving history, urban planning, geography, economics, remote sensing and artificial intelligence. It mobilizes researchers from five laboratories, and two companies, to study informal settlements in the capitals of Fiji, Vanuatu and New Caledonia. A first multi-agent system has been developed by a PhD student. This system enables to simulate the evolution of buildings in such area. It uses the agents' behavior rules defined by an expert and an influence model based on procedural generation. However, calibration of these models remains mainly manual, requiring a strong involvement of experts. The aim of this internship is to experiment several machine learning approches (e.g. CBA, reinforcement learning) to automate the system calibration (rules, thrseholds, parameters, etc.).
URL sujet detaillé :
:
Remarques : - co-encadrant: Alain Casali (LIS-AMU)
- en collaboration a l'Université de la Nouvelle-Calédonie (E. Tack et G. Enée)
- gratification: 3,90 euros par heure, soit environ 600 euros par mois
|
|
|
|
|
SM207-170 Evolution over time of the structure of social graphs
|
|
Description
|
|
Pre-requisites if any: Basics in probability and graph theory. Programming in python, C/C++ or java.Â
Description: The goal of the project is to develop methods to analyse the evolution over time of a social network.Â
In the paper [1], the authors propose a model of random networks to study the impact of several types of lockdown policies during the COVID pandemic. The experiments built for France show that completely closing medium and long distance travel to slow down the spread of a random walk is more efficient than than local restrictions. The goal of the internship will be to extend this work to larger countries, possibly to the world. More precisely, the first step will be to understand the model presented in [1] and the state of the art (for example [2]). The second step will be an implementation of a graph generator following the model. We will study the properties of the generated graphes, e.g. degree distribution, clustering, distances and diameter. The last step will be to analyse the different lockdown policies at a scale larger than France.Â
This work is part of a larger project studying social networks and their evolutions, see for example [3,4,5].Â
The internship may be followed by a PhD for interested students.
References.Â
[1] Chatterji, I., & Lawson, A. (2021). Horospherical random graphs. arXiv preprint arXiv:2112.03535. https://arxiv.org/pdf/2112.03535.pdf
[2] Mauras, S., Cohen-Addad, V., Duboc, G., Dupré la Tour, M., Frasca, P., Mathieu, C., ... & Viennot, L. (2021). Mitigating COVID-19 outbreaks in workplaces and schools by hybrid telecommuting. PLoS computational biology, 17(8), e1009264.
[3] Thibaud Trolliet. Study of the properties and modeling of complex social graphs. Social and Information Networks [cs.SI]. Université Cote d'Azur, 2021. English. https://tel.archives-ouvertes.fr/tel-03468769/document
[4] Frédéric Giroire, Nicolas Nisse, Thibaud Trolliet, Malgorzata Sulkowska. Preferential attachment hypergraph with high modularity. [Research Report] Université Cote d'Azur. 2021. https://hal.inria.fr/hal-03154836
[5] Frédéric Giroire, Nicolas Nisse, Kostiantyn Ohulchanskyi, Malgorzata Sulkowska, Thibaud Trolliet. Preferential attachment hypergraph with vertex deactivation. [Research Report] Inria - Sophia antipolis; UCA, I3S. 2022. https://hal.inria.fr/hal-03655631
URL sujet detaillé :
:
Remarques : The internship will be co-supervised by Nicolas Nisse. http://www-sop.inria.fr/members/Nicolas.Nisse/
|
|
|
|
|
SM207-171 Characterising Synthetic vs Real-World Combinatorial Problem Instances via Search Landscape Models
|
|
Description
|
|
When trying to find the best algorithm or parameter configuration for a given optimization problem, there are generally not enough real-world instances to train the prediction models. Therefore synthetic instances are required. Since these are not perfect, the models may not actually be well suited to real-world instances. In this project, we will analyse the differences between real-world and synthetic instances by looking at their search landscape. In particular, we will consider Local Optima Networks, a high-level graph representation of the landscape. The Course-Based Curriculum Timetabling problem will be used as a case study.
The suggested work plan is as follows. It may evolve to suit the interests of the student and to adapt to the results obtained. 1. Understand and run a timetable solver to gather data about the timetable search landscapes, on real-world and synthetic instances. 2. Identify and compute relevant traditional landscape and graph metrics. New metrics may be proposed, as well as relevant visualisations. 3. Compare real-world and synthetic landscapes. 4. Identity weaknesses in the instance generator and propose improvements.
URL sujet detaillé : https://nextcloud.univ-lille.fr/index.php/s/dyxjtRifPW8wPJ2
Remarques : Advisors: Marie-Eléonore KESSACI, Nadarajen VEERAPEN Stipend: About 560â=82=AC per month
|
|
|
|
|
SM207-172 Enhancing verification for modular distributed programming
|
|
Description
|
|
Distributed systems are increasingly widespread and critical. A typical distributed system is a composition of communicating, concurrent modules; this enhances scalable performance, modularity, clear APIs, elasticity and flexibility. However, composition is hard because of concurrency, failures, and lack of guarantees such as type checking
Our Varda distributed programming environment is designed to reconcile performance and strong abstractions. A Varda program specifies a system as an architecture of independently-developed components.
The goal of the internship is to improve the verification capabilities of the Varda compiler. The intern shall work on two fronts: improving its type system, and exporting a Varda architecture as a formal model to an external checker (e.g., TLC, Coq or P), in order to statically verify invariants.
URL sujet detaillé : https://laurentprosperi.info/static/verif.pdf
Remarques :
|
|
|
|
|
SM207-173 Efficient compilation of a modular distributed programming language
|
|
Description
|
|
Distributed systems are increasingly widespread and critical. A typical distributed system is a composition of communicating, concurrent modules; this enhances scalable performance, modularity, clear APIs, elasticity and flexibility. However, composition is hard because of concurrency, failures, and lack of guarantees such as type checking.
Our Varda distributed programming environment is designed to reconcile performance and strong abstractions. A Varda program specifies a system as an architecture of independently-developed components. Furthermore, the developer specifies the system's orchestration, i.e., how components are created, destroyed, where they are located, how they interact , and what invariants must be maintained. A Varda system can be rearchitected flexibly thanks to transparent interception. The current Varda compiler emits Java code and interfaces with a runtime in the Akka environment.
The goal of the internship is to extend support to the Go language, interfacing with Kubernetes in order to support container as first-class value in the language, and to improve performance and flexibility.
URL sujet detaillé : https://laurentprosperi.info/static/go.pdf
Remarques :
|
|
|
|
|
SM207-174 Do we need another interactive proof assistant with related type theory?
|
|
Description
|
|
Do we need another interactive proof assistant ? Surely not. Do we need to explore new forms of type polymorphism in proof assistants and type theories in order to explore new features and possibly simplify the actual type theories (e.g. calculus of construction?). If you are confident in caml, coq, lambda calculus, term rewriting systems, dependent type theories, subtypes, opt. intersection and union types, you may be interested to this stage that is a follow up of a series of papers, one Ph.D. thesis and one prototype proof of concept SW, called Bull. A successful stage can surely continue with a thesis. Please send a cv to Luigi.Liquori.fr and have a look on the following url: https://hal.archives-ouvertes.fr/hal-02573605 (and related conferences cited inside) and the following code: https://github.com/cstolze/Bull https://github.com/cstolze/Bull-Subtyping
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-175 A Combination Approach to Congruence Closure Procedures
|
|
Description
|
|
Context:
Decision Procedures are very useful to discharge the conditions generated by verification tools. Nowadays, decision procedures are often encapsulated into solvers for the problem of Satisfiability Modulo Theory (SMT). The architecture of a SMT solver consists of different components, including satisfiability procedures (for different theories), a Boolean solver for propositional calculus, and techniques for combining satisfiability procedures available for the considered theories. Indeed, formulas handled by SMT solvers are often expressed in unions of theories, e.g, a theory of arithmetic plus the theory of uninterpreted function symbols, UF. To consider unions of theories, a natural approach is to proceed in a modular way by applying a combination method [5,8]. In the case of some axiomatized theories, congruence closure techniques can be applied [6], leading to an efficient satisfiability procedure in the case of UF. To obtain powerful SMT solvers, it is important to study the tight integration of both congruence closure and combination principles.
Subject:
The project is to study a combination approach to congruence closure procedures. Several extensions to the classical congruence closure procedure known for UF have been proposed in the literature, especially for the Associativity-Commutativity of a function symbol [1] (AC for short), and for some theories including AC [2-4]. The first goal of this project is to get a better understanding of the class of axiomatized theories admitting a congruence closure procedure. Then, a natural question will arise: how to combine congruence closure procedures when handling unions of theories? This will allow us to handle theories defined by multiple uninterpreted function symbols and interpreted ones fulfilling axioms such as Associativity and/or Commutativity. A first step is to consider the disjoint case, where the component theories of the union are signature-disjoint. To this end, we believe that the notions of extended canonizer [7] and deduction completeness [9] can be very useful.
On the practical side, we may investigate the design of an architecture in which congruence closure procedures can be easily prototyped and combined. To develop this architecture, it would be interesting to reuse an existing toolkit initially implemented in OCaml to build an equational theorem prover for theories including AC [10].
References:
[1] Leo Bachmair, Ashish Tiwari, and Laurent Vigneron. Abstract congruence closure. J. Autom. Reason., 31(2):129--168, 2003.
[2] Sylvain Conchon, Evelyne Contejean, and Mohamed Iguernelala. Canonized rewriting and ground AC completion modulo Shostak theories: Design and implementation. Log. Methods Comput. Sci., 8(3), 2012.
[3] Deepak Kapur. A modular associative commutative congruence closure algorithm. In FSCD 2021, volume 195 of LIPIcs, pages 15:1--15:21.
[4] Deepak Kapur. Modularity and combination of associative commutative congruence closure algorithms enriched with semantic properties. CoRR, abs/2111.04793, 2021.
[5] Greg Nelson and Derek C. Oppen. Simplification by cooperating decision procedures. ACM Trans. Program. Lang. Syst., 1(2):245--257, 1979.
[6] Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure. J. ACM, 27(2):356--364, 1980.
[7] Silvio Ranise, Christophe Ringeissen, and Duc-Khanh Tran. Nelson-Oppen, Shostak and the extended canonizer: A family picture with a newborn. In ICTAC 2004, volume 3407 of Lecture Notes in Computer Science, pages 372--386. Springer, 2004.
[8] Robert E. Shostak. A practical decision procedure for arithmetic with function symbols. J. ACM, 26(2):351--360, 1979.
[9] Duc-Khanh Tran, Christophe Ringeissen, Silvio Ranise, and Hélène Kirchner. Combination of convex theories: Modularity, deduction completeness, and explanation. J. Symb. Comput., 45(2):261--286, 2010.
[10] Laurent Vigneron. Positive deduction modulo regular theories. In CSL 1995, volume 1092 of Lecture Notes in Computer Science, pages 468--485. Springer, 1995.
URL sujet detaillé :
:
Remarques : Laurent Vigneron (Laurent.Vigneron.fr) is the co-adviser of the internship.
The intern should have good knowledge of logic (first order logic,
automated deduction, rewriting). The internship may include an
implementation part and so good programming skills are welcome. If the
candidate is interested, a continuation is possible in the context of
a PhD thesis on related topics.
|
|
|
|
|
SM207-176 Programmation d'architectures hétérogènes avec SYCL
|
|
Description
|
|
This internship is part of the general problem of programming heterogeneous embedded platforms (multi-core, GPU, FPGA, CGRA). This internship is conducted and financed within the framework of the joint LATERAL Laboratory between the Lab-STICC and Thalès/LAS, for the "digital core" part of an on-board computer.
The work carried out in this context is based on the use of the SYCL standard, which defines a universal programming model made up of calculation kernels, placed on defined architectures, and which interact via memory buffers. SYCL is an emerging industry standard, the use of which is beginning to be well established for embedded computing platforms, and which aims to also be deployable on any high-performance computing (HPC) platform. The objective of this internship is to conduct a study on the programming of heterogeneous architectures with SYCL, to establish a complete overview of existing implementations, addressable hardware supports, and to characterize processing performance in the field of signal processing or image processing. This internship will consist of the following parts:
' State of the art of SYCL implementations
' Comparative study on FPGA component integration (Intel/Altera and AMD/Xilinx) in a SYCL program, impact on development time, underlying reconfiguration management modes
' Implementation of a set of representative supercomputing programs and performance measurements by varying the context
URL sujet detaillé : https://ubocloud.univ-brest.fr/s/tigsGpTbA3tMcKo
Remarques : Sujet co-encadré avec Lo=AFc Lagadec (Ensta-Bretagne). Gratification standard pour stage dans un organisme public (environ 600 euros par mois).
|
|
|
|
|
SM207-177 Conception de traitements évolués sur les composants VERSAL AI
|
|
Description
|
|
This internship aims to study the architecture and the programming environment of the AI Engine integrated in the VERSAL AI components of AMD/Xilinx. This internship is conducted and financed by the joint LATERAL Laboratory between the Lab-STICC and Thalès/LAS, for the "digital core" part of an on-board computer. In order to characterize the performance of this component, significant examples will be developed, in the field of signal processing and AI.
URL sujet detaillé : https://ubocloud.univ-brest.fr/s/ciXEKkoDAn7jdm6
Remarques : Sujet co-encadré avec Catherine Dezan. Gratification standard pour stage dans un organisme public (environ 600 euros par mois).
|
|
|
|
|
SM207-178 Embarquabilité d'algorithmes de classification acoustique
|
|
Description
|
|
This internship aims to study and characterize classification algorithms for acoustic signals in the marine environment, for implementation on reconfigurable FPGA circuits. This internship is conducted as part of the RESSACH project, and funded by the ISblue university research school.
URL sujet detaillé : https://ubocloud.univ-brest.fr/s/yr85jPtHsTLmXGL
Remarques : Sujet co-encadré avec Catherine Dezan. Gratification standard pour stage dans un organisme public (environ 600 euros par mois).
|
|
|
|
|
SM207-179 Design and implementation of a rollback-recovery algorithm for high-performance, parallel applications in the one-sided communication model followed by OpenSHMEM
|
|
Description
|
|
Large-scale systems are subject to failures. The parallel programming models and programming environments that support parallel applications running on such systems need to be able to tolerate them, and the execution needs to be able to progress in spite of failures.
This internship focuses on fault tolerance achieved using system-level checkpointing for rollback-recovery, in the one-sided communication model followed by OpenSHMEM.
The goals for this internship will be to: - Design an algorithm for fault-tolerance using rollback-recovery in the one-sided communication model followed by OpenSHMEM; - Implement it in an open-source implementation of OpenSHMEM; - Evaluate its performance using benchmarks and real applications.
URL sujet detaillé : https://etsmtl365-my.sharepoint.com/
Remarques : Possibilité de bourse si l'étudiant n'en a pas.
|
|
|
|
|
SM207-180 Design and implementation of an application-level model for fault-tolerance for high-performance, parallel applications in the one-sided communication model followed by OpenSHMEM
|
|
Description
|
|
Large-scale systems are subject to failures. The parallel programming models and programming environments that support parallel applications running on such systems need to be able to tolerate them, and the execution needs to be able to progress in spite of failures.
This internship focuses on fault tolerance achieved using application-level fault tolerance, and, in particular, the design of a model to let the used define and implement how his or her application can be fault tolerant, in the one-sided communication model followed by OpenSHMEM.
The goals for this internship will be to: - Design a model for application-level fault-tolerance in the one-sided communication model followed by OpenSHMEM; - Implement it in an open-source implementation of OpenSHMEM; - Make some existing benchmarks and mini-apps fault-tolerant in this model, and use them to evaluate its performance.
URL sujet detaillé : https://etsmtl365-my.sharepoint.com/
Remarques : Possibilité de bourse si l'étudiant n'en a pas.
|
|
|
|
|
SM207-181 Security analysis of an e-voting system
|
|
Description
|
|
The goal of this internship is to lay the basis for a longer project aimed at providing a regular analysis of the main e-voting systems used in France. Our recent work on the Neovote system --- used in 2022 for the French presidential primaries --- proved the existence of multiple major vulnerabilities. We suspect that similar vulnerabilities would be present in other competing systems and plan to study them one by one. The main goal of the internship would be to analyse the code of at least one competing software to detect vulnerabilities, inconsistencies, and departures from best practices in e-voting. The data would be either obtained by soliciting a trial or by using evidence from ongoing elections (e.g. on systems used by other universities).
URL sujet detaillé : https://koliaza.com/evoting-internship.pdf
Remarques :
|
|
|
|
|
SM207-182 Designing a prototype and experimenting with federated machine learning
|
|
Description
|
|
Spirals is an Inria Lille research team [1] studying self-adaptation of software systems: rooted in Software Engineering, its focuses also comprise privacy & security, and measuring the energy consumption of software systems. Spirals participates in the Inria FedMalin challenge [2], which aims at coordinating research on federated and decentralised machine learning. In this paradigm, learning occurs directly where data is produced: on personal devices. Only model gradients are exchanged for generalization---either with a central location (federated) or directly among learning peers (decentralised).
Spirals being one of the most applied research teams of the FedMalin consortium, it notably aims at implementing theoretical contributions from colleagues (e.g. Aurélien Bellet [3]) in order to enlighten their practical limitations and overcome them (obviously!). We are thus seeking for an intern that would digest contributions from FedMalin partners, develop federated ML prototypes on mobile devices, and conduct data acquisition & processing experiments in the wild using a fleet of devices. You will need skills in both statistics/machine learning and software development. Enthusiasm, curiosity, and autonomy are sought-after qualities. Supervision will be undertaken by Adrien Luxey-Bitri (associate professor) and Rémy Raes (PhD student and former research engineer).
== Demanded work = 1. State of the art review in federated/decentralised learning, with a particular focus on contributions from the FedMalin consortium [2,3] ;
2. Implementation of a mobile prototype and data digestion pipeline ;
3. In-situ data collection and experimentation.
4. Proposition d'implémentation mobile et expérimentation.
== Bibliography = [1] Spirals research group. https://team.inria.fr/spirals/
[2] FedMalin. https://www.inria.fr/fr/fedmalin - https://www.inria.fr/en/fedmalin
[3] Aurélien Bellet's personal website. http://researchers.lille.inria.fr/abellet/
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-183 Performance evaluation of admission control policies for edge /fog systems
|
|
Description
|
|
Edge and fog computing paradigms aim to shif computation execution part of mobile services on nodes closer to the end user [1][2][3]. This internship will focus on the development of a simulator to evaluate the performance of such systems under different deployment scenarios. The reference application studied in the internship are video analytics, one of the killer applications of edge computing. Video analytics applications process several video camera feeds in a tagged geographical area to perform sophisticated functionalities such as motion detection (e.g., count moving vehicles, identify an accident or a traffic jam) or features extraction (e.g., recognize a target vehicle's plate number). The simulator will permit to reproduce the dynamic admission of video flows onto the edge system -and their informative content- and to compare the performance of different edge orchestration policies. It will be able to: (1) simulate the performance of a given placement of processing functions in the Fog/Edge infrastructure and (2) simulate the admission of mobile cameras based on their field of view and on video streaming parameters. The candidate will develop a platform to perform simulations which reproduce the performance experienced by the video analytics service.
[1] F. Faticanti, F De Pellegrini, D Siracusa, D Santoro, S Cretti, Throughput-Aware Partitioning and Placement of Applications in Fog Computing, IEEE Transactions on Network and Service Management 17 (4), 2436-2450 [2] F. Faticanti, F Bronzino, F De Pellegrini, "The case for admission control of mobile cameras into the live video analytics pipeline", Proc. of the 3rd ACM Workshop on Hot Topics in Video Analytics and Intelligent Edges, Oct. 2021 [3] F. De Pellegrini, F Faticanti, M Datar, E Altman, D Siracusa, Optimal Blind and Adaptive Fog Orchestration under Local Processor Sharing, in Proc. of IEEE RawNet Workshop 2020.
URL sujet detaillé :
:
Remarques : Gratification selon la législation en vigueur
|
|
|
|
|
SM207-184 Models and Optimization for Decoy Allocation in Cloud-to-Edge Computing Environments
|
|
Admin
|
|
Encadrant : Francesco DE PELLEGRINI |
Labo/Organisme : The Laboratoire Informatique d'Avignon (LIA) of the University of Avignon is seeking a highly motivated stage candidate in the field of security for cloud-to-edge computing networks. The stage focuses on the modelization of deception algorithms for the placement of decoys (e.g., honeypots, honeynets, etc.) in highly distributed computing environments. The candidate will work on the modelization of the problem, the resulting optimization problems and their solutions. Beyond a basic understanding of the security aspects involved, a solid background in optimization and algorithmics is expected. |
URL : https://univ-avignon.fr/recherche/upr-4128-lia-laboratoire-informatique-d-avignon-1195.kjsp |
Ville : Avignon |
|
|
|
Description
|
|
Work description: The Laboratoire Informatique d'Avignon (LIA) of the University of Avignon is seeking a highly motivated stage candidate in the field of security for cloud-to-edge computing networks. The stage focuses on the modelization of deception algorithms for the placement of decoys (e.g., honeypots, honeynets, etc.) in highly distributed computing environments. The candidate will work on the modelization of the problem, the resulting optimization problems and their solutions. Beyond a basic understanding of the security aspects involved, a solid background in optimization and algorithmics is expected.
Keywords: Stochastic optimization, cybersecurity, cloud computing
Context and challenges: Cyber deception is a defense strategy, complementary to conventional approaches, used to enhance the security posture of a system. The basic idea of this technique is to deliberately conceal and/or falsify a part of such system by deploying and managing decoys (e.g., "honeypots", "honeynets", etc.), i.e., applications, data, network elements and protocols that appear to malicious actors as a legitimate part of the system, and to which their attacks are misdirected. The advantage of an effective cyber deception strategy is twofold: on one hand, it depletes attackers' resources while allowing system security tools to take necessary countermeasures; on the other hand, it provides valuable insights on attackers' tactics and techniques, which can be used to improve system's resilience to future attacks and upgrade security policies accordingly. Although cyber deception has been successfully applied in some scenarios, existing deception approaches lack the flexibility to be seamlessly operated in highly distributed and resource-constrained environments. Indeed, if virtualization and cloud-native design approaches paved the way for ubiquitous deployment of applications, they widened the attack surface that malicious actors might exploit. In such a scenario, it is practically unfeasible to try to deploy decoys for each and every system's service or application without dramatically depleting resources, especially in edge scenarios, where these are scarcely available. This calls for a novel approach to cyber deception combining security, networking, cloud and algorithmics that takes the tradeoff between security and efficiency into account and makes deception strategies more effective in cloud-to-edge environments.
The objective of the stage. The stage project will provide a model, based on stochastic processes and Markov Decision Problem (MDP) theory, for the automated orchestration of decoys, the design and implementation of lightweight and flexible honeypot deployment strategies for cloud-to-edge computing systems. The aim of the model is to represent the major trade-offs in the allocation of decoys and to include both infrastructure and application features related to the microservice architecture pertaining to modern microservice chains composing cloud-to-edge applications. The stagiaire is expected also to provide numerical solutions based on classic dynamic programming algorithms. More advanced models may include advanced features such as partial observable system states and information asymmetry. Finally, the reference MDP framework will be compared against classic schemes for honeypots placement.
To apply: candidates should contact as soon as possible: Francesco De Pellegrini: francesco.de-pellegrini-avignon.fr Yezekael Hayel: yezekael.hayel-avignon.fr for further information on the stage and workplace laboratory. A follow up Ph.D. position is going to be opened at the end of the stage.
References 1) Wang, Cliff, and Zhuo Lu. "Cyber deception: Overview and the road ahead." IEEE Security & Privacy 16.2 (2018): 80-85. 2) Li, Huanruo, et al. "An optimal defensive deception framework for the containerâ=80=90based cloud with deep reinforcement learning." IET Information Security 16.3 (2022): 178-192. 3) Sajid, Md Sajidul Islam, et al. "SODA: A System for Cyber Deception Orchestration and Automation." Annual Computer Security Applications Conference. 2021. 4) Sayari, Amal, et al. "Attack Modeling and Cyber Deception Resources Deployment Using Multi-layer Graph." International Conference on Advanced Information Networking and Applications. Springer, Cham, 2022. 5) H. Anwar, C. A. Kamhoua, N. O. Leslie, and C. Kiekintveld, "Honeypot Allocation for Cyber Deception Under Uncertainty," IEEE Transactions on Network and Service Management, pp. 1-1, 2022, doi: 10.1109/TNSM.2022.3179965. 6) Quanyan Zhu. 2019. Game theory for cyber deception: a tutorial. In Proceedings of the 6th Annual Symposium on Hot Topics in the Science of Security (HotSoS '19). 7) O. Tsemogne, Y. Hayel, C. Kamhoua, G. Deugoue, Game Theoretic Modelling of Cyber Deception against Epidemic Botnets in Internet of Things, in IEEE Internet of Things Journal, special issue on Secure Data Analytics for Emerging Internet of Things, vol. 9, no. 4, 2022.
URL sujet detaillé :
:
Remarques : To apply: candidates should contact as soon as possible:
Francesco De Pellegrini: francesco.de-pellegrini-avignon.fr
Yezekael Hayel: yezekael.hayel-avignon.fr
for further information on the stage and workplace laboratory.
A follow up Ph.D. position is going to be opened at the end of the stage.
|
|
|
|
|
SM207-185 Casser des graphes
|
|
|
|
SM207-186 Observation Fine des Perturbations de Réseaux Routiers
|
|
|
|
SM207-187 Cyclic proofs for arbitrary well-founded recursion
|
|
Description
|
|
Cyclic proofs, through the lens of the Curry-Howard correspondence, may be viewed as typed functional programs 'with loops', closer to low-level machine models while nonetheless enjoying excellent metalogical properties. This computational viewpoint of cyclic proofs has been advanced through several works in recent years, including [BDS'16], [DS'19], [KPP'21], [D'21], [CD'22], and now comprises a vibrant area of research.
However an under-developed aspect of the cyclic-proofs-as-programs program is a general (higher-type) recursion theoretic analysis. In the traditional setting this is well-understood via a celebrated three-way correspondence: higher-type recursion on integers = recursion up to epsilon_0 = definability in Peano Arithmetic. Steps towards this have been taken in works such as [KPP'21] and [D'21], as well as [CD'21], where fragments of G=B6del's system T have been recast in a circular setting. Along with analogous results on the arithmetical side, [S'17], [D'20], a coherent picture is starting to emerge: circular reasoning buys exactly one level of abstraction complexity, when compared to explicit recursion operators.
One stumbling block towards a general circular (higher-type) recursion theory is that the 'progressing thread' condition on cyclic proofs (to ensure totality of the underlying program) is infamously sensitive to minor variations in syntax. Typically unproblematic coding matters can pose serious problems for direct translations between circular systems, often requiring first going through systems of explicit recursion.
The aim of this project is to develop circular systems parametrised by one or more arbitrary well-founded relations, with the capacity of conducting (higher-type) recursion on them. The idea is to develop correctness conditions based on a 'guarded cut' rather than typical threading, rather mirroring classical recursion-theoretic approaches to recursion on well-founded relations. The point here is two-fold: (a) admit a robust theory of circular recursion on well-founded relations; (b) allow direct translations between circular type systems.
At least one application of these ideas is to develop circular systems for ordinal recursion, filling a gap in the literature on cyclic proofs with regards to the aforementioned three-way correspondence.
REFERENCES
[BDS'16] Infinitary proof theory: the multiplicative additive case. David Baelde, Amina Doumane & Alexis Saurin. Proceedings of CSL '16.
[S'17] Cyclic arithmetic is equivalent to Peano arithmetic. Alex Simpson. Proceedings of FoSSaCS '17.
[D'20] On the logical complexity of cyclic arithmetic. Anupam Das. Journal LMCS.
[KPP'21] Cyclic proofs, system T, and the power of contraction. Proceedings of POPL '21.
[D'21] On the logical st
[DS'19] Infinets: the parallel syntax of non-wellfounded proof theory. Abhishek De & Alexis Saurin. Proceedings of Tableaux '19.
[D'21] A circular version of G=B6del's T and its abstraction complexity. Anupam Das. arXiv preprint 2012.14421. (Preliminary version of part of this work in proceedings of FSCD '21)
[CD'22] Cyclic implicit complexity. Gianluca Curzi & Anupam Das. Proceedings of LICS '22.
URL sujet detaillé :
:
Remarques : Funding: there is funding to reimburse travel to and accommodation in Birmingham, UK during the internship.
This project may involve collaboration with other members of the Birmingham team: https://www.birmingham.ac.uk/research/activity/computer-science/theory-of-computation/people.aspx
Please contact me by email if you are interested.
|
|
|
|
|
SM207-188 Logical characeristaions of monotone (deterministic) complexity classes
|
|
Description
|
|
Implicit Computational Complexity (ICC) is an area which aims to develop machine-free characterisations of complexity classes using concepts from mathematical logic, in particular recursion theory and proof theory. Celebrated results include Cobham's characterisation (C) of the polynomial-time functions (FP) using 'bounded recursion on notation', and later Bellantoni and Cook's characterisation (B) of FP using 'predicative' (or 'safe') recursion on notation [BC'92]. The latter is particularly notable, both for its simplicity of presentation and for the fact that it avoids any explicit mention of resource bounds. Higher-type versions of both algebras have been proposed, namely Cook-Urquhart's PV-omega corresponding to C [CU'93], and Hofmann's SLR corresponding to B [H'97], yielding a now rich proof theory of polynomial-time by way of the Curry-Howard correspondence.
Monotone (Boolean) functions are ones that preserve the underlying point-wise order on binary strings. Computational models for monotone classes abound in the literature, particularly for non-uniform classes, e.g. negation-free circuits or formulas, and monotone branching programs. What is notable here is that the natural syntactic monotone restriction of a computational model does not necessarily coincide with the corresponding semantic restriction on the computational model. For instance famous work of Razborov implies that there are monotone polynomial-time predicates computed by no family of negation-free polynomial-size circuits [R'85].
Grigni and Sipser commenced a line of work duly recovering monotone versions of uniform complexity classes [GS'92], yielding a robust 'monotone' version of a non-deterministic computational model. Roughly speaking, a computation is monotone just if whenever it can make a transition when reading a 0, it can make the same transition when reading a 1. The same natural idea does not quite work for deterministic models, but Lautemann, Schwentick & Stewart duly proposed a machine model and showed that there is a robust notion of 'monotone polynomial time', by showing that several natural such characterisations coincide [LSS'98].
Very recently, Das and Oitavem commenced investigations into ICC-style characterisations of monotone classes [DO'18], in particular classifying monotone computation as a sort of uniformity during recursion. At least one interest from the logical point of view is that such characterisations lend themselves to program extraction for weak theories of arithmetic, where negation and implication are controlled. Via the bounded-arithmetic-proof-complexity correspondence, there is a direct link with monotone proof complexity too.
This aim of this project is twofold: (a) extend the recursion theoretic characterisation of [DO'18] to other deterministic classes, such as LOGSPACE, as well as more complex forms of nondeterminism, e.g. ALOGTIME, where the corresponding machine models are rather ad hoc. (b) connect these recursion theoretic characterisations with weak theories by means of monotone computational interpretations. The pinnacle here would be to achieve a monotone 'Cook's correspondence', linking monotone polynomial-time with a monotone version of Cook's PV.
REFERENCES
[BC'92] A new recursion-theoretic characterization of the polytime functions. Stephen Bellantoni and Stephen Cook.
[CU'93] Functional interpretations of feasibly constructive arithmetic. Journal APAL. (Preliminary version in proceedings of STOC '89)
[H'97] A mixed modal/linear lambda calculus with applications to Bellantoni-Cook safe recursion. Proceedings of CSL '97.
[R'85] Lower bounds on the monotone complexity of some Boolean functions. Alexander Razborov. Doklady Akademii Nauk SSSR.
[GS'92] Monotone complexity. Michelangeol Grigni & Michael Sipser. Proceedings of London Mathematical Society Symposium on Boolean Function Complexity.
[LSS'98] Positive versions of polynomial time. Clemens Lautemann, Thomas Schwentick & Iain Stewart. Journal Information and Computation. (Preliminary version in proceedings of CCC '96)
[DO'18] A recursion-theoretic characterisation of the positive polynomial-time functions. Anupam Das & Isabel Oitavem. Proceedings of CSL '18.
URL sujet detaillé :
:
Remarques : Funding: there is funding to reimburse travel to and accommodation in Birmingham, UK during the internship.
This project may involve collaboration with other members of the Birmingham team: https://www.birmingham.ac.uk/research/activity/computer-science/theory-of-computation/people.aspx
Please contact me by email if you are interested.
|
|
|
|
|
SM207-189 Learning of interactions between phytoplankton species and integration of preliminary knowledge
|
|
Description
|
|
Phytoplankton are small or unicellular organisms in the sea, at the basis of food chains. They have an impact on climate, water quality and fishing. Therefore, studying the evolution of their population over time is of major interest.
For this, we aim at using a framework of symbolic learning in order to obtain an explainable predictive model. This model will be used to find the possible interactions between species (predation, cooperation), which are still mostly unknown.
This subject is directed at M2 students. The student will have to: - Understand the learning method and the work that is already available, - Integrate preliminary knowledge (preferred temperature of species, etc.) in the learning pipeline, - Produce a graph of interactions by extracting information from the predictive model, - Discuss with the biologist colleague in order to validate the hypothesis made, results obtained...
See attached detailed subject. The internship can take place in French or in English.
URL sujet detaillé : http://maxime.folschette.fr/sujets/Stage%20M2%20-%20Integration%20des%20connaissances%20phytoplanctoniques%20-%202022-2023.pdf
Remarques : Co-supervision with Cédric Lhoussaine
|
|
|
|
|
SM207-190 Static analysis by under-approximation for synchronous automata networks
|
|
Description
|
|
The synchronous automata network is a formalism allowing to represent many dynamic systems such as biochemical reactions, interacting computers or programs, etc. A naive analysis of their dynamics is exponentially complex depending on their size, which requires to find alternative methods. A possible approach is the use of abstract interpretation, which consists in approximating their dynamics to obtain very fast results in some cases.
The objective of this internship is to propose an abstract interpretation approach on synchronous automata. The work of the student will be the following: - Understand the existing works on the subject (on similar formalisms), - Propose an extension for synchronous automata networks, - Implement the method proposed, possibility by integrating it into an existing library.
URL sujet detaillé : http://maxime.folschette.fr/sujets/Stage%20M1-M2%20-%20Abstraction%20des%20reseaux%20d%27automates%20-%202022-2023.pdf
Remarques : The internship advising can take place in French or English. Understanding of written English is required.
|
|
|
|
|
SM207-191 Word equations, Makanin's algorithm and palindromes
|
|
Description
|
|
Some equations on words over a free semigroups are simple: for example, the solutions of the commutation equation XY=YX are exactly the all the words X and Y which are repetitions of the same word, like for example X=001001 and Y=001001001 are both repetitions of 001.
However, solving general word equations is very difficult. A general method for it was described by Makanin in 1977 and is worth a monograph chapter: "Makanin's Algorithm", by V. Diekert, in the 2002 book "Algebraic combinatorics on words" by M. Lothaire (which is a collective penname).
So, the minimal goal of this stage is to read and understand that main ideas of this chapter (available online), at least for particular cases. The maximum goal is to find modifications of the algorithm for the case when words denoted by some of the variables are palindromes. This can help also for other combinatorial problems on palindromes which can be discussed.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-192 Non-linear physics on quantum computer
|
|
Description
|
|
In this internship project, we will consider fully quantum circuits for classical non linear PDE, which models a broad class of phenomena in science, such as turbulence in hydro-dynamics. The selected student will be hosted by Prof. Giuseppe Di Molfetta at the LIS in Marseille and will be co-directed by Pr. Pierre Sagaut, an expert in computational fluid dynamics, member of the M2P2. The candidate will also be a member of a top scientists large network sponsored by the von Karman Institute for Fluid Dynamics and will have access to funded International thesis programmes. The selected candidate will be responsible for designing quantum circuits, their optimisation and the analysis of their complexity. This is certainly theoretical work that will require great inventiveness, motivation and ambition. Please, do not hesitate to contact me for the detailed sujet or for any other questions. Keep in touch!
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-193 Distributed algorithms on quantum (noisy) architectures
|
|
Description
|
|
Context and goals:
Mid-terms quantum architectures will be likely distributed. Similarly to modern high-performance computing infrastructures, many quantum processors, memories and storage units can be inter-connected via quantum communication networks, and the computational tasks be solved by adopting a distributed approach. However, distributed quantum computing remains challenging from different points of view. Some examples are quantum teleportation as a mean to transfer quantum information between interconnected devices, make secure quantum networks or abstracting and optimizing the execution of the quantum algorithms, based on the characteristics of the underlying distributed system. Our proposal will focus on this last issue. In particular our strategy will be to use quantum cellular automata-based architecture as formal framework. Theory of QCA is well-developed and their realizability as well and we believe this may help to identify new fundamental strategies to construct multiparty entanglement and to elaborate quantum-enhanced distributed algorithm. Moreover, one major limitation is that all known (few) results are fault-free and nowadays we do not know how errors propagate in the network, how they depends on the topology and affect the computational power. This will be one of the main open problems to which proposal is aiming to give an answer. â=80=8B ----
The successful candidate, during the internship, will profit of an existing and consolidated scientific collaboration with the Distributed Algorithms groups at LIS (DALGO) and the Department of Mathematics in Marseille (I2M) on the quantum error codings side.
URL sujet detaillé : https://www.giuseppe-dimolfetta.com/projects-for-students
Remarques :
|
|
|
|
|
SM207-194 Towards Law Versioning
|
|
Description
|
|
Legal texts are represented by XML files. Such texts have many kinds of relations with one another: to name only two, they can cite or modify each other.
Theoretical work on version control exist, expressed in a category of files and patches. We propose to generalize it by working in a double category of XML files, with one kind of arrows representing modifications, another one representing dependency.
The work program will start by defining an appropriate category of XML files (accounting for their tree-structures) and modifications and lifting the results on linear files to it; and then generalizing it to a double category.
It can also lead, in a more extended timeframe, towards more applied work, such as writing a versioning system implementing the aforementioned double category.
URL sujet detaillé : https://lacl.fr/~lpellissier/stage.pdf
Remarques : Gratification de stage
|
|
|
|
|
SM207-195 IP adaptation layer for Underwater Acoustic Sensor Networks (UASN)
|
|
Description
|
|
Context Several mega-projects for the deployment of offshore wind turbines are underway in France (Saint-Brieuc, Saint-Nazaire and Yeu farms in particular). The Blue IoT Eolia project aims to support these deployments by providing an underwater acoustic sensor network for monitoring underwater infrastructure. The difficulty to conduct experiments at sea justifies to study these networks first on reliable simulators. In this study we will focus on the DESERT simulator provided by the University of Padova, Italy [1].
Objectives The main objective of this project is to provide an Internet Protocol (IP) adaptation layer as we meet usually in the Internet of Things [2]. This adaptation layer is useful in the purpose of interconnection and interoperability of many devices (sensors, robots, modems..). We will experiment it on classical MAC layer as ALOHA or more dedicated protocol as TDA-MAC [3]. A second objective is to validate the approach done by the DESERT simulator in comparison with experiment at sea done in the past (on offshore shore wind farms).
Work to do - understanding specificity of underwater acoustic communications ; - conception of the IP adaptation layer based on SCHC protocol [3]; - practicing DESERT and specially collisions in different time and space ; - port TDA-MAC on DESERT (in the phases of synchronization and data transmission); - implement the IP adaptation on the simulator; - first QoS (Quality of Service) estimation (packet delivery ratio, end-to-end delay, payload size,..)
Skills Wireless networks (with an interest to the physical layers) IP Protocols Event driven simulator as NS2 C/C++, open source development
References [1] https://desert-underwater.dei.unipd.it/ [2] Gomez, Minaburo, Toutain et Barthel, Â" IPv6 over LPWANs: connecting Low Power Wide Area Networks to the Internet (of Things) Â", IEEE Wireless Communications PP(99),â=80=8E October 2019. [3] Morozs, N., Mitchell, P., & Zakharov, Y. V. (2017). TDA-MAC: TDMA without clock synchronization in underwater acoustic networks. IEEE Access, 6, 1091-1108.
URL sujet detaillé :
:
Remarques : co-supervised with Dr Nils Morozs from University of York
|
|
|
|
|
SM207-196 Differential privacy and byzantine resilience in federated learning.
|
|
Description
|
|
Federated learning (FL) enables a large number of IoT devices (mobiles, sensors) cooperating to learn a global machine learning model while keeping the devices data locally [1,2]. For example, Google has applied FL in their application Gboard to predict the next word that the users would enter on their smartphones [3].
During the training process, how to preserve the data privacy of the user and to prevent the failure of the learning due to malicious participants are two of the critical concerns in FL [4].
For the privacy concern, differential private (DP) algorithms are introduced by injecting the noises to the transmitted messages [5,6]. It ensures to a certain extent that even if the user changes just one training sample, the adversary should not observe a certain level of difference in the exchanged message and thus could not draw any conclusions. For the resilience of the system against the malicious participants, byzantine resilience algorithms are proposed to filter out the potential malicious updates from the users [7,8,9].
However, recent research shows that a direct composition of these techniques makes the guarantees of the resulting algorithm depend unfavorably upon the number of parameters of the ML model, making the training of large models practically infeasible [10]. New method needs to be carefully designed in order to ensure DP and byzantine resilience simultaneously [11].
In this internship, the student is required first to acquire knowledge on federated learning and understand the theoretical impossibility result [10]. He/She needs to implement the method [11] using PyTorch and show its performance on privacy and resilience against the malicious adversary in FL.
PREREQUISITES We are looking for a candidate with coding experience in Python for a machine learning task and good analytical skills.
REFERENCES [1] McMahan et al, Communication-Efficient Learning of Deep Networks from Decentralized Data, AISTATS 2017, pages 1273-1282 [2] Tian Li et al, Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, pages 50-60, 2020 [3] Hard, Andrew et al, Federated Learning for Mobile Keyboard Prediction. arxiv: 1811.03604, 2019 [4] Kairouz et al, Advances and Open Problems in Federated Learning [5] McMahan et al, Learning differentially private recurrent language model, ICLR 2018 [6] Bellet et al, Personalized and Private Peer-to-Peer Machine Learning, AISTATS 2018 [7] Yin et al, Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates [8] Krishna Pillutla et al, Robust Aggregation for Federated Learning [9] Blanchard et al, Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent, NIPS 2017 [10] Guerraoui et al, Differential Privacy and Byzantine Resilience in SGD: Do They Add Up?, PODC 2021 [11] Guerraoui et al, Combining Differential Privacy and Byzantine Resilience in Distributed SGD
URL sujet detaillé :
:
Remarques : The internship will be co-supervised by Chuan Xu. https://sites.google.com/view/chuanxu?pli=1
|
|
|
|
|
SM207-197 Chromatic number of graphs with bounded twin-width
|
|
Description
|
|
It has been recently shown by Pilipczuk and Soko=C5=82owski that the chromatic number of a graph with twin-width at most k is quasi-polynomially bounded by its clique number. The goal of this internship is to try to get a polynomial bound, as the one already existing for bounded clique-width by Bonamy and Pilipczuk.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-198 Projects in the MetaCoq ecosystem
|
|
Description
|
|
The MetaCoq project [1, 2] is a formalisation of Coq in Coq [5], which can be used as basis for meta- programming of tactics and commands, to prove meta-theoretic properties of the type theory of Coq such as subject reduction, and to verify programs crucial in the implementation of Coq such as type checking or extraction. We offer various projects on MetaCoq, spanning research areas of programming language theory, type theory, interactive theorem proving, compiler correctness, meta-programming, and implementation of user interfaces. For more details, see the pdf file available at https://yforster.de/downloads/m2-internship-lyon.pdf
URL sujet detaillé : https://yforster.de/downloads/m2-internship-metacoq-lyon.pdf
Remarques : Co-advising possible with Meven Lennon-Betrand (2), Kenji Maillard (1), Pierre-Marie Pédrot (1), Kazuhiko Sakaguchi (1), Matthieu Sozeau (1), Nicolas Tabareau (1), and Théo Winterhalter (3) 1 Gallinette Project-Team, Inria, Nantes, France
2 CLASH group, Computer Laboratory, Cambridge, UK
3 Deducteam Project-Team, Inria, Saclay, France
|
|
|
|
|
SM207-199 Malware analysis and detection
|
|
Description
|
|
Malware is a major cybersecurity outbreak. Detection of malware is a very challenging issue in particular because of the generation of variants using obfuscation methods from a known sample and also because of the difficulty to predict a program behavior and as a result to identify new threats. Our long-term goal is to mitigate malware by leveraging recent advances in software analysis and AI in order to propose efficient tools.
We propose several research directions with a combination of cutting-edge techniques symbolic execution, fuzzing, static analysis, reverse, dynamic analysis, and machine learning - Advanced reverse and decompilation with the goal of recovering the program meaning. Of particular interest are self-modifying programs. - Identification of malicious behaviors and malware detection. Of particular interest are IOT, Scada. - CVE and vulnerability detection in compiled library and applications - Generation of malware and program evolution, by combination of obfuscation methods and Deep Generative model achieved by state-of-the-art deep learning methods.
More details on the topics will be happily provided! The list is not exhaustive, ask us if you have some project in mind.
URL sujet detaillé : https://members.loria.fr/JYMarion/
Remarques : Carbone research group at LORIA is a leading group in low-level security and malware analyses, with regular publications in top-tier venues in security. We work in close collaboration with other French and international research teams, industrial partners and national agencies. This internship will be funded and is part of Defmal project is supported by a government grant managed by the National Research Agency under France 2030 with reference ``ANR-22-PECY-0007''. This internship may be pursued in PhD thesis (already funded by Defmal project).
|
|
|
|
|
SM207-200 Incentives to federated learning
|
|
Description
|
|
The increasing size of data generated by smartphones and IoT devices motivated the development of Federated Learning (FL) [1,2], a framework for on-device collaborative training of machine learning models. FL algorithms like FedAvg [3] allow clients to train a common global model without sharing their personal data; FL reduces data collection costs and protects clients' data privacy. At the same time, clients' local datasets may be drawn from different distributions and the global model may be unsatisfactory for a given client, who may then prefer to train a local model autonomously. This issue is mitigated by new FL algorithms which enable model personalization at the client level [4,5]. In order to prevent clients' defection, it is also possible to incentivize clients' participation.
The goal of this research project is to overview the different approaches to promote clients' participation to FL training, ranging from game-theoretic studies [6-8], clients' incentives for contributing data and computation resources [9-10], personalization approaches [4,5,12,13], and new approaches explicitly maximizing the fraction of clients incentivized to use the global model [14].
This research topic can lead to a PhD position. We are then looking for students with a strong motivation to pursue a research career.
Useful Information/Bibliography:
[1] Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia Smith. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37 (3):50'9660, 2020. [2] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurelien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019. [3] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communicationefficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273'961282. PMLR, 2017. [4] Othmane Marfoq, Giovanni Neglia, Aurelien Bellet, Laetitia Kameni, and Richard Vidal. Federated multi-task learning under a mixture of distributions, NeurIPS 2021 [5] Othmane Marfoq, Giovanni Neglia, Laetitia Kameni, Richard Vidal, Personalized Federated Learning through Local Memorization, ICML 2022. [6] Xuezhen Tu, Kun Zhu, Nguyen Cong Luong, Dusit Niyato, Yang Zhang, and Juan Li. Incentive mechanisms for federated learning: From economic and game theoretic perspective. arXiv preprint arXiv:2111.11850, 2021. [7] Kate Donahue and Jon Kleinberg. Model-sharing games: Analyzing federated learning under voluntary participation. In The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), 2021. [8] Kate Donahue and Jon Kleinberg. Optimality and stability in federated learning: A game-theoretic approach. In Advances in Neural Information Processing Systems, 2021. [9] Avrim Blum, Nika Haghtalab, Richard Lanas Phillips, and Han Shao. One for one, or all for all: Equilibria and optimality of collaboration in federated learning. In International Conference on Machine Learning, 2021. [10] Jingoo Han, Ahmad Faraz Khan, Syed Zawad, Ali Anwar, Nathalie Baracaldo Angel, Yi Zhou, Feng Yan, and Ali R. Butt. Tokenized incentive for federated learning. In Proceedings of the Federated Learning Workshop at the Association for the Advancement of Artificial Intelligence (AAAI) Conference, 2022. [11] Jiawen Kang, Zehui Xiong, Dusit Niyato, Han Yu, Ying-Chang Liang, and Dong In Kim. Incentive design for efficient federated learning in mobile networks: A contract theory approach. In 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS), 2019. [12] Valentina Zantedeschi, Aurelien Bellet, and Marc Tommasi. Fully decentralized joint learning of personalized models and collaboration graphs. volume 108 of Procexedings of Machine Learning Research, pages 864-874, 2020. PMLR. [13] Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. Ditto: Fair and robust federated learning through personalization. In International Conference on Machine Learning, pages 6357-6368. PMLR, 2021. [14] Yae Jee Cho, Divyansh Jhunjhunwala, Tian Li, Virginia Smith, Gauri Joshi, To Federate or Not To Federate: Incentivizing Client Participation in Federated Learning, arXiv:2205.14840
URL sujet detaillé :
:
Remarques : The internship can be paid through "gratification" (about 550 euros per month).
|
|
|
|
|
SM207-201 Cooperative Machine Learning Inference
|
|
Description
|
|
An increasing number of applications rely on complex inference tasks that are based on machine learning (ML). Currently, there are two options to run such tasks: either they are served directly by the end device (e.g., smartphones, IoT equipment, smart vehicles), or offloaded to a remote cloud. Both options may be unsatisfactory for many applications: local models may have inadequate accuracy, while the cloud may fail to meet delay constraints. In [1], we presented the novel idea of inference delivery networks (IDNs), networks of computing nodes that coordinate to satisfy ML inference requests achieving the best trade-off between latency and accuracy. IDNs bridge the dichotomy between device and cloud execution by integrating inference delivery at the various tiers of the infrastructure continuum (access, edge, regional data center, cloud). Nodes with heterogeneous capabilities can store a set of monolithic machine learning models with different computational/memory requirements and different accuracy and inference requests can be forwarded to other nodes if the local answer is not considered to be accurate enough.
In this project, we want to explore the possibility to enlarge the set of actions for nodes in an inference delivery network beyond the simple inference forwarding, by allowing models to be split across multiple nodes [3,4] and/or inferences from different nodes to be opportunely combined to improve their quality. In particular, we aim to compare specific model splitting techniques, with or without the insertion of bottlenecks [2], in terms of performance metric like inference delay and network load. We will evaluate different methodologies to estimate online the quality of an inference [5], and propose distributed bagging algorithms to combine inferences from different models [6-9].
This research topic can lead to a PhD position. We are then looking for students with a strong motivation to pursue a research career.
URL sujet detaillé :
:
Remarques : The internship can be paid through "gratification" (about 550 euros per month).
|
|
|
|
|
SM207-202 Approches formelles pour l'explicabilité de l'IA
|
|
Description
|
|
Artificial intelligence is present almost everywhere in industry, and is making its way into applications where its decisions can have dramatic consequences (autonomous vehicles, financial decisions, etc.). The question then arises: how much trust can be placed in artificial intelligence? How can this trust be justified?
The methods we are interested in use training dataset to build a decision process. This decision process is then used on testing dataset. Multiple methods exist, the internship is particularly interested in decision trees but other methods may be considered (e.g. neural networks).
The explicability of machine learning models aims to help humans understand the decisions made by these complex models. The need for explicability is one of the challenges for the use of machine learning techniques, especially in high-risk and safety-critical areas.
The objective of this internship is to develop tools that allow efficient and rigorous reasoning on machine learning models, in particular on explicability issues.
URL sujet detaillé : https://drive.google.com/file/d/1zlpGhXzaEXwoIoXjJJIDJNi41umfUEcu/view?usp=sharing
Remarques : This internship is part of a collaboration with Joao Silva-Marques, holder of a chair at the ANITI Institute (https://aniti.univ-toulouse.fr/).
Gratification: about 600 â=82=AC / month
|
|
|
|
|
SM207-203 SCHNYDER WOODS FOR HIGHER GENUS SURFACES, WITH APPLICATIONS TO GRAPH DRAWING
|
|
Description
|
|
We will focus on the study of the generalization of Schnyder woods to the case of graphs embedded on surfaces.
Schnyder woods have been introduced by Walter Schnyder in 1990 as a deep and nice characterization of planar graphs, given in terms of edge orientations and coloration, which lead to a huge number of applications to several domains, especially in graph drawing and graph encoding.
In this internship we will study the recent generalization of Schnyder woods for non planar graphs: in particular we will consider graphs embedded on low genus surfaces, for which several interesting questions are still open, especially concerning the design of efficient algorithms for computing grid drawings with polynomial resolution for graphs having genus g>=2.
URL sujet detaillé : http://www.lix.polytechnique.fr/~amturing/stages.html
Remarques : stage gratifié
|
|
|
|
|
SM207-204 Exploring the tradeoffs between energy and performance of federated learning algorithms
|
|
|
|
SM207-205 Learning dynamics of high-level sports from automatic video analysis
|
|
Description
|
|
The LJK laboratory and INRIA Grenoble center supervise the project PerfAnalytics (http://perfanalytics.fr ), a large project on automatic video analysis for scientific and technical support of French athletes for their preparation to the 2024 Olympics Games. Using an existing dataset of motion and force measurement for standards activity such as walking, as well as sport climbing, the goal of the internship is to show if a prediction of forces from 3D motion extracted from video is stable enough using a machine learning approach for time series.
URL sujet detaillé : https://perfanalytics.fr/docs/stageM2_ENS_PerfAnalytics_2023.pdf
Remarques :
|
|
|
|
|
SM207-206 k-local Hamiltonian problem and Quantum Cellular Automata
|
|
Admin
|
|
Encadrant : Giuseppe DI MOLFETTA |
Labo/Organisme : The CaNa research group (Giuseppe Di Molfetta, Kevin Perrot, Sylvain Sene, Enrico Porreca) seeks to capture at the formal level some of the fundamental paradigms of theoretical physics and biology, via the models and approaches of theoretical computer science and discrete mathematics. The group is located in Luminy, Marseille, France, and benefits from a rich scientific environment with the Cellular Automata experts of I2M (Pierre Guillon, Guillaume Theyssier) and the physicists from CPT (Alberto Verga, Thomas Krajewski). |
URL : https://www.giuseppe-dimolfetta.com |
Ville : Marseille/Paris |
|
|
|
Description
|
|
The goal : The question here is whether we can classify and characterize Hamiltonian problems complexity, using QCA as a mathematical framework and importing results from quantum simulation. Recently Sellapillay, Verga and Di Molfetta have introduced a QCA, based on Toffoli gate, which undergoes a dynamical transition between a classical regime to one where the system is led by a well known operator, the PXP-Hamiltonian, displaying long range correlation. The aim of the internship student will be to understand this transition and how it affects the ground state of the system. We expect to be able to prove a corresponding transition between different complexity classes associated to the corresponding local Hamiltonian problems.
Keywords :k-local hamiltonian, complexity theory, quantum cellular automata Place : LIP6, QI Team. Scientific environment: The QI research group (Eleni Diamanti, Alex B. Grilo, Frederic Grosshans, Elham Kashefi, Damian Markham, Marco Quintino) touches from very theoretical topics on quantum information (complexity theory, cryptographic protocols, quantum foundations) to applied ones (implementation of protocols, machine learning). The group is located in Paris, France. Laboratoire d'Informatique et systèmes (LIS), Natural Computing team (CaNa). Scientific environment: The CaNa research group (Giuseppe Di Molfetta, Kevin Perrot, Sylvain Sene, Enrico Porreca) seeks to capture at the formal level some of the fundamental paradigms of theoretical physics and biology, via the models and approaches of theoretical computer science and discrete mathematics. The group is located in Luminy, Marseille, France, and benefits from a rich scientific environment with the Cellular Automata experts of I2M (Pierre Guillon, Guillaume Theyssier) and the physicists from CPT (Alberto Verga, Thomas Krajewski).
URL sujet detaillé : https://docs.google.com/document/d/1ZXO_OKkGambTRfRCb0QPEktYonDBX1lJ7BrAjPRdgNg/edit?usp=sharing
Remarques : The stage will be co-directed by Di Molfetta (LIS) and Grilo (LIP6) and will be co-supervised by Kevissen Sellapillay, postdoc at Juilich University.
(Possible extension in thesis)
Don't be afraid to ask ;)
|
|
|
|
|
SM207-207 Machine-checked undecidability proofs regarding contextual equivalence in PCF
|
|
Description
|
|
This project aims at giving machine-checked undecidability proofs of undecidability of various problems regarding contextual equivalence in PCF in the proof assistant Coq. The final goal is to give a (possibly simplified) undecidability proof of contextual equivalence of finitary PCF, a famously hard result in the area of semantics of programming languages due to Loader.
URL sujet detaillé : http://guilhem.jaber.fr/proposal-undecidable-ctxeq.pdf
Remarques : Co-encadrement avec Guilhem Jaber (Nantes Université, LS2N).
Possibilité de financement permettant de rémunérer le stage.
|
|
|
|
|
SM207-208 Approximation algorithms for classical and quantum coding problems
|
|
Description
|
|
Information theory studies the optimal rates for performing a task given access to a resource. Since Shannon's seminal work, this question is studied by formulating the question in statistical terms. Recently, motivated by quantum information, this problem started to be studied from an algorithmic angle.
For example, for the point-to-point channel coding question, this question was related to the maximum coverage problem https://arxiv.org/abs/1508.04095. The objective of this internship is to extend this result in several directions (depending on the background/preferences of the intern): in the classical setting, we will consider the more general channel simulation question, or in the quantum setting we consider the transmission of classical information over quantum channels. Depending on the preferences of the student, the internship can be either theoretical (establishing provable guarantees) or numerical (designing and running algorithms for some cases of interest) or, ideally, both.
URL sujet detaillé :
:
Remarques : Rénumération possible
|
|
|
|
|
SM207-209 Distance transform on cellular spaces and its computation
|
|
Description
|
|
Distance transforms are an important tool in image processing used for many applications. Given a binary digital image, whose space can be seen as a (regular) cubical cellular space, its distance transform specifies the distance from each pixel to the nearest foreground pixel. Such distance transforms play a central role in comparing digital shapes, computing the medial axis of digital shapes, segmenting images into regions, etc. There exist several algorithms for computing exact or approximate Euclidean distance transform in linear time with respect to image size in cubical cellular space.
In this research internship, we aim at studying more general cases, namely irregular cellular spaces which are, for instance, generated by superimposition of two regular cellular spaces or a Voronoi diagram. More precisely, we focus on the basics of distances and on algorithms for computing distance transforms on such cellular spaces in an efficient manner.
URL sujet detaillé : https://kenmochi.users.greyc.fr/tmp/sujetStageM2_2023.pdf
Remarques : Co-advising: Nicolas Passat (CReSTIC) and Phuc Ngo (LORIA)
|
|
|
|
|
SM207-210 Unguarding recursion with temporal refinements
|
|
Description
|
|
Functional programming on infinite datatypes, such as streams or non-wellounded trees, is by now well established thanks to declarative definitions and equational reasoning on high-level abstractions, in particular when infinite objects are represented by coinductive types. The goal of the internship is to work towards a comprehensive framework for specifying input-output temporal properties of higher-order programs that handle infinite datatypes. This includes working with guarded lambda-calculus, possibly within guarded type theory, establishing a guarded model of call-by-push-value and investigating the notions of liveness and totality in categorical models of guarded recursion. More details can be found at https://kenji.maillard.blue/unguardingRecursion.pdf.
URL sujet detaillé : https://kenji.maillard.blue/unguardingRecursion.pdf
Remarques : Co-encadrement avec Guilhem Jaber (LS2N, Nantes) et Colin Riba (LIP, ENS-Lyon)
|
|
|
|
|
SM207-211 Heuristics for Birkhoff--von Neumann decomposition
|
|
Description
|
|
A doubly stochastic matrix is a square matrix of nonnegative elements, where elements in each row sum up to one, as the elements in each column. Birkhoff--von Neumann (BvN) decomposition writes a given doubly stochastic matrix as a convex combination of permutation matrices. The BvN decomposition of a given matrix is not unique, and applications require sparsest representation in which we seek the minium number of permutation matrices. Finding such solutions has been shown to be NP-hard and effective heuristics are of significant interest.
Since a permutation matrix can be seen as a perfect matchings, the BvN decomposition can be described in terms of bipartite graphs and perfect matchings. The aim of this internship is to investigate this connection to come up with a heuristic based purely on graphs, and another one based on a numerical method in which subsets are hopefully combinatorial so that we can develop the most effective heuristics.
Key references: + Richard A. Brualdi, Notes on the Birkhoff algorithm for doubly stochastic matrices, Canadian Mathematical Bulletin 25 (2) (1982) 191-199.
+ Fanny Dufosse and Bora Ucar, Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices, Linear Algebra and its Applications, vol. 497, 108-115 (also available as [Research Report] RR-8852, Inria Grenoble Rhone-Alpes, Feb. 2016.)
+ Fanny Dufosse, Kamer Kaya, Ioannis Panagiotas, and Bora Ucar, Further notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices, Linear Algebra and its Applications, 554 (2018), pp. 68-78; also available as [Research Report] RR-9095, Inria Grenoble Rhone-Alpes, 2017.
URL sujet detaillé : https://hal.inria.fr/hal-01270331
Remarques : Potential collaboration with Jeremy Cohen (CNRS, at CREATIS Lab, Lyon), https://jeremy-e-cohen.jimdofree.com, and the standard internship salary.
|
|
|
|
|
SM207-212 Lambda-calcul probabiliste infinitaire
|
|
Description
|
|
Il s'agit d'explorer la possibilité d'étendre simultanément le lambda-calcul dans deux directions, développées séparément jusque la.
D'une part, on peut considérer un lambda-calcul probabiliste, c'est- -dire a minima muni d'une primitive de choix probabiliste binaire. La littérature abonde sur le sujet, avec des motivations plus ou moins appliquées (dont l'étude de la programmation probabiliste).
D'autre part, des versions infinitaires du lambda-calcul ont été introduites: on y considère des termes potentiellement infinis, et des suites infinies de réductions, dont la convergence dépend d'hypothêses supplémentaires. Cette extension permet de représenter dans un meme système les lambda-termes et leurs arbres de B=B6hm, et assure que tous les termes ont une forme normale, indépendamment de toute contrainte de typage (au prix de l'existence de formes normales infinies).
Une série de travaux issus du domaine de la logique linéaire fournit des outils pour envisager une version infinitaire du lambda-calcul probabiliste: * d'une part les arbres de B=B6hm probabilistes de Leventis, * d'autre part la notion de développement de Taylor des programmes (étudiée par Dal Lago et Leventis pour le lambda-calcul probabiliste, et récemment étendue par Cerda et Vaux pour le lambda-calcul infinitaire).
L'objectif premier du stage sera d'adapter l'approche infinitaire au lambda-calcul probabiliste, en tentant d'obtenir un résultat de normalisation vers les arbres de B=B6hm probabiliste. Les propriétés du développement de Taylor pourront fournir des intuitions précieuses dans ce but.
URL sujet detaillé :
:
Remarques : Financement possible par un projet ANR
|
|
|
|
|
SM207-213 Infinitary probabilistic lambda-calculus
|
|
Description
|
|
The goal is to explore the feasibility of extending the ordinary untyped lambda-calculus simultaneously in two directions.
On the one hand, we can consider a probabilistic lambda-calculus, i.e. the lambda-calculus augmented with a probabilistic (a minima, binary) choice primitive. The literature on the subject is abundant, with various motivations (including the study of functional probabilistic programming).
On the other hand, infinitary variants of the lambda-calculus have been proposed in the past two decades: these introduce potentially infinite terms and reduction sequences, whose convergence relies on additional conditions. This extension allows to represent, in a common framework, the ordinary, finite lambda-terms and their Bohm trees (infinite terms in normal form), and ensures that every term is normalisable (at the cost of considering infinite normalisation), giving a rewriting-theoretic counterpart to various denotational models.
A series of works in the domain of linear logic suggest to consider an infinitary version of the probabilistic lambda-calculus. Notably: * the probabilistic Bohm trees introduced by Leventis, * the notion of Taylor expansion of lambda-terms (studied by Dal Lago and Leventis for the probabilistic lambda-calculus, and recently extended to the infinitary lambda-calculus by Cerda and Vaux).
The purpose of the training period will be to extend the infinitary approach to the probabilistic lambda-calculus, aiming to obtain a result of infinitary rewriting theory generalizing the normalization of probabilistic lambda-terms to their probabilistic B=B6hm trees. The properties of Taylor expansion will certainly offer precious intuitions to guide this research.
URL sujet detaillé :
:
Remarques : Financement possible par un projet ANR
|
|
|
|
|
SM207-214 Real-time analysis and verification of ROS2 robotic applications
|
|
Description
|
|
ROS is the most popular framework for the development, prototyping and deployment of robotic applications, with thousands of off-the-shelf components ready to use across multiple platforms. However, a major weakness of ROS is the absence of real-time guarantees, making it difficult to prove (or at least guarantee with some confidence) the safety of ROS-based robotic applications vis-a-vis timed properties, especially in the presence of data shared among ROS services. Since 2015, work has been done on a more real-time version of ROS, called ROS2 (a first release was produced in 2017). This new version offers more deterministic mechanisms in terms of communication and execution. In particular, the addition of an execution management mechanism (called Executor) allows to define more clearly the behavior of the application threads. As for communication, it has gained in determinism thanks to the use of the DDS standard. Work in progress [1, 2] is investigating these different aspects and should eventually allow to develop reliable real-time robotic applications using ROS2. The objective of this Masters internship is to get familiar with the notion of Executor of ROS2 as well as with the DDS standard in order to enable rigorous analysis of timed behaviours. This study is a first step to approach the verification of ROS2 applications. Several elements will be explored during this master's internship : - bibliographical study on DDS and on ROS2 Executor : to make a bibligraphic study in order to identify precisely the current solutions made at the level of services with data sharing. A particular focus will be put on the modeling or the methods allowing to model and to validate these mechanisms. - Modeling and verification of real-time behaviors in a ROS2 application : the objective is to formally model the behavior of an application developed in ROS2 taking into account the DDS and/or Executors. The methods and tools for formal modeling will be chosen at the beginning of this task, with a focus on mixed methods combining transition-system-based verification and schedulability analysis in the style of [3] - Setting up a ROS2 experimentation platform : the goal here will be to deploy ROS2 on a physical hardware. Several alternatives are possible depending on the hardware target, either on a CPU with a Linux support or on a microcontroler with a dedicated RTOS. This platform will be used to evaluate the temporal performances of these different supports.
[1] Varillon, Benoit and Chaudron, Jean-Baptiste and Doose, David and Lesire, Charles Corail, ROS 2 temps réel. Corail, ROS2 temps réel. In : ROSConFr 2021. [2] Casini, Daniel; Bla=9F, Tobias; Lutkebohle, Ingo; Brandenburg, Bj=B6rn B. Response-Time Analysis of ROS 2 Processing Chains Under Reservation-Based Scheduling. In Proc. of 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). [3] Foughali, Mohammed ; Hladik, Pierre-Emmanuel. Bridging the gap between formal veri- fication and schedulability analysis. Journal of systems architecture. 2020.
URL sujet detaillé : https://www.dropbox.com/s/q06btmrhfpauo15/internship_ROS2_en.pdf?dl=0
Remarques : Co-encadré avec Pierre-Emmanuel Hladik (LS2N).
|
|
|
|
|
SM207-215 Pebble transducers with superpolynomial growth
|
|
|
|
SM207-216 Randomized algorithm in an H-matrix solver
|
|
Description
|
|
We have developped with Inria Bordeaux a task-based solver (using starPU and c++) to factorize and solve dense linear systems using hierarchical matrices. This solver is used to solve acoustic and electromagnetics problems in Airbus Design Labs. The internship will consist in evaluating and integrating randomized algorithms (for QR and SVD factorization) in this solver. This work is halfway between applied mahts, algorithms, computer science and task based programming. A CIFRE PhD starting in autumn 2023 is already secured as a continuation to this internship. The subject will be to implement new algorithms in the H-matrix algorithms to handle mixed precision, develop novel methods for handling large blocks, or work on a GPU implementation. The internship is exclusively reserved to candidates interested in following in a PhD.
URL sujet detaillé :
:
Remarques : payment : yes co-advising : yes, the internship is done within the Concace joint team in collaboration with Inria Bordeaux and Cerfacs.
Important : the internship is to be followed by a CIFRE thesis at autumn 2023.
An open source version of the h-matrix tool (not task-based) is available under the name hmat-oss on github.
|
|
|
|
|
SM207-217 Energy-efficient Green VNF-FG placement and chaining for softwarized 5G/6G mobile networks
|
|
Description
|
|
The main goal of this internship proposal is to define a proof-of-concept for building an intelligent energy-efficient approach to address Virtualized Network Function Forwarding Graph (VNF-FG) placement and chaining with VNFs shared across multiple tenants to optimize resource usage, reduce energy consumption, and increase provider revenue in the context of 5G/6G mobile networks.
Our approach leverages a multi-agent deep reinforcement learning technique for VNF-FG placement and chaining in NFV/SDN-enabled infrastructures.
URL sujet detaillé : https://www.e4c.ip-paris.fr/api/v1/job/download/?filename=information/job/OffreGreenVNF.pdf
Remarques : Le stage est financé par le centre interdisciplinaire E4C (Energy4Climate) d'IP Paris.
Etant maitre de conférences a l'ENSIIE (coté enseignement) et a SAMOVAR (TSP, IP Paris) (coté recherche), je co-encadre ce stage avec Omar Houidi (postdoc a TSP).
|
|
|
|
|
SM207-218 Enhancing low-rank updates performance for sparse direct solvers
|
|
Description
|
|
Sparse direct solvers are widely used in various applications. In order to enhance those solvers, low-rank compression techniques have been recently introduced. In this internship, we propose to enhance the low-rank update kernels, see the detailed subject to get more information.
URL sujet detaillé : http://perso.ens-lyon.fr/gregoire.pichon/sujet_updates.pdf
Remarques : Co-advised with Bora U=A7ar Possibility to continue with a PhD
|
|
|
|
|
SM207-219 Bornes inférieures pour la certification de non-colorabilité
|
|
Description
|
|
Un modèle classique pour étudier les systèmes distribués est le modèle LOCAL. Dans ce modèle, un réseau est représenté par un graphe, et au début du calcul les noeuds connaissent seulement les aretes qui leur sont incidentes. On suppose qu'a chaque étape, les noeuds peuvent transférer une quantité arbitraire d'information a leurs voisins via ces aretes. En algorithmique distribué, on se demande alors en combien d'étapes (rounds) de transfert d'information on peut résoudre un problème donné (par exemple construire un arbre couvrant, ou calculer une coloration).
En certification locale, on suppose que l'on a déja fait ces calculs et on se demande quelle quantité d'information il faudrait retenir pour pouvoir vérifier "plus tard" que le graphe n'a été modifiée et que la propriété est toujours vérifiée. Plus formellement, chaque sommet re=A7oit un certificat. Un sommet doit, en regardant seulement son certificat et celui de ses voisins dans le graphe, décider s'il accepte ou pas son certificat. Le but est que la propriété globale soit satisfaite si et seulement si tous les sommets acceptent leurs certificats.
Ces dernières années, la certification locale a re=A7u une attention notable, en particulier la ligne de recherche qui consiste a établir la taille optimale des certificats pour une propriété donnée (en fonction du nombre de noeuds du réseau, noté n). On peut montrer que toutes les propriétés peuvent etre certifiées avec O(n^2) bits, et de nombreux problèmes nécessitent des certificats d'au moins log n bits. Cependant, pour certains problèmes pourtant difficiles algorithmiquement, il n'existe pas de bornes inférieures non triviales. En particulier, pour certifier qu'un graphe est k-colorable, on conjecture une borne inférieure en Omega(log k), mais la seule borne connue établit qu'un unique bit n'est pas suffisant.
Le but de ce stage serait d'améliorer ces bornes inférieures, si possible de prouver la conjecture, et d'adapter les techniques pour d'autres propriétés similaires.
URL sujet detaillé :
:
Remarques : Co-advisor: Laurent Feuilloley (LIRIS) - https://perso.liris.cnrs.fr/lfeuilloley/
|
|
|
|
|
SM207-220 Hierarchies for graph partition problems
|
|
Description
|
|
Several graph partitioning problems such as max-bisection and correlation clustering have algorithms, which achieve the respective best-known approximation guarantees, that are based on natural mathematical programming (LP or SDP) relaxations augmented by constraints from hierarchies (Lasserre or Sherali-Adams). However, it is not clear if these extra constraints are really necessary to achieve these approximation ratios. The goal of this project is to find rounding approaches for these problems using smaller relaxations and/or using hierarchies for related problems.
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-221 Arguments de terminaison avec l'assistant de preuve COQ (WQO with COQ)
|
|
Description
|
|
La preuve de terminaison d'algorithmes est un problème difficile. En effet, il faut montrer que les boucles s'arretent, c'est-a-dire qu'elles ne peuvent pas s'exécuter un nombre infini de fois. Pour montrer cela formellement, on peut utiliser une fonction vers les entiers naturels qui décroit strictement a chaque itération, ou bien plus généralement une fonction vers un ordre bien fondé qui décroit de meme a chaque itération.
Pour certains algorithmes, ces ordres bien fondés proviennent de structures mathématiques appelées beaux préordres. Formellement un préordre est beau si de toute suite infinie d'éléments il est possible d'extraire une sous-suite croissante. Les beaux préordres peuvent se combiner, par exemple par produit cartésien, ou par extension a des mots. Les théorèmes montrant ses résultats sont appelés Lemme de Dickson et Lemme de Higman.
On souhaite pouvoir utiliser ces lemmes pour montrer formellement la terminaison de certains algorithmes a l'aide d'assistants de preuve, et en particulier avec l'outil COQ. Pour cela, les preuves des lemmes de Dickson et de Higman mais aussi la définition des beaux préordres, habituellement données en logique classique, doivent etre converties en logique intuitionniste.
En 2012, Dimitrios Vytiniotis, Thierry Coquand et David Wahlstedt ont donné une définition en logique intuitionniste des beaux préodres, appelés relations presque pleines, et donnés une preuve du lemme de Dickson.
Il existe une autre fa=A7on de décrire les beaux préordres. L'objectif du stage est d'étendre les travaux sur les relations presque pleines a cette autre description, ou d'étendre le Lemme de Higman aux relations presques pleines. Dans tous les cas, on commencera par une preuve papier détaillée avant de l'implémenter dans l'outil COQ.
Bibliographie: https://link.springer.com/chapter/10.1007/978-3-642-32347-8_17 https://fr.wikipedia.org/wiki/Lemme_de_Dickson https://fr.wikipedia.org/wiki/Lemme_de_Higman
URL sujet detaillé :
:
Remarques : Co-encadrants: Thibault Hilaire
David Ilcinkas
Rémunération:
Des bourses de l'université de Bordeaux sont disponibles. La demande d'une bourse doit-etre faite rapidement.
|
|
|
|
|
SM207-222 Collision-free resolution of 3D kinematic constraints
|
|
Description
|
|
The GENERAT3D project funded by the French National Research Agency proposes to design a method for synthetic generation of CAD model assemblies. It is in this context that it is essential to automate the process of solving kinematic constraints without collisions so that the generated assemblies remain visually consistent. The goal of this project is to evaluate the advantages and disadvantages of different kinematic solvers available in the literature. Once the choice of the solution has been made, the solution will have to be developed and implemented in Python or C++ in order to be integrated into the open source software FreeCAD.
URL sujet detaillé : https://github.com/luvrgz/internship-proposal/raw/main/proposition_stage_solveur_en.pdf
Remarques :
|
|
|
|
|
SM207-223 Several topics in Reinforcement Learning
|
|
Description
|
|
The Supaero Reinforcement Learning Initiative (SuReLI) can host internships this year for outstanding MSc students. Check our website and publications for our current research interests: https://sureli.isae-supaero.fr
Among current hot topics: - offline RL - RL in the low data regime - robust MDPs - learning representations for generalization in RL (possible extensions of [1,2,3]) - better understanding of variance in SGD and implications on model-based Deep RL (following up on [4]) - representations of deep NNs (for evolutionary optimization among other things [5]) - neural architecture search [6]
Common application benchmarks in SuReLI: - the usual suspects in RL (OpenAI Gym, ProcGen, Deepmind Control Suite, etc.) - new line of research on the application of deep RL to the control of tumoral micro-environments - control of fluid flows and computational fluid dynamics - mobile robotic applications (simulated or real, we have a bunch of platforms in the lab) - coupling RL and mixed integer linear programming for industrial engineering topics
We can offer detailed internship topics, but we are interested in particular in students who take an interest in our research first. We are also open to discussing new research topics. Feel free to reach out to me to discuss your research proposal. Some opportunities to stay in the team as a PhD student might arise during the coming year.
URL sujet detaillé :
:
Remarques : These are paid internships, with a perspective to remain in the team as PhD students afterwards.
|
|
|
|
|
SM207-224 Commutative Submonads of Strong Monads
|
|
Description
|
|
This internship involves doing research in category theory and lambda calculi with computational effects that are modelled via monads. The concept of a (strong) monad was initially introduced in category theory and it subsequently found use in programming language theory and monads today are used by programming languages such as Haskell, Idris and others.
Many monads of interest are strong, but not commutative. Categorically, commutative monads enjoy stronger mathematical properties compared to strong monads. Computationally, in commutative monads it is safe to change the order of monadic sequencing for admissible operations, whereas in a strong monad that need not be the case.
In recent work [1], we formulated the notion of "centre" for a strong monad that allows us to identify commutative submonads of strong monads that overcome some of the problems identified above. Less formally, the centre of a strong monad, whenever it exists, may be thought of in a similar way to the centre of a group, monoid, semiring, etc.
The goal of this internship is to further develop the above results. This can include:
* Identifying interesting strong monads from the literature that have non-trivial centres.
* Studying the notion of commutants of a strong monad by taking inspiration from [2,3] and introducing a computational or logical interpretation.
* Studying the Eilenberg-Moore Category of a central submonad and its relationship with the Eilenberg-Moore category of the original monad.
* Studying how distributive laws can be used to combine the centres of two strong monads that can be combined via a distributive law.
* Potentially other related goals.
For more information, please write me an email and then we can discuss the internship in greater detail.
[1] https://arxiv.org/abs/2207.09190
[2] https://link.springer.com/article/10.1007/s10485-017-9503-1
[3] https://www.sciencedirect.com/science/article/pii/S0022404915002510
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-225 Revisiting A
|
|
Description
|
|
In their 2011 paper "A semantic measure of the execution time in Linear Logic", de Carvalho, Pagani and Tortora de Falco explain how to the relational semantics of Linear Logic proof nets allows to obtain exact bounds on the number of cut elimination steps leading to a normal form.
Recasting their work in the setting of Taylor expansion, will hopefully shed new light on those important results.
See the full subject for details.
URL sujet detaillé : https://www.i2m.univ-amu.fr/perso/lionel.vaux/pub/internship-taylor-dcpt.pdf
Remarques :
|
|
|
|
|
SM207-226 Security Analysis of a Cyber-Physical Platform
|
|
Description
|
|
Information systems are increasingly complex and spreading in every sector, often built from off- the-shelf components, with little to no security guarantees whatsoever. To regain its sovereignty, a State needs to build its cybersecurity posture from certified software and hardware stacks, security-wise. To that end, it is necessary to design an evaluation methodology that enables the validation of the system's security from different perspectives. One of them deals with the technical audit of the system's components in the face of possible threats, with different levels of security enabled. In the CERES project, we study the security of different cyber-physical systems. One of our use cases is the building information (management) system or its modern instantiation, the smart building. In particular, we aim at studying the current and future threats and the impact related to the interconnection of different components. We have started to build a first demonstration platform featuring a single cyber-physical system made of sensors/actuators, programmable logic controllers (PLCs), and a SCADA (Supervisory Control and Data Acquisition) system, which we will use to model and evaluate threats, security measures and impacts. This internship aims to assist our team in operating the demonstration platform and study the security of its use case. In particular, based on risk assessment conclusions, we will orient our security analysis of the building use case with respect to the envisaged threats and implement potential attacks. We will study the ability of an attacker to penetrate one of the several layers of the target system.
Activities include: Activities ' take charge of the platform components and complete its documentation ' perform a risk assessment (RA) of the use case for a given context ' study the impacted layers, protocols, components retained in the RA ' implement some of the attack scenarios
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-227 Feerated learning with untrusted server
|
|
Description
|
|
MOTIVATION AND CONTEXT
Over the last decades, there has been an increasing interest in exploiting data. On the other hand, recently there has also been an increasing awareness of the risks of collecting sensitive data centrally, given the frequency of data leaks, hacking or abuse. INRIA's Magnet team is interested in decentralized privacy-preserving machine learning where the sensitive data remains with the data owners, and machine learning is performed collaboratively by these data owners by participating in collaborative algorithms which (through the use of differential privacy and/or encryption) generate the desired statistical models but prevents sensitive data from being revealed. Important ongoing research projects in the team include the href{http://researchers.lille.inria.fr/jramon/projects/tip.html}{TIP}, href{http://researchers.lille.inria.fr/jramon/projects/trumpet.html}{TRUMPET} and href{http://researchers.lille.inria.fr/jramon/projects/flute.html}{FLUTE} projects. This internship fits into the larger research program including these projects.
Most current proposals for federated learning involve one or more servers which either have a trusted execution environment (TEE) or are assumed to not collude (exchange information). This isn't fully satisfactory. First, in the current geopolitical environment governments are unwilling to trust TEEs on processors constructed by large companies based in other continents. Second, as over recent years media has reported about a large number of data leaks and about the huge economic interests in obtaining information, it is hard for consumers to trust that a set of servers don't collude, especially as such assumption is very difficult to verify.
OBJECTIVES
Therefore, the goal of this internship project is to build a Federated Learning system with Untrusted Servers (FLUS), in particular for scenarios where a server is not trusted to properly protect secrets.
Simplifying assumptions. Of course, a purely decentralized approach is sometimes very inefficient, hence we will assume that a data user, e.g., a company who has interest in the success of a machine learning effort, may have an incentive to set up a server which properly coordinates a computation and organizes communication between parties. We will also assume that there is a proper public key infrastructure (PKI) so parties can prove they are unique citizens.
Prior work. In the past, the MAGNET team has researched this scenario and developed promising algorithms for it, e.g., in the context of the PhD thesis of Cesar Sabater. Also, in the TAILED project the MAGNET team is developing an open source library containing infrastructure to support developing secure, privacy-preserving machine learning algorithms. The road is now free to develop our first FLUS algorithms and experimentally evaluate them.
In particular, the objectives of the internship project are (1) to develop a FLUS algorithm (probably for logistic regression, but we may change to another machine learning task if our medical partners express strong preferences before the start of the internship project) (2) to experimentally evaluate its correctness and scalability on a medical benchmark dataset (3) optionally, if time allows, to develop a more programmer-friendly interface to the system so developers can run scikit-learn style scripts as FLUS algorithms on decentralized data.
Whether work on the third optional objective is started depends on the skills of the student and the challenges encountered underway. Both with and without this part an interesting project can be carried out.
PLAN
Here is a tentative work plan: (1) Literature study. Among others, getting familiar with distributed algorithms, basic multi-party computing concepts, the existing TAILED library, related work on federated machine learning, logistic regression (2 weeks) (2) Initial development cycle (2a) Implementation of basic statistics aggregation strategies (6 weeks) (2b) Implementation of a FLUS logistic regression algorithm (3 weeks) (2c) Testing correctness and scalability on benchmark dataset (3 weeks) (3) Extending features to improve privacy guarantees and scalability and to add features requested by users/partners. (3 weeks) (4) Development of programmer-friendly interface, 'compiling' scikit-learn scripts into FLUS operations (3 weeks) (5) Testing (4 weeks) (6) Completion of the internship report (2 weeks)
URL sujet detaillé : http://researchers.lille.inria.fr/jramon/jobs/2022-FederatedLearningWithUntrustedServer.pdf
Remarques : INRIA will provide facilities and funding according to the applicable regulations.
|
|
|
|
|
SM207-228 A pilot study for federated learning on oncological data
|
|
Description
|
|
MOTIVATION AND CONTEXT
Over the last decades, there has been an increasing interest in exploiting data. On the other hand, recently there has also been an increasing awareness of the risks of collecting sensitive data centrally, given the frequency of data leaks, hacking or abuse. The Horizon Europe projects href{http://researchers.lille.inria.fr/jramon/projects/trumpet.html}{TRUMPET} and href{http://researchers.lille.inria.fr/jramon/projects/flute.html}{FLUTE} will work towards a platform for secure privacy-preserving federated machine learning where the sensitive data remains with the data owners, and machine learning is performed collaboratively by these data owners by participating in collaborative algorithms which (through the use of differential privacy and/or encryption) generate the desired statistical models but prevents sensitive data from being revealed.
These projects also feature use cases in medicine, in particular the TRUMPET project will study lung cancer clustering and eligibility prediction for radiotherapy for head and neck cancer patients, while the FLUTE project will study prediction and diagnosis of prostate cancer.
Before tackling these use cases with federated learning, this internship will conduct a first, short exploratory study to understand how these medical machine learning problems could be solved using machine learning in a simpler setting with central data.
OBJECTIVES
The goal of this internship project is to find an adequate machine learning approach to at least one of the TRUMPET use cases.
In particular, the objectives are (1) to study the literature on the specific machine learning task at hand (2) to make a description of the format and structure of the available data, the clinical objectives and the relevant background knowledge (3) to shortlist a few adequate machine learning strategies and test them on public or synthetic data in similar conditions as the real data which will be used later. (4) to select a best alternative and bring the concerned algorithm in a format that can be implemented in a federated learning framework.
URL sujet detaillé : http://researchers.lille.inria.fr/jramon/jobs/2022-trumpet-explore.pdf
Remarques : INRIA will provide facilities and funding according to the applicable regulations.
|
|
|
|
|
SM207-229 Numerical inference of differential privacy guarantees
|
|
Description
|
|
MOTIVATION AND CONTEXT
Over the last decades, there has been an increasing interest in exploiting data. On the other hand, recently there has also been an increasing awareness of the risks of collecting sensitive data centrally, given the frequency of data leaks, hacking or abuse. INRIA's Magnet team is interested in decentralized privacy-preserving machine learning where the sensitive data remains with the data owners, and machine learning is performed collaboratively by these data owners by participating in collaborative algorithms which (through the use of differential privacy and/or encryption) generate the desired statistical models but prevents sensitive data from being revealed. Important ongoing research projects in the team include the href{http://researchers.lille.inria.fr/jramon/projects/tip.html}{TIP}, href{http://researchers.lille.inria.fr/jramon/projects/trumpet.html}{TRUMPET} and href{http://researchers.lille.inria.fr/jramon/projects/flute.html}{FLUTE} projects. This internship fits into the larger research program including these projects.
Even with perfect security, the output of an algorithm can still leak information. That is why it is interesting to study statistical privacy notions such as differential privacy. There is a large body of work proving differential privacy guarantees for specific machine learning algorithms. Unfortunately, the composition theorems which are commonly used aren't tight bounds. In reality better privacy is achieved than what can be easily proven symbolically, especially if a machine learning algorithm is complex and involves iterations.
In this internship, we want to explore a different approach, where we don't study differential privacy guarantees using classic symbolic proof techniques, but where we attempt to find out how private a given algorithm is by an automated numeric analysis. The idea is inspired on other existing numeric approaches, e.g., if one can't give a simple closed form expression for an integral, it is common to approximate its value using numeric integration, dividing the interval over which one should integrate into intervals for which an accurate estimate can be obtained.
OBJECTIVES
The goal of this internship project is to research and validate algorithm(s) to give differential privacy guarantees for machine learning algorithms using numerical techniques.
The end result should consist of (a) an algorithm for providing differential privacy guarantees, (b) theory showing correctness / security of this algorithm, (c) empirical results showing how much better the guarantees are compared to classic symbolic approaches.
Depending on the skills and interest of the student, more emphasis can be put on either objective (b) theory or objective (c) experimentation.
WORK PLAN
Here is a tentative work plan: (1) Getting familiar with differential privacy and numeric techniques (3 weeks) (2) Formulating the problem, outlining one or more ideas to solve the problem (3 weeks) (3) Proving correctness of the suggested approach(es) (2-8 weeks) (4) Implementing a prototype and performing experiments (10-4 weeks) (5) Completion of the internship report (2 weeks)
The timing can be adapted according to the personal preferences of the student or the requirements of his school.
URL sujet detaillé : http://researchers.lille.inria.fr/jramon/jobs/2022-num-dp.pdf
Remarques : INRIA will provide facilities and funding according to the applicable regulations.
|
|
|
|
|
SM207-230 Implementation of critical applications on multi-core
|
|
Description
|
|
The implementation of critical applications on multi(many)-core platforms is a hot topic in real-time research community. The interferences on shared resources such as shared memory, impacts the execution of critical tasks and their timing analysis. We focus on applications modeled by a direct acyclic graph (DAG) of tasks where edges represent precedence constraints and communications. Our previous work revisits the link between timing analysis and implementation with a collaboration of code orchestration, mapping/scheduling, interference analysis and schedulability. We consider the DAG as a periodic application with one global period. However, applications described by DAGs are generally multi-peridic or subject to execution modes. The idea is to analyze and integrate the execution mode to our implementation/analyses. One of these steps will be explored during the Internship.
The candidate must have knowledge on real-time systems, timing analysis or Synchronous DAG applications.
URL sujet detaillé : https://www-verimag.imag.fr/Implementation-of-critical-applications.html
Remarques : Funded by and topic in the context of the CAOTIC ANR project.
|
|
|
|
|
SM207-231 Linear Logic Translations, Taylor expansion and fixpoints.
|
|
Description
|
|
Linear Logic is a logical system in which one can encode different paradigms of lambda calculus, such as Call-By-Name and Call-By-Value, through the graphical setting of multiplicative exponential proof nets.
Syntactic Taylor expansion is a setting - inspired from semantics of lambda calculus - where a calculus (that can be exponential, or even divergent) is simulated through a syntax of finite approximations. It is defined both for Linear Logic and lambda calculus.
The goal of this internship is to obtain commutation results between Linear Logic translations and linear simulations. A first challenge is the choice of a convenient formalism for proof-nets, which should be compatible with a formalization in Coq. These results are meant to allow the comparison between different operational semantics with respect to their use of resources.
If possible, we will also consider the addition of explicit fixpoints in the calculus and study the consistence with semantics of coinductive calculus in order to include stream functions in the translation and linear simulation processes.
URL sujet detaillé :
:
Remarques : Possibilité de rémunération (sur les crédits en vigueur, c'est-a-dire environ 600â=82=AC).
|
|
|
|
|
SM207-232 Attention mechanisms for graphical models, with applications to protein structure analysis
|
|
Description
|
|
Keywords: deep learning, attention mechanisms, transformers, message passing, belief propagation, enumeration, approximation algorithms, free energy, protein structure analysis.
Context. Emerging from the field of natural language processing, (self-)attention mechanisms have proven essential to understand the coupling between tokens in a sentence [2, 3]. In a different context, graphical models make it possible to express the conditional dependence of random variables encoded in graph nodes via the edges of the graph. On such models, message passing algorithms provide effective ways to compute various quantities of interest, in particular partition functions and free energies [4, 5]. Recently, attention mechanisms have also proven key to encode the coupling between spatial patterns observed between amino acids in a protein structure [6]. The corresponding tool, AlphaFold by Deepmind, is considered a major achievement to predict a plausible structure of a protein from its amino-acid sequence. In related work, message passing algorithms have been used to compute average properties of proteins [1].
Goals. AlphaFold is a key achievement but outputs a single structure. In fact, statistical physics teaches us that observable properties of molecules depend on ensemble of conformations (weighted by Boltzmann'f factor). (See also AI, molecular design and the Covid19.) The goal of this internship will be to extend attention mechanisms in the context of graphical models, to study ensembles of conformations rather than isolated observations. The work envisioned encompasses the design and analysis of algorithms, their coding (C++ and python), as well their experimental evaluation.
References [1] H. Kamisetty, E.P. Xing, and C.J. Langmead. Free energy estimates of all-atom protein structures using generalized belief propagation. Journal of Computational Biology, 15(7):755-766, 2008. [2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [3] Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 603-612, 2019. [4] Jonathan S Yedidia, William T Freeman, Yair Weiss, et al. Understanding belief propagation and its generalizations. Exploring artificial intelligence in the new millennium, 8(236-239):0018-9448, 2003. [5] Jonathan S Yedidia, William T Freeman, and Yair Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. Information Theory, IEEE Transactions on, 51(7):2282-2312, 2005. [6] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Z=CC=8CÄ=B1=CC=81dek, A. Potapenko, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583-589, 2021.
URL sujet detaillé : http://www-sop.inria.fr/teams/abs/positions/master22attention.pdf
Remarques : Internship with gratification. Possibility to follow-up with a PhD thesis.
|
|
|
|
|
SM207-233 Sampling algorithms in high-dimensional spaces coding the geometry of proteins
|
|
Description
|
|
Keywords: sampling algorithms, Monte Carlo Markov Chain, volume calculations, statistical physics, conformational spaces, protein structure analysis.
Context. A well studied problem in geometry is the calculation of the volume of a high-dimensional polytope. While this problem is NP-hard, probabilistic algorithms based on Monte Carlo Markov Chain techniques and random walks of the Hit-and-Run type have been developed [1]. Currently, state-of-the-art random walks compute volumes in hundreds of dimensions within minutes on a laptop [2, 3]. In structural biology, a central problem is the prediction of the function of a protein, which requires understanding its dynamics-see e.g. AI, molecular design and the Covid19. A key achievement in protein structure analysis is the AlphaFold by Deepmind, which predicts a plausible structure from the amino acid sequence [4]. However, a single structure falls short from providing insights in the function of a protein. (For a metaphor, consider getting a single picture for an entire movie.) The generation of exhaustive molecular conformations is therefore a major challenge, and we recently contributed state-of-the-art algorithms for flexible loops, based on Hit-and-Run like algorithms working in high dimensional curved spaces coding the geometry of proteins [5, 6].
Goals. The work envisioned encompasses the analysis of our sampling algorithms, and their extension to compute provably correct averages in the so-called NVT ensemble. The analysis are likely to suggest modifications and improvements, which will also be implemented in C++ in the scope of the Structural Bioinformatics Library, an advanced software environment providing both low level algorithmic classes, and applications for end-users.
References [1] B. Cousins and S. Vempala. A practical volume algorithm. Mathematical Programming Computation, 8(2):133-160, 2016. [2] A. Chevallier, S. Pion, and F. Cazals. Improved polytope volume calculations based on Hamiltonian Monte Carlo with boundary reflections and sweet arithmetics. J. of Computational Geometry, 13(1):55-88, 2022. [3] A. Chevallier, F. Cazals, and P. Fearnhead. Efficient computation of the the volume of a polytope in high-dimensions using piecewise deterministic markov processes. In AISTATS, 2022. [4] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Z=CC=8CÄ=B1=CC=81dek, A. Potapenko, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583-589, 2021. [5] T. O'Donnell, V. Agashe, and F. Cazals. Geometric constraints within tripeptides and the existence of tripeptide reconstructions. Technical report, 2022. [6] T. O'Donnell and F. Cazals. Enhanced conformational exploration of protein loops using a global parameterization of the backbone geometry. Technical report, 2022.
URL sujet detaillé : http://www-sop.inria.fr/teams/abs/positions/master22sampling.pdf
Remarques : Internship with gratification. Possibility to follow-up with a PhD thesis.
|
|
|
|
|
SM207-234 Automatisation des workflows de prédiction de gênes eucaryotes
|
|
Description
|
|
Les progrès constants et rapides des technologies utilisées en génomique permettent aujourd'hui de séquencer de nombreux génomes d'espèces différentes, avec des reconstructions de la séquence de ces génomes a l'échelle des chromosomes. Ces technologies rendent possible l'exploration et la sauvegarde de la biodiversité a travers des projets tel que Biodiversity Genomics Europe (https://biodiversitygenomics.eu/), dont le Genoscope (https://jacob.cea.fr/drf/ifrancoisjacob/Pages/Departements/Genoscope.aspx) est partenaire.
Le LBGB (https://www.genoscope.cns.fr/lbgb/) est impliqué activement dans l'exploitation de ces technologies en mettant en oeuvre des méthodes informatiques a grande échelle pour d'une part reconstituer la séquence de ces génomes eucaryotes et d'autre part annoter les régions codantes (les gènes) de ces derniers.
Le sujet du stage concernera ce dernier point, et plus particulièrement l'automatisation des outils utilisés pour la prédiction des gènes. De part ce changement d'échelle, l'étudiant devra proposer des méthodes permettant d'absorber le flux de production et mettre en place les outils sous la forme de workflows exécutés sur des grilles de calculs dédiées. Le stage se concentrera notamment sur l'évaluation de la qualité des prédictions de gènes.
L'étudiant devra maitriser un ou plusieurs langages de scripts tels que Python, Perl et Bash, sous environnement linux.
mots clés: big data, biodiversité, workflow, grille de calcul, langage de scripts, génomique, annotation
URL sujet detaillé :
:
Remarques :
|
|
|
|
|
SM207-235 Validated Numerical Software For Algebraic Curves With Singularities
|
|
Description
|
|
Validated numerics is the art of designing efficient numerical algorithms, yet reliable ones, i.e. with guaranteed error bounds encompassing all sources of errors: uncertain data, rounding errors, discretization, etc. The goal is to provide scientists in a broad sense with a "certified pocket calculator". This includes engineers working on safety-critical applications, but also a novel generation of mathematicians using computers to prove their theorems.
The goal of this internship is to design and implement validated algorithms to compute with algebraic curves, which arise in many branches of science. More specifically, we are interested by the difficult case of singularities, which may cause catastrophic numerical errors if not dealt properly with. These achievements will later allow us to treat currently unreachable applications in computer algebra, physics and robotics.
URL sujet detaillé : https://cfhp.univ-lille.fr/files/students/Master-2023-Num-Puiseux.pdf
Remarques : co-encadrement par Adrien Poteaux, Florent Bréhard et Fran=A7ois Boulier
|
|
|
|
|
SM207-236 Code-level Cybersecurity & Program Analysis
|
|
Description
|
|
Code-level Cybersecurity & Program Analysis: Vulnerabilities, Verification, Reverse
Keywords: software security, vulnerabilities, reverse & deobfuscation, program analysis, formal methods, static analysis, symbolic execution, logic
The BINary-level SECurity research group (BINSEC) at CEA List has several open internship positions at the crossroad of software security, program analysis and formal methods, to begin as soon as possible at Paris-Saclay, France. Positions are 4-6 month long and can naturally open the way to a doctoral work. All these positions are articulated around the BINSEC open-source platform ( https://binsec.github.io ), which aims at providing automatic tools for low-level security analysis by adapting software verification methods initially developed for safety-critical systems.
== Context = Several major classes of security analyses have to be performed on machine code, such as vulnerability analysis of mobile code or third-party components, deobfuscation or malware inspection. These analyses are very challenging, yet still relatively poorly tooled. Our long-term goal is to leverage recent advances in software verification, formal methods and security analysis in order to propose efficient semantic tools for low-level security investigations.
== Current topics = We propose several research directions, each one aiming at extending some recent work published in top tiers venue
' Vulnerability detection at scale, with combination of cutting edges techniques such as symbolic execution, fuzzing and static analysis - the challenge here is to design effective combinations enjoying both precision and scalability for different classes of vulnerabilities;
' Binary-level formal verification of cryptographic implementations with (variants of) symbolic execution - the challenge here is to handle advanced security properties such as non-interference or side channel leaks (timing, cache, power), as well as low-level micro-architectural behaviours such as speculation (Spectre) or faults (RawHammer);
' Advanced reverse and certified decompilation, through the combination of program analysis and artificial intelligence, with the ultimate goal of recovering legitimate high-level code equivalent to the original executable file.
More details on the topics will be happily provided! The list is not exhaustive, ask us if you have some project in mind.
All positions include theoretical research as well as prototyping (preferably in OCaml) and experimental evaluation. Results will be integrated in the open-source BINSEC platform.
== Host ==
The BINary-level SECurity research group (BINSEC) of CEA List is a leading group in formal methods for low-level security, with regular publications in top-tier venues in security, formal methods and software engineering. We work in close collaboration with other French and international research teams, industrial partners and national agencies. CEA List is located in Campus Paris Saclay.
== Requirements & application.
We welcome curious and enthusiastic students with a solid background in Computer Science, both theoretical and practical. A good knowledge of functional programming (OCaml) is appreciated. Some experience in verification, security, logic or compilation would be great. Applicants should send an e-mail to Sébastien Bardin ( sebastien.bardin.fr ) - including CV and motivation letter. Deadline: as soon as possible (first come, first served). Contact us for more information.
URL sujet detaillé : http://sebastien.bardin.free.fr/2023-internships-cyber-formalmethods.pdf
Remarques :
|
|
|
|
|
SM207-237 Quantitative Change Prediction for Reaction Networks
|
|
Description
|
|
We propose to develop novel bioinformatics methods in area of systems biology to improve the production of microbial biosurfactants with methods from biotechnology. Biosurfactants produced by microorganisms relevent to acro ecology and medicine. So it is of highest interest to enable the production of biosurfactants of better quality, in higher quantities, and at lower prices.
Nonribosomal lipopeptides are a class of microbial biosurfactants that can be produced from amino and fatty acids by many bacteria such as Bacillus subtilis. In our previous interdisciplinary work we have developed successful methods for overproducing surfactine in the bioreactor based on artificial strains of Bacillus subtilis obtained by outknocking dedicated genes. While the case of single gene knockouts has been largely studied, we believe that multiple gene knockouts are the key for better optimization methods in the future. Multiple gene knockout, however, raise serious difficulties for exhaustive tests in the wet lab, simply since the possibly number is too large.
The hope of model-based prediction methods from systems biology is to limit the number of candidates, so that only few of the many possible candidates need to be tested experimentally. The restriction to purely qualitative reasoning techniques, however, makes it difficult if not impossible to distinguish the good multiple knockouts from the others. Therefore, we propose to add quantitative aspects to existing prediction methods and to the existing biochemical reaction networks of the metabolism of B. subtilis and its control. Of course, this is not generally possible, since only much of the quantitative information on the regulartory control is unknown and will remain unknown in the near future. We therefore propose to integrate known aspects on the forms of the kinetic functions (mass-action, Michaelis-Menten, different kinds of repression, etc), both into the formal models of the metabolic and control networks and into model-based prediction methods.
URL sujet detaillé : http://researchers.lille.inria.fr/niehren/quantpred.pdf
Remarques : Suite en these envisagé. Reference majeure:
https://hal.inria.fr/hal-01239198
D'autres references:
https://smarthal.lille.inria.fr/?author=joachim+niehren&labos_exp=&annee_publideb=&annee_publifin=&updatecache=yes&query=++(%0D%0A++++biology%0D%0A++%2B+biochem%0D%0A++%2B+biotech%0D%0A++%2B+bioinfo%0D%0A)%0D%0A%26+versari%0D%0A&title=Joachim+Niehren%27s+publications+on+Systems+Biology&inputjournal=&inputotherjournal=&inputconference=&inputotherconference=&inputbook=&inputotherbook=&inputinvited=&inputotherinvited=&inputthesis=
|
|
|
|
|
SM207-238 Reward-driven State Tabulation for Model-based Reinforcement Learning
|
|
Description
|
|
Reinforcement Learning (RL) has had several achievements such as in the game of Go. These recent achievements were obtained from the combination of RL with deep neural networks, also known as deep RL. The vast majority of deep RL algorithms use their neural architecture to approximate value functions following the scheme of fitted value iteration, which is unfortunately unstable. The goals of the internship is to research an alternative where the neural network is used to discover an abstract Markov Decision Process (MDP) from which value functions can be computed in a model-based way. To do so, the work program of the intern will comprise the following steps i) to reproduce a prior paper to establish a baseline, ii) to propose a new, reward-driven, variational loss function to train the neural state aggregator, iii) to devise a model-based RL algorithm operating on the abstract MDP iv) to evaluate the algorithm against the baseline on tasks where states contain superfluous information and investigate whether the neural state aggregator can learn to ignore it.
Fot the full description please see https://team.inria.fr/scool/files/2022/12/Reward_driven_State_Tabulation_for_Model_based_Reinforcement_Learning.pdf
URL sujet detaillé : https://team.inria.fr/scool/files/2022/12/Reward_driven_State_Tabulation_for_Model_based_Reinforcement_Learning.pdf
Remarques : Co-encadrement : Matheus Medeiros Centa
|
|
|
|
|
SM207-239 Fast Matching of Deformable Objects
|
|
Description
|
|
In this project, the goal is to develop fast 3D shape matching techniques that loosely preserve the intrinsic geometric properties so that a quick and decently accurate registration can be obtained.
URL sujet detaillé : https://shaifaliparashar.github.io/proposal_shape_registration.pdf
Remarques : Co-advisor: Dr. Julie Digne
|
|
|
|
|
SM207-240 Modelling Deformable Objects for Robotic Manipulation
|
|
Description
|
|
In this project we will use the local geometric properties to the robotic context. We consider some of the common real-life objects, available in YCB dataset. Given a robot which is equipped with multiple imaging and depth sensors, we will use the local geometric properties of deformation to predict the robot-object interaction.
URL sujet detaillé : https://shaifaliparashar.github.io/proposal_robot_manipulation.pdf
Remarques : Co-advisor: Prof. Liming Chen
|
|
|
|
|
|