Differential Privacy for Decentralized Learning

Edwige Cyffers

pages 249, December 2024

Abstract

The collapse of storage and data processing costs, along with the rise of digitization, has brought new applications and possibilities to machine learning. In practice, Big data is often synonymous with sensitive data collection. Hence, protecting privacy— especially by avoiding data leakage, intentional or accidental —is one of the key challenges in Trustworthy Machine Learning. A first direction towards more control over data is to keep it decentralized, exchanging only the information needed to run the learning process. This can be achieved through a central server orchestrating the learning process in federated learning or through peer-to-peer communications. However, this does not guarantee that data is protected throughout the entire process, as federated learning is known to be vulnerable to privacy attacks. To reliably quantify and control the privacy loss occurring in machine learning algorithms, Differential Privacy is currently the gold standard both in research and industry for machine learning applications. This thesis lies at the intersection of machine learning, decentralized algorithms and differential privacy. We present the first reconstruction attack in decentralized learning, targeting privacy leaks among participants not directly connected, proving the need to include defense mechanisms in this setting. We then introduce a new variant of differential privacy, Network Differential Privacy, which is suited for decentralized learning where each node only sees local communications. Using this variant, we analyze the privacy and utility guarantees of various decentralized algorithms, namely gossip algorithms and random walks for stochastic gradient descent, and ADMM. Our contributions demonstrate that decentralization can bring privacy amplification in the sense of differential privacy, and that the gains depend on the algorithm and the communication graph. This paves the way for the use of decentralization as a tool to develop more effective privacy-preserving machine learning.

Bibtex

@phdthesis{cyffers2024differential,
  title = {Differential Privacy for Decentralized Learning},
  author = {Cyffers, Edwige},
  school = {Universit{\'e} de Lille},
  year = {2024},
  month = {12},
  day = {5},
  pages = {249},
  address = {Lille, France},
  language = {english},
  keywords = {Differential privacy, Decentralized Learning, Optimization, Privacy, Graph, Federated Learning, Confidentialit{\'e} Differentielle, Apprentissage d{\'e}centralis{\'e}, Optimisation, Vie priv{\'e}e, Graphe, Apprentissage f{\'e}d{\'e}r{\'e}},
  type = {Ph.D. Thesis},
  supervisor = {Bellet, Aur{\'e}lien},
  abstract = {The collapse of storage and data processing costs, along with the rise of digitization, has brought new applications and possibilities to machine learning. In practice, Big data is often synonymous with sensitive data collection. Hence, protecting privacy— especially by avoiding data leakage, intentional or accidental —is one of the key challenges in Trustworthy Machine Learning. A first direction towards more control over data is to keep it decentralized, exchanging only the information needed to run the learning process. This can be achieved through a central server orchestrating the learning process in federated learning or through peer-to-peer communications. However, this does not guarantee that data is protected throughout the entire process, as federated learning is known to be vulnerable to privacy attacks. To reliably quantify and control the privacy loss occurring in machine learning algorithms, Differential Privacy is currently the gold standard both in research and industry for machine learning applications. This thesis lies at the intersection of machine learning, decentralized algorithms and differential privacy. We present the first reconstruction attack in decentralized learning, targeting privacy leaks among participants not directly connected, proving the need to include defense mechanisms in this setting. We then introduce a new variant of differential privacy, Network Differential Privacy, which is suited for decentralized learning where each node only sees local communications. Using this variant, we analyze the privacy and utility guarantees of various decentralized algorithms, namely gossip algorithms and random walks for stochastic gradient descent, and ADMM. Our contributions demonstrate that decentralization can bring privacy amplification in the sense of differential privacy, and that the gains depend on the algorithm and the communication graph. This paves the way for the use of decentralization as a tool to develop more effective privacy-preserving machine learning.},
  department = {CRIStAL UMR 9189},
  committee = {Smith, Virginia and Honkela, Antti and Cummings, Rachel and Hendrikx, Hadrien and Senellart, Pierre and Bellet, Aur{\'e}lien and Kairouz, Peter},
  domain = {Computer Science/Machine Learning, Computer Science/Artificial Intelligence, Computer Science/Distributed, Parallel, and Cluster Computing},
  doi = {},
  url = {},
  note = {Doctoral school: MADIS}
}