Publikation
Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models
Dominik Hintersdorf; Lukas Struppek; Kristian Kersting; Adam Dziedzic; Franziska Boenisch
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2406.02366, Pages 1-36, arXiv, 2024.
Zusammenfassung
Diffusion models (DMs) produce very detailed and high-quality images. Their
power results from extensive training on large amounts of data—usually scraped
from the internet without proper attribution or consent from content creators. Un-
fortunately, this practice raises privacy and intellectual property concerns, as DMs
can memorize and later reproduce their potentially sensitive or copyrighted training
images at inference time. Prior efforts prevent this issue by either changing the
input to the diffusion process, thereby preventing the DM from generating mem-
orized samples during inference, or removing the memorized data from training
altogether. While those are viable solutions when the DM is developed and de-
ployed in a secure and constantly monitored environment, they hold the risk of
adversaries circumventing the safeguards and are not effective when the DM itself
is publicly released. To solve the problem, we introduce NEMO, the first method
to localize memorization of individual data samples down to the level of neurons
in DMs’ cross-attention layers. Through our experiments, we make the intriguing
finding that in many cases, single neurons are responsible for memorizing particular
training samples. By deactivating these memorization neurons, we can avoid the
replication of training data at inference time, increase the diversity in the generated
outputs, and mitigate the leakage of private and copyrighted data. In this way, our
NEMO contributes to a more responsible deployment of DMs.
1 Intro
