Publikation
LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment
Lukas Helff; Felix Friedrich; Manuel Brack; Kristian Kersting; Patrick Schramowski
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2406.05113, Pages 1-24, arXiv, 2024.
Zusammenfassung
This paper introduces LlavaGuard, a suite of
VLM-based vision safeguards that address
the critical need for reliable guardrails in the
era of large-scale data and models. To this
end, we establish a novel open framework,
describing a customizable safety taxonomy,
data preprocessing, augmentation, and training
setup. For teaching a VLM safeguard on safety,
we further create a multimodal safety dataset
with high-quality human expert annotations,
where each image is labeled with a safety rating,
category, and rationale. We also employ ad-
vanced augmentations to support context-specific
assessments. The resulting LlavaGuard models,
ranging from 0.5B to 7B, serve as a versatile
tool for evaluating the safety compliance of
visual content against flexible policies. In
comprehensive experiments, LlavaGuard out-
performs both state-of-the-art safeguards and
VLMs in accuracy and in flexibly handling
different policies. Additionally, we demonstrate
LlavaGuard’s performance in two real-world
applications: large-scale dataset annotation and
moderation of text-to-image models. We make
our entire framework, including the dataset,
model weights, and training code, publicly
available at https://ml-research.github.
io/human-centered-genai/projects/
llavaguard
