Publikation

LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Lukas Helff; Felix Friedrich; Manuel Brack; Kristian Kersting; Patrick Schramowski

In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2406.05113, Pages 1-24, arXiv, 2024.

Zusammenfassung

This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ ad- vanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard out- performs both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard’s performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code, publicly available at https://ml-research.github. io/human-centered-genai/projects/ llavaguard

Weitere Links

https://doi.org/10.48550/arXiv.2406.05113

2406.05113v3.pdf (pdf, 14 MB )