Publikation
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
Felix Friedrich; Simone Tedeschi; Patrick Schramowski; Manuel Brack; Roberto Navigli; Huu Nguyen; Bo Li; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2412.15035, Pages 1-20, arXiv, 2024.
Zusammenfassung
Building safe Large Language Models (LLMs) across multiple languages is essen-
tial in ensuring both safe access and linguistic diversity. To this end, we conduct a
large-scale, comprehensive safety evaluation of the current LLM landscape. For
this purpose, we introduce M-ALERT, a multilingual benchmark that evaluates the
safety of LLMs in five languages: English, French, German, Italian, and Spanish.
M-ALERT includes 15k high-quality prompts per language, totaling 75k, with
category-wise annotations. Our extensive experiments on 39 state-of-the-art LLMs
highlight the importance of language-specific safety analysis, revealing that models
often exhibit significant inconsistencies in safety across languages and categories.
For instance, Llama3.2 shows high unsafety in category crime_tax for Italian
but remains safe in other languages. Similar inconsistencies can be observed across
all models. In contrast, certain categories, such as substance_cannabis
and crime_propaganda, consistently trigger unsafe responses across models
and languages. These findings underscore the need for robust multilingual safety
practices in LLMs to ensure responsible usage across diverse communities.
