JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Authors: Shota Onohara, Atsuyuki Miyai, Yuki Imajuku, Kazuki Egashira, Jeonghun Baek, Xiang Yue, Graham Neubig, Kiyoharu Aizawa

Abstract: Accelerating research on Large Multimodal Models (LMMs) in non-English
languages is crucial for enhancing user experiences across broader populations.
In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale
Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the
Japanese cultural context. To facilitate comprehensive culture-aware
evaluation, JMMMU features two complementary subsets: (i) culture-agnostic (CA)
subset, where the culture-independent subjects (e.g., Math) are selected and
translated into Japanese, enabling one-to-one comparison with its English
counterpart MMMU; and (ii) culture-specific (CS) subset, comprising newly
crafted subjects that reflect Japanese cultural context. Using the CA subset,
we observe performance drop in many LMMs when evaluated in Japanese, which is
purely attributable to language variation. Using the CS subset, we reveal their
inadequate Japanese cultural understanding. Further, by combining both subsets,
we identify that some LMMs perform well on the CA subset but not on the CS
subset, exposing a shallow understanding of the Japanese language that lacks
depth in cultural understanding. We hope this work will not only help advance
LMM performance in Japanese but also serve as a guideline to create
high-standard, culturally diverse benchmarks for multilingual LMM development.
The project page is https://mmmu-japanese-benchmark.github.io/JMMMU/.

Source: http://arxiv.org/abs/2410.17250v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these