“Did my figure do justice to the answer?” : Towards Multimodal Short Answer Grading with Feedback (MMSAF)

Authors: Pritam Sil, Bhaskaran Raman, Pushpak Bhattacharyya

Abstract: Personalized feedback plays a vital role in a student’s learning process.
While existing systems are adept at providing feedback over MCQ-based
evaluation, this work focuses more on subjective and open-ended questions,
which is similar to the problem of Automatic Short Answer Grading (ASAG) with
feedback. Additionally, we introduce the Multimodal Short Answer grading with
Feedback (MMSAF) problem over the traditional ASAG feedback problem to address
the scenario where the student answer and reference answer might contain
images. Moreover, we introduce the MMSAF dataset with 2197 data points along
with an automated framework for generating such data sets. Our evaluations on
existing LLMs over this dataset achieved an overall accuracy of 55\% on Level
of Correctness labels, 75\% on Image Relevance labels and a score of 4.27 out
of 5 in correctness level of LLM generated feedback as rated by experts. As per
experts, Pixtral achieved a rating of above 4 out of all metrics, indicating
that it is more aligned to human judgement, and that it is the best solution
for assisting students.

Source: http://arxiv.org/abs/2412.19755v1