mathvision-cuhk.github.io - Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

Description: Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

Example domain paragraphs

The accuracies of four prominent Large Multimodal Models (LMMs), random chance, and human performance are evaluated on our proposed MATH-Vision (MATH-V) across 16 subjects and 5 levels of difficulty, with Level 1 being the easiest and Level 5 the most challenging. Human performance is assessed using the testmini subset.

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks.

To address this issue, we present the Math-Vision (Math-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

Links to mathvision-cuhk.github.io (2)

junting.github.io Junting Pan
wangk.org Ke Wang's Homepage