VisNumBench: Evaluating Number Sense of Multimodal Large Language Models

Published in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

VisNumBench evaluates whether multimodal large language models have robust number sense across visual numerical tasks. The project page is available at wwwtttjjj.github.io/VisNumBench.

Recommended citation: Weng, Tengjin, Jingyi Wang, Wenhao Jiang, and Zhong Ming. (2025). "VisNumBench: Evaluating Number Sense of Multimodal Large Language Models." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 3830-3840.
Download Paper