Analyzing the Performance of Large Language Models in Complex Spatial Reasoning Tasks

Main Article Content

Rajiv Kumar

Abstract

Spatial reasoning, which involves understanding the relationships between objects or entities in space, is a fundamental aspect of human cognition but remains a significant challenge for large language models (LLMs). This paper explores spatial reasoning, a critical but challenging task for large language models (LLMs) that requires understanding the relationships between spatial entities points, lines, and regions through metric, topological, directional, and order relations. A specialized evaluation dataset focused on Australian geography was developed to minimize pre-training bias, featuring 239 carefully crafted spatial reasoning questions. Fifteen prominent LLMs from OpenAI, Google, Anthropic, Meta, and Mistral were assessed under controlled zero-shot conditions across five key experiments: Toponym Resolution, Metric Relations, Directional Relations, Topological Relations, and Cyclic Order Relations. Results revealed significant variation in model performance, with models struggling notably in metric and cyclic order tasks, while showing relatively better outcomes in qualitative topological reasoning. Metric relation errors often involved underestimations of distance, directional reasoning accuracy declined with task complexity, and cyclic order reasoning approached random performance. Models from Google and Anthropic demonstrated greater caution, abstaining more frequently in the face of uncertainty, highlighting the ongoing challenges LLMs face in mastering complex spatial reasoning tasks.

Downloads

Download data is not yet available.

Article Details

Section

Research Paper

How to Cite

Analyzing the Performance of Large Language Models in Complex Spatial Reasoning Tasks. (2025). Journal of Global Research in Electronics and Communications(JGREC), 1(6), 1-7. https://doi.org/10.5281/zenodo.15663123