Stay in touch.
Sign up to our newsletter to get our latest product updates, news and exclusive content
Can Tunca
We’re excited to introduce the first AI indoor mapping benchmark, designed to set a new standard for evaluating AI in mapping. This initiative provides the mapping community with a structured, transparent way to assess AI mapping technology.
As part of our commitment to advancing the field of indoor mapping technology, Pointr is releasing a comprehensive benchmark designed to reliably evaluate AI models' capabilities in understanding and processing floor plans.
Evaluating these capabilities has been challenging due to the variety of architectural representations, the difficulty of consistently assessing generated maps, and the challenge of simulating real-world deployment scenarios.
One of the most significant barriers to progress in AI-driven mapping has been the lack of standardized evaluation frameworks. The AI Mapping Benchmark addresses this gap and provides a structured way to evaluate large language and vision models' abilities to solve real-world floor plan understanding challenges.
The benchmark involves providing models with raw floor plan data and challenging them to generate accurate spatial interpretations.
Each sample in the AI Mapping Benchmark was created using real-world, diverse floor plans, ensuring it represents the complexity and variety seen in actual deployments. An important consideration in developing this benchmark was acknowledging that there is no worldwide standard for building CAD files. A key advantage of using AI is precisely its ability to generalize across different formats and conventions.
Our dataset includes 19 raw CAD files representing real-world floor plans from diverse indoor environments, including office buildings, hotels, universities, and hospitals. For each sample, models are provided with the original floor plan in vector CAD format along with relevant contextual information beyond just wall structures—which is essential for meaningful classification. Given these inputs, the model must process the floor plan to detect, classify, and identify all spatial elements.
The ground truth for evaluation consists of fully classified, detected, and identified elements at 100% accuracy through manual mapping, produced as GeoJSON files. These ground truth annotations are not shown to the model during testing.
We define clear metrics to evaluate performance, focusing on three key dimensions:
Detection Accuracy – How well objects and rooms are detected within the floor plan. This measures the model's ability to correctly identify the presence and boundaries of spatial elements.
Classification Performance – How accurately elements are categorized (e.g., conference room, corridor, emergency exit). This evaluates the model's understanding of spatial function.
Identification Precision – How well the model differentiates and uniquely identifies individual spaces and features within the broader context of the building.
The benchmark follows a structured transformation process that mirrors the evolution of maps in practical applications:
1. Raw CAD / Floor Plan Input – The original vector data
2. Cleaned Version with Initial Detection – Single bounding boxes with Points of Interest (POIs)
3. Final Classified Output – Elements grouped by functional category
4. Final Identified Output in GeoJSON – Colored boxes with appropriate labeling
Learn more about how we evaluate unit detection, classification and identification by downloading our guide to how we measure MapScale's® performance here.
To ensure fair comparison, we established strict rules for benchmark participation:
We are testing the benchmark with several leading AI systems, including models from Pointr (MapScale®), OpenAI (GPT-4o), Google (Pali-Gemma-3b), and Meta (Llama 3.2). It's important to note that most general-purpose AI models have limitations in this domain—they cannot produce complete map digitizations.
For example, when testing with ChatGPT, we utilized only the classification metric, as it lacks full map generation capabilities. In these cases, we provided the model with specific room contexts rather than complete floor plans.
AI Model |
Detection |
Classification |
Identification |
MapScale® |
90.2% |
60.9% |
69.4% |
OpenAI GPT-4o |
N/A |
46.5% |
51.1% |
Google Gemma 3 |
N/A |
39.3% |
30.7% |
Meta Llama 3.2 |
N/A |
19.4% |
25.2% |
The AI Mapping Benchmark Scores reveals a clear performance gap between MapScale® and general-purpose LLMs like GPT-4o, Gemma 3, and Llama 3.2. With scores of 90.2% in Detection, 60.9% in Classification, and 69.4% in Identification, MapScale® significantly outperforms the rest.
MapScale®'s edge is expected as it is a finetuned model built specifically to analyze architectural floor plans. Its domain-specific training enables it to understand layout patterns, object representations, and spatial relationships far more accurately than general LLMs.
In contrast, models like GPT-4o or Gemma are optimized for broad natural language understanding and generation tasks, not spatial reasoning or diagram interpretation. Moreover, general LLMs can't produce full map digitizations and were only tested on Classification and Identification using room-level inputs.
While general LLMs are steadily improving in their multimodal capabilities, these results highlight the ongoing importance of domain-specific models when precision and specialized understanding are critical.
The benchmark was developed by Pointr's dedicated mapping team, led by Maksim Vozniyk and by Pointr's R&D team, led by Can Tunca. The mapping team consists of 25 specialists who bring over 80 years of combined experience in spatial data processing and indoor mapping. It includes GIS engineers, architects and mapping experts who have processed more than 7 billion square feet across 5,000+ buildings worldwide. The R&D team consists of 10 AI scientists, including 4 PhDs, who have developed the world’s first AI mapping engine, leveraging LLMs and Computer Vision.
We're calling on the mapping community to use, test, and improve this benchmark. The dataset and evaluation tools are openly available for the community to explore, contribute to, and improve.
By establishing a common standard for evaluation, we aim to accelerate innovation in AI-driven floor plan understanding and push the boundaries of what's possible in indoor mapping technology.
Can Tunca
Can Tunca is the Chief Research and Development Officer at Pointr, where he leads an R&D team of 10 in pioneering AI-driven mapping technology. Under his leadership, Pointr has developed the world’s first AI mapping engine, leveraging LLMs and Computer Vision. Can holds a PhD in Computer Science from Boğaziçi University, one of Turkey’s most prestigious institutions.
Sign up to our newsletter to get our latest product updates, news and exclusive content