Summary

I have analyzed a popular LLM’s ability to understand residential 2D floorplan images and reason about common tasks in that domain. As a result, common failure modes have been identified, and I propose we generate a synthetic dataset with validated ground truth to be used to train LLMs. In addition, manual QA supervision should be applied to ensure correctness.

LLM tasks

The LLM will be asked to perform the tasks described below and will be evaluated based on the results.

PNG ⇒ JSON

Generate a JSON object that contains:

rooms: list of room names
doors with endpoints: a list of doors showing which rooms are connected by it
adjacency list: for each room, shows a list of rooms adjacent to it

PNG ⇒ Question answers

Answer specific text questions about the floorplan:

how many bedrooms are illustrated?
how many bathrooms are illustrated?
what is the shortest path between the entrance and a bathroom?

Manual experiment

The LLM tasks were performed manually against a small dataset and the results were analyzed to identify common failure modes.