Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models

https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/

When you get the best route from Google Maps, explore a new place in Street View, look at your neighbourhood on Google Earth, or check the weather forecast with Search, you’re using geospatial data. For decades, Google has organized the world’s geospatial information — data associated with a specific geographical location — and made it accessible through our products.

Geospatial information is essential in everyday situations and for a wide range of real-world enterprise problems. Whether you’re working in public health, urban development, integrated business planning, or climate resilience, Google’s data, real-time services, and AI models can accelerate your analyses and augment your proprietary models and data.

Geospatial information can be big, complex and hard to understand — just like the real world! Gathering, storing and serving data requires specialized sensors and platforms. Observations of the things you care about can be scarce or require time-consuming labelling. Use-cases are diverse and often require various kinds of data that need to be aligned and cross-referenced (weather, maps, images, etc.), and recent breakthrough AI methods are not optimized for geospatial problems. Transforming geospatial information into understanding is a focus area for Google Research.

Last November we introduced two pre-trained, multi-purpose models to address many of the challenges of geospatial modeling: the Population Dynamics Foundation Model (PDFM), which captures the complex interplay between population behaviors and their local environment, and a new trajectory-based mobility foundation model. Since then, over two hundred organizations have tested the PDFM embeddings for the United States and we are expanding the dataset to cover the UK, Australia, Japan, Canada, and Malawi for experimental use by selected partners.

We’re also exploring how generative AI can reduce the significant cost, time, and domain expertise required to combine geospatial capabilities. Large language models (LLMs) like Gemini can manage complex data and interact with users through natural language. When integrated into agentic workflows that are grounded in geospatial data, we’re starting to see that they can generate insights in various domains that are both surprising and useful.

Today, we're introducing new remote sensing foundation models for experimentation alongside a research effort called Geospatial Reasoning that aims to bring together all of our foundation models with generative AI to accelerate geospatial problem solving. Our models will be available through a trusted tester program, with inaugural participants including WPP, Airbus, Maxar, and Planet Labs.

An overview of Geospatial Reasoning.

Watch the film

https://www.youtube.com/embed/g9F-_tCakL8?enablejsapi=1&origin=https%3A%2F%2Fresearch.google&widgetid=1&forigin=https%3A%2F%2Fresearch.google%2Fblog%2Fgeospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models%2F&aoriginsup=1&vf=1

Grounding with geospatial foundation models

Our newest remote sensing foundation models are based on proven architectures and training techniques, such as masked autoencoders, SigLIP, MaMMUT and OWL-ViT, and adapted to the remote sensing domain. All models were trained on high-resolution satellite and aerial images with accompanying text descriptions and bounding box annotations. These foundation models generate rich embeddings for images and objects, and can also be fine-tuned for specific remote sensing tasks, such as mapping buildings and roads, assessing post-disaster damage, or locating infrastructure. The flexible natural language interface provided by the models supports retrieval and zero-shot classification tasks, allowing users to, for example, find images of “residential buildings with solar panels” or “impassable roads”.

We evaluated the foundation models on a broad range of remote sensing benchmarks, covering classification, segmentation, and object detection tasks. We performed evaluation using frozen model features and with both fine-tuned and zero-shot setups, with promising results showing state-of-the-art performance on multiple metrics. Furthermore, we used these models for multiple remote sensing efforts across Google (including disaster response and mapping urban and agricultural landscapes) and they consistently showed improvement on task metrics. We plan to share more details about the models and evaluations in the coming months. Organizations can inquire about access to test these models on their own use cases through our trusted tester program.

Research into Geospatial Reasoning

Geospatial Reasoning, our newest research effort, is a framework for building agentic workflows that will allow developers, data analysts, and scientists to integrate Google’s most advanced foundation models, discussed above, with their own models and datasets. Last October, Google started piloting Gemini capabilities in Google Earth, using Gemini to create unique data layers, conduct GIS operations and derive geospatial insights — accelerating geospatial analyses in Google Earth’s no-code environment. Geospatial Reasoning will extend this approach by demonstrating how developers can build bespoke workflows on the Google Cloud Platform to handle intricate geospatial queries using Gemini.

Gemini will orchestrate inference and analysis across Google’s models and datasets, the user’s proprietary sources, and public domain data sources. Given a complex natural language query, Gemini will plan and enact a chain of reasoning, analyzing multiple geospatial and structured data sources, and using advanced AI models for task-specific inference and grounding. Responding with insights and data visualizations, Geospatial Reasoning will provide rapid, trustworthy answers.

For example, here's how a crisis manager responding to the aftermath of a hurricane could use Geospatial Reasoning:

Visualize the pre-disaster context in open-source satellite imagery using Earth Engine.