Can Tessera identify plants from space?

Using Tessera foundation model embeddings to predict where data-deficient plant species might occur

<aside> 💡

Disclaimer: Claude Code wrote the initial draft of this post using our conversation transcript and git history. I simply post-edited it and made minor adjustments.

</aside>

The Problem: Most Plant Species Have Almost No Data

Screenshot 2025-12-03 at 10.46.06 am.png

Before starting this project, I wanted to quantify exactly how biased toward well-studied species GBIF's plant occurrence data . I queried their API and the numbers confirmed it starkly:

354,357 plant species have occurrence data in GBIF
72.6% have 100 or fewer occurrences
36.6% have 10 or fewer occurrences
9.3% have just 1 single recorded occurrence

This creates a problem for conservation. If you're trying to assess whether a species is endangered, or plan where to survey for new populations, you need to know where it could occur—not just the handful of places someone happened to document it. The African Baobab (Adansonia digitata) has 11,281 records; countless equally important species have fewer than 10.

The goal: Given just a few GPS locations of a plant species, can we predict other locations where it might occur?

The Key Insight: Habitat Preferences Live in Embeddings

This approach was inspired by Gabriel Mahler’s work on brambles using Tessera, a geospatial foundation model. Tessera produces 128-dimensional embedding vectors for every 10m x 10m pixel on Earth, encoding land cover, terrain, climate, and other environmental features the model learned from satellite imagery.

The hypothesis: if we know where a species occurs, we can sample the Tessera embeddings at those locations to learn what habitat "looks like" to the model. Then we can score every other location by how similar its embedding is to the known occurrences.

What I Tried: The Evolution of Approaches

Attempt 1: Similarity-Based Scoring

My first approach was simple: