rewildingCities

Urban Thermal Analysis Toolkit: Cloud Experiments


🌷 "We embody, we learn, we release the idea of failure, because it is all data." — adrienne maree brown


title_slide.svg

What is rewildingCities?

An open-source platform where local communities run rigorous geospatial analyses to model climate-resilient futures for their cities. The infrastructure itself is pedagogical — designed for citizen scientists of all levels.

Our first toolkit: urban thermal analysis. Where are the heat islands? How much do parks cool their surroundings? What landscape features drive that cooling? Who bears the heat burden?


What we built this cycle

The analytical primitives

Layer What it does Primitives
soil/ Prepares raw data validate raster/vector · repair geometry · reproject · scale units · filter by area · mask water · crop to boundary · fetch from APIs
roots/geometry/ Spatial operations generate buffer rings · calculate geometry
roots/metrics/ Extract measurements zonal statistics · landscape metrics · calculate PCI (TPM-M) · land cover proportions via crosswalk
roots/statistics/ Statistical analysis correlation (bootstrap + partial) · regression (VIF enforcement, train/test split) · Getis-Ord Gi* · cluster classification

Every analysis is composed from atomic R operations that each do one thing, document everything, and never lie about limitations.

The orchestration system

A Python orchestrator reads experiment YAML files, resolves references to city data and analytical choices, builds a dependency graph, and executes R primitives in sequence — each one breathing through the rewildr package contract.

The envelope system

Every transformation produces a JSON provenance document. Warnings accumulate and never disappear. If data is degraded, the envelope says so. If a park's buffer ate the ocean and produced a negative PCI, the envelope flags it. The system's honesty is infrastructure, not an afterthought.

Cloud validation (completed)

The soil pipeline has been deployed to AWS Batch (Fargate). Same Docker container runs identically locally and in the cloud. 2,055 NYC park boundaries processed in 31 seconds with full provenance tracking.


Code Updates from V1:

The pilot study replicated Xiao et al. (2023) but had known issues. Here's what v2 addresses:

Problem v1 (pilot) v2 (current)
PCI calculation Fixed 480m radius, not real TPM-M True gradient walk: 30m rings, find first local max
Water contamination Coastal parks got negative PCI, no detection mask_raster_by_class.R removes water pixels before extraction; envelope warns on high water %
Land cover NDVI threshold hack for blue/green/grey Crosswalk YAML maps classified raster directly
Regression No train/test split, VIF > 10 ignored Holdout validation, VIF < 7.5 enforced, bootstrap CIs
Code structure Monolithic scripts, hardcoded paths, emoji cats Atomic primitives, rewildr contract, envelope provenance