Repository


GitHub - kalliannhale/rewildingCities: A modular toolkit that powers a modern data platform designed to support predictive analytics, policy insight, and community storytelling around green infrastructure and climate resilience.

Video Overview


rewildingCities_architectural implementation.mp4

Project Evolution, Design, & Implementation


The glossary

term definition
curiosity space A research question that is domain-specific but method-agnostic. Articulates what we want to know without prescribing how. Lives in garden/curiosity-space/.
method An abstract analytical approach with explicit choice points. Describes how to answer certain types of questions, with decisions that change meaning. Lives in garden/methods/.
experiment An instantiation binding a curiosity to a method with resolved choices, applied to a specific place. Where abstraction becomes action. Lives in garden/experiments/.
primitive Atomic operation (usually written in R) that does one analytical task and returns an envelope. Drawn from peer-reviewed urban ecology research.
envelope JSON document carrying data path, metadata, provenance, and warnings. The system’s memory and conscience.
manifest YAML file declaring a city’s available data, metadata, and quality notes. What a community has to work with.
plot A city’s directory containing its manifest and cached data. Each city is a plot in the garden.
provenance Chain of custody documenting what ran, when, with what parameters. Accountability as infrastructure.
semantic type What a dataset means (not just its format). park_boundaries vs. just “vector.”
warning The system's honesty about limitations. Levels: info, warning, critical. Warnings accumulate — they never disappear.
validation tier How strictly a field is required. Required (error), Expected (warning), Optional (silent). A gentle trellis, not a locked gate.
profile Pre-configured scope settings (full, dev, test, neighborhood) affecting subsetting and hashing.
crosswalk Mapping between different classification schemes. How we translate between communities’ data languages.
PCI Park Cooling Intensity — measure of a park’s thermal impact on surroundings, from Xiao et al. (2023)

creating rewildingCities:


rewildingCities is an open-source platform facilitating localized explorations of socio-technical systems for climate resilience. The infrastructure itself is pedagogical, inviting collaboration from the entire community: children, elders, students, teachers, aspiring professionals, and established experts.

Localized communities of citizen scientists are encouraged to conduct rigorous geospatial analyses exploring speculative realities for positive communal development in response to the interconnected crises of social and environmental chaos—with the aim of modeling these potentialities and identifying priorities for broader interventions in municipal ecology and development.

We accept curated datasets from public portals, Google Earth Engine exports, academic institutions, and other forms of community-collected information. Each participating community initializes and maintains a public manifest that feeds into the analyses performed on their platform plot. The system is designed to help communities understand what their data can and cannot tell them, and to produce honest, contextualized analysis.

One of the most salient issues we can use as a source of abstraction to pilot a prototype of this research engine is the need to mitigate rising temperatures. Urban heatwaves have quickly become the deadliest of natural disasters (Habeeb et al., 2015; He et al., 2022; Shafiei Shiva et al., 2019). Green infrastructure is the most effective, sustainable, and holistically beneficial way to create a legacy of resilience for our cities. However, the resources and tools needed to analyze the changes required to our urban landscapes are often slow to filter through academic institutions.

This report documents the deployment and validation of a polyglot geospatial pipeline on AWS Batch, demonstrating the cloud-readiness of the rewildingCities platform architecture. The system processes urban park boundaries through a three-step soil layer pipeline (validate → repair → reproject), with full provenance tracking via the Envelope System.

Successful cloud executions processed 2,055 NYC park features in 31 seconds of compute time, detecting and repairing 14 invalid geometries (0.7%), transforming coordinates from WGS84 to NYC State Plane, and accumulating 12 warnings across the provenance chain. The same Docker container runs identically in local development and AWS Batch—validating the “local-first, cloud-ready” design principle.

Key findings: Fargate cold start adds ~99 seconds of queue time before execution begins, but once running, the polyglot Python → R handoff performs efficiently. The Envelope System successfully accumulated warnings across primitive boundaries, demonstrating transparent provenance even in containerized cloud execution.