Cloud Zero Professional SLA

This document describes the operational practices and support commitments for the hosted Zero service operated by Rocicorp, known as Cloud Zero.

Scope

This SLA applies to Cloud Zero when run under the Professional or Managed plans.

Monitoring

Cloud Zero is monitored 24/7 using automated monitoring and alerting systems. Alerts notify the on-call engineer when service health or availability degrades.

Service availability is evaluated using multiple operational signals. A violation of any of the following conditions triggers an alert and pages the on-call engineer:

Replication lag: mean replication lag exceeding 2.5 seconds over any rolling 30-second window.
Internal errors: the zero-cache service reporting internal errors while serving existing or new clients.
Service reachability: the zero-cache service failing health checks or becoming unreachable.

Additional signals are also monitored to detect degraded service or abnormal behavior.

Incident Response

When monitoring systems trigger a production alert:

An engineer will be paged immediately.
Rocicorp will acknowledge the incident and begin investigation within 30 minutes.
In practice, incidents are typically acknowledged and investigated within minutes. If the responding engineer cannot resolve the issue quickly, the incident will be escalated to additional engineers.

Deployments

To minimize risk to production systems:

Major changes will not be deployed during working hours (EST–PST). Minor changes targeted at fixing active issues will be deployed asap.
Changes are typically rolled out to other customers or internal environments first before being deployed to your environment.

Scope

Monitoring

Incident Response

Deployments

Support