This document describes the operational practices and support commitments for the hosted Zero service operated by Rocicorp, known as Cloud Zero.
Scope
This SLA applies to Cloud Zero when run under the Professional or Managed plans.
Monitoring
Cloud Zero is monitored 24/7 using automated monitoring and alerting systems. Alerts notify the on-call engineer when service health or availability degrades.
Service availability is evaluated using multiple operational signals. A violation of any of the following conditions triggers an alert and pages the on-call engineer:
- Replication lag: mean replication lag exceeding 2.5 seconds over any rolling 30-second window.
- Internal errors: the zero-cache service reporting internal errors while serving existing or new clients.
- Service reachability: the zero-cache service failing health checks or becoming unreachable.
Additional signals are also monitored to detect degraded service or abnormal behavior.
Incident Response
When monitoring systems trigger a production alert:
- An engineer will be paged immediately.
- Rocicorp will acknowledge the incident and begin investigation within 30 minutes.
- In practice, incidents are typically acknowledged and investigated within minutes. If the responding engineer cannot resolve the issue quickly, the incident will be escalated to additional engineers.
Deployments
To minimize risk to production systems:
- Major changes will not be deployed during working hours (EST–PST). Minor changes targeted at fixing active issues will be deployed asap.
- Changes are typically rolled out to other customers or internal environments first before being deployed to your environment.
Support