Summary

From approximately 6:45am PDT to 9:45am PDT on 05/27/25, Opal’s web platform experienced a partial outage causing some users to be stuck in an indefinite loading state.

Impact

Some Opal users (<100), in both cloud and non-airgapped on-prem environments, were unable to access Opal web. Non web surfaces (CLI, Slack, API) were unaffected and available.

Severity: Sev 1

This incident was classified by a SEV 1 in accordance with our SEV guidelines. While not all users were affected, those that were were unable to accomplish necessary tasks in Opal, and blocked for the extent of outage.

Root cause analysis

The root cause was an outage with a downstream vendor - LaunchDarkly in a critical loading path. Opal uses LaunchDarkly to roll out new features, gate access, and enable emergency killswitches. LaunchDarkly feature flags are among the first thing loaded when Opal initializes and the outage there caused the entire application to crash. See here for LaunchDarkly’s incident report.

Actions taken

Removed LaunchDarkly from the critical path and added a mode for Opal to load without feature flags if LaunchDarkly servers go down for any reason. Some features may be down during these periods, but Opal will generally be usable for core functionality.

Screen Recording 2025-05-28 at 1.31.36 PM.mov

Timeline

6:45am PDT: Received first alert from a customer about Opal screens not loading, followed by 3 more customers within the next 15 minutes.
6:56am PDT: Opal engineers start investigating the issue.
7:08am PDT: Opal engineers identify the sub-processor outage as the cause and continue to monitor the status and determine the scope of the Opal outage.
9:44am PDT: LaunchDarkly services fully recover, as well as all Opal flows for all customers.

Next steps

Immediately

Within a day of the incident, the change mentioned in “Actions taken” was made to remove LD Client successfully loading from the critical path of an authenticated Opal app. If an outage of LaunchDarkly were to happen again, this would still allow Opal to function with fall-back value for the flags. This change is to be merged immediately into effect for cloud and on-prem instances.

Longer Term Q2+

Opal engineers will continue to look into more robust fail-safe setups to ensure feature flag values have sane fall-back values in the case of a sub-processor outage.