Service Disruption: Session Data Loss (February 25-26, 2026)

We want to share a transparent account of a service disruption that affected some Elicit users between February 25-26, 2026. We take incidents like this seriously, and we owe you a clear explanation of what happened, what we've done about it, and how we're preventing it from happening again.

What happened

Between February 25-26, some Elicit sessions — primarily Systematic Literature Reviews and other long-running workflows — became stuck or failed to load. Retrying these sessions did not resolve the issue. In total, 1,747 sessions belonging to 1,416 users were affected.

The root cause was a capacity issue in our session storage infrastructure. As usage grew, the system reached a threshold where it began automatically removing data to free up space — including data from sessions that were still actively in use. This caused those sessions to become permanently unresponsive.

Timeline

February 17 (initial reports)

We received reports of slowness in large Systematic Literature Reviews and opened an internal incident.
We identified and deployed fixes for two contributing performance issues (a connection pooling bug and CPU bottleneck in our backend).
The system stabilized, and we continued monitoring.

February 25, afternoon (UTC)

Error rates increased again. We reopened the incident and began investigating.

February 26

Morning: Continued investigation revealed that the issue was more serious than initially understood — session data was being silently lost due to storage capacity pressure, not just experiencing slowness. We escalated the incident to our highest active severity level and pulled in additional engineers.
Late morning: We identified the proximate cause (a backend feature creating excessive load on the storage system) and disabled it.
Afternoon: We identified the root cause (storage capacity exhaustion) and applied an infrastructure upgrade, doubling capacity. We also began restoring affected sessions.
Evening: The infrastructure upgrade completed. Data loss stopped immediately.

February 26, evening onward (remediation)

We restored 593 sessions from automatic backups. The remaining affected sessions had already been superseded by newer versions or did not require restoration.
We began a broader cleanup of our storage system to prevent recurrence.

What happened

Timeline

What we're doing about it