We want to share a transparent account of a service disruption that affected some Elicit users between February 25-26, 2026. We take incidents like this seriously, and we owe you a clear explanation of what happened, what we've done about it, and how we're preventing it from happening again.
What happened
Between February 25-26, some Elicit sessions — primarily Systematic Literature Reviews and other long-running workflows — became stuck or failed to load. Retrying these sessions did not resolve the issue. In total, 1,747 sessions belonging to 1,416 users were affected.
The root cause was a capacity issue in our session storage infrastructure. As usage grew, the system reached a threshold where it began automatically removing data to free up space — including data from sessions that were still actively in use. This caused those sessions to become permanently unresponsive.
Timeline
February 17 (initial reports)
- We received reports of slowness in large Systematic Literature Reviews and opened an internal incident.
- We identified and deployed fixes for two contributing performance issues (a connection pooling bug and CPU bottleneck in our backend).
- The system stabilized, and we continued monitoring.
February 25, afternoon (UTC)
- Error rates increased again. We reopened the incident and began investigating.
February 26
- Morning: Continued investigation revealed that the issue was more serious than initially understood — session data was being silently lost due to storage capacity pressure, not just experiencing slowness. We escalated the incident to our highest active severity level and pulled in additional engineers.
- Late morning: We identified the proximate cause (a backend feature creating excessive load on the storage system) and disabled it.
- Afternoon: We identified the root cause (storage capacity exhaustion) and applied an infrastructure upgrade, doubling capacity. We also began restoring affected sessions.
- Evening: The infrastructure upgrade completed. Data loss stopped immediately.
February 26, evening onward (remediation)
- We restored 593 sessions from automatic backups. The remaining affected sessions had already been superseded by newer versions or did not require restoration.
- We began a broader cleanup of our storage system to prevent recurrence.
What we're doing about it