Degraded performance

Incident Report for Magic

Postmortem

What happened

A spike in traffic occurred and our API pods did not scale out in enough time.

The following were not possible or experienced partly degraded service:

  • API Login
  • Logging into the Magic Dashboard

How we responded

During the outage, our engineers rapidly joined a virtual war-room to triage the situation and to find the fastest, most impactful step forward. After investigating the issue, we began to increase our overall server fleet size, which stabilized the traffic spike.

We also found that our scale-up policies had room for improvement. Our team immediately modified our scale-up policies to better accommodate for high traffic spikes in the future, which we were able to observe during later traffic spikes.

Posted Aug 11, 2022 - 11:55 PDT

Resolved

This incident has been resolved.
Posted Jul 24, 2022 - 20:16 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 24, 2022 - 19:25 PDT

Investigating

You may temporarily experience increased latency and error rates. We are actively investigating the degraded performance issue.
Posted Jul 24, 2022 - 19:25 PDT
This incident affected: Authentication, API, and Dashboard.