Original Post — Direct link

Seriously, ALL of us who play the game rely on the API to play the game. When it's down, and we have to fly to the tower, then switch characters and fly back to the tower before we can even play the game with the weapons we want to use, that's a HUGE DOWNER!

It's not just like the API has been down once, it's been down A LOT since Forsaken came out.

I get it, it's a complex system that's architecture is just as complex. You're running a CloudFlare world-wide content delivery system front-end on top of an Amazon Web Services platform of API endpoints and people expect that to be operational 24x7x365. But outages really are unacceptable when there's not an outage with your service provider. Having issues with load? Then just setup auto-scaling of your EC2 instances. Having issues with the API being down, then setup some sorta monitoring solution like Nagios to page someone on-call if the API is down. Setup an on-call rotation and when the API is down on a Sunday have someone login from home and fix it. Literally every other company does this, much smaller, much less profitable companies with only a one or two person IT infrastructure staff do this. Yes, your systems are very large and scaled very big across the globe, but, you can do it, you have the resources to boost the API durability and uptime, please give it the buff it sorely needs.

External link →
about 6 years ago - /u/Achronos - Direct link

IT Buzzword bingo aside, the problem with it isn't a load issue (though high concurrency does make it worse, it isn't the cause), there is a low level bug in the part of the service that it uses to communicate with the game services. We do know what is causing it, the fix is somewhat delicate though, and can't just be slapped in. We are working on it though and hope to fix it soon.