about 2 years ago - ZOS_MattFiror - Direct link
Since Update 33 launched, I think the PC North American megaserver performance problems deserve some explanation. This post outlines what has been going on the last week or so for our North American PC players.

First, last year (which seems like decades ago) we announced a plan to increase ESO’s stability and performance, and we have been diligently performing tasks behind the scenes with every update to implement them. One of the larger items on this list was "Database Sharding" which is a simple concept: take our giant player database (DB) and separate it into two sections for "current characters" and "older characters" so the entire DB doesn't have to be queried when a player logs in. Over time, our character DB (one per Megaserver) has been growing and about two years ago, its sheer size became a bottleneck. This is why the "requesting character load" part of the login process sometimes takes a lot longer than it should.

The DB Sharding process separates our character databases into a "live" DB and a "cold" DB; all accounts who have logged in over the past year are in the Live DB and older ones are in the Cold DB. The plan, once everything is complete, is that active accounts will pull their characters from the smaller Live DB on login, greatly decreasing login time. Older characters will pull from the Cold DB on login, which will take longer, but once an account logs in their characters are moved over to the Live DB for faster access after the initial login. This character record separation happens the first time an account logs in after sharding has been enabled for that megaserver. The first login may be longer than normal as the copying happens, but after that every login afterwards should be much faster.

The good news here is that we have already done this for most of the live megaservers over the last couple of months; all console megaservers have been upgraded already and login times have greatly decreased.

With that background information, you can now start to understand what happened since Update 33 launched last Monday. The PC character database (especially the North American megaserver) is far, far larger than console as ESO had a big launch year in 2014 (pre-console launch) and all those accounts are still there. In addition, all the Beta accounts (and characters) are still there as well.

So, Update 33 launched last Monday and the plan was to wait until the dust settled, then actually enable sharding on PC NA. On launch day, we tracked the usual in-game bugs and issues that tend to crop up and began work to address them. And there were indeed some problems. There were reports of in-game loading screen timeouts and that the Activity Finder was bogged down. Our first big failure was we chalked these reports up to normal server startup issues after a big update. We later increased our real-time monitoring which showed the Activity Finder and other processes were running a bit "hot" – they would spike a bit, then return to normal. We made adjustments both outside of and during primetime hours to try to alleviate queue issues, but this made it difficult to pinpoint if our adjustments were working or if primetime population on the server was easing. So we – and this was our second large error – decided to move ahead with enabling DB Sharding on the PC NA megaserver without addressing the Activity Finder issues.

And all of you who play on the PC NA megaserver know what happened once we flipped the DB Sharding switch: the entire server slowed down even more during primetime. The DB processes got backed up, which meant that all transfers between processes (i.e. zoning) were even slower, as well as logouts (where your character's DB record is updated) and the Activity Finder (which accesses your character records) became so bogged down it essentially ceased to function at all.

We had done the math and designed the DB Sharding system to work within normal server performance guidelines, so when we started addressing the slowdown issues, we naturally assumed that we had some bad calculations and started there. We made some changes (hence the downtime on Monday earlier this week) but they didn't help at all; performance was still terrible Monday night. Adding to the situation was that we could only troubleshoot on the live server, and only during primetime, because these problems cropped up mostly when the server was under moderate load. But the system ran slowly again Monday night so we knew it was something else.

On Tuesday, with the understanding that the problem was probably not connected to DB Sharding at all, we traced every log we could find to figure out where the bottleneck was and we finally found it – the issue was actually caused by a bad (as in failing) network port that was unable to process as much bandwidth as it was configured for. It wasn't a software problem at all; it was a hardware failure that, in essence, slowed down the entire megaserver. Tuesday’s maintenance was to take that device out of service and reconfigure a replacement, and once that was up, everything returned to normal and the DB Sharding process ran as intended: behind the scenes and with no player impact.

Obviously, there are no guarantees, but we do believe we have gotten to the root of this issue. The TL;DR is that it wasn't related to Update 33, Account Wide Achievements or DB Sharding at all, even though they all happened around the same time and we spent too much time investigating a red herring because of it.

I know this hasn't been an awesome time for any of you on PC. Many of you were unable to login to play and take advantage of the Explorer's Celebration as you otherwise might have. You may have lost time and progress, and to acknowledge that, we are going to be giving out five 150% Experience Scrolls on the first day of April through the Daily Login Rewards calendar and will be tripling the number of Weekly Endeavor Seals the week of 4/4 for players on all ESO platforms.

We have so much to look forward to in April with Jester's Festival, the Anniversary Jubilee, and even more we can't wait to share with you. We hope you'll use these Experience Scrolls during the upcoming 100% bonus XP events and catch up to where you might have been, had the game been running as intended.

Thanks so much for bearing with us and for reading this long explanation. Given the circumstances, I think full disclosure was warranted.