For my (fellow) nerds, I wanted to address two main things.
1. Defining Plat+ (Classification Accuracy Shrinking Sample Size)
TL;DR 1: Outlined below, we currently use a classification system (A) which we believe prioritizes accuracy, but does lower the sample size pool (ex: Plat+) by a small amount. The old system (B), which most sites use, inflates win rates and sample sizes - plain and simple, it's not acceptable to use a system that says a class (Plat+) has a 55% winrate on average. That would incorrectly mean that most builds have a positive win rate in Plat+. That skews the data way more than sample size. I'd try to remember that size of sample is not the be-all-end-all of accuracy.
Let's use this sample game:
Team 1 (W): Plat 4, Plat 4, Gold 1, Gold 1, Gold 1 | Team 2 (L): Gold 1, Gold 1, Gold 1, Gold 1, Gold 1
As a simple example, here are 2 classification strategies (among many). We currently use Class A.
Classification A (Average):
Approach: average the MMRs and classify the results as the averaged MMR (here: Gold).
Result: This game counts as 5 Gold Wins and 5 Gold Losses.
Classification B (Individualized):
Approach: Each player's W/L counts towards their actual rank.
Result: This game counts as 2 Plat Wins and 3 Gold Wins and 5 Gold Losses.
Class A Pros:
+Will have 50% wins and 50% losses
+The numbers for Gold will add up to 100%
Class A Cons:
-Fewer champions analyzed. We lose 2 Plat games which are classified as Gold games in the above example.
-Less data for higher ranks. All Challenger games are rare, so mixes of Challenger + Masters will be classified as Masters games.
Class B Pros:
+More champions analyzed
+More data for higher ranks
Class B Cons:
-As MMR increases, win rates will be inflated as Plat players are more likely to win than Gold players, etc.
-Conclusion: Matches in a given MMR will be extremely unlikely to add up to 100% System B results in Plat+ players, as a class, having an average of 55%+ win rate. If this were the only game in our database, our site would say that Plat players would have a 100% winrate and Gold players have a negative winrate. You can understand how this is problematic and can inflate results across millions of games. A sample size needs to be significantly large, but size is not the absolute determinant in an accurate interpretation of the results.
2. Outages In Last Few Months Causing Small Data
Here are some of the issues that caused U.GG data outages in the past few months:
Patch 10.1: Season ID from Riot API is 13, which was the same season id from season 9. We made an assumption and expected this number to be 14 and this caused our systems a few months ago to crash.
Patch 10.4: Rewrote our match crawler to more efficiently support profiles and tier list. Previously we were seeding with a group of players and searching through their matches to add other players to our queue. Now we crawl through the leaderboard to grab every match. We had issues while transitioning this process.
Patch 10.8: Adjusted our code base because one of our databases, Elastic Search, changed many defaults with a recent update.
The shard default for the index went from 10 to 2. This caused I/O issues with our database.
I would like to reiterate that these issues are not an excuse. Because of our position in the League of Legends eco system, our users deserve better. We were too reckless in deploying our changes and have caused headache for a lot of you. We're working on refining our process to avoid these types of outages. We will make U.GG accuracy and reliability a top priority.