Original Post — Direct link

Recently, Amazon and Riot had a hackathon that asked people to globally rank teams. Unfortunately I don't know Python and I have no idea how do to use AWS, so I wasn't able to submit something.

But I am doing a PhD in biostats, so I used R for data management and analytics and got some findings I wanted to share (that way my efforts weren't in vain).

For context: The data I used is from Oracle's Elixir. Similar to the hackathon, I'm only going to use 2020-2023 data.

Application of Elo

I decided to create a modified elo formula to rank teams. Quick recap: This is a generic elo formula that gives the expected probability of player A beating player B, and then you update the ratings of A and B through this formula.

My modified elo formula looks like this, where I use the difference in features (kills, assists, etc.) between blue side and red side, denoted by delta. The idea behind delta is that would consider how close or how one-sided the game was and then reward/penalize teams for that. Then my updating formula looks like this, where the update is weighted by region, split, tournament, and time of match (earlier matches have lower weight, to reflect patches).

First, I naively just fitted the formula on a team level and the results came out like this:

Rank Team Elo
1 DWG KIA 1668.87
2 T1 1656.81
3 Gen.G 1642.70
4 Royal Never Give Up 1620.54
5 JD Gaming 1613.45
6 EDward Gaming 1608.96
7 Top Esports 1605.18
8 G2 Esports 1584.90
9 PSG Talon 1581.88
10 GAM Esports 1580.11

I think we can all agree that this is not accurate. I think DWG KIA is so high because they won Worlds in 2020, thus putting their match weights higher.

We also know that we should actually consider players on a team to determine their elo. I will apply this elo formula on a player level by role. Here delta will be used to be the difference in features within the role. In theory, if a player performed really well compared to his counterpart, but his team lost the match, then he wouldn't be penalized as much. Then the team elo would be calculated by naively averaging all player elos (we can argue that teams have different emphasis on how they play through their players, but just for simplicity sake).

Here are the results for top 10 players in each role and their elo.

Rank Top Jungle Mid ADC Support
1 369 1724.46 Oner 1721.28 Faker 1746.72 Ruler 1718.66 Keria 1728.66
2 Zeus 1698.00 Kanavi 1709.42 Chovy 1719.33 Gumayusi 1706.42 Lehends 1699.42
3 Doran 1680.12 Peanut 1704.80 knight 1710.53 Hope 1673.56 BeryL 1695.11
4 Bin 1639.76 Canyon 1672.95 Yagao 1686.82 JackeyLove 1642.44 Meiko 1640.20
5 Canna 1605.51 Clid 1614.24 ShowMaker 1678.49 Deft 1608.72 SwordArt 1629.78
6 Kiaya 1601.33 Karsa 1611.76 Scout 1663.86 huanfeng 1592.67 yuyanjia 1624.72
7 Wayward 1592.10 Tian 1610.72 Bdd 1626.69 Light 1589.97 ON 1609.01
8 BrokenBlade 1591.26 Tarzan 1605.76 Caps 1606.64 Aiming 1580.85 Kellin 1607.47
9 Hanabi 1584.06 Levi 1604.23 Kati 1606.41 GALA 1579.83 Bie 1592.34
10 Ale 1581.92 Cuzz 1603.10 Xiaohu 1604.56 Viper 1578.70 MISSING 1588.56

We can say relatively that this is surprisingly accurate on a player level. Now applying the average to get the team elo:

Rank Team Elo
1 T1 1720.22
2 JD Gaming 1657.90
3 Gen.G 1649.78
4 Bilibili Gaming 1614.64
5 KT Rolster 1614.45
6 Dplus KIA 1613.10
7 LNG Esports 1590.90
8 Frank Esports 1586.31
9 Weibo Gaming 1582.13
10 G2 Esports 1577.30

Here we get something ok. Intuitively, T1 is highest because of their performance in the past 3 Worlds. But interesting a wild Frank Esports from Hong Kong is top 8, so there definitely needs to be some adjustments with my formula. Overall, not too shabby for a second attempt.

Machine Learning on Winning Probabilities for Worlds 2023

Switching gears a little bit, I wanted to look at winning probabilities of team A beating team B in Worlds.

With the Swiss Stage, this was a good opportunity to look at all possible team match-ups. I trained XGBoost, a machine learning algorithm, on the 2023 season up to the beginning of Worlds, and tested the model on all possible combinations of the Worlds teams competing against each other. One of the training features was also the elo of the team at that time point given my formula.

Here is the heatmap. The graph is read across the rows as the probability of the team on the Y axis beating a team on the X axis. The lighter the color the better. We can see that the model highly values JDG and Gen.G to beat pretty much all the teams. The model also surprisingly highly overvalues GAM Esports for some reason, and undervalues Bilibili Gaming.

If you read all this, thanks for reading. If anyone from Amazon or Riot read this, sorry for not being able to submit anything in time. If there's any questions about the methodology, I'll try to answer it the best I can. Thanks again for reading and enjoy Worlds!

TL;DR Created a modified elo formula to get rankings of all teams, then applied machine learning with the elo formula to get winning probabilities of all teams competing in Worlds

External link →
over 1 year ago - /u/soudle_noop - Direct link

The use of additional statistics to compliment the model is a pretty smart idea, especially when dealing with this kind of problem which has very few number of games available.

On top of my head, I think another interesting thing to try is to use a model that can propagate through time. Say this one (warning: I've never personally used it), or any other kinds of TTT (Trueskill Through Time) implementations that are out there. It might lead to more accurate estimates.