Original Post — Direct link
about 4 years ago - /u/RiotBrentmeister - Direct link

Originally posted by Clearskky

God my brain was thoroughly exhausted by the time I finished reading the article, its insane how many tweaks Riot had to perform to get the amount of juice they required from their servers. At times like this I wish I was a CS major instead of being self taught so I could better understand the concepts in the article.

I'm curious how Riot makes sure their tweaks to the Clocksource, C-states, OS Scheduler etc don't result in unusual, hard to replicate problems and what their cooling solution looks like.

Sorry if it was a bit much. I tried to ride a fine line of saying something meaningful and accurate vs being overly technical. In terms of cooling/power we don't really have many special requirements. I initially was concerned more about it but the actual power draw (and heat generation) wasn't even measurable in the test cases we looked at. I'd assume it has something to do with the way the power-supply / board draws power but I'm not certain as it's not my area of expertise.

Another thing to consider is that if we can reduce the number of hosts we even need to rack by tweaking, we can reduce overall power draw and heat generation of the data center.

about 4 years ago - /u/RiotBrentmeister - Direct link

Originally posted by kolibrionextasy

I actually enjoyed the amount of technical detail! I just finished my degree and I'm thinking about getting into Backend-Development/Engineering. Are there more informations available about which technology stack VALORANT is using on the backend and maybe how it influenced performance?

Glad you enjoyed it!

The servers that host games are a pretty simple tech stack. We deploy a single docker container that contains a go service that communicates back with the core game services that manage matchmaking players. The service just starts game server processes (which are written in C++ using the Unreal Engine).

You can read more about how the entire backend is developed and deployed in this series of tech blogs if you're interested. https://technology.riotgames.com/news/running-online-services-riot-part-i

In terms of performance, it didn't affect us much. I mention the one story with the erlang scheduler problem in the blog. We had another issue where we saw poor performance if the docker image didn't match the host OS. Other than that, it's not a huge impact in terms of overall performance.

about 4 years ago - /u/RiotBrentmeister - Direct link

Originally posted by Pontiflakes

That was a cool read. I wonder how Riot feels about server performance since go-live. I see low server fps notification every other game. Kills take a noticeable moment to register. Playing Counter-Strike feels like playing on LAN in comparison. Do the lower tick rate servers process frames that much more quickly or is there something else?

We do monitor server tick rates on LIVE using the same data I show in this chart. For the vast majority of servers in we see a steady 128 tick rate. We do have a problem where on round transition the server takes a long time to setup the initial frame and spawn everyone in. That's usually when you'll see low server FPS show up in the top right.

You can actually see server tick rate in your game if you go to Video -> Stats -> Server Tick rate. You can see a graph of server frametime in realtime. It's generally a boring straight line. There are specific regions and specific data centers where they ran hot for periods of time and start to dip but we're constantly adding hardware as it's available and the player base grows.

The more common "server problems" tend to stem from networking outages on the route from players to our servers and complications that cause dropped packets or significant increases in network latency. The network graphs also exist in the settings menu.

In terms of the feel difference you're seeing I can't be sure. My guess would be potentially high ping or packet loss but it's hard to say without seeing it. We're still working to improve the capabilities of Riot Direct and have added more data centers post-launch to help address outlier areas of poor ping. Hopefully any issues you're seeing will be address by this combination of efforts. We're always striving to get better.

about 4 years ago - /u/RiotBrentmeister - Direct link

Originally posted by rrwoods

25 ms frame times is 40 FPS; 20 ms is 50 FPS. FPS (which is a measure of performance) went up by 25%.

... That said, since the previous sentence was talking about frame time rather than frame rate, I'd guess this is a mistake.

Your FPS example is close to what I intended. If I were to make a 20ms task take 10ms I would see that as a doubling of performance (100%) not a 50% increase in performance. You can do double the amount of the work with the same resources. I can see how this is confusing the way I wrote it. Sorry and thanks for trying to explain it.

about 4 years ago - /u/RiotBrentmeister - Direct link

Originally posted by JakeHomanics

Hey just wanted to say, as a Unity Developer of 5ish years a lot of what you talked about was very well understood and I have limited knowledge of networking. Unity uses RPCs and in a similar way the synced properties.

It was very interesting to see micro-managed solutions because when you’re developing you usually use whatever method “just works” with your code base. I can imagine the developers thought “these are just properties, let’s go the easy way and just have them automatically sync”. With RPCs you basically add an extra step in your code.

Have you guys ever experienced an impassible wall? In my experience, everything can be reworked to be completely viable solutions, however it would take a lot of extra development time to build out the perfect solution at every step. But I imagine something just wouldn’t be possible somewhere. I.E. a player’s ability just becomes too much for the GPU or increases frame times no matter what you tried. In networking, maybe transferring a lot of necessary data takes too long for the packets to get across and there’s no way to reduce the data. Has a gameplay mechanic had to be scrapped because of any of these reasons?

My colleague wrote a great article on the lengths we went through to get through some pretty tough walls on the client with the visual targets we want to hit. https://technology.riotgames.com/news/valorant-shaders-and-gameplay-clarity

When we hit a truly impassable wall we try to reduce the problem back to it's basic essence in simplest terms and rethink our approach from there. In my experience, specific solutions can be unusable but you can usually solve the core problem another way. Some things have to be scrapped in the end though. A common joke on the team is that one day we'll do mirrors. Mirrors offer some interesting gameplay & visual opportunities but from a rendering perspective they're a nightmare to support because you effectively end up rendering twice. You get "similar" interesting gameplay opportunities as mirrors from something like Cypher's camera though!

about 4 years ago - /u/RiotBrentmeister - Direct link

Originally posted by Parrity

contains a go service

A wild gopher appears. :pikachuface:

I am curious, what ended up being the magic number settled on for game instances per server?

Once we addressed some of the hardware configuration problems we were able to scale linearly with number of cores. You can just take the number of cores on the system and multiply it by three. We use a variety of sized servers depending on regional availability and cost but they all perform the same. The largest I've seen us use is a 56 core system that hosts 168 games or 1680 players.