Original Post — Direct link

There is obviously a problem with MTG Arena with both disconnects and crashes but there seems to be no consistency with when it occurs or to whom. I, personally, have not had either problem while others have it every game or every other game.

I have been guessing for awhile that there is some problem that the Arena software does not "play well" with certain chip sets, graphic cards or sound cards.

I would HIGHLY recommend that, when you submit a report to WOTC for these problems, that you include as much technical information as possible. If you can get the info (look for diagnostic, configuration and other such programs that might have come with your computer), including the make and model of your computer, processor type, version of your operating system, graphics card and software/driver versions, sound card and software/driver versions, etc., put it in your report.

WOTC has no way of knowing this info and, by providing it, they might start to see a pattern in those that are suffering from these problems which, in turn, might allow them to work on a fix.

External link →
over 4 years ago - /u/wotc_beastcode - Direct link

Originally posted by FooberticusBazly

Providing more information about system specs wouldn't hurt anything. But from the evidence I've seen, most if not all of these performance issues are directly tied to network/netcode/server issues.

The game's netcode (client and server side) does not handle network interruptions well. Furthermore, I've seen evidence that the game's render loop is bound to the netcode, blocking until all requests/responses are resolved before rendering frames. I've black box tested this (and other possible causes of these issues) using the game's resource meter, Wireshark, Window's Resource Monitor and various cpu/gpu monitoring tools.

Using the game's resource meter, you can observe the game's framerate is directly tied to the network latency between the client and server. When the latency rises, the framerate drops in lock.

Using Wireshark, you can observe all of the network traffic going to/from the server gateway. Nearly all client-side actions, including those that do not change the game state such as picking up a card and dragging it around the screen without playing it, fire a constant stream of packets to the server. These network requests directly relate to observable spikes in latency, which in turn directly reduces the frame rate.

From these observations you can deduce that the game's main loop is waiting to render frames until network requests/responses are resolved.

Network/netcode/server issues would explain the variance in the reported issues. Two people have the same cpu, gpu and ram. Both are using the same version of Windows. One has problems, the other doesn't. Factor out the commons (hardware, software) and that leaves the network.

Or one person has a monster gaming rig and another is playing on an 8 year old laptop using the onboard gpu. The monster rig's gpu, cpu and ram aren't anywhere close to full load, the laptop is running at 90% cpu and gpu usage. The monster rig has performance issues, the laptop doesn't. Factor out the obvious disparities and that leaves the network.

When most people talk about "internet speed" they're thinking of bandwidth. Bandwidth is a measure of how much concurrent data you can transmit at one time, the width of your pipe, which isn't really a measure of speed at all. You can run one of those online "speed tests" that will tell you your bandwidth and your ping, or the latency between you and whatever internet speed test server you're testing against. This ping value is closer to a measure of internet speed, but all it tells you is the latency between you and that one specific server.

The only measure of latency that really matters in this case is the latency between the game client and server. All your other network connections could be running at 1ms latency, but if the latency between the game client and server is higher then the "internet speed" of all that other stuff doesn't matter.

To make things more complicated, this client/server latency could vary for many different reasons. Playing over wifi vs a hardwired connection will introduce variance. Busy network legs between your home and the server gateway will introduce variance. The servers themselves, if they're under heavy load and are responding slowly, will introduce variance.

The solution to this kind of problem would mean a refactor of the game's core code, including decoupling the render code from the network code so it's not waiting for requests/responses to resolve before rendering frames. The netcode needs to be more resilient to handle network variance without disconnecting.

There is also likely an issue with the server architecture. There may not be enough servers or resources to scale under peak load, or the messaging architecture needs to be rethought, or any number of issues with optimization. What I do know from reading WoTC's tech job postings, the server code is written in C#/.Net and the servers are running in a Microsoft cloud environment instead of using an industry proven, time tested stack and cloud environment.

I'm sure the developers at WoTC are well aware of all of this.

I'm an old software engineer (architect) with over 20 years of experience. These kinds of changes, especially when they touch the very core of both your client and server architecture, are not easy, fast or cheap. It's like keeping your Honda Civic from the 1990s vs buying a new Tesla. The civic is ugly and has 90 horsepower. But it's paid for, gets you where you need to go and costs less to maintain than buying a newer, more efficient car. This is exactly how the people with the checkbooks at Hasbro/WoTC look at the issue. They may eventually replace the Civic, but they're in no big hurry to do it because it still carries the groceries.

Thanks for the detailed and well-thought-out post. Thanks to the OP as well for urging users to send us detailed reports. Believe me that we get them and believe me that we find them useful.

As far as network latency is concerned, I can tell you fairly categorically that the game does not hold rendering or game loop processing to wait for network replies. Game actions like a land being played or a spell resolving are definitely paused to wait for the server, but rendering is still occurring. In fact, one of the dead giveaways that the server is under heavy load or that your connection is having trouble is when you play a land and it pauses mid-flight and hangs out in the full-card mode. We start the animation for playing the land while the network request is in flight, but don’t finish the animation and put the card in play until we get the response.

You can determine if the render loop is still running by looking at some piece of the game that is always in motion (flying creatures, particle effects on battlefield, the Elemental Cat, the glowing effect on the button that moves through phases). If this sort of hang is happening because of network latency, we’d love to know. We have not, ourselves, seen this manifestation, so we’d be very thankful for some metrics and test configurations.

One factor that probably confuses the issue is that a lot of stuff happens when we receive a message. Magic game states are complex and can cause a lot of changes. This often results in asset loads and a bunch of other hefty logic. So there is definitely a correlation between receiving a network message and frame rate hits. Naturally, we are looking into improving those.

You mentioned using Resource Monitor. I’d love to see what your disk access (the “IO Read Bytes/sec” perf counter) looks like in Resource Monitor or Process Explorer while you are experiencing full graphical freezes. We have some rogue disk reads that I think are contributing to the problem.

Thanks again for the effort you have put into this exploration.

over 4 years ago - /u/wotc_beastcode - Direct link

Originally posted by And3riel

One factor that probably confuses the issue is that a lot of stuff happens when we receive a message. Magic game states are complex and can cause a lot of changes. This often results in asset loads and a bunch of other hefty logic. So there is definitely a correlation between

receiving

a network message and frame rate hits. Naturally, we are looking into improving those.

Whoa hold a minute there. You dont have all assets loaded before the match starts? They only load after server issued an action?

Why would you decide to load assets on the fly like that? It seems like the amount of assets needed in one match is pretty much given by the decklists. Or are you telling me that the assets for 150 MTG cards do not fit into memory at once?

There are a number of reasons, but as an example, we can't pre-load your opponents deck (animations, card art, etc...) until you see a given card. If we loaded that data ahead of time, the client could be hacked to show your opponents deck. And naturally we don't want to load every possible asset.