Magic The Gathering: Arena

Magic The Gathering: Arena Dev Tracker




30 Mar

Comment

Originally posted by randommuser90

What was the parsers first response to being fed Emrakul?

A syntax error due to being unfamiliar with the phrase "After that turn". #wotc_staff

Comment

Originally posted by PhRzN

Thanks for the info!

Oh, one other great reason is that the whole point of our regression tests is to identify situations where we end up generating wrong code. If we make a change for a future card that would break an old card (in a way that we've tested), we really want the old card's code to change too: our test will fail and we'll notice that the change we're making is the wrong change to make. If the old card's code is frozen, then our regression tests aren't really doing much for us! #wotc_staff

Comment

Originally posted by ZodiacWalrus

Did anyone else think from the title that "Pierce Flesh" and "Spirit Alike" were actually MTG cards and they had somehow made new card-breaking bugs after fixing Kunai/Crowbar lmao?

Thankfully not, though Pierce Flesh does sound like it would make a great black removal spell.

It's a reference to the flavor text of the Kunai! Thanks to /u/WotC_Megan for the idea. #wotc_staff

Comment

Originally posted by PhRzN

Why do you need to regenerate the rules for all cards on each release instead of freezing? Is it because new cards can change behaviors of old ones due to new mechanics?

Ack, sorry I missed this one! There are plenty of reasons to regenerate card code:

  • The biggest reason is consistency. We do not want to be in a world where two nearly identically worded abilities work totally differently because one was "locked in" years ago and behaves totally out of date. Worse yet would be something like two different printings of the same card behaving differently!

  • Frequently, updates we do to new cards want to apply to old cards too, particularly for rules-tangential behavior (client communication, autotap, trigger ordering, etc.). If we've identified behavior in an ability we want to treat differently for those tasks, it's good to apply it to all abilities with that behavior.

  • Rules changes aren't super infrequent in MTG. Having our code be regenerated makes it so that we don't have to manually identify which cards are affected by the rules change - as long as we correctly handle the new requirements ge...

Read more
Comment

Originally posted by DonRobo

How do you fix those? Do you hardcore fixes for those or can you develop generic solutions?

We almost never make kludges for cards - our normal workflow is to build solutions that would handle variation: what similar designs would have the same issue? Can we preemptively handle those? For example, for [[Muldrotha, the Grave Tide]], our solution made it so we could handle a similarly worded card that allowed you to play different colors of cards, or different land types, etc., even though Muldrotha is one-of-a-kind. #wotc_staff

Comment

Originally posted by ClapSalientCheeks

Here's another instance of a weird "apply this effect to a permanent" occurrence: Urabrask's forge will destroy any available UF-created creature token, ignoring whether or not the token is "THAT token"

Example: Forge makes a token.

Phase the token out; UF finds nothing to sacrifice.

Next turn, a new token is both created and destroyed before the end step.

At this end step, UF will still sacrifice the first token from a turn ago, even though the oracle text does not refer to it.

Haven't yet tested this effect on copies of the token.

We are unable to reproduce this bug, on live or in our dev environments. Are you sure the reproduction steps here are accurate? #wotc_staff

Comment

Originally posted by WotC_BenFinkel

Well, the dream has always been for the card parser to be a massive productivity boost for backfilling MTG's card catalog. There are a few reasons why it isn't just a snap-of-the-finger though:

  • The Pareto Principle applies: the parser is excellent at handling normal MTG card text, but a sizeable number of MTG cards do things that really no other card does. For example, [[Void Winnower]]'s prohibition on casting even-mana value spells would play some havoc with casting X-cost spells. Perhaps we could just dump the large proportion of cards that work "for free" in engine...

  • ... but in-engine isn't the only concern. There's also the client experience to consider. The engine's been worked on for longer and supports some interactions that the client has never needed to implement before. Plus there's our standards of presentation: we want new content we release to meet our standards of clarity to players and to work with our auxiliary systems like autotap...

Read more

Another factor here is QA time. Just because the parser thinks it understands a card, it doesn't necessarily mean that it's right (there are some great stories here that we'll tell sometime). We need to either write automated tests to validate behavior (which takes time) or have QA manually test the card in a variety of scenarios (also takes time).

We want to expand Arena's card pool just like players want. At the most obvious level, releasing new cards directly makes us money. But, more than that, we all work here because we love Magic, and we love Arena, and we want it to continue to grow. But we need to balance that with the people we have who can do the work required. As Ben noted, we're also ...

Read more
Comment

Originally posted by Douglasjm

The rules have a concept of "printed on", and use it in the definition of characteristic-defining abilities and the rules for linked abilities, and also for the starting point of applying layers, as well as resolving object references by card name. To implement the rules in a way that conceptually matches how they are written, the Arena code should also recognize and use this concept. Copy effects and shenanigans with Vecna and such do complicate it a bit, though.

For reasoning about it, let's define a concept I'll call "effectively printed on", or "EPO". In the simplest case, without copy effects or shenanigans, an ability is effectively printed on the card that it is, in fact, literally physically printed on. Any reference by name to that card should be interpreted as referring to the ability's EPO card.

To determine an ability's EPO card in the presence of copy effects, it is necessary to distinguish between conferred and non-conferred abilities. A conferred abil...

Read more

I guess my point I'm trying to make is that we do have such a concept (it is a bit hard to discuss given that "ability" means three separate but tangled concepts in MTG). There absolutely is a relationship between an ability-on-card and the card that possesses it, and that's normally how self-references are interpreted. A large part of the complication here (and complication -> misunderstanding -> bugs) is that self-references mean different things in different contexts in MTG text. Your summary of the correct answer is pretty accurate, and it reflects our current logic, but getting there took some iteration and seeing more examples of abilities that contradicted our understanding. #wotc_staff

Comment

Originally posted by ThoseThingsAreWeird

What's a "new feature" for us?

In my head I've always separated Arena out into "the actual game of Magic" and then "the bits Arena adds". So I guess new Rules (e.g. Incubate, I think that fits your Rule description), but then also new Arena bits (like the new Codex of the Multiverse)?

What if we have tests for "Draw two cards" already but now a card comes out with "Draw five cards" - is that a new feature?

I guess that depends on how your parser was set up, but I'd wager you've written the parser to be smart enough to say "Draw 2" is the same as "Draw 5" as those are two different tokens1 ("Draw" and number). But then I guess that raises the question of something like is "Draw 1, then Scry 1" the same as "Draw 1" and "Scry 1" (i.e. combined vs separate)?

Our line is "involved developer changes to the parser or engine"

Yeah that m...

Read more

Tokenization is a component of our parsing process, one very early in the process. It's true that replacing one token with another similar one is often not worth considering to be a big difference. But what about one sentence structure with another? For example "If you would draw a card, draw two cards instead" vs. "If you would draw a card, instead draw two cards" should behave the same despite being worded differently (and both are in fact valid wordings). If we already handled a phrase like "If CARDNAME would deal damage, it deals twice that much damage instead" as well as "If CARDNAME would deal damage, instead it deals twice that much damage", then we've already handled that syntactical difference. Let's say we wrote tests for the latter two cards; we'd find in ad-hoc testing that once we got either version of the draw replacement working (and wrote tests verifying it), the other one would work too. Given that it worked "out of the box", how much effort should we spend testing...

Read more

29 Mar

Comment

Originally posted by saxophoneplayingcat

How do you detect the 25% needing individual attention?

Two main ways:

  • The parser fails to generate code. This is great! It's recognizing that something is outside its current boundaries. We usually have a good idea of what we need to do from the error messaging.

  • The parser generates wrong code. Less great. Human QA needs to play the card to see that it's doing the wrong thing. The most common type of problem here is with "anaphora resolution" - figuring out what ambiguous phrases like "it" or "that creature" mean. Why, I just estimated the complexity of a few LTR bugs with that issue moments ago... #wotc_staff

Comment

Originally posted by Un111KnoWn

How similarly does MTG Arena work compared to how MTG Online works?

Pretty broad question. In one sense, we're somewhat similar: we both make code happen starting from English strings from new cards to make a good MTG play experience. But our engineering is completely different, from code generation to the actual engine design. #wotc_staff

Comment

Originally posted by gitgudds3

I thought the end of this story would be:

“Thank goodness!” said Bilbo laughing, and handed him the tobacco-jar.

Perhaps for an LTR implementation tale! #wotc_staff

Comment

Originally posted by Juuuuuuuules

As a noncoder, I found this super interesting. I know it’s more work for you but I’d love more of these posts when it’s relevant.

I'd love to tell more stories about "challenging developments that went smoothly". I think there's a couple challenges to that:

  • Less of a narrative! With a bug, there's an immediate hook of "how did that happen", then a cool investigation, a eureka of the issue, and often an embarrassingly simple fix (this bug was fixed just by deleting a line of code!). I think that makes for a pretty clear flow. Most implementation stories don't have such a narrative structure to them, which makes them harder to write about.

  • Scope of background. Even this post had coworkers dozing off with the groundwork I presented to describe the bug. New features are often even less cleanly described.

  • When? What? It can be hard after-the-fact to decide what would make an interesting story to talk about, or when to talk about it.

Still, the reaction to Ian's post and this has us pretty interested in doing more. Heck, I've always wanted to! I'm...

Read more
Comment

Originally posted by Flyrpotacreepugmu

That's quite an interesting look behind the scenes. Ever since someone mentioned that Gutter Grime was the cause, I've been trying to think of how that could possibly break these equipment, but I never would've guessed that was how it happened.

That bit about Falco Spara was also interesting. It also reminded me that multiple copies of [[Muldrotha, the Gravetide]] don't work properly (or at least didn't a couple months ago). Casting one spell of each type removes the option even if you have multiple Muldrothas that should each be able to cast one. I wonder if that's a similar issue to Falco Spara where they all try to do the same thing, or if it's because of Muldrotha's unique UI...

I believe that had been a UI bug, where the client was improperly batching the Muldrotha permissions in its presentation of your actions. #wotc_staff

Comment

Originally posted by Douglasjm

We decided that the salient feature of these cards was that they were on Auras and Equipment and made special code to handle self-references in those cases.

It seems obvious to me that the salient feature is which card the ability was printed on. This is not the first time I've seen a bug in Arena result from not properly considering the "printed on" relationship, though the other one I remember had to do with linked abilities. It makes me wonder if the dev team, and/or the design of the code base, need more awareness of the importance of that relationship.

Can you clarify what your suggestion is? "Printed on" is a pretty ambiguous concept:

  • What about copy effects? If card A becomes a copy of Gutter Grime and triggers to make an Ooze, the reference to "Gutter Grime" on the Ooze means "Card A".

  • That still holds true even if Card A stops being a copy of Gutter Grime.

  • Through horrible shenanigans you're able to make The Book of Vile Darkness create a Vecna token that has Gutter Grime's triggered ability. In that case the "Gutter Grime" phrase on the Ooze it creates refers to the Vecna that made the Ooze token. Was that ability "printed on" Vecna?

I think perhaps what you're trying to say is "Gutter Grime" in the conferred ability refers to "the card that conferred this ability to this Ooze". But that's the whole point of this post - identifying when a self-reference is like that is nontrivial. Our original logic, due to the cards we had covered on Arena, was myo...

Read more
Comment

Originally posted by r_xy

so how do you choose what cards get a regression test?

if the conferred ability was such a headache to originally implement wouldnt that make it a good candidate for one?

We test cards that involved a developer's effort to get to work in the first place. Human QA does a pass over a set to identify what didn't automatically work from the first time we generate code for a new card set. Anything that doesn't work at that point is, well, my day job! And work we do there gets verified against regression by an automated test.

When we're closer to release, QA does another full pass to hopefully identify regressions, again focusing on the new cards due to the huge explosion of possible interactions.

I identified in the OP the relevant cards in the story: Heliod's Punishment has plenty of tests that lean on "self-reference in conferred abilities". Unfortunately Heliod's Punishment's behavior doesn't involve the ProposeEffectCostResource rule, which was the center of this bug: its conferred ability's only cost is the tap-symbol. #wotc_staff

Comment

Originally posted by slavazin

I'm curious about regression testing. You said that those are difficult to write due to simulating parts of a full game. Why not actually run full games (or slices of full games from state A to B) in some headless mode? Either pull the gameplay from standard tournament games, or play a few games and record the gameplay? you can mix a lot of cards with unique interactions, and after each resolution of a trigger, compare the game state with the recorded state/delta. From my extremely limited pov the downsides would be a lot of computer time spent running through somewhat meaningless actions, but if they're fast enough, you can load a lot of unique game situations in 30 minutes of playing and recording a game. An error can then display the card/trigger that caused the trigger and the mismatch in outcome.
Just curious as to the drawbacks

The problem with taking a recording of a game and saying "make sure it plays like that again" is in determining what "like that" means. We do plenty of changes to the game that don't change the gameplay outcome but do, for example, change the information in requests and responses to the client, change what information is available in the game, change autotap strategies, etc. The advantage with our "scripted game" tests is that we're able to decide precisely what is important to verify with automated assertions, and what aspects of the game's proceedings are allowed to vary over the development of Arena as a project. #wotc_staff

Comment

Originally posted by RealisticCommentBot

Confusing as the Falco thing is, this exact scenario I think happens (and is a bit odd) when you have two copies of Serra Paragon out, as you have to choose which Serra Paragon you are using to cast a card from your graveyard.

It could totally be relevant mainly because that ability is activate only once each turn compared to spara, but as a user once I'd seen it happen once or twice I undertood what was happening.

I feel it would be similar for spara, but it's defeintly more confusing when they both have counters on them (which is likley the case as they ETB with counters)

The notion is it doesn't matter which Falco you use - the action behaves the exact same way for either. For Serra Paragon, it does matter which you use - that one can't be used again this turn (and maybe you'd prefer to use the one with fewer +1/+1 counters on it just in case your opponent has removal!) #wotc_staff

Comment

Originally posted by ThoseThingsAreWeird

Therefore, we don't create such tests for every new card on MTG Arena

This surprises me a little bit, but it probably has a reasonable answer.

We create regression tests for each new feature, but we've done that from the start. So yeah that adds an extra bit of time onto creating each feature, but we've got a certain level of confidence that we're not breaking stuff in the future (assuming we right the tests correctly, which we always do every time ever...). In the grand scheme of things it's a lot of time, but for each release it's a relatively small amount of time.

Was there a period of time when you weren't creating regression tests? Or is it that your approach to regression tests wasn't covering every Rule? Presumably covering every Rule, would mean you cover every card with an ability? Or actually, that'd need to have regressions on every Rule interacting with every other Rule... Ok yeah I see where this is going...

...
Read more

What's a "new feature" for us? This has always been a pretty interesting question to me, for a code-generating system. When a vanilla creature comes out, do you recommend we make a regression test for it? What should the content of that test be? What about a french vanilla creature? What if we have tests for "Draw two cards" already but now a card comes out with "Draw five cards" - is that a new feature?

Our line is "involved developer changes to the parser or engine". This does miss bugs, but in my opinion it is rare. And the greater focus on "new work" allows us to put much more attention in testing the boundary scenarios for the riskiest new behaviors.

Slightly before I joined WotC 6 years ago, our regression test framework was much more inconvenient and brittle, but pretty much from day 1 of engine development there has been some form or another of testing.

As for our strategy for testing, our normal standard is a scripted game with assertions a...

Read more
Comment

Originally posted by jasonsavory123

Can I ask why this approach to creating rules was chosen and simultaneously we don’t have a larger card pool? If the rules are generated by reading oracle rules text, why is pioneer, modern, legacy etc not available ?

I could understand the smaller card pool if rules were manually implemented as functions or equivalent, but this threw me for a loop as something that seems too complex for the limited card pool the game started with.

Well, the dream has always been for the card parser to be a massive productivity boost for backfilling MTG's card catalog. There are a few reasons why it isn't just a snap-of-the-finger though:

  • The Pareto Principle applies: the parser is excellent at handling normal MTG card text, but a sizeable number of MTG cards do things that really no other card does. For example, [[Void Winnower]]'s prohibition on casting even-mana value spells would play some havoc with casting X-cost spells. Perhaps we could just dump the large proportion of cards that work "for free" in engine...

  • ... but in-engine isn't the only concern. There's also the client experience to consider. The engine's been worked on for longer and supports some interactions that the client has never needed to implement before. Plus there's our standards of presentation: we want new content we release to meet our standards of clarity to players and to work with our auxiliary systems like autotap...

Read more