over 2 years
ago -
Community_Team
-
Direct link
We sat down with Ben, one of our UI Programmers who fixed the "mangled text" bug that's been affecting Path of Exile for a long time. Because there was some interest in the bug itself, he was kind enough to write up a post-mortem for us. Check it out below!
After announcing we finally found a fix for the infamous "mangled text" bug players have been encountering for the past 6 years, some players expressed interest in getting a post-mortem on this age-old issue. I'd always been interested in trying my hand at a technical details post someday so here you are! The bug has been known by a few different names, referred to by things such as "jumbled", "corrupted" or "krangled" text. Internally we mostly referred to it as being "mangled" so that's what I'll be calling it.
I can tell you that the bug was introduced to the codebase April 25th of 2016 and made it into production in 2.3.0 with Prophecy League. It was introduced by some refactoring of the text engine in order to support the then-upcoming Xbox version of Path of Exile.
The Symptoms
I feel safe in saying that a majority of players encountered this bug at some point over the years, mostly showing up after longer play sessions, but some players encountered it much more often than others. It could affect any text in the game and there were two distinct effects of the bug:One where the kerning, or spacing between the individual drawn glyphs, was either too large or too small.And another where the individual characters were being graphically represented by the wrong glyph:
Some keen-eyed players had noticed that in the second case, applying a substitution cipher could restore the original text.
Example: "Pevpf`^l D^j^db" -> "Physical Damage", ^ maps to "a".
One thing which always struck us as being odd was that capital letters would have a different (or sometimes no offset) compared to lowercase letters.
The Hunt
The earliest bug ticket for this issue was made on June 4th 2016, created from reports on the forums just after Prophecy's release. The biggest hurdle was we could never find a way to reliably reproduce the bug on our machines, only having it pop up very rarely and randomly. From what I heard we only ever got it on a programmer's machine once or twice, which is key to letting us inspect what's in memory to gather clues as to what went wrong. Until we could find reproduction steps the best we could do was some speculative fixes and hope the issue stopped getting reported. Due to lack of finding anything and it not being a show-stopping issue it got downgraded to lower importance so more time could be given to new features and other fixes.Many developers (myself included) had made their own attempts at finding the issue over the years, all while more links to user reports would get added every other month to remind us of this puzzling issue. From the report gathering, screenshots and my own experience I could tell the following things:
- It affected individual font styles (combination of typeface, size, italic/bold status), rather than particular text displays or strings.
- It didn't appear to be a texture generation, corruption or atlasing issue as none of the glyphs ever seemed to be clipped or cut in half. The rare time we got the bug on a programmer's machine also confirmed this.
- Logging out would not resolve the issue in most cases, only a client restart.
- I noticed we never got a report for this happening on Xbox, PlayStation or MacOS, which ended up helping me narrow it down the most to a particular area of the text engine.
Around the release of Scourge I noticed the bug seemed to start getting reported a bit more often and I started experiencing the bug more in my own play sessions. I recorded most of these occurrences, collected images from players and started building a couple of hunches, but still couldn't find reproduction steps other than "play the game a bunch". A couple of weeks ago I got a bit of a gap in my tasks and decided to make another serious attempt, spending a few days diving fully into the text engine to read through and understand all its intricacies.
The Fix
While deep diving through the text engine code, I finally came upon the following function:SCRIPT_CACHE* ShapingEngineUniscribe::GetFontScriptCache( const Resources::Font& font ){ const auto font_resource = font.GetResource()->GetPointer(); // `font_script_caches` here is a map of `const FontResource*` to `SCRIPT_CACHE` values auto it = font_script_caches.find( font_resource ); if( it == std::end( font_script_caches ) ) it = font_script_caches.emplace( std::make_pair( font_resource, nullptr ) ).first; return &it->second;}
For non-programmers: this function takes in a reference to a particular font resource and uses its location in memory as a key (lookup value) for a SCRIPT_CACHE data object, creating a new entry if it doesn't already exist. The function then returns a pointer to the SCRIPT_CACHE object, which lets the function caller modify the stored SCRIPT_CACHE instead of a copy which wouldn't have its changes persisted in the `font_script_caches` map.
The SCRIPT_CACHE object here is an opaque data object used by the Windows Uniscribe library (which we only use for the Windows version of the client). The Uniscribe documentation doesn't give insight into what information is actually stored by this, just that the application must keep one of these for each "character style" used. From the effects of the text mangling bug though we can infer that it's used for kerning and mapping characters to glyph textures at least.
Upon first glance this function appears to be doing something completely reasonable, which is probably why the issue never got noticed all these years. You only spot the issue once you realise that font resources can be unloaded by our resource manager when they are no longer in use. The bug then occurs when another font (different typeface, style and/or size) happens to get loaded by the resource manager into the exact same location memory, causing the new font to reuse the old one's SCRIPT_CACHE.
Once I had found this I did a couple of tests to confirm my theory that this was the issue.
Forcing every font to use the same script cache immediately produced this upon starting the game:
Huzzah! Both types of symptoms on display, which was also confirmation that these effects were from the same issue and not two separate ones. From this I was then able to reproduce the bug naturally by purposefully loading and unloading as many fonts as possible, until you get a new font occupying an old one's memory location:
Now that we know the problem, there were a few ways to go about fixing this: You could move the SCRIPT_CACHE object to belong to the Resource::Font object, delete the old SCRIPT_CACHE whenever the font gets unloaded, or swap the lookup value from the memory address to be instead be based on the typeface, size and styling of the font, which is what actually makes a font unique. All these options work but each has its own pros and cons and should be weighed based on how it fits into the larger systems.
Summary
The actual cause of the bug isn't that interesting in itself, just realising that memory addresses can and do get reused, so you need to be careful if/when using pointers as keys. This bug will stick around in my memory though due to the particularly strange symptoms, being really annoying to track down and even just the notoriety of having stuck around for so long. I'll almost miss it in a way since it's one less "grand mystery" for my brain to pick at. Guess I just need to find the next mystery issue to fascinate myself with!Thanks to everyone who has reported this and other issues over the years! Software development and debugging in particular can be strange and the smallest of things can produce really weird bugs. Detailed bug reports are always really valuable for building a picture of what could be happening and helps us reproduce the problem ourselves, which then lets us develop and test fixes instead of poking around in the dark.
R.I.P T l e e v