Hello,
We all love building bigger and bigger, but hitting the UPS ceiling really puts a damper on the mood.
Thats why we must continue our endless quest to optimize the game.
Roboports OptimizationRseding
I've profiled many save files over the years of working on Factorio and frequently see saves where logistics and or construction robots are taking a lot of update time. That's nothing new, but along with robots come Roboports - in large numbers.
A typical factory with lots of roboports.
Roboports have never been "slow" but they're always present and people are encouraged to build a lot of them - even more in the upcoming Space Age where you want to do a lot remotely. After the most recent play-testing session, the resulting save once again showed them taking some small, but non-zero amount of time, and it got me thinking about them again.
It would be really nice if they didn't need to be active and updated all the time. Most Roboports don't have anything to do except consume power. Every so often (relative to their lifetime) they will have some robots come along and need to charge/station. But, most don't really do anything except exist as an extension of the logistics network. So, I tried an experiment; what if I just turned them off when they don't need to do anything special? If a robot comes along and wants to charge, or one of the other rare situations where they need to do things, I just turn them back on until it's over.
There were of course more complexities to it than just that, but the end result worked.
The time spent on Roboports in our recent play-testing save file dropped from an average of 1ms to 0.025ms per tick.
Radar logic optimizationRseding
Earlier this month a minor feature card appeared on the to-do list to add a "small radar coverage" area to Roboports. Normally that would be about 5 minutes of work to implement, but the card had a small stipulation attached: "do it in a way where overlapping areas don't increase the performance cost".
Up until this point radars the entity and radar features built into other things (like the player) were a very simple "every so often, iterate around it and ask the map system to keep it revealed". That worked fine for the most part because you don't typically have a lot of overlapping players or radars. But now, we wanted Roboports - an entity you do build in very tight clusters with a lot of overlap - to do the same logic.
I settled on a registration style system where anything that wants to reveal an area of the map simply registers the chunks as "keep revealed" by increasing a counter for that chunk in the map system. As long as the counter is larger than zero the chunk stays visible. Things can overlap as much as they want and it simply increases the counters.
Debug option showing the 'keep revealed' counter.
If a chunk has a counter greater than 1, it is put in an update bucket, and we loop through 1 bucket each tick.
When the counter goes back to 0 for a chunk, its removed from the bucket and it stops getting updated.
Mp4 playback not supported on your device.
Looping through the buckets to keep the chunk charted.
(50% speed)
With this new system I was also able to replace the radar entity's logic and other entities radar logic so they all use the same shared registrations. This had a surprising effect in that radar entities take less update time, and combined with Roboports providing radar coverage meant you didn't need as many radars.
On another play-testing save file we had it improved the overall game performance by 3.6%. Given it wasn't meant to have any measurable impact (if anything I expected adding radar coverage to Roboports would make it worse), a 3.6% overall improvement was great.
So it was a success, and in 2.0 Roboports will have a small 'built-in' radar range of 2 chunks.
Lamp Always ONboskid
During our recent office LAN party when we were eating at a restaurant kovarex shared an idea that given it is possible to pick any RGB color for lamps, he wants to place an image using lamps. Only problem was in order to power the lamps he would need to place substations every so often making the image slightly ugly. At this point I said "unless you built them on a space platform" knowing that on platforms you do not need electric poles. He seemed to be happy and excited while I remained calm not knowing what will happen the next day.
When the next day of playtesting arrived, he made a 100 wide by 150 high blueprint containing lamps that he placed on the space platform with an image converted into RGB colors of those lamps. There was however one significant issue that was to solve: on the space platform it is always a daytime so lamps do not turn on. Lamps had to be connected to circuit network to force enable them during a day. This was quickly noticed as the update time on our quickly growing save file went up too fast.
One of the lamps forced on by control behavior.
Having 15,000 lamps with control behavior that needs to update every tick was not optimal so I made lamps possible to be forced on even during a day.
Lamp using Always ON.
This made our save file to update faster by about 1.2ms just by being able to skip having lamps connected to circuit network. However by looking at the update time of control behaviors after the change was made I was slightly worried why the control behavior update is still so high, so I had to identify it.
Belt reader and multithreading control behaviorsboskid
Control behavior update being still high was easy to explain. It was the feature that I made: the ability to read content of all belts in sequence (FFF-405).
The belt reader itself was added for similar reasons as the Lamp Always ON - to reduce the amount of active control behaviors as every belt piece had to be connected individually to read content. Ability to read content of all belts in a sequence means there was need for only one of those belts to be connected with wire, only one control behavior needs to count the items and transport lines themselves do not need to be split into 1 tile long pieces. Effectively by adding belt reader it was possible to significantly reduce update costs of control behaviors and update costs of transport lines while also gaining new abilities that were not possible before (like counting items that are in underground belts).
The problem with belt reader is that it is so easy to use, during our playtesting it was used quite a lot, not only on space platforms which was the intended use case but also in other places.
Belt reader used where I was not expecting it.
I could say that Hrusa was the primary saboteur of our playtesting. He was responsible for some of the belt reader usages that could be solved without them. There were also active provider chests placed on Fulgora that turned it into a logistic robots hell with more than 10k logistic robots being in the air at all times. I cannot punish him for playing the game so it was time to optimize. Optimizing code for logistic robots was for someone else, belt reader and control behaviors are however for me to handle.
Belt reader under the hood is really simple: every tick it has to see what items are on the belts and how many of them are in each stack on belt to count the total.
Optimizing belt readers went through couple of iterations. One of them was trying to keep running total on transport lines however it failed.
The turning point for me was realizing a really simple fact: belt reader is mostly a "read" operation: it just reads a lot of data from memory (content of belt stacks) to count items and at the end produces a single frame of signals to be sent to a circuit network. This structure means I should be able to do multithreading on it: multiple belt readers computed at the same time do not interfere with each other as they only read belt contents and their output is not used by other belt readers. Similar structure was easy to see in other places like roboports reading logistic network content, arithmetic combinators, selector combinators and decider combinators. With the exception of selector combinator (which could no longer use a global random generator) all of those were relatively easy to make multithreaded.
With those changes done, our playtesting save file could run about 9.5% faster.
A synthetic save file filled with 77k of combinators interconnected with 6k circuit networks I was using during development was running 14.9x faster while keeping a consistent 100% CPU usage (a benchmark time went down from 131s to just about 8.2s to complete).
Failed attempt: Multithreading electric network updateboskid
This is one of those things that I keep hearing from everyone: make electric network update multithreaded.
The electric network update was already improved (FFF-209), however it was still performed by only one thread which was often observed to be the slowest one to finish. In most cases save games have just 1 huge network, so multithreading wouldn't make much benefit. However with Space age, there are many large networks (different planets), so the potential is greater.
As for anything that is multithreaded, the first step that has to be done is identifying the moving parts and how they interact with each other. All of this is required because the game needs to remain fully deterministic or a desync would happen.
At first glance you would think it should be pretty simple, each thread works on a different electric network and its done. This is however not the case because there is one game mechanic that makes it slightly more complicated: the ability to have an entity powered by multiple electric networks.
One of the cases where electric networks are not independent.
In this example when the oil refinery is crafting, its energy storage is discharged and the electric network needs to charge it back again. Here we have 2 possible cases that can happen:
- Left electric network updates first - charging the oil refinery from the Steam engine, the boiler would burn fuel and the inserter would activate.
- Right electric network updates first - charging the oil refinery from the solar panels (in which case boiler remains idle).
Because of this, the networks are dependant on each other, and must be updated by the same thread.
After finding all the cases like this (like power switch closing causing multiple networks to merge), I was able to define what should an update group contain. If two electric networks had at least one entity powered by both networks, those networks had to be in the same update group, be updated by the same thread and desyncs will not happen.
I was able to start doing measurements...
The results
And this is where this idea failed completely. I could see our playtesting save file having at least 4 large electric networks completely independent since they are on different planets, however the electric network update time remained the same while the CPU usage went significantly up.
As it turns out, electric network update does not really do much: it just reads two variables, does one or two additions and goes to the next entity. It was memory throughput limited and by having more threads reading the data from memory, processor cannot simply read data faster. With multithreading here, instead of having one thread wait for memory, all of the threads doing electric network update were waiting for memory, slowing each other.
To fully reject this idea I had to use additional profiling tools like Intel's VTune which allowed me to give more numeric arguments that showed that electric network is memory throughput limited. Our playtesting save file had electric network update time improved from 0.5ms to 0.39ms while CPU usage went up from 0.5% to 15%. Overall the save file was not running faster.
Smarter update of worker robots kovarex
I noticed a problem with logistic robots in our office save: someone removed a cable from the automated system adding more robots to roboports on Fulgora. In a few hours, we ended up with an overpopulated logistic networks with 10k robots having nowhere to go, because all the roboports were full.
Since no one noticed this problems for hours, the first reaction was to add an alert for when the worker robots don't have any space to station, similar to the alert of no storage space.
Homeless robots huddling around roboports.
But the underlying problem is, that the worker robot update is too simplistic. All the robots are iterated every tick and their update logic executed. However the update logic is most of the time very very simple, things like "continue moving towards this target", "wait in the queue to start charging", etc.
It is not a new idea, to fake the smooth movement of something while actually updating only once in a while. We used the chunk update planner (FFF-161) and the smoke update trick (FFF-84) 9 years ago, so it was just a question of time until these techniques would be applied to the worker robots.
The problem with the worker robots, compared to just smokes is, that they do a lot of different things (different types of job, stationing, charging, etc.), but since I was motivated to find another way to optimise the game, I sat down, and jumped right in.
The main part of the challenge was the robots moving, as most of the logic worked in a simple way of:
- Are we there yet?
- No! Move closer
- Are we there yet?
- Yes, do the next part of the logic.
But now we need to have move intention stored in the robot and use that when we want to move somewhere. Once the move intention is known, the robot can enter limbo and the rendering code can use this intention to pretend the robot is moving smoothly all the time. This can also be used to calculate the arrival time at the destination, to know when to schedule the next update.
Mp4 playback not supported on your device.
Debug visualisation: Red (real robot) is only updated once every 20 ticks.
Ghost robot is the predicted position used for rendering.
There needs to be some limit, as the robot can be traveling for a long time, and maybe some distant robot travels into the center of our screen, but we wouldn't know about it, as its physical position would be too far. We check edges of screen for different reasons of similar kinds (lamps making light from offscreen for example), so there is some leeway. Because of that, I just decided to define some hardcoded magic numbers of maximum 20 ticks of movement without update and 60 ticks of stationary robot without updates. These could be probably increased, but the practical performance gain would have extremely diminishing returns at this point.
The result
In the office LAN party save, the overall save performance was increased by 15%, and it generally depends on the amount of robots, but with some heavy robot saves, the overall performance gain is usually around 10-25%. I call this a success.
With all of the changes combined, we are getting into more comfortable territory when it comes to performance with bigger space-age saves. But it would be nice to push it even a little bit more, to allow players to go more crazy. We have some ideas which could help lot, so stay tuned, and hope that these would be success stories and not failed postmortems.
As always, let us know what you think at the usual places.