[2024] Friday Facts #421 - Optimizations 2.0

8 months ago - /u/FactorioTeam - Direct link

8 months ago - Rseding, boskid, kovarex - Direct link

Hello,
We all love building bigger and bigger, but hitting the UPS ceiling really puts a damper on the mood.
Thats why we must continue our endless quest to optimize the game.

Roboports OptimizationRseding

I've profiled many save files over the years of working on Factorio and frequently see saves where logistics and or construction robots are taking a lot of update time. That's nothing new, but along with robots come Roboports - in large numbers.

A typical factory with lots of roboports.

Roboports have never been "slow" but they're always present and people are encouraged to build a lot of them - even more in the upcoming Space Age where you want to do a lot remotely. After the most recent play-testing session, the resulting save once again showed them taking some small, but non-zero amount of time, and it got me thinking about them again.

It would be really nice if they didn't need to be active and updated all the time. Most Roboports don't have anything to do except consume power. Every so often (relative to their lifetime) they will have some robots come along and need to charge/station. But, most don't really do anything except exist as an extension of the logistics network. So, I tried an experiment; what if I just turned them off when they don't need to do anything special? If a robot comes along and wants to charge, or one of the other rare situations where they need to do things, I just turn them back on until it's over.

There were of course more complexities to it than just that, but the end result worked.
The time spent on Roboports in our recent play-testing save file dropped from an average of 1ms to 0.025ms per tick.

Radar logic optimizationRseding

Earlier this month a minor feature card appeared on the to-do list to add a "small radar coverage" area to Roboports. Normally that would be about 5 minutes of work to implement, but the card had a small stipulation attached: "do it in a way where overlapping areas don't increase the performance cost".

Up until this point radars the entity and radar features built into other things (like the player) were a very simple "every so often, iterate around it and ask the map system to keep it revealed". That worked fine for the most part because you don't typically have a lot of overlapping players or radars. But now, we wanted Roboports - an entity you do build in very tight clusters with a lot of overlap - to do the same logic.

I settled on a registration style system where anything that wants to reveal an area of the map simply registers the chunks as "keep revealed" by increasing a counter for that chunk in the map system. As long as the counter is larger than zero the chunk stays visible. Things can overlap as much as they want and it simply increases the counters.

Debug option showing the 'keep revealed' counter.

If a chunk has a counter greater than 1, it is put in an update bucket, and we loop through 1 bucket each tick.
When the counter goes back to 0 for a chunk, its removed from the bucket and it stops getting updated.

Mp4 playback not supported on your device.
Looping through the buckets to keep the chunk charted.
(50% speed)

With this new system I was also able to replace the radar entity's logic and other entities radar logic so they all use the same shared registrations. This had a surprising effect in that radar entities take less update time, and combined with Roboports providing radar coverage meant you didn't need as many radars.

On another play-testing save file we had it improved the overall game performance by 3.6%. Given it wasn't meant to have any measurable impact (if anything I expected adding radar coverage to Roboports would make it worse), a 3.6% overall improvement was great.

So it was a success, and in 2.0 Roboports will have a small 'built-in' radar range of 2 chunks.

Lamp Always ONboskid

During our recent office LAN party when we were eating at a restaurant kovarex shared an idea that given it is possible to pick any RGB color for lamps, he wants to place an image using lamps. Only problem was in order to power the lamps he would need to place substations every so often making the image slightly ugly. At this point I said "unless you built them on a space platform" knowing that on platforms you do not need electric poles. He seemed to be happy and excited while I remained calm not knowing what will happen the next day.

When the next day of playtesting arrived, he made a 100 wide by 150 high blueprint containing lamps that he placed on the space platform with an image converted into RGB colors of those lamps. There was however one significant issue that was to solve: on the space platform it is always a daytime so lamps do not turn on. Lamps had to be connected to circuit network to force enable them during a day. This was quickly noticed as the update time on our quickly growing save file went up too fast.

One of the lamps forced on by control behavior.

Having 15,000 lamps with control behavior that needs to update every tick was not optimal so I made lamps possible to be forced on even during a day.

Lamp using Always ON.

This made our save file to update faster by about 1.2ms just by being able to skip having lamps connected to circuit network. However by looking at the update time of control behaviors after the change was made I was slightly worried why the control behavior update is still so high, so I had to identify it.

Belt reader and multithreading control behaviorsboskid

Control behavior update being still high was easy to explain. It was the feature that I made: the ability to read content of all belts in sequence (FFF-405).

The belt reader itself was added for similar reasons as the Lamp Always ON - to reduce the amount of active control behaviors as every belt piece had to be connected individually to read content. Ability to read content of all belts in a sequence means there was need for only one of those belts to be connected with wire, only one control behavior needs to count the items and transport lines themselves do not need to be split into 1 tile long pieces. Effectively by adding belt reader it was possible to significantly reduce update costs of control behaviors and update costs of transport lines while also gaining new abilities that were not possible before (like counting items that are in underground belts).

The problem with belt reader is that it is so easy to use, during our playtesting it was used quite a lot, not only on space platforms which was the intended use case but also in other places.

Belt reader used where I was not expecting it.

I could say that Hrusa was the primary saboteur of our playtesting. He was responsible for some of the belt reader usages that could be solved without them. There were also active provider chests placed on Fulgora that turned it into a logistic robots hell with more than 10k logistic robots being in the air at all times. I cannot punish him for playing the game so it was time to optimize. Optimizing code for logistic robots was for someone else, belt reader and control behaviors are however for me to handle.

Belt reader under the hood is really simple: every tick it has to see what items are on the belts and how many of them are in each stack on belt to count the total.

Optimizing belt readers went through couple of iterations. One of them was trying to keep running total on transport lines however it failed.

The turning point for me was realizing a really simple fact: belt reader is mostly a "read" operation: it just reads a lot of data from memory (content of belt stacks) to count items and at the end produces a single frame of signals to be sent to a circuit network. This structure means I should be able to do multithreading on it: multiple belt readers computed at the same time do not interfere with each other as they only read belt contents and their output is not used by other belt readers. Similar structure was easy to see in other places like roboports reading logistic network content, arithmetic combinators, selector combinators and decider combinators. With the exception of selector combinator (which could no longer use a global random generator) all of those were relatively easy to make multithreaded.

With those changes done, our playtesting save file could run about 9.5% faster.

A synthetic save file filled with 77k of combinators interconnected with 6k circuit networks I was using during development was running 14.9x faster while keeping a consistent 100% CPU usage (a benchmark time went down from 131s to just about 8.2s to complete).

Failed attempt: Multithreading electric network updateboskid

This is one of those things that I keep hearing from everyone: make electric network update multithreaded.

The electric network update was already improved (FFF-209), however it was still performed by only one thread which was often observed to be the slowest one to finish. In most cases save games have just 1 huge network, so multithreading wouldn't make much benefit. However with Space age, there are many large networks (different planets), so the potential is greater.

As for anything that is multithreaded, the first step that has to be done is identifying the moving parts and how they interact with each other. All of this is required because the game needs to remain fully deterministic or a desync would happen.

At first glance you would think it should be pretty simple, each thread works on a different electric network and its done. This is however not the case because there is one game mechanic that makes it slightly more complicated: the ability to have an entity powered by multiple electric networks.

One of the cases where electric networks are not independent.

In this example when the oil refinery is crafting, its energy storage is discharged and the electric network needs to charge it back again. Here we have 2 possible cases that can happen:

Left electric network updates first - charging the oil refinery from the Steam engine, the boiler would burn fuel and the inserter would activate.
Right electric network updates first - charging the oil refinery from the solar panels (in which case boiler remains idle).

Because of this, the networks are dependant on each other, and must be updated by the same thread.

After finding all the cases like this (like power switch closing causing multiple networks to merge), I was able to define what should an update group contain. If two electric networks had at least one entity powered by both networks, those networks had to be in the same update group, be updated by the same thread and desyncs will not happen.

I was able to start doing measurements...

The results

And this is where this idea failed completely. I could see our playtesting save file having at least 4 large electric networks completely independent since they are on different planets, however the electric network update time remained the same while the CPU usage went significantly up.

As it turns out, electric network update does not really do much: it just reads two variables, does one or two additions and goes to the next entity. It was memory throughput limited and by having more threads reading the data from memory, processor cannot simply read data faster. With multithreading here, instead of having one thread wait for memory, all of the threads doing electric network update were waiting for memory, slowing each other.

To fully reject this idea I had to use additional profiling tools like Intel's VTune which allowed me to give more numeric arguments that showed that electric network is memory throughput limited. Our playtesting save file had electric network update time improved from 0.5ms to 0.39ms while CPU usage went up from 0.5% to 15%. Overall the save file was not running faster.

Smarter update of worker robots kovarex

I noticed a problem with logistic robots in our office save: someone removed a cable from the automated system adding more robots to roboports on Fulgora. In a few hours, we ended up with an overpopulated logistic networks with 10k robots having nowhere to go, because all the roboports were full.

Since no one noticed this problems for hours, the first reaction was to add an alert for when the worker robots don't have any space to station, similar to the alert of no storage space.

Homeless robots huddling around roboports.

But the underlying problem is, that the worker robot update is too simplistic. All the robots are iterated every tick and their update logic executed. However the update logic is most of the time very very simple, things like "continue moving towards this target", "wait in the queue to start charging", etc.

It is not a new idea, to fake the smooth movement of something while actually updating only once in a while. We used the chunk update planner (FFF-161) and the smoke update trick (FFF-84) 9 years ago, so it was just a question of time until these techniques would be applied to the worker robots.

The problem with the worker robots, compared to just smokes is, that they do a lot of different things (different types of job, stationing, charging, etc.), but since I was motivated to find another way to optimise the game, I sat down, and jumped right in.

The main part of the challenge was the robots moving, as most of the logic worked in a simple way of:

Are we there yet?
No! Move closer
Are we there yet?
Yes, do the next part of the logic.

But now we need to have move intention stored in the robot and use that when we want to move somewhere. Once the move intention is known, the robot can enter limbo and the rendering code can use this intention to pretend the robot is moving smoothly all the time. This can also be used to calculate the arrival time at the destination, to know when to schedule the next update.

Mp4 playback not supported on your device.
Debug visualisation: Red (real robot) is only updated once every 20 ticks. Ghost robot is the predicted position used for rendering.

There needs to be some limit, as the robot can be traveling for a long time, and maybe some distant robot travels into the center of our screen, but we wouldn't know about it, as its physical position would be too far. We check edges of screen for different reasons of similar kinds (lamps making light from offscreen for example), so there is some leeway. Because of that, I just decided to define some hardcoded magic numbers of maximum 20 ticks of movement without update and 60 ticks of stationary robot without updates. These could be probably increased, but the practical performance gain would have extremely diminishing returns at this point.

The result

In the office LAN party save, the overall save performance was increased by 15%, and it generally depends on the amount of robots, but with some heavy robot saves, the overall performance gain is usually around 10-25%. I call this a success.

With all of the changes combined, we are getting into more comfortable territory when it comes to performance with bigger space-age saves. But it would be nice to push it even a little bit more, to allow players to go more crazy. We have some ideas which could help lot, so stay tuned, and hope that these would be success stories and not failed postmortems.

As always, let us know what you think at the usual places.

8 months ago - /u/FactorioTeam - Direct link

External link →

8 months ago - /u/kovarex - Direct link

Originally posted by 15_Redstones

Having the bots move once every 20 ticks looks interesting, but how does that work with robot-biter interaction? If a worm tries to kill a bot that flies overhead, it needs its current position updated at the speed of combat.

The construction robots (only kind of robots relevant to this), upate every single tick whenever enemies are around. This keeps the behaviour around enemies consistent, but keeps the performance boost when enemies are not around (which is usually almost always).

8 months ago - /u/kovarex - Direct link

Originally posted by Early-Pomegranate-54

I have a save file that i quit a while ago because of performance. I was about 900 hours in with 39% of research done. I wanted to complete a PY run with every intermediate having a train station but with only the base game, no LTN or Cybersyn

Some spec:

Trains: 1130

Train station: 5372

UPS: between 24-26

My train take 2/3 of the update time

https://preview.redd.it/64gfgsqm0ved1.png?width=942&format=png&auto=webp&s=b24d3eece5319361ddba5e1ad795d2d6bbf07f83

But 95% of those repath are not required. The train go in a straight line and wont encounter a fork until about 300-400 tile later. It should be scheduled for a repath. But the repath should not happen until he really need to figure out were he want to go for an optimal path… a fork

There is still quite a big reserve when it comes to both train movement and repathing, it just wasn't usually that much time consuming in our saves, but with the additional improvements, it is just a question of time before it does. Can you post a link to the save (or the forum with the save?)

8 months ago - /u/kovarex - Direct link

Originally posted by smurphy1

Good read. I still think the electric network update could be changed to having one electric buffer per entity type (eg all fast inserters on one network share the same buffer and all the stack inserters share a different buffer, etc) reducing the number of buffers to update from the number of electric entities to the number of entity types for each network. This would improve both the electric update obviously but should also help the entity update by significantly reducing cache misses for fetching the entity's electric buffer because the buffers are shared.

I believe I have a way to incorporate all electric network mechanics except drain but there are several ways to make something like drain depending on what the intended gameplay purpose is.

Wait a second ....

...

My brain is trying to figure out the reason why it can't work.

Eh? Derp?

Obviously, it would change the behaviour in some ways, but maybe not in ways that would matter. It might be weird for some bigger entities like roboports, which can charge up and discharge. So a lot of roboports in a big network sharing their buffer would mean it takes much longer to run out of energy completely, but maybe it doesn't really matter that much.

For small buffer entities like inserters, assemblers or laser turrets, they have buffers, but just for technical reasons, and we want them to slow down or shut down in a coordinated way when there is not enough electricity anyway, so a shared buffer wouldn't matter.

This would be a problem once we want to independtly update entities at the same time, but in this case, the buffers could still be split based on the group in which the entities would be updated.

And as you suggested, if it is still separated per entity, the ability to produce statistics would be kept intact.

This is stupidly simple and genius idea at the same time :)

8 months ago - /u/kovarex - Direct link

Originally posted by 10g_or_bust

As the game engines stands now, is it better to have more smaller rail blocks (signals closer on straight sections) or fewer larger blocks, assuming we don't make such a drastic change that the distance between trains going the same way changes?

For performance, it is always best to have as few blocks as possible, as blocks (parts of blocks actually, as junctions split it even more), are the steps the pathfindiner is using to find the goal.

This is because of simplification, the pathfinder is using the already existing block structure for itself. But it would be reasonable to build special data-structure for pathfinding, where only junction points would divide individual steps, which would greatly reduce the comlexity of the search in real-life scenarios.

8 months ago - /u/kovarex - Direct link

Originally posted by Soul-Burn

Isn't this already done for accumulators?

When they are placed or changed, they are separate until their energy level syncs up with the rest of the accumulators, and are then computed as a single entity.

Yes, but the difference is, that they can be used together, without any change in behaviour. Consumption machines can use different levels of energy based on activity (one fully beaconed assembler uses way more than assembler full of efficiency modules for example). So the problem with entities is little bit more complex, but solvable.

8 months ago - /u/kovarex - Direct link

Originally posted by smurphy1

My idea is for the electric update to calculate the satisfaction percent based on the energy demanded for the shared buffer and save that in a new field on the energy buffer. If you have a merged buffer of 100J and 50J is consumed but only 30J is available to refill the buffer then the satisfaction is 60% (30J/50J) even though the buffer would be 80% full. During the entity update each entity would key off the percent satisfied field on the energy buffer instead of the energy available in the buffer.

This results in behavior almost identical to current behavior even in low power states. The one way this changes behavior slightly is in a low power state where an entity goes from idle to active. Currently the entity will operate at 100% speed for 1 tick because its buffer is full at first before reaching the equilibrium speed based on how low the power is. With this idea the newly active entity would operate at the speed of the current satisfaction equilibrium but if you simulate this scenario over multiple ticks you'll see that the extra speed of the current behavior ends up spread out over multiple ticks with diminishing effect as it converges toward the same equilibrium as individual buffers.

For roboports and laser turrets the issue is the energy buffer is violating single responsibility. It's the energy distribution abstraction and the entity charge state. If entities with a charge state move that internally then the buffer is just a distribution abstraction again. This means the buffer would be scaled based on input_flow_limit. This would mean these entities would need to be active to pull energy from the buffer into their internal charge state but that's probably offset by the gains elsewhere. It would also allow for different behavior such as turrets resting in an uncharged state and only charging to shoot when enemies come into range.

You could even merge buffers of electric generators by doing the inverse of percentSatisfied with what percent of previous charge level was consumed and using that value to set the max value each generator could add to the shared buffer. Just like with the low power scenario with some active and some idle, the merged buffer for generators will converge on the same equilibrium value as current mechanics even if the generators have different amounts of steam available to them.

Not sure how to handle drain but it might not be necessary anymore since the amount of energy in the buffer no longer controls if an entity is considered powered. Also different combinations of module effects could complicate merging buffers or adjusting their size to account for max consumption changes. Entities in multiple networks I think can be handled by the entity consuming from buffers in each network in a consistent order. Probably worse performance for that case but should be more than offset.

Can you code in C++? If you want to work for us and program this (or something else), you can be my guest, and I will not require you to go through the testing process. (This is my reaction of your analysis which was exactly what went through my head thinking about it)

← More from Factorio

Recent Factorio Posts

9 days ago - /u/FactorioTeam

16 days ago - /u/FactorioTeam

18 days ago - /u/Twinsen01

23 days ago - /u/FactorioTeam

Get Discord notifications

[2024] Friday Facts #421 - Optimizations 2.0

Roboports OptimizationRseding

Radar logic optimizationRseding

Lamp Always ONboskid

Belt reader and multithreading control behaviorsboskid

Failed attempt: Multithreading electric network updateboskid

The results

Smarter update of worker robots kovarex

The result

Recent Factorio Posts

Version 2.0.43

Version 2.0.42

Post Space Age - Developer AMA

Version 2.0.41