Hello,
We've had a lot of requests to talk about map generation. It's difficult to talk about map generation without first
explaining noise expressions.
From time to time we need to talk about noise expressions anyway because they are a critical part of the game, but I
don't think we've ever done a good job of explaining what they actually are at a high level.
We will a closer look at planet mapgen again in the future, but for now this will introduce the basic concepts and act
as a primer for later.
What are noise expressions?Earendel
The "expression" part
When making a game world in Factorio you need to decide what gets placed where. All you really have to work with is the X and Y position. The terrain generator can't know anything about what is already placed. Somehow you need some code that converts X and Y into the type of tile to place, and which trees, rocks, resources, decoratives, cliffs, or enemies to place.
The middle of the map where X and Y are both 0 is the origin and the starting position. We always want this to be land otherwise you're stuck in water. We can calculate the distance from the origin to get a distance "cone" and use it to make a circular island where everything above some value is land, otherwise it is water.
(Images above are not to scale.)
Although, we don't always need to change the probability of both tiles at once, we only need to make sure that the tile we want has the most influence where we want it. For example, land could always have a "weight" of 1, and then water could just have a weight higher than 1 when we want water to appear.
If we can add values to the X and Y coordinates before it goes into the distance function then it shifts the cone to a different position. Offset cones could be used to make new islands, add a section to the existing island, or invert the new cone to take chunks out of an island.
We also have most of the arithmetic operators, like absolute value, modulo, exponents, and trigonometry functions. The trigonometric functions can be used to rotate positions instead of just using offsets, and that's the main trick to the starting area of Vulcanus.
The core part of the expressions is that we can chain these operations together for things like: test_1 = A + B * C. But we can also make one noise expression reference the output of another one, e.g: test_doubler = test_1 * 2.
So all of this is great, you could make an interesting crop circle pattern with this if you wanted to, but it's not great for a natural landscape. For that we want some noise.
The "noise" part
In terrain generation noise does not mean sound, it just means just random numbers.
When making random numbers, your most basic setup is to generate completely random numbers every time you need one. This is called incoherent noise and no point has any relation to any other point. If you zoom in you just get more complete randomness so the usefulness is quite limited.
Coherent noise is different. It makes good use of the X and Y coordinates so that nearby positions can have similar values. This means that as you move over the landscape things change smoothly and coherently.
The main coherent noise we use is basis noise (a Perlin style noise) from FFF-112. The output values end up with an approximate feature size. If you zoom out it's indistinguishable from incoherent noise, but if you zoom in (making the features larger) then everything keeps smoothing out until it is almost flat.
This is great because we can use some noise with a huge feature size for continents, a medium size for islands and bays, and a smaller noise to break up the coastline a bit more. This is the basic idea behind fractal noise. Multiple levels of different sizes are added together with smaller layers having progressively less impact as they add smaller details to certain areas.
The next type of noise is spot noise. It creates a number of spots on the landscape with a certain amount of spacing. This is what we usually use for resource placement. Each spot is actually a cone, so we can have higher richness in the middle of each patch. The resource cone is then perturbed by adding some basis noise so it's not such a clean circle. If you want to read more about spot noise, I recommend checking out FFF-258.
Left: Nauvis resource cones before the added noise to break up the circle.
Right: Nauvis map as you'd normally see it.
Putting it all together
The real power comes from artistic control over these things and finding the maths required to do it efficiently. Here's a couple of examples:
Spot noise is not just for resources, it can also be used for volcanoes. We add huge spot cones to the elevation to make the main volcano body. We can also invert the cone and "min" the tip to invert the mountain peak for a lava pit.
These sorts of inversions are critical for some things. If you want to have a mostly water map you can just reduce the elevation, but this will tend to make islands. What if you want a mostly water map but still have most of the land connected?
For this we can use absolute value and bounce any negative value to be positive. If we then invert it, all of the values are negative, but a small addition bumps a narrow band into the positive region. This makes a series of narrow land paths that almost always connect. We use this sort of pattern in the Vulcanus lava area to make sure that there's a way through the maze.
Noise toolsEarendel
Before joining Wube I was working on new planets for my Space Exploration mod (and still am). In Space Exploration each planet type will have unique map generation a bit like the planets in Space Age. Making just 1 completely new planet is a huge amount of work, so when embarking on making 14+ new types, I decided to invest in making some new tools to make the process easier. This resulted in my own set of Noise Tools, that do a number of things:
- Autoplace cleaning so that testing is much faster.
- Planet-switching presets to be able to preview other planets.
- A convenient inline way to add temporary debug sliders so that noise expression values can be adjusted from the map preview screen.
- And last but not least, the ability to use the map preview to visualise noise expression output directly.
The last one about noise visualisation is the one that is really difficult to do without. Without it you are very detached from the output of the system. Let's say for example, you add some new code that is supposed to make some of the existing sand on the map go from a yellow version to a red version. You load up the game, wander around, but you don't see any red sand. What went wrong? It could be any number of things. Maybe it is actually working but coincidentally the areas that should be red happened to be under water and grass instead of sand. More likely though, if you've explored a large area and still didn't find any then it is broken somewhere, but where?
The noise visualiser lets me choose a specific noise expression and convert that to an image in the preview screen. That way I can see things like the distribution and output values of expressions to say things like, the scale is way larger than I thought so you'd need to go 10km to find any difference, or the output range is too low and it's never strong enough to make a change, or maybe one value accidentally goes negative, multiplies with another negative and causes some other unexpected problems.
Left: Nauvis map as you'd normally see it.
Right: Nauvis elevation. Blue is elevation below zero with darker values being deeper. Yellow is high, green is
very high. Used for water and cliffs.
This really shines when working on things like multiple biomes. Usually if you just rely on tile change as an indicator, you can see that a change happened when going from one biome to another, but you can't tell what the rate of change is like. Often a soft rate of change is better so vegetation can fade out as things get drier, but usually there's no way to see that ahead of time. With the visualiser you can tell if the biome transition is hard or soft because it can display up to 255 different values and show you the gradient.
Left: Vulcanus elevation. Blue is elevation below zero with darker values being deeper. Yellow is high, green is
very high.
Right: Vulcanus temperature. Black is cool, red is warm, yellow is hot. Used to place hot tilesets.
The way it works is still a bit hacky because it runs as a mod and not part of the game engine. Essentially it gets all the tiles in the game, changes their map colours to different values from black to white, or blues > reds > greens, and then it reassigns those tiles to only appear somewhere based on the noise expression you are trying to visualise. Using blue for values below zero and other hues above zero is very useful for elevation because that makes the zero line transition very clear, and that is important because that tends to be our coastline. You can't really play the game like this but it's invaluable in the early stages of a new planet, or when trying to debug something.
So naturally, when I started making planets at Wube I updated my tools so I could use them for Space Age. I'm pretty confident in saying that with these tools I can work on planets about 10x faster.
Not only that, but Genhis and I have been working on some fancy new noise functions that we'll reveal along with a new planet later. The new functions have a lot of settings and trying to get them to all work without my noise tools would not have been fun. In fact, I have my doubts that we would have been able to finish all the features of the new system without some way of seeing what we were doing.
I'll release Noise Tools for 2.0 when the expansion is released.
Map generation in C++Genhis
When the first Vulcanus map generation overhaul was merged, we noticed a significant slowdown as we launched the game and generated the planet terrain. This would have been a mild inconvenience for players, but launching the debug version many times a day made it very noticeable. So, we would either have to give up on fancy map generation or spend some time to make it faster.
Map generation already runs in multiple threads where possible and we try to optimise the code for SIMD execution (single instruction, multiple data). Runtime map generation is therefore efficient enough which generally matched what we observed about Vulcanus. The issue had to be somewhere else. As I dived deeper and deeper into noise expressions, I found some areas for improvement.
In C++, noise expressions represent an abstract syntax tree (AST) of mathematical operations. Each noise expression is a class instance storing its children. If you don't know what that means, just think of it as a bag that can contain other bags. They can be combined and nested up to hardware limits. The structure is fully built from its Lua counterpart. When a surface is created, noise expressions are compiled to a noise program. In general, each NamedNoiseExpression is a procedure in the program. Procedures are useful because we can reuse data for multiple steps. The procedure contains a linear sequence of noise operations with already resolved dependencies, so it is guaranteed that children of a later operation have already been computed. This structure is optimised for fast computation when you need all data, so changes like short-circuiting if statements can't easily be done. Additionally, before noise expressions are compiled, they are recursively simplified – if all their children are constant, they can be folded into a constant as well. This step can't be done sooner because some variables depend on map settings.
Noise expressions lifecycle in 1.1
Optimising the internal structure
The system wasn't expected to be pushed to its limits, so the code didn't deduplicate identical expressions and allocated them separately. Base game allocated 31'000 objects and Vulcanus even more, 280'000. I added a global storage which cached expressions based on their hash and could retrieve one without constructing the full object first. It reduced the number of objects to 5'300 and saved us 125 MB of RAM.
Having so many duplicate expressions surely means there are other places which don't reuse things, right? For example, the simplification step. When it found an opportunity to apply constant folding, the whole branch of the AST had to be re-created to replace one expression with a constant. This step created many temporary objects – almost as many as the number of permanently allocated expressions (200'000 for Vulcanus). I wanted to merge this into the compilation step and simplify expressions "just in time". My attempt was successful and creating the Vulcanus surface from the inefficient noise expression tree became 15x faster.
I continued the refactoring having in mind that writing optimal noise expressions is hard. My goal was to make the compiler aid modders with the optimisation step. Therefore, I tried to deduplicate expressions used in multiple procedures. The new system would extract them to separate procedures, so that their result could be reused at runtime. You don't need to call `noise.delimit_procedure()` anymore, it's automatic!
However, it turns out that figuring out which expressions are used by multiple procedures is not easy and issues started popping up. I would have to sacrifice compilation performance if I wanted to fully deduplicate it. Not only that, but requesting to run a procedure would be more expensive because cross-procedural dependencies were computed on demand. I guess it wouldn't matter because it was still insignificant, but why do something inefficiently when we can optimise it and make the code simpler?
So I decided to remove procedures altogether. Instead of one noise program per surface we now have three (tiles, cliffs, entities + decoratives). These parts were run separately and didn't reuse procedure results anyway. Runtime map generation saw up to 20% improvement and noise program compilation was up to 50% faster, although results varied depending on complexity of noise expressions.
Now you can think of a noise program as one procedure with multiple outputs, which has its advantages. There are no dependencies and everything is inlined. If an intermediate result is not needed anymore, the memory is assigned to another noise operation. So it's like a long C++ function with stack variables being optimised away, although a somewhat simpler version.
Problems with the Lua format
The Lua noise expressions format was introduced in FFF-207. Although many people claim it is pure magic and hard to work with, it offers great flexibility to modders and allows them to create unique and atmospheric maps. Despite this, it has some issues. When you dump the `data.raw` prototype table, you will notice that a significant portion is taken by noise expressions. This is because the format is very verbose with each AST node being a Lua table. Creating so many individual strings and tables also impacts performance.
If we didn't use Lua functions and metatables provided by the noise library, the format would look like this just to compute "x + 5".
{
type = "function-application",
function_name = "add",
source_location = {filename = ..., line_number = ...}, -- debug symbols, optional
arguments =
{
{
type = "variable",
variable_name = "x",
source_location = {filename = ..., line_number = ...}
},
{
type = "literal-number",
literal_value = 5,
source_location = {filename = ..., line_number = ...}
}
}
}
Not readable, is it? Imagine chaining it 100x with other functions and operations. The noise library hides this away, but the performance penalty and "mangled" output are still there. So we decided to change it.
I was tasked with making the format read like math. That is, to create a parser which could process string expressions into AST. At first, I was fixated on saving as much performance as possible which resulted in a monolithic design. Functional, but hard to test, maintain and expand further. Then, I started reading about how other parsers do it. I even considered using an external grammar tool which would generate the parser, but I didn't think it would be worth the time to learn using it and the produced code would likely be suboptimal. In the end, I decided to go for an in-house solution.
The parser was split into three logical parts.
- Tokenizer, which processes a stream of characters and categorises individual character groups to token types. The operator character set is based on Lua with a few exceptions where both C++ and Lua syntax are supported. Apart from regular numbers, it supports scientific notation, and other formats can be added as needed.
- PostfixTokenizer, which converts infix tokens to postfix (reverse Polish) notation. This step is responsible for following operator precedence rules and for making sure that resulting expressions are unambiguous. It uses a modified version of the shunting yard algorithm to process data.
- NoiseExpressionParser, which takes postfix tokens and converts them to a noise expression tree (AST).
I wanted to move as much stuff to C++ as possible to avoid expensive Lua string concatenation. Therefore, I extended noise expressions and allowed defining named noise functions as prototypes. I also added support for local noise functions and expressions which aren't exposed to the global prototype table. There is more I could say about these improvements, but it would better fit a modding documentation and wouldn't make an interesting blog post.
So now all noise expressions are parsed from string, and the legacy format which uses Lua tables is removed for 2.0. Regarding the original issue this was all done for, noise expressions take 50% less time during prototype initialization and the prototypes now load 20% faster as a result.
Further work
My journey to improve map generation doesn't end here. The introduction of prototype-defined noise functions meant that AST contained constants which didn't have any effect on the result and were quite frequent. I implemented partial constant folding using arithmetic identities, so expressions like "1*x + 0" are folded into "x" and aren't evaluated for every chunk.
In addition, I noticed we hadn't been using basis noise (our Perlin-like noise function) efficiently. Its special case (x=x, y=y) is optimised because we know we work on a grid. We can reuse intermediate tile values, so it is 5x faster than the generic implementation. When we wanted to offset the grid using x and y parameters, it would no longer be interpreted as the special case. Adding separate offset parameters improved the performance further.
I did more work around map generation, but not everything fit into the blog post. For example, I removed noise layer prototypes, added more noise functions, and made a few other tweaks. However, I had to make some compromises. Some noise functions were removed, including array construction. It is possible to add array support to the new parser if requested, although there are more pressing issues right now.
The results
Several iterations have passed since the first Vulcanus generation prototype. Its individual noise expressions have been optimised as well. Together with C++ improvements, what took 18.35 ms per chunk now takes 2.83 ms. This is a result we are quite happy with.
I am sure you want to know how all this compares to 1.1:
- Base game initialises prototypes 7% faster and 87% less time is spent on noise expression prototypes.
- Nauvis noise programs are compiled 85% faster.
- Thanks to procedures removal, there are fewer noise operations (6'016 → 2'233).
- This brings us 25% faster chunk generation on average (4.8 ms → 3.58 ms).
All in all, around 90% of the noise expression engine was rewritten from scratch. I estimate I spent around 4 months working on the C++ side of map generation. It was well worth it, because not only do we have a faster system, but also it is more maintainable and we can easily add new noise expression types as we need them. Designing it was a fun challenge. The system may be a bit over-engineered just for map generation, but at least we have a solid foundation which we could reuse in other projects if we wanted to.
It's that time of year againEarendel
Well done, you made it to the end. Merry Cogmas.
Make some noise and express yourself in the usual places.