about 1 year
ago -
PDX Chakerathe
-
Direct link
Bonjour à nouveau, friends! I don’t know if the third time makes it a tradition but I appear once more before release to give some updates from HoI’s tech team.
If you haven’t seen much of me since last year, it’s because I’ve partly been busy with **REDACTED**, **REDACTED** and also with **REDACTED**, but my attention has fully returned to HoI for the last couple months.
So, what’s new with HoI’s tech in 1.13 “Stella Polaris”?
Mines, or the greatest urban legend
First off, great news, naval mines will not slow down the game anymore! Hurray!
Except that’s a lie. The “mine lag” I keep reading about in some community house rules isn’t a thing. Hasn’t been for 2 years. I know because I fixed it in 1.11 (Barbarossa). But don’t feel bad for missing the notice, as it turns out this change from December 2020 (shipped in 2021 with No Step Back) never ended up in the changelog.
commit da0ce4ac94ba1938459ffb6abe3bf2b0cd4bab1e
Author: Mathieu Ropert
Date: Mon Dec 14, 17: 38:59 2020 +0100
[X] Fixed performance issue when mines are used
Oops.
But I have even better news. While the usage of mines didn’t slow the game down, the existence of mines as a concept was still incurring a performance cost, whether you used them or not. This has also been greatly improved for 1.13.
So go ahead. Go nuts. Mine the whole ocean. See what happens.
V-Sync, G-Sync, Free-Sync, NSync…
While there is no big platform update this time (supported operating systems remain the same), we have introduced some small improvements to rendering tech in HoI.
The main one is support for variable refresh rate monitors and GPUs. If you use DX11 (the default since 1.12) and your hardware supports it (and you use Windows 10 or higher), you should get more FPS with vsync off. On my NVIDIA RTX 3060 at work, it goes up from 140-150 FPS in 1.12 to about 190 FPS in 1.13 when the game is paused.
I’m not saying you’ll actually be visually able to tell the difference, but you wouldn’t be our fans if you didn’t like when numbers go up, so this should give you some stonks for free.
We’ve also tweaked a bit how the game renders when unpaused, especially when your hourly/daily tick starts getting slower than 16/33/50ms (depending on game speed). Long story short, the “cost” of keeping the FPS going steady when the simulation gets more CPU heavy has been improved, meaning better FPS and less perceived “lag” especially in speed 4 and 5.
General performance improvements
I used the mines example earlier as a teaser, but in general this update comes with a good amount of performance improvements. Those were partly inspired by having spent time looking at our other titles and a bunch of secret stuff I sadly cannot delve into, and partly just due to getting a fresh set of eyes on old problems.
Here’s some side by side comparison between 1.12 (Avalanche) and 1.13 (Stella Polaris).
First, a fresh new 1936 game on my work i7-12700k and it’s 20 cores:
HOI4 1.12 “Avalanche” - 1936 - Intel i7-12700k
HOI4 1.13 “Stella Polaris” - 1936 - Intel i7-12700k
And here’s 2 similar-ish saves from 1943, same machine:
HOI4 1.12 “Avalanche” - 1943 - Intel i7-12700k
HOI4 1.13 “Stella Polaris” - 1943 - Intel i7-12700k
This might look a tad overwhelming if you’ve never seen the in-game profiler from previous dev diaries (console command imgui show profiler to make it pop in game). The bits I would like to direct your attention to are the “Ticks per second” and “Last 24 ticks average” that give you a rough idea of how fast the game is going.
Long story short, the 1936 scenario goes from a 15ms average to 10ms average for the hourly tick (the time it takes to simulate one in-game hour). Or if you prefer, from 42 to 64 hours simulated per second.
In the 1943 scenario, where there are way more wars, divisions, fleets and planes involved, we go from 43ms to 29ms hourly tick average. Or again, up from 17 to 27 in-game hours simulated every real world second.
In both cases, this makes the simulation about 1.5 times faster. In practice, the perceived speed will be a bit less because we still need to account for GPU time and the like, but it should be fairly noticeable still.
Of course, we understand that not all of you are running top of the 10th+ generation i7 CPUs, so it would only be fair to also look at what it means on lower end machines. Luckily, I also have access to a modest AMD Ryzen 3 1200, a quad core CPU that the internet tells me is roughly equivalent to an Intel i5-7400. Brace yourselves for more graphs.
First, 1936:
HOI4 1.12 “Avalanche” - 1936 - AMD Ryzen 3 1200
HOI4 1.13 “Stella Polaris” - 1936 - AMD Ryzen 3 1200
And then the same 1943 saves:
HOI4 1.12 “Avalanche” - 1943 - AMD Ryzen 3 1200
HOI4 1.13 “Stella Polaris” - 1943 - AMD Ryzen 3 1200
First off, if there was still any doubt in your mind, yes, switching to a beefier machine will absolutely make HoI run faster. As you can see in the numbers here, this ran about 3 times slower than the high end 20 core machine from before.
Second, you can also see that 1.13 still performs better than 1.12 (about 25-30% faster), but not as much as on a high end machine. “Why?” you ask. Because threads!
HOI4: You Can(not) Multithread
It is a recurring misconception that Paradox games don’t use more than one thread. This is obviously not the case. I’ve been talking about it since my days on Stellaris.
But it is true that some parts of the game simulation are better than others at making use of many cores. And once they’ve been made to only use one core, it’s usually quite difficult to change as the game simulation starts relying on it.
This all comes back to the olden days where Europa Universalis was a board game ported to PC. After all, it is fairly common for those to be designed in a way where each actor takes a turn. The German player moves all his troops on the board. Then the British. Then the Soviets. Etc… This is a very serial process. Easy to understand and design, but slower than if every player could go at the same time. And of course it is very hard to change after the fact, because if everyone suddenly starts moving at the same time then you get conflicts and need rules to identify and resolve them.
Crusader Kings 3 resolved most of those issues by taking a radically different approach to its technical design, but we obviously don’t have that luxury. And no, unlike what I’ve heard before, this has little to do with the engine. Understand that ol’ Clausewitz (and its friend Jomini) are mostly concerned with graphics, networking, loading and saving files, that kind of stuff. The bulk of the performance bottleneck isn’t there, it’s in the game simulation and that’s almost entirely up to each game to decide how they wanna approach it and that decision is usually made (and hard to change) in the first years of the game development, before it even releases.
Still, in this patch we were able to identify places where we could break that “turn-based” design and replace it with a simultaneous process and nobody would be the wiser. Which allowed us to throw all your cores at the problem, rather than rely on the one.
This is, I believe, the main reason why you will notice more improvement the more cores you have. For example, if a process took say, 2ms before. Now it will take 0.5ms with 4 cores (2/4) and 0.1ms with 20 cores (2/20), roughly speaking. Here’s a rough outline of HoI4 CPU usage in a late 1942 save with 20 cores for example:
Notice the CPU chart at the top which shows how many cores are being used at a given time. Ideally you’d want it to be fully green all the time, but this is what we get with the upcoming release of HoI4.
For those of you who are more technically inclined, I can recommend you this (updated) talk I’ve given at ACCU 2023 about multithreading challenges in PDS Games.
Multi Threading Model in Paradox Games: Past, Present and Future - Mathieu Ropert - ACCU 2023
And with that, I’ve just noticed something off while making the screenshots for this dev diary, so I’ll go back to investigate and see if there’s some more we can do to improve even more in the post-release patches 😎
I will leave you with this side by side comparison of letting the game run for 30 seconds on 1.12 vs 1.13. Keep an eye on the date/time at the top.
If you haven’t seen much of me since last year, it’s because I’ve partly been busy with **REDACTED**, **REDACTED** and also with **REDACTED**, but my attention has fully returned to HoI for the last couple months.
So, what’s new with HoI’s tech in 1.13 “Stella Polaris”?
Mines, or the greatest urban legend
First off, great news, naval mines will not slow down the game anymore! Hurray!
Except that’s a lie. The “mine lag” I keep reading about in some community house rules isn’t a thing. Hasn’t been for 2 years. I know because I fixed it in 1.11 (Barbarossa). But don’t feel bad for missing the notice, as it turns out this change from December 2020 (shipped in 2021 with No Step Back) never ended up in the changelog.
commit da0ce4ac94ba1938459ffb6abe3bf2b0cd4bab1e
Author: Mathieu Ropert
Date: Mon Dec 14, 17: 38:59 2020 +0100
[X] Fixed performance issue when mines are used
Oops.
But I have even better news. While the usage of mines didn’t slow the game down, the existence of mines as a concept was still incurring a performance cost, whether you used them or not. This has also been greatly improved for 1.13.
So go ahead. Go nuts. Mine the whole ocean. See what happens.
V-Sync, G-Sync, Free-Sync, NSync…
While there is no big platform update this time (supported operating systems remain the same), we have introduced some small improvements to rendering tech in HoI.
The main one is support for variable refresh rate monitors and GPUs. If you use DX11 (the default since 1.12) and your hardware supports it (and you use Windows 10 or higher), you should get more FPS with vsync off. On my NVIDIA RTX 3060 at work, it goes up from 140-150 FPS in 1.12 to about 190 FPS in 1.13 when the game is paused.
I’m not saying you’ll actually be visually able to tell the difference, but you wouldn’t be our fans if you didn’t like when numbers go up, so this should give you some stonks for free.
We’ve also tweaked a bit how the game renders when unpaused, especially when your hourly/daily tick starts getting slower than 16/33/50ms (depending on game speed). Long story short, the “cost” of keeping the FPS going steady when the simulation gets more CPU heavy has been improved, meaning better FPS and less perceived “lag” especially in speed 4 and 5.
General performance improvements
I used the mines example earlier as a teaser, but in general this update comes with a good amount of performance improvements. Those were partly inspired by having spent time looking at our other titles and a bunch of secret stuff I sadly cannot delve into, and partly just due to getting a fresh set of eyes on old problems.
Here’s some side by side comparison between 1.12 (Avalanche) and 1.13 (Stella Polaris).
First, a fresh new 1936 game on my work i7-12700k and it’s 20 cores:
HOI4 1.12 “Avalanche” - 1936 - Intel i7-12700k
HOI4 1.13 “Stella Polaris” - 1936 - Intel i7-12700k
And here’s 2 similar-ish saves from 1943, same machine:
HOI4 1.12 “Avalanche” - 1943 - Intel i7-12700k
HOI4 1.13 “Stella Polaris” - 1943 - Intel i7-12700k
This might look a tad overwhelming if you’ve never seen the in-game profiler from previous dev diaries (console command imgui show profiler to make it pop in game). The bits I would like to direct your attention to are the “Ticks per second” and “Last 24 ticks average” that give you a rough idea of how fast the game is going.
Long story short, the 1936 scenario goes from a 15ms average to 10ms average for the hourly tick (the time it takes to simulate one in-game hour). Or if you prefer, from 42 to 64 hours simulated per second.
In the 1943 scenario, where there are way more wars, divisions, fleets and planes involved, we go from 43ms to 29ms hourly tick average. Or again, up from 17 to 27 in-game hours simulated every real world second.
In both cases, this makes the simulation about 1.5 times faster. In practice, the perceived speed will be a bit less because we still need to account for GPU time and the like, but it should be fairly noticeable still.
Of course, we understand that not all of you are running top of the 10th+ generation i7 CPUs, so it would only be fair to also look at what it means on lower end machines. Luckily, I also have access to a modest AMD Ryzen 3 1200, a quad core CPU that the internet tells me is roughly equivalent to an Intel i5-7400. Brace yourselves for more graphs.
First, 1936:
HOI4 1.12 “Avalanche” - 1936 - AMD Ryzen 3 1200
HOI4 1.13 “Stella Polaris” - 1936 - AMD Ryzen 3 1200
And then the same 1943 saves:
HOI4 1.12 “Avalanche” - 1943 - AMD Ryzen 3 1200
HOI4 1.13 “Stella Polaris” - 1943 - AMD Ryzen 3 1200
First off, if there was still any doubt in your mind, yes, switching to a beefier machine will absolutely make HoI run faster. As you can see in the numbers here, this ran about 3 times slower than the high end 20 core machine from before.
Second, you can also see that 1.13 still performs better than 1.12 (about 25-30% faster), but not as much as on a high end machine. “Why?” you ask. Because threads!
HOI4: You Can(not) Multithread
It is a recurring misconception that Paradox games don’t use more than one thread. This is obviously not the case. I’ve been talking about it since my days on Stellaris.
But it is true that some parts of the game simulation are better than others at making use of many cores. And once they’ve been made to only use one core, it’s usually quite difficult to change as the game simulation starts relying on it.
This all comes back to the olden days where Europa Universalis was a board game ported to PC. After all, it is fairly common for those to be designed in a way where each actor takes a turn. The German player moves all his troops on the board. Then the British. Then the Soviets. Etc… This is a very serial process. Easy to understand and design, but slower than if every player could go at the same time. And of course it is very hard to change after the fact, because if everyone suddenly starts moving at the same time then you get conflicts and need rules to identify and resolve them.
Crusader Kings 3 resolved most of those issues by taking a radically different approach to its technical design, but we obviously don’t have that luxury. And no, unlike what I’ve heard before, this has little to do with the engine. Understand that ol’ Clausewitz (and its friend Jomini) are mostly concerned with graphics, networking, loading and saving files, that kind of stuff. The bulk of the performance bottleneck isn’t there, it’s in the game simulation and that’s almost entirely up to each game to decide how they wanna approach it and that decision is usually made (and hard to change) in the first years of the game development, before it even releases.
Still, in this patch we were able to identify places where we could break that “turn-based” design and replace it with a simultaneous process and nobody would be the wiser. Which allowed us to throw all your cores at the problem, rather than rely on the one.
This is, I believe, the main reason why you will notice more improvement the more cores you have. For example, if a process took say, 2ms before. Now it will take 0.5ms with 4 cores (2/4) and 0.1ms with 20 cores (2/20), roughly speaking. Here’s a rough outline of HoI4 CPU usage in a late 1942 save with 20 cores for example:
Notice the CPU chart at the top which shows how many cores are being used at a given time. Ideally you’d want it to be fully green all the time, but this is what we get with the upcoming release of HoI4.
For those of you who are more technically inclined, I can recommend you this (updated) talk I’ve given at ACCU 2023 about multithreading challenges in PDS Games.
Multi Threading Model in Paradox Games: Past, Present and Future - Mathieu Ropert - ACCU 2023
And with that, I’ve just noticed something off while making the screenshots for this dev diary, so I’ll go back to investigate and see if there’s some more we can do to improve even more in the post-release patches 😎
I will leave you with this side by side comparison of letting the game run for 30 seconds on 1.12 vs 1.13. Keep an eye on the date/time at the top.