hi eugene, probably you know this already, but i think maybe your current way of dealing with persistence is not optimal. even server restarts that break writes, should not cause a wipe of any sort.
instead it should be some kind of eventsourced system, where every change is written incrementally, without affecting a 'big state file'. only the loading of these incremental changes would then evaluate the current and complete state.
so this means, even if the server is killed within a write, only one change file would be broken. and when it is, you just leave it out when loading the state, as if it never happened.
this system would not only make youre persistence much more stable, it would also make it easier to reproduce and tackle bugs within persistence.
of course it is a challenge to get it right and performing well. but i believe this is the way to go.
We know :). The thing is we have been working on entity system that should get us there at some point but sadly the atomic operations on this scale are few months away.