Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISCUSS: Lua time budget for MapGen #12003

Closed
lhofhansl opened this issue Jan 28, 2022 · 14 comments
Closed

DISCUSS: Lua time budget for MapGen #12003

lhofhansl opened this issue Jan 28, 2022 · 14 comments
Labels
Discussion Issues meant for discussion of one or more proposals Feature request Issues that request the addition or enhancement of a feature @ Mapgen Performance

Comments

@lhofhansl
Copy link
Contributor

lhofhansl commented Jan 28, 2022

Problem

Server lag continues to be a problem.

Sources are:

  • Map Save
  • Lua for ABMs
  • Lua for MapGen
  • and more...

Map Save is somewhat mitigated by using zStd now, which is much faster at writing.
Lua on behalf of ABMs is already limited (default 200ms) per ABM interval (default 1s)

Lua for MapGen is not limited, and thus is a known cause of unlimited server lag. The more clients, the more customer Lua, and the more EmergeThreads... The higher the lag. I've seen 10-20s lag due to MapGen with MineClone with multiple EmergeThreads.

We also have assumptions in the code about maximum lag. For example collisionMoveSimple assumes lag < 0.5s or results will be incorrect leading to stuttering on the client (as client and server make different predictions).

Solutions

Limit MapGen Lua the same (or similar) as we limit ABM lua, 200ms per second, or similar.
The details are tricky as there is a single queue for emerging whether a block is loaded or generated. We do not want to stall loading.

I'd have to think whether two queues are fine 1 for generate requests and 1 for load requests. And dedicated threads for each. Does anyone see a problem offhand with that?

Alternatives

  1. Play with on_generated to avoid the env lock being held. Very tricky and lots of work.
    2. Fair locks would probably go a long way.
  2. Priority locks with mapgen as low priority.

Ask

Let's have a discussion here.

@lhofhansl lhofhansl added Feature request Issues that request the addition or enhancement of a feature Discussion Issues meant for discussion of one or more proposals labels Jan 28, 2022
@lhofhansl lhofhansl changed the title DISCUSS: Lua budget for MapGen DISCUSS: Lua time budget for MapGen Jan 28, 2022
@lhofhansl
Copy link
Contributor Author

Tried with a custom fair mutex implementation. That did not give the desired result. I expected that as long as the server thread can get in in order requested it would alleviate the problem. That was not the case.

@lhofhansl
Copy link
Contributor Author

lhofhansl commented Jan 29, 2022

BTW The following config properties do not do what they are described to do:

#    Maximum number of blocks to be queued that are to be loaded from file.
#    This limit is enforced per player.
#    type: int min: 1 max: 1000000
# emergequeue_limit_diskonly = 128

#    Maximum number of blocks to be queued that are to be generated.
#    This limit is enforced per player.
#    type: int min: 1 max: 1000000
# emergequeue_limit_generate = 128

At the time the blocks are enqueued for emerge we do not know whether they need to be generated or not.
Instead the second limit is used when the block may be generated, and the first is used if the block will not be generated but may be loaded.
So the first limit is used when a block's distance happens to fall in between the max_block_generate_distance and the max_block_send_distance. This is very confusing, and the first setting seems pretty useless.

Does someone know why we want different limits in these cases? (Again this does not limit generations and disk loads separately).

@erlehmann
Copy link
Contributor

Does someone know why we want different limits in these cases?

I am pretty sure server owners might want to limit lag from players exploring new areas. Is this not usable for that purpose?

@erlehmann
Copy link
Contributor

erlehmann commented Jan 29, 2022

I've seen 10-20s lag due to MapGen with MineClone with multiple EmergeThreads.

The reason for that is most likely the village generation, which lags literally everything else because it tries to work around the mapgen griefing already placed structures by overwriting the same mapblock up to 8 times (ask @kay27 for details about it).

@corarona has found a way to fix this by means of moving the village generation out of the map generator into “structure blocks” that get placed by the mapgen and turn into a village by means of an ABM: https://git.minetest.land/Mineclonia/Mineclonia/commit/e3b8e1a826d8e670db6e96288ef86c3d6ed7ab2b

Edit: I have found no other village generator that lagged the game as much as the one in MineClone2, even ones that have much more complex structures have caused no noticeable lag for me. I therefore think it is not necessary to fix this in the engine and that MineClone2 village generation should be fixed instead.

@lhofhansl
Copy link
Contributor Author

lhofhansl commented Jan 29, 2022

@erlehmann Thanks for looking

In general I'm trying to improve load/lag on the servers.
In the past I added server-side occlusion culling (#5320) which avoids sending up to 90% of the blocks in some settings, added better ABM time accounting (#8645 and #10290), cached ABM checks (#7555), I fixed a bunch of bugs where blocks were sent multiple times (generated or loaded at the server multiple times and then sent to the client multiple times) (#10537 and #10495), avoided unnecessary dirtying of blocks (#10670), and I fixed a bug with multiple emerge threads where the same block is generated multiple times in different threads with all the extra load on the server (#10637), and I added zstd compression to reduce lag and improve compression (disk and network) (#10788).

I'm actually not sure how servers could ever scale to any reasonable view_range and player counts before all of these...

This is a continuation of those efforts.

I am pretty sure server owners might want to limit lag from players exploring new areas. Is this not usable for that purpose?

Yes. And that you do with max_block_generate_distance.
Here's how the settings work (by actually looking at the code):

  1. max_block_generate_distance sets the radius around a player in which non-existent blocks generated or loaded.
  2. max_block_send_distance sets the radius in which non-existent blocks are allowed to be loaded from disk (if they exist) (but not generated). This is always strictly larger than the block_generate_distance
  3. emergequeue_limit_diskonly applies a limit to loading blocks beyond the generate distance.
  4. emergequeue_limit_generate applies a limit to generating or loading blocks within the generate distance

I get 1, 2. And I get some combo of 3 and 4.

But 3 and 4 are weird.
Want to give closer blocks a higher priority? Nope can't do that. Further blocks would never be generated (only loaded).
Want to slow down generation to be light on the server, but keep loading blocks at full speed? Again, can't do that. You'll also slow down loading blocks.

@corarona has found a way to fix this by means of moving the village generation out of the map generator into “structure blocks” that get placed by the mapgen and turn into a village by means of an ABM: https://git.minetest.land/Mineclonia/Mineclonia/commit/e3b8e1a826d8e670db6e96288ef86c3d6ed7ab2b

That works precisely because ABM Lua has a time budget of 200ms per second. :) That leaves budget for other Lua to run (entities, etc).
And I want to add such a time budget to MapGen Lua. As is MapGen Lua can hog the servers' CPU.

I though I can play with the emerge queue limits (such as lowering the generate limit), but these are useless for that.

Edit: Spelling.

@lhofhansl
Copy link
Contributor Author

I have found no other village generator that lagged the game as much as the one in MineClone2, even ones that have much more complex structures have caused no noticeable lag for me. I therefore think it is not necessary to fix this in the engine and that MineClone2 village generation should be fixed instead.

Oh but I disagree! The engine should provide a modicum of protection just like we do for ABMs. Should we remove ABM time budgets and just tell mod owners to fix their code when ABM introduce lag?

@corarona
Copy link
Contributor

Edit: I have found no other village generator that lagged the game as much as the one in MineClone2, even ones that have much more complex structures have caused no noticeable lag for me. I therefore think it is not necessary to fix this in the engine and that MineClone2 village generation should be fixed instead.

afaik mcl villages are pretty much the settlements mod (https://gitlab.com/Rochambeau/settlements) which, if i remember correctly did mention these freezes somewhere. Can't find where exactly rn though.

@corarona
Copy link
Contributor

Oh but I disagree! The engine should provide a modicum of protection just like we do for ABMs. Should we remove ABM time budgets and just tell mod owners to fix their code when ABM introduce lag?

Might as well they will just use node timers if abms aren't slow enough :). Jk i think that is a good idea.

@lhofhansl
Copy link
Contributor Author

Thinking about how to do this now. Fair-locking already failed :(
Budgeting here is more tricky than with ABMs since regular mapgen, Lua mapgen, and block loading from disk are all co-mingled on the same queues and threads that handle them. We only want to limit those sections that hold the env-lock (which ends up locking out all other Lua code).

@sfan5
Copy link
Member

sfan5 commented Jan 29, 2022

Limit MapGen Lua the same (or similar) as we limit ABM lua, 200ms per second, or similar.

This isn't as simple as it sounds.
Currently Minetest passes control to core.run_callbacks for it to work through core.registered_on_generateds. Then after a while control comes back and everything is done.

Stopping Lua mid execution is hard,
Executing each callback separately with timekeeping is possible but you likely run into a situation where one of a few takes 2s upwards at which point it's too late to take a break.
Not to mention all the possible breakage from having several server steps happen before all on_generated callbacks for a single block finish.

In the end the real solution is still #3182.

@lhofhansl
Copy link
Contributor Author

lhofhansl commented Jan 29, 2022

Yeah... I know. I think we do not need to break mid-lua (we do not do that for ABMs) to get some mileage.
In the end "regular" Lua just needs to be able to get some time in. I thought fair locks might help, but actually didn't. Another thing to try would be priority locks. Since all other Lua is budgeted, even a low-priority mapgen would eventually get in. Might have some side-effects, though.

#3182 is a lot of work to get right, though, especially without breaking stuff.

You think it's not worth pursuing?

@sfan5
Copy link
Member

sfan5 commented Jan 30, 2022

I'd rather go straight for the right approach but that doesn't mean something couldn't be found to minimize mapgen lag.

@kay27
Copy link
Contributor

kay27 commented Feb 3, 2022

The higher the lag. I've seen 10-20s lag due to MapGen with MineClone with multiple EmergeThreads.

Well, it's probably MCL2. It currently more likely in unmaintained state.

This lag arose from, among other things, the fact that we released MCL2 0.71 in sync with 5.4, I debugged the code on two engine versions for two sleepless weeks, as a result, villages emerge much larger than they actually need, and generally buggy :(

Please notify if you will occasionally notice the same lag in mcl5, though villages are under performance optimization, see #12042

@sfan5
Copy link
Member

sfan5 commented Dec 28, 2022

Closing this since the discussion is over and we know what the right approach is.

@sfan5 sfan5 closed this as completed Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Issues meant for discussion of one or more proposals Feature request Issues that request the addition or enhancement of a feature @ Mapgen Performance
Projects
None yet
Development

No branches or pull requests

6 participants