Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - [ecs] Improve Commands performance #2332

Closed

Conversation

NathanSWard
Copy link
Contributor

@NathanSWard NathanSWard commented Jun 11, 2021

Objective

  • Currently Commands are quite slow due to the need to allocate for each command and wrap it in a Box<dyn Command>.
  • For example:
fn my_system(mut cmds: Commands) {
    cmds.spawn().insert(42).insert(3.14);
}

will have 3 separate Box<dyn Command> that need to be allocated and ran.

Solution

  • Utilize a specialized data structure keyed CommandQueueInner.
  • The purpose of CommandQueueInner is to hold a collection of commands in contiguous memory.
  • This allows us to store each Command type contiguously in memory and quickly iterate through them and apply the Command::write trait function to each element.

@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Jun 11, 2021
@NathanSWard NathanSWard added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled labels Jun 11, 2021
@NathanSWard
Copy link
Contributor Author

Here is a list of the minimal performance tests I created.
It seems that the AnyStack has a nice size performance improvement from the currently implementation on main.

group                                                         commands-anystack                      commands-main
-----                                                         -----------------                      -------------
empty_commands/Commands: empty                                1.00      6.9±0.10ns        ? ?/sec    1.24      8.5±0.16ns        ? ?/sec
spawning_commands/Commands: Spawn entities with components    1.00      2.4±0.29ms        ? ?/sec    1.77      4.3±0.29ms        ? ?/sec

@NathanSWard
Copy link
Contributor Author

I have a feeling that since this is a pretty specialized structure I should probably just move it into bevy_ecs from bevy_util.
This will also allow me to remove some of the unsafe code around the user_data: *mut u8.

@TheRawMeatball
Copy link
Member

Hmm, should we maybe pull the commands benchmark into a separate PR and merge it earlier? It's far less controversial than the unsafe anystack impl, and reducing the scope of PR s with unsafe is usually a good thing

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of using an fn here. Apart from the alignment issue you have pointed out, this looks good.

crates/bevy_ecs/src/system/commands.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked over the tests in detail, but if this passes miri, I can believe it's good to go.

crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
crates/bevy_utils/src/anystack.rs Outdated Show resolved Hide resolved
@NathanSWard
Copy link
Contributor Author

I haven't checked over the tests in detail, but if this passes miri, I can believe it's good to go.

I'll go ahead and run miri with this sometime today/tomorrow and report back the results.

@NathanSWard NathanSWard requested a review from DJMcNab June 13, 2021 02:44
Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a few docs thoughts

crates/bevy_ecs/src/system/commands/command_queue.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/system/commands/mod.rs Show resolved Hide resolved
crates/bevy_ecs/src/system/commands/command_queue.rs Outdated Show resolved Hide resolved
@NathanSWard
Copy link
Contributor Author

NathanSWard commented Jun 14, 2021

Here are the most recent benchmarks (taken from #2334).
It seems this new implementation still has a decent performance gain.

group                                  COMMANDS-MAIN               COMMANDS-NEW
-----                                  -------------               ------------
commands/2000_entities                 1.68   818.8±13.29µs        1.00   486.1±13.52µs
commands/4000_entities                 1.71  1654.4±38.16µs        1.00   967.2±21.10µs
commands/6000_entities                 1.68      2.4±0.04ms        1.00  1449.6±59.79µs
commands/8000_entities                 1.71      3.3±0.05ms        1.00  1912.0±53.50µs
empty_commands/0_entities              1.33      8.4±0.15ns        1.00      6.4±0.12ns
rng_commands/2000_entities             1.61   493.1±27.43µs        1.00    306.3±8.56µs
rng_commands/4000_entities             1.64   984.3±24.50µs        1.00   599.8±14.73µs
rng_commands/6000_entities             1.60  1458.4±44.39µs        1.00   908.9±17.67µs
rng_commands/8000_entities             1.59  1917.4±39.45µs        1.00  1209.5±28.52µs

@NathanSWard
Copy link
Contributor Author

NathanSWard commented Jun 14, 2021

As @mockersf suggested I created a benchmark that utilized FakeCommands (e.g. commands that don't actually do anything to the world.
Here are the results for that test.

group                                  COMMANDS-MAIN                          COMMANDS-NEW
-----                                  -------------                          ------------
empty_commands/0_entities              1.32      8.4±0.18ns        ? ?/sec    1.00      6.4±0.11ns        ? ?/sec
fake_commands/2000_commands            3.26     66.4±1.15µs        ? ?/sec    1.00     20.3±0.35µs        ? ?/sec
fake_commands/4000_commands            3.24    132.0±2.17µs        ? ?/sec    1.00     40.7±0.78µs        ? ?/sec
fake_commands/6000_commands            3.22    195.1±3.33µs        ? ?/sec    1.00     60.6±1.07µs        ? ?/sec
fake_commands/8000_commands            3.22    261.0±4.17µs        ? ?/sec    1.00     81.0±1.45µs        ? ?/sec
spawn_commands/2000_entities           1.59   489.5±10.41µs        ? ?/sec    1.00   307.5±10.07µs        ? ?/sec
spawn_commands/4000_entities           1.56   946.9±27.91µs        ? ?/sec    1.00   605.3±16.40µs        ? ?/sec
spawn_commands/6000_entities           1.60  1428.0±36.34µs        ? ?/sec    1.00   893.3±17.30µs        ? ?/sec
spawn_commands/8000_entities           1.58  1874.1±39.38µs        ? ?/sec    1.00  1187.7±28.82µs        ? ?/sec

@NathanSWard NathanSWard force-pushed the nward/commands-no-alloc branch 3 times, most recently from ae297f1 to 2a614b8 Compare June 15, 2021 04:40
@NathanSWard
Copy link
Contributor Author

NathanSWard commented Jun 15, 2021

benchmarks for the most recent version. (with special casing for 0 sized commands and ensuring the vector isn't null).
This special casing for 0 sized seemed to squeeze out a bit more perf 🤷

group                           COMMANDS-MAIN                          COMMANDS-NEW
-----                           -------------                          -------------
empty_commands/0_entities       1.34      8.4±0.18ns        ? ?/sec    1.00      6.3±0.12ns        ? ?/sec
fake_commands/2000_commands     3.49     66.4±1.15µs        ? ?/sec    1.00     19.0±0.39µs        ? ?/sec
fake_commands/4000_commands     3.51    132.0±2.17µs        ? ?/sec    1.00     37.6±0.39µs        ? ?/sec
fake_commands/6000_commands     3.46    195.1±3.33µs        ? ?/sec    1.00     56.3±0.40µs        ? ?/sec
fake_commands/8000_commands     3.47    261.0±4.17µs        ? ?/sec    1.00     75.3±1.02µs        ? ?/sec
spawn_commands/2000_entities    1.63   489.5±10.41µs        ? ?/sec    1.00    300.8±5.55µs        ? ?/sec
spawn_commands/4000_entities    1.59   946.9±27.91µs        ? ?/sec    1.00   595.7±11.86µs        ? ?/sec
spawn_commands/6000_entities    1.65  1428.0±36.34µs        ? ?/sec    1.00   863.3±16.51µs        ? ?/sec
spawn_commands/8000_entities    1.61  1874.1±39.38µs        ? ?/sec    1.00  1167.6±27.94µs        ? ?/sec

@NathanSWard
Copy link
Contributor Author

@cart this should be good to go now :)

@cart
Copy link
Member

cart commented Jul 16, 2021

bors r+

bors bot pushed a commit that referenced this pull request Jul 16, 2021
# Objective

- Currently `Commands` are quite slow due to the need to allocate for each command and wrap it in a `Box<dyn Command>`.
- For example:
```rust
fn my_system(mut cmds: Commands) {
    cmds.spawn().insert(42).insert(3.14);
}
```
will have 3 separate `Box<dyn Command>` that need to be allocated and ran.

## Solution

- Utilize a specialized data structure keyed `CommandQueueInner`. 
- The purpose of `CommandQueueInner` is to hold a collection of commands in contiguous memory. 
- This allows us to store each `Command` type contiguously in memory and quickly iterate through them and apply the `Command::write` trait function to each element.
@bors bors bot changed the title [ecs] Improve Commands performance [Merged by Bors] - [ecs] Improve Commands performance Jul 16, 2021
@bors bors bot closed this Jul 16, 2021
ostwilkens pushed a commit to ostwilkens/bevy that referenced this pull request Jul 27, 2021
# Objective

- Currently `Commands` are quite slow due to the need to allocate for each command and wrap it in a `Box<dyn Command>`.
- For example:
```rust
fn my_system(mut cmds: Commands) {
    cmds.spawn().insert(42).insert(3.14);
}
```
will have 3 separate `Box<dyn Command>` that need to be allocated and ran.

## Solution

- Utilize a specialized data structure keyed `CommandQueueInner`. 
- The purpose of `CommandQueueInner` is to hold a collection of commands in contiguous memory. 
- This allows us to store each `Command` type contiguously in memory and quickly iterate through them and apply the `Command::write` trait function to each element.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants