Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 0.10.x introduced sysinfo crate result in very slow speed under Linux #839

Closed
ttys3 opened this issue Dec 8, 2021 · 6 comments · Fixed by #845
Closed

🐛 0.10.x introduced sysinfo crate result in very slow speed under Linux #839

ttys3 opened this issue Dec 8, 2021 · 6 comments · Fixed by #845

Comments

@ttys3
Copy link

ttys3 commented Dec 8, 2021

brief description

0.10.x introduced sysinfo crate result in very slow speed under Linux.

a simple delta diff cost about 300ms (with 16 logic core cpu)

Append:

just test on another Linux computer, 6 core CPU

a simple delta diff cost about 100ms (with 6 logic core cpu)

so for each core, get_cpu_frequency() call cost about 15ms, if you CPU core is less than 10, this problem is not easy to found, because you are not very obvious to feel the delay gap.

for any >= 0.10.x version, the more (logic) CPU cores you have, the more slower you get

this issue is also mentioned in #841 (comment)

I also tried delta-0.9.2-x86_64-unknown-linux-gnu (https://github.com/dandavison/delta/releases/download/0.9.2/delta-0.9.2-x86_64-unknown-linux-gnu.tar.gz)

which works without slowdown problem on my machine.

the problem seems to be get_cpu_frequency() call in sysinfo crate

my patched version resolved the slowdown problem: https://github.com/ttys3/delta/releases/tag/0.11.2

related commit: ttys3@8f5a835

ttys3/sysinfo@53cf3dd

since this need to patch the sysinfo crate, it is hard to make a normal PR -_-


my investigation

here's my env:

------------
OS: Arch Linux x86_64
Kernel: 5.15.6-arch2-1
Shell: zsh 5.8
Resolution: 3840x2160
Terminal: tmux
CPU: Intel i9-9900 (16) @ 5.000GHz
GPU: NVIDIA GeForce GTX 1060 3GB
Memory: 6747MiB / 32038MiB


my git config related to delta:

[pager]
    #delta
    log = delta --color-only | less -RFX
    show = delta --color-only | less -RFX
    diff = delta --color-only | less -RFX

1. test with git diff

with delta 0.11.1, do time git diff Cargo.toml cost about 0.5s each time (repeated 3 times):

❯ delta --version
delta 0.11.1

git diff Cargo.toml  0.08s user 0.02s system 18% cpu 0.534 total
git diff Cargo.toml  0.07s user 0.04s system 23% cpu 0.505 total
git diff Cargo.toml  0.07s user 0.05s system 20% cpu 0.580 total

with delta 0.11.2, do time git diff Cargo.toml cost about 0.3s each time (repeated 3 times):

❯ delta --version
delta 0.11.2

git diff Cargo.toml  0.05s user 0.00s system 18% cpu 0.311 total
git diff Cargo.toml  0.05s user 0.01s system 18% cpu 0.307 total
git diff Cargo.toml  0.04s user 0.02s system 18% cpu 0.305 total

2. without git, run delta directly

time delta Cargo.toml.old Cargo.toml

delta 0.11.1

delta Cargo.toml.old Cargo.toml  0.06s user 0.06s system 20% cpu 0.600 total
delta Cargo.toml.old Cargo.toml  0.06s user 0.06s system 21% cpu 0.567 total
delta Cargo.toml.old Cargo.toml  0.07s user 0.06s system 21% cpu 0.571 total

delta 0.11.2

delta Cargo.toml.old Cargo.toml  0.06s user 0.00s system 18% cpu 0.322 total
delta Cargo.toml.old Cargo.toml  0.06s user 0.01s system 20% cpu 0.311 total
delta Cargo.toml.old Cargo.toml  0.05s user 0.01s system 20% cpu 0.293 total

the result keep the same. so it does not related to git.


the normal speed should be near 0.001s, just like this (for example time diff Cargo.toml.old Cargo.toml):

diff --color Cargo.toml.old Cargo.toml  0.00s user 0.00s system 82% cpu 0.001 total
diff --color Cargo.toml.old Cargo.toml  0.00s user 0.00s system 85% cpu 0.001 total
diff --color Cargo.toml.old Cargo.toml  0.00s user 0.00s system 83% cpu 0.001 total

I have found the root cause.

it is delta::utils::process::determine_calling_process

2021-12-08_23-12_1

expand the tower top:

2021-12-08_23-13

so the reason is clear:

delta::utils::process::determine_calling_process
|
V
sysinfo::linux::system::System::refresh_processors
|
v
sysinfo::linux::processor::get_cpu_frequency

oh NO NO NO NO, it try to detect the cpu frequency !!!!

just disable the call, the speed is up:

delta/target/release/delta 0.05s user 0.00s system 101% cpu 0.051 total
delta/target/release/delta 0.05s user 0.01s system 101% cpu 0.051 total
delta/target/release/delta 0.04s user 0.01s system 101% cpu 0.051 total
diff --git a/src/utils/process.rs b/src/utils/process.rs
index 827e44d..ac3a43e 100644
--- a/src/utils/process.rs
+++ b/src/utils/process.rs
@@ -39,7 +39,8 @@ lazy_static! {
 }
 
 fn determine_calling_process() -> Option<CallingProcess> {
-    calling_process_cmdline(ProcInfo::new(), describe_calling_process)
+    None
+    // calling_process_cmdline(ProcInfo::new(), describe_calling_process)
 }
 
 // Return value of `extract_args(args: &[String]) -> ProcessArgs<T>` function which is

the reason is:

delta src/utils/process.rs

info: sysinfo::System::new(),

impl ProcInfo {
    fn new() -> Self {
        ProcInfo {
            info: sysinfo::System::new(),
        }
    }
}

and sysinfo::System::new() is:

    fn new() -> Self {
        Self::new_with_specifics(RefreshKind::new())
    }

new_with_specifics will always try to refresh_processors:

by default, !refreshes.cpu() result in !false , so it is true:

fn new_with_specifics(refreshes: RefreshKind) -> System {
       // ...
        if !refreshes.cpu() {
            s.refresh_processors(None); // We need the processors to be filled.
        }
        s.refresh_specifics(refreshes);
 // ...
}

there seems no way to forbbide refresh_processors call.

if we change the new to sysinfo::System::new_with_specifics(sysinfo::RefreshKind::new().with_cpu())

the first s.refresh_processors(None); will not get called.

but in s.refresh_specifics(refreshes);, it get called again.


@ttys3 ttys3 changed the title 🐛 0.10.x introduced sysinfo crate result in very bad performance under Linux 🐛 0.10.x introduced sysinfo crate result in very slow under Linux Dec 8, 2021
@ttys3 ttys3 changed the title 🐛 0.10.x introduced sysinfo crate result in very slow under Linux 🐛 0.10.x introduced sysinfo crate result in very slow speed under Linux Dec 8, 2021
@ttys3
Copy link
Author

ttys3 commented Dec 9, 2021

@dandavison

I've just test and verified the same problem on another Linux computer, 6 core CPU

a simple delta diff cost about 100ms (with 6 logic core cpu)

so for each core, get_cpu_frequency() call cost about 15ms, if you CPU core is less than 10, this problem is not easy to found, because you are not very obvious to feel the delay gap.

basically, I think this affects all Linux users.

@ttys3
Copy link
Author

ttys3 commented Dec 9, 2021

a simple demo to re-produce the issue

I also made a simple demo to show that get_cpu_frequency() call from sysinfo crate nearly cost 264ms on my machine:

https://github.com/ttys3/get-cpu-frequency-slow/releases/tag/v0.1.0

usage: get-cpu-frequency-slow [your_machine_logic_CPU_num]

logic_CPU_num default to 16

I simply write 16 because my CPU have it. (yes, forgive me)

if you need build:

git clone https://github.com/ttys3/get-cpu-frequency-slow
make build
make run 

demo result:

most of the time it will look like this (for 16 logic core CPU):

./target/release/get-cpu-frequency-slow
total time elapsed in get_cpu_frequency() is: 264.490103ms

❯ make run
./target/release/get-cpu-frequency-slow
total time elapsed in get_cpu_frequency() is: 264.542344ms

❯ make run
./target/release/get-cpu-frequency-slow
total time elapsed in get_cpu_frequency() is: 265.281557ms

❯ make run
./target/release/get-cpu-frequency-slow
total time elapsed in get_cpu_frequency() is: 264.696121ms

this result is very similar to my previous tests result:

❯ delta --version
delta 0.11.2

git diff Cargo.toml  0.05s user 0.00s system 18% cpu 0.311 total
git diff Cargo.toml  0.05s user 0.01s system 18% cpu 0.307 total
git diff Cargo.toml  0.04s user 0.02s system 18% cpu 0.305 total

but there's also rarely times to result in very good result, like this (but I think this result can just ignored):

❯ make run
./target/release/get-cpu-frequency-slow
total time elapsed in get_cpu_frequency() is: 275.174µs

@ttys3
Copy link
Author

ttys3 commented Dec 9, 2021

using benchmark tool like hyperfine will NOT present the problem,

something like hyperfine -r 1000 'delta Cargo.toml Cargo.toml.bak' will give you wrong conclusion

the reason is caching affects the real world problem. although the benchmark tool is OK, and it also works as expected.

it shows the result like this (6 core machine):

❯ hyperfine -r 1000 'delta Cargo.toml Cargo.toml.bak'
Benchmark 1: delta Cargo.toml Cargo.toml.bak
  Time (mean ± σ):      55.3 ms ±   8.7 ms    [User: 47.1 ms, System: 5.8 ms]
  Range (min … max):    50.7 ms … 149.7 ms    1000 runs

but in real world, users will not always run git diff continuous. so the cacehing will not work.
and every time the user may get the delay.

in real world, it is more like this:

I use a simple sleep 1 to make the caching not work any more.

for (( c=1; c<=10; c++ ))
do  
   sleep 1
   time delta Cargo.toml Cargo.toml.bak > /dev/null
done

the result on a 6 core CPU, most of the execution cost >= 100ms :

# this test run under a 6 core CPU machinefor (( c=1; c<=10; c++ ))
do
   sleep 1
   time delta Cargo.toml Cargo.toml.bak > /dev/null
done
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 62% cpu 0.091 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 45% cpu 0.122 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 41% cpu 0.136 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.04s user 0.01s system 52% cpu 0.110 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 39% cpu 0.142 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 45% cpu 0.122 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 39% cpu 0.143 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 46% cpu 0.121 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 101% cpu 0.055 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 40% cpu 0.141 total

in a benchmark situation, it is more like this (with out the sleep), yes, you may found that it is at low latency,
about 50ms

# this test run under a 6 core CPU machinefor (( c=1; c<=10; c++ ))
do
   time delta Cargo.toml Cargo.toml.bak > /dev/null
done
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 100% cpu 0.055 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 101% cpu 0.054 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 101% cpu 0.055 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 100% cpu 0.056 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 100% cpu 0.055 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.04s user 0.02s system 101% cpu 0.055 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 90% cpu 0.063 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.06s user 0.00s system 85% cpu 0.068 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.00s system 99% cpu 0.058 total
delta Cargo.toml Cargo.toml.bak > /dev/null  0.05s user 0.01s system 100% cpu 0.055 total

@dandavison
Copy link
Owner

dandavison commented Dec 9, 2021

@ttys3 and @th1000s do you understand why the sysinfo library requires that refresh_processors is called even when the user specifies a RefreshKind for processes only?

fn new_with_specifics(refreshes: RefreshKind) -> System {
       // ...
        if !refreshes.cpu() {
            s.refresh_processors(None); // We need the processors to be filled.
        }
        s.refresh_specifics(refreshes);
 // ...
}

https://github.com/GuillaumeGomez/sysinfo/blob/c4e213130fd95faff439b59fe615f638f2c77644/src/linux/system.rs#L296-298

@dandavison
Copy link
Owner

dandavison commented Dec 9, 2021

Would one possibility be to use one of the competing Rust crates, for example https://docs.rs/heim/0.0.11/heim/process/struct.Process.html? (Or is there no attractive / performant way to call those functions from sync code?)

@dandavison
Copy link
Owner

I've opened GuillaumeGomez/sysinfo#632 to ask whether there's any scope for avoiding the CPU data querying.

dandavison added a commit that referenced this issue Dec 11, 2021
* Do not query CPU data when querying process data

Fixes #839
Ref GuillaumeGomez/sysinfo#632

* Update branch of sysinfo

* Update upstream sysinfo commit

Ref GuillaumeGomez/sysinfo#636

* Point sysinfo at an explicit commit rather than a symbolic branch name

commit d647acfbf216848a8237e1f9251b2c48860a547f
Merge: 989ac6c 67a586c
Author: Guillaume Gomez <[email protected]>
Date:   2 hours ago

    Merge pull request #636 from GuillaumeGomez/update-if-needed

    Only update processors if needed
dandavison added a commit that referenced this issue Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants