Skip to content

nikelborm/fetch_timeout_poc

Repository files navigation

fetch_timeout_poc

Repo for issue Leak and very long hanging when fetch throws ConnectTimeoutError(UND_ERR_CONNECT_TIMEOUT)

Steps to reproduce bug

QEMU VM approach (preferred)

  1. download compressed qemu vm disk image I've built after following steps from QEMU VM approach from scratch
  2. install zstd and qemu
  3. run zstd -d archlinux.qcow2.zst to decompress it on host
  4. create and run new QEMU vm in UEFI mode with mounted archlinux.qcow2 disk
  5. in vm login into root user with 0000 password
  6. in vm run cd fetch_timeout_poc/; npm start

QEMU VM approach from scratch

  1. install qemu
  2. download latest arch iso
  3. create and run new QEMU vm with mounted arch.iso in UEFI mode and set boot priority to booting from cd-rom first
  4. in vm when will options to boot from installation media be presented, choose the first one
  5. in vm run following commands one-by-one
    git clone https://github.com/nikelborm/fetch_timeout_poc.git;
    archinstall --silent --config ./fetch_timeout_poc/vm.archinstall.user_configuration.json --creds ./fetch_timeout_poc/vm.archinstall.user_credentials.json;
    reboot;
  6. in vm login into newly installed arch with root user and 0000 password
  7. in vm run following commands one-by-one
    touch .bashrc;
    echo '. .bashrc' > .profile;
    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash;
    git clone https://github.com/nikelborm/fetch_timeout_poc.git;
    cp ./fetch_timeout_poc/vm.wg.conf /etc/wireguard/wg0.conf;
    . .bashrc;
    nvm list-remote;
    nvm i 21;
    systemctl enable [email protected];
    systemctl start [email protected];
    cd fetch_timeout_poc/;
    npm start

To copy files between vm and host:

  1. on receiving side run netcat -l -p 1234 > /destination.file
  2. on sending side run cat /source.file | netcat 192.168.___.___ 1234
  3. on sending side press Ctrl + C

To fix internet (make internet work fine) in vm run wg-quick down wg0 To break internet (return conditions where bug appears) in vm run wg-quick up wg0

Host approach

  1. enable wifi and connect to network with internet
  2. connect to wireguard vpn with sending all trafic through this vpn server
  3. disable wifi
  4. run npm start

Current behavior

Fetch is being executed as expected. It kills itself after about 10 seconds and then returns control back to the script. But the problem is that somthing in the background is still blocking node.js and does not allow process to exit. It just hangs waiting for something to finish even if script was completed.

started fetch at:  2023-10-21T00:37:09.239Z
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11372:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:https:///home/nikel/projects/alarm/index.js:7:3 {
  cause: _ConnectTimeoutError: Connect Timeout Error
      at onConnectTimeout (node:internal/deps/undici/undici:6616:28)
      at node:internal/deps/undici/undici:6574:50
      at Immediate._onImmediate (node:internal/deps/undici/undici:6605:13)
      at process.processImmediate (node:internal/timers:478:21) {
    code: 'UND_ERR_CONNECT_TIMEOUT'
  }
}
finished fetch at:  2023-10-21T00:37:19.391Z
fetch took 10152 ms
timestamp when script should finish:  1697848639391
timestamp when script actually finished:  1697848749482

After all code in the script been executed it takes 110 seconds (1697848749482 - 1697848639391 = 110091) to exit from process.

But enabling wifi back (restoring internet connection) during process's hanging (after the moment fetch already failed and logged error) magically speeds up exit from process.

Expected behavior

Timeout of fetch itself and the error thrown are left unchanged. But script exits immediately after it completed. Difference between timestamp when script should finish and timestamp when script actually finished should be near zero.

Other behaviors

Running completely offline

If we skip 1-2 steps and just run script offline we get other kind of error and expected behavior.

Following is being printed to console:

started fetch at:  2023-10-21T00:54:31.085Z
TypeError: fetch failed
    at Object.fetch (node:internal/deps/undici/undici:11372:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:https:///home/nikel/projects/alarm/index.js:7:3 {
  cause: Error: getaddrinfo EAI_AGAIN jsonplaceholder.typicode.com
      at GetAddrInfoReqWrap.onlookupall [as oncomplete] (node:dns:118:26) {
    errno: -3001,
    code: 'EAI_AGAIN',
    syscall: 'getaddrinfo',
    hostname: 'jsonplaceholder.typicode.com'
  }
}
finished fetch at:  2023-10-21T00:54:31.250Z
fetch took 165 ms
timestamp when script should finish:  1697849671250
timestamp when script actually finished:  1697849671310

Difference between timestamp when script should finish and timestamp when script actually finished is 60 ms what is fine.

Running online

If we skip 3 step and just run script online we get no errors and expected behavior.

Following is being printed to console:

started fetch at:  2023-10-21T00:58:19.347Z
finished fetch at:  2023-10-21T00:58:19.995Z
fetch took 648 ms
timestamp when script should finish:  1697849899995
timestamp when script actually finished:  1697849900162

Difference between timestamp when script should finish and timestamp when script actually finished is 167 ms what also seems fine.

Running using node-fetch with described network configuration

If we install node-fetch with npm i node-fetch and then import it in the beginning using import fetch from 'node-fetch'; it will replace original fetch and will give us different behavior:

started fetch at:  2023-10-21T01:48:05.984Z
FetchError: request to https://jsonplaceholder.typicode.com/todos/1 failed, reason: getaddrinfo EAI_AGAIN jsonplaceholder.typicode.com
    at ClientRequest.<anonymous> (file:https:///home/nikel/projects/alarm/node_modules/node-fetch/src/index.js:108:11)
    at ClientRequest.emit (node:events:517:28)
    at TLSSocket.socketErrorListener (node:_http_client:501:9)
    at TLSSocket.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'EAI_AGAIN',
  code: 'EAI_AGAIN',
  erroredSysCall: 'getaddrinfo'
}
finished fetch at:  2023-10-21T01:50:06.143Z
fetch took 120159 ms
timestamp when script should finish:  1697853006143
timestamp when script actually finished:  1697853006236

When fetch from node-fetch fails, it logs last few lines to console and script exits immediately. Difference between timestamp when script should finish and timestamp when script actually finished is 93 ms what also seems fine.

The interesting thing is that fetch from node-fetch throws the same error as error thrown by original fetch when vpn disabled and notebook completely offline.

fetch from node-fetch returns control after 120 seconds. Original fetch returns control after 10 seconds and then hangs in the background 110 seconds. Looks very suspicious (120 == 110 + 10).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published