Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.14 sometimes hangs before running any actions #5336

Closed
lberki opened this issue Jun 6, 2018 · 14 comments
Closed

0.14 sometimes hangs before running any actions #5336

lberki opened this issue Jun 6, 2018 · 14 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@lberki
Copy link
Contributor

lberki commented Jun 6, 2018

Stack trace looks like this:

https://gist.github.com/mattklein123/070fbfe5c49d52b0457e9c694c148380

It appears that we unconditionally run docker info on every build, and if that hangs, we also hang. There is a fix at HEAD (4b80f24) but that doesn't help 0.14 by itself.

@mattklein123
Copy link

Hi,

I'm happy to help debug this further. Can you let me know what additional info I can provide? I suspect it's something like in certain cases the docker process is waiting to read from stdin or something like that?

@mattklein123
Copy link

klein@localhost:~$ docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:17:20 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:15:30 2018
  OS/Arch:      linux/amd64
  Experimental: false
mklein@localhost:~$ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 6
Server Version: 18.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 22
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-127-generic
Operating System: Ubuntu 16.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.779GiB
Name: localhost
ID: XIOC:BKJF:YGML:WTCP:33T7:7JZO:M3BU:AEBA:GNX7:RLWA:4SML:MI4T
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: mattklein123
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

For reference.

@lfpino lfpino assigned lberki and aehlig and unassigned lfpino Jun 7, 2018
@lfpino
Copy link
Contributor

lfpino commented Jun 7, 2018

Assigning back to @lberki and @aehlig (Bazel sheriff) to further debug this if needed.

bazel-io pushed a commit that referenced this issue Jun 8, 2018
Baseline: 5c3f5c9

Cherry picks:
   + f96f037:
     Windows, Java launcher: Support jar files under different drives
   + ff8162d:
     sh_configure.bzl: FreeBSD is also a known platform
   + 7092ed3:
     Remove unneeded exec_compatible_with from local_sh_toolchain
   + 57bc201:
     Do not autodetect C++ toolchain when
     BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 is present
   + 35a78c0:
     remote: recursively delete incomplete downloaded output
     directory.
   + 3c9cd82:
     distfile: pack the archives needed later in the build
   + 27487c7:
     Slightly refactor SpawnAction to improve env handling
   + 1b333a2:
     Fix Cpp{Compile,Link}Action environment and cache key computation
   + 3da8929:
     Make SymlinkTreeAction properly use the configuration's
     environment
   + eca7b81:
     Add a missing dependency from checker framework dataflow to
     javacutils
   + 10a4de9:
     Release 0.14.0 (2018-06-01)
   + 4b80f24:
     Add option to enable Docker sandboxing.
   + 6b16352:
     Allow disabling the simple blob caches via CLI flag overrides.

Bug fix for [#5336](#5336)
Bug fix fot [#5308](#5308)
excitoon pushed a commit to excitoon-favorites/bazel that referenced this issue Jun 20, 2018
Baseline: 5c3f5c9

Cherry picks:
   + f96f037:
     Windows, Java launcher: Support jar files under different drives
   + ff8162d:
     sh_configure.bzl: FreeBSD is also a known platform
   + 7092ed3:
     Remove unneeded exec_compatible_with from local_sh_toolchain
   + 57bc201:
     Do not autodetect C++ toolchain when
     BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 is present
   + 35a78c0:
     remote: recursively delete incomplete downloaded output
     directory.
   + 3c9cd82:
     distfile: pack the archives needed later in the build
   + 27487c7:
     Slightly refactor SpawnAction to improve env handling
   + 1b333a2:
     Fix Cpp{Compile,Link}Action environment and cache key computation
   + 3da8929:
     Make SymlinkTreeAction properly use the configuration's
     environment
   + eca7b81:
     Add a missing dependency from checker framework dataflow to
     javacutils
   + 10a4de9:
     Release 0.14.0 (2018-06-01)
   + 4b80f24:
     Add option to enable Docker sandboxing.
   + 6b16352:
     Allow disabling the simple blob caches via CLI flag overrides.

Bug fix for [bazelbuild#5336](bazelbuild#5336)
Bug fix fot [bazelbuild#5308](bazelbuild#5308)
@hlopko
Copy link
Member

hlopko commented Jun 25, 2018

Since the 0.14.1 release is out, can this be closed?

@mattklein123
Copy link

If you want? Though don't you have to figure out why it hangs?

@hlopko
Copy link
Member

hlopko commented Jun 25, 2018

I believe the fix was to hide docker sandbox behind a flag (c909639)

@lberki @dslomov is that correct?

@mattklein123
Copy link

That's a workaround, not a fix. What if I want to use the feature?

@hlopko
Copy link
Member

hlopko commented Jun 25, 2018

Fair point, then I'd change the title of the bug and lower the priority. Assignees? Wdyt?

@lberki
Copy link
Contributor Author

lberki commented Jun 25, 2018

It'll be taken care of by @philwo when he's back at work.

@lberki lberki added P2 We'll consider working on this in future. (Assignee optional) and removed P1 I'll work on this now. (Assignee required) labels Jun 25, 2018
@lberki lberki assigned philwo and unassigned lberki and aehlig Jun 25, 2018
@ittaiz
Copy link
Member

ittaiz commented Jun 25, 2018 via email

@philwo
Copy link
Member

philwo commented Jul 4, 2018

@mattklein123 @ittaiz If docker info hangs, this sounds more like a Docker issue.

I checked the code and I don't think it ever waits for stdin. The "info" command makes a single API request to the Docker daemon, decodes the result and prints it (https://github.com/moby/moby/blob/master/client/info.go).

Common reasons for hanging commands that do network requests are a wrongly configured hostname / DNS setup. I notice that your "Name" output in docker info gives "localhost", which is at least a bit weird. It should instead return the actual hostname of your machine.

Can you still reproduce the docker info hang with Bazel and/or without Bazel on your machine?

@philwo
Copy link
Member

philwo commented Jul 4, 2018

There are two things we can do on Bazel's side:

  1. Delay initialization of the Docker module so that users who aren't actually using the Docker sandbox are not affected by any delays.
  2. Run docker info with a timeout of 10 seconds (or so) and if it still didn't finish by then, abort and print an error (and possibly whatever docker info printed to stdout/stderr, if anything).

There's nothing Bazel can do to fix or workaround the hang itself, though.

@mattklein123
Copy link

docker info runs just fine on my machine. It was something with how Bazel was calling it through Java. That's all I can offer you.

werkt pushed a commit to werkt/bazel that referenced this issue Aug 2, 2018
Baseline: 5c3f5c9

Cherry picks:
   + f96f037:
     Windows, Java launcher: Support jar files under different drives
   + ff8162d:
     sh_configure.bzl: FreeBSD is also a known platform
   + 7092ed3:
     Remove unneeded exec_compatible_with from local_sh_toolchain
   + 57bc201:
     Do not autodetect C++ toolchain when
     BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1 is present
   + 35a78c0:
     remote: recursively delete incomplete downloaded output
     directory.
   + 3c9cd82:
     distfile: pack the archives needed later in the build
   + 27487c7:
     Slightly refactor SpawnAction to improve env handling
   + 1b333a2:
     Fix Cpp{Compile,Link}Action environment and cache key computation
   + 3da8929:
     Make SymlinkTreeAction properly use the configuration's
     environment
   + eca7b81:
     Add a missing dependency from checker framework dataflow to
     javacutils
   + 10a4de9:
     Release 0.14.0 (2018-06-01)
   + 4b80f24:
     Add option to enable Docker sandboxing.
   + 6b16352:
     Allow disabling the simple blob caches via CLI flag overrides.

Bug fix for [bazelbuild#5336](bazelbuild#5336)
Bug fix fot [bazelbuild#5308](bazelbuild#5308)
@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed category: local execution / caching labels Sep 3, 2019
@jmmv
Copy link
Contributor

jmmv commented May 11, 2020

Given the age of this report... and that things might have changed... I'll have to close due to the apparent difficulty in reproducing this. Please let me know if think this is still a problem.

@jmmv jmmv closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

9 participants