-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Jenkins] couchjs segfaults #551
Comments
Based on @nickva 's suppositions, I attempted a low-memory, low-stack-size invocation of couch and couchjs respectively. For reference: $ docker run -it -m 256M --memory-swap 0 --memory-reservation 256M couchdbdev/centos-7-erlang-18.3 /bin/bash Then, after pulling & building couch master inside the VM, editing Finally, running the test in a loop: $ for i in `seq 1 500`; do make suites=reduce.js javascript; done No failures after 500 executions. It didn't help reproduce the problem. :( |
I looked at this a bit today. Compiled couchjs from mozjs 1.8.5 source on a CentOS 7 VM. The created a patch to run log all couchjs inputs to file. Then fed that input to that couchjs process in a loop. So far no crashes. The ran valgrind against couchjs with a few lines of input. It found a whole bunch of uninitialized variable uses but apparently those are ok according to upstream and to reduce the noise have to set
Those are ok. It would be good to clean them up but the process exits at that point it doesn't seem like the culprit. |
So to me this suggests it's not a strict memory leak that's failing. Your idea from IRC about capturing all the stdio for the couchjs process and using that to playback against the process to find the source of the problem is intriguing. I'd also appreciate a review of the compile-time settings for our js libs across platforms, compilers used, and whether anything there might be at fault. |
Logging is based on an environment variable: `COUCHDB_IO_LOG_DIR` If set, logs will go to that directory. Logs are per `couch_os_process` Erlang process. There are 3 files saved for each process: ``` <unixtimestamp>_<erlangpid>.in.log : Input, data coming from the proess <unixtimestamp>_<erlangpid>.out.log : Output, data going to the process <unixtimestamp>_<erlangpid>.meta : Error reason ``` Log files are saved as named (visible) files only if an error occurs. If there is no error, disk space will still be used as long the process is alive. But as soon as it exists, file will be unlinked and space will be reclaimed. Issue: apache#551
Logging is based on an environment variable: `COUCHDB_IO_LOG_DIR` If set, logs will go to that directory. Logs are per `couch_os_process` Erlang process. There are 3 files saved for each process: ``` <unixtimestamp>_<erlangpid>.in.log : Input, data coming from the proess <unixtimestamp>_<erlangpid>.out.log : Output, data going to the process <unixtimestamp>_<erlangpid>.meta : Error reason ``` Log files are saved as named (visible) files only if an error occurs. If there is no error, disk space will still be used as long the process is alive. But as soon as it exists, file will be unlinked and space will be reclaimed. Issue: apache#551
Logging is based on an environment variable: `COUCHDB_IO_LOG_DIR` If set, logs will go to that directory. Logs are per `couch_os_process` Erlang process. There are 3 files saved for each process: ``` <unixtimestamp>_<erlangpid>.in.log : Input, data coming from the proess <unixtimestamp>_<erlangpid>.out.log : Output, data going to the process <unixtimestamp>_<erlangpid>.meta : Error reason ``` Log files are saved as named (visible) files only if an error occurs. If there is no error, disk space will still be used as long the process is alive. But as soon as it exists, file will be unlinked and space will be reclaimed. Issue: #551
Got a traceback from a Jenkins Docker CentOS 7 test runner:
|
Found a deterministic instance of a segfault. Apparently this bit
Segfaults in Haven't tried others. The same input is expected to generate an exception about an invalid function, and it does so for example in the It obviously doesn't always segfault because it runs as part of a test suite and it doesn't always crash in
|
So @davisp and I found out that building mozjs185 from source on Ubuntu12 docker image doesn't reproduce the segfault. But re-building deb package from source does. We tried applying the same ./configure options but still couldn't make the segfault happen. There are differences in CFLAGS and some of those seem like would enable stack protection mechanisms. Here is the compile options diff between a deb package build vs build from source. The first one is the deb package
|
So we've been talking this over on #couchdb-dev IRC. The working theory is:
On Ubuntu 12.04, our Docker images currently use Debian's package, which is what @nickva analysed above. I will change our 12.04 image to build SM from source using flags as recommended via https://paste.apache.org/2UId . I'm not going to put a lot of effort in here - Ubuntu 12.04 is officially EOL at this point. Also worth noting is that the equivalent Debian package has a Our CentOS 6/7 images currently build SM from source but do not include the build options above. That is, The package builds (in couchdb-pkg) do use these options, though (derived from the official CentOS 7 js-devel package definition). None of this helps Ubuntu 14, though, which has shown multiple segfaults (including at least 2 recorded in JIRA), and should have the flags listed above as ideal. All of this mess, plus the problems with building SM1.8.5 with gcc6+, make me want to get a |
Lunch gave me some perspective. I'm proceeding with a 3-step approach:
@janl what do you think about for #3 above, adding the daemon in 2.2, and making it the default in 3.0? |
Images are still uploading; once they have I'll push the button on Jenkins to run a few times and will share results in this ticket. If it's not fixed, I'll reopen. |
One run done so far with no segfaults (though 4 other errors). |
Two runs done with no segfaults. I'm declaring victory for now - we would have had a segfault by now. Again, please reopen this ticket if another segfault happens in Jenkins. |
Out of interest I found out that on Ubuntu 12 image. The predictable segfault is caused by this linker flag:
Deb package adds that to config/autoconf.mk to So just adding that to the vanilla upstream source on Ubuntu 12 ends up reproducing the segfault. |
Great work isolating this, I was utterly lost. |
I'll leave the full backtrace of the segfault here for future reference. Wohali's work on building from source seems fixed this problem (and disabling jit fixed other unpredictable segfaults).
|
Expected Behavior
couchjs
shouldn't segfault.Current Behavior
Sometimes, in an automated run, it does. Here's an example:
Sample couch.log content:
Possible Solution
There was a comment on IRC about this:
This is a recurrence of JIRA issue https://issues.apache.org/jira/browse/COUCHDB-3352
The text was updated successfully, but these errors were encountered: