Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foundationdb coredumps on 24.05 calling into cxx1112regex_traits #319537

Open
siriobalmelli opened this issue Jun 13, 2024 · 2 comments
Open

foundationdb coredumps on 24.05 calling into cxx1112regex_traits #319537

siriobalmelli opened this issue Jun 13, 2024 · 2 comments

Comments

@siriobalmelli
Copy link
Contributor

Describe the bug

foundationdb dumps core in the following system configurations:

foundationdb architecture kernel nixpkgs status
7.1.32 aarch64 6.1.92 release-24.05 coredump
7.1.32 aarch64 6.1.92 release-23.11 ok
7.1.32 aarch64 6.6.32 release-24.05 coredump
7.1.32 aarch64 6.6.32 release-23.11 ok
7.1.30 aarch64 - - build fail
7.1.32 x86_64 6.1.92 release-24.05 coredump
7.1.32 x86_64 6.1.92 release-23.11 ok
7.1.32 x86_64 6.6.32 release-24.05 coredump
7.1.32 x86_64 6.6.32 release-23.11 ok
7.1.30 x86_64 6.1.92 release-24.05 coredump
7.1.30 x86_64 6.1.92 release-23.11 ok
7.1.30 x86_64 6.6.32 release-24.05 coredump
7.1.30 x86_64 6.6.32 release-23.11 ok

Steps To Reproduce

Set up a single machine test cluster using a minimal flake:

{
  description = "foundationdb crash reproduction";

  inputs = {
    nixpkgs-24_05.url = "github:nixos/nixpkgs/release-24.05";
    nixpkgs-23_11.url = "github:nixos/nixpkgs/release-23.11";
  };

  outputs = {self, ...} @ inputs: let
    inherit (inputs.nixpkgs-24_05.lib) nixosSystem; # toggle nixpkgs here
  in {
    nixosConfigurations.test-system = nixosSystem {
      system = "x86_64-linux"; # toggle architecture here
      modules = [
        ({
          modulesPath,
          pkgs,
          ...
        }: {
          imports = [
            "${modulesPath}/virtualisation/amazon-image.nix"
          ];

          # boot.kernelPackages = pkgs.linuxPackages_6_1;
          boot.kernelPackages = pkgs.linuxPackages_6_6; # toggle kernel here

          ec2.hvm = true;

          networking.useDHCP = true;

          services.foundationdb = {
            enable = true;

            extraReadWritePaths = ["/run/foundationdb"];
            listenAddress = "127.0.0.1:4500";
            listenPortStart = 4500;
            openFirewall = true;
            package = pkgs.foundationdb71;
            pidfile = "/run/foundationdb/fdb.pid";
            publicAddress = "127.0.0.1";
            restartDelay = 120;
            serverProcesses = 1;
            traceFormat = "json";
          };

          system.stateVersion = "24.05";
        })
      ];
    };
  };
}

See comments above for where to toggle nixpkgs, architecture, kernel;
changing foundationdb version is outside the scope of this simple
reproduction but suffice it to say I've tested that also.

Resulting coredump can be seen with:

coredumpctl list | grep fdbserver | tail -n 1 | awk '{ print $5 }' | xargs coredumpctl info

Example:

           PID: 1320 (fdbserver)
           UID: 118 (foundationdb)
           GID: 118 (foundationdb)
        Signal: 11 (SEGV)
     Timestamp: Thu 2024-06-13 09:22:18 UTC (17min ago)
  Command Line: /nix/store/cz1i01ckbvrxn1gli0bbrim16dvznqv7-foundationdb-7.1.32/bin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/4500 --listen_address 127.0.0.1:4500 --logdir /var/log/foundationdb --logsize 10MiB --maxlogssize 100MiB --memory 8GiB --public_address 127.0.0.1:4500 --storage_memory 1GiB --trace_format json
    Executable: /nix/store/cz1i01ckbvrxn1gli0bbrim16dvznqv7-foundationdb-7.1.32/bin/fdbserver
 Control Group: /system.slice/foundationdb.service
          Unit: foundationdb.service
         Slice: system.slice
       Boot ID: 4b0c405bd88a4031b58c8dceb9be882e
    Machine ID: ec26ef85d6581da22538098e8836259e
      Hostname: ip-172-29-141-193.eu-west-1.compute.internal
       Storage: /var/lib/systemd/coredump/core.fdbserver.118.4b0c405bd88a4031b58c8dceb9be882e.1320.1718270538000000.zst (present)
  Size on Disk: 558.0K
       Message: Process 1320 (fdbserver) of user 118 dumped core.
                
                Module libgcc_s.so.1 without build-id.
                Module libstdc++.so.6 without build-id.
                Module libboost_context.so.1.78.0 without build-id.
                Stack trace of thread 1320:
                #0  0x0000000002a25854 _ZNKSt7codecvtIDic11__mbstate_tE10do_unshiftERS0_PcS3_RS3_ (fdbserver + 0x2625854)
                #1  0x0000000001d88710 _ZNSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_readyEv (fdbserver + 0x1988710)
                #2  0x0000000001d88aac _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE25_M_insert_bracket_matcherILb0ELb0EEEvb (fdbserver + 0x1988aac)
                #3  0x0000000001d9a60d _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbserver + 0x199a60d)
                #4  0x0000000001d99083 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1999083)
                #5  0x0000000001d9965b _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbserver + 0x199965b)
                #6  0x0000000001d9a443 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbserver + 0x199a443)
                #7  0x0000000001d99083 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1999083)
                #8  0x0000000001d99161 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1999161)
                #9  0x0000000001d9965b _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbserver + 0x199965b)
                #10 0x00000000023dc723 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEEC2EPKcS6_RKSt6localeNSt15regex_constants18syntax_option_typeE.constprop.0 (fdbserver + 0x1fdc723)
                #11 0x0000000001d8dbad _ZN8Hostname10isHostnameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x198dbad)
                #12 0x0000000001da2334 _ZN23ClusterConnectionStringC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x19a2334)
                #13 0x0000000001ca267c _ZN21ClusterConnectionFileC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x18a267c)
                #14 0x000000000139faaa _ZN12_GLOBAL__N_110CLIOptions17parseArgsInternalEiPPc (fdbserver + 0xf9faaa)
                #15 0x0000000000e001ca main (fdbserver + 0xa001ca)
                #16 0x00007fbf4e75a10e __libc_start_call_main (libc.so.6 + 0x2a10e)
                #17 0x00007fbf4e75a1c9 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a1c9)
                #18 0x0000000000e520d5 _start (fdbserver + 0xa520d5)
                ELF object binary architecture: AMD x86-64

Expected behavior

A running single-node foundationdb cluster, check with sudo fdbcli --exec status:

Broken System

SIGNAL: Segmentation fault (11)
Trace: addr2line -e fdbcli.debug -p -C -f -i 0x7335ac 0x728a3d 0x72aa33 0x72ab11 0x72b00b 0xc02473 0x84810b 0x6214ea 0x7ff65d43d10e
Segmentation fault

Working System

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - ssd-2
  Coordinators           - 1
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 1
  Zones                  - 1
  Machines               - 1
  Memory availability    - 7.5 GB per process on machine with least available
  Fault Tolerance        - 0 machines
  Server time            - 06/13/24 10:14:01

Data:
  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 210 MB

Operating space:
  Storage server         - 3.1 GB free on most full server
  Log server             - 3.1 GB free on most full server

Workload:
  Read rate              - 16 Hz
  Write rate             - 0 Hz
  Transactions started   - 4 Hz
  Transactions committed - 0 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Client time: 06/13/24 10:14:01

Additional context

Looking at the dependency tree with:

nix-tree .#nixosConfigurations.test-system.config.services.foundationdb.package

The issue appears to be the glibc version change 2.38-77 -> glibc-2.39-52,
which is both a direct dependency of foundationdb and an indirect dependency
via boost-1.78.0, it was not obvious how to test this further.

I am happy to collect additional data as needed.

Notify maintainers

  1. foundationdb maintainers:

    @thoughtpolice @lostnet

  2. glibc maintainers:

    @eelco @Ma27 @ConnorBaker

Metadata

Broken System

  • system: "x86_64-linux"
  • host os: Linux 6.6.32, NixOS, 24.05 (Uakari), 24.05.20240613.bbc6229
  • multi-user?: no
  • sandbox: yes
  • version: nix-env (Nix) 2.18.2
  • nixpkgs: not found

Working System

  • system: "x86_64-linux"
  • host os: Linux 6.6.33, NixOS, 23.11 (Tapir), 23.11.20240612.5c2ec3a
  • multi-user?: yes
  • sandbox: yes
  • version: nix-env (Nix) 2.18.1
  • nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos
@siriobalmelli
Copy link
Contributor Author

Bump.

If there's anything else I can do to better debug please let me know.

@lostnet
Copy link
Contributor

lostnet commented Jun 21, 2024

It looks to me like the implementation of ClusterConnectionString was replaced in newer versions
so not encountering this would probably be a benefit of updating the version, so that may be an option.
(But I am not able to participate in that process.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants