Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在运行run.sh时报错 #17

Open
Modesty-Li opened this issue Nov 5, 2021 · 1 comment
Open

在运行run.sh时报错 #17

Modesty-Li opened this issue Nov 5, 2021 · 1 comment

Comments

@Modesty-Li
Copy link

您好!我在执行./run.sh 3命令时遇到了下面的一些问题,麻烦您在有空的时候解答一下。我使用的是Infiniband网卡,机器之间通信也没有问题,因为受机器内存的限制,我在分配内存的时候减少了空间。
[server11:504910] Warning: could not find environment variable "CLASSPATH"
INFO: TOPO: 1nodes
INFO: node 0 cores: 0 1 2 3
INFO: #0: has 4 cores.
INFO: #0: allocate 5.0625GB memory
INFO: TOPO: 1nodes
INFO: node 0 cores: 0 1 2 3
INFO: #2: has 4 cores.
INFO: #2: allocate 5.0625GB memory
INFO: TOPO: 1nodes
INFO: node 0 cores: 0 1 2 3
INFO: #1: has 4 cores.
INFO: #1: allocate 5.0625GB memory
INFO: initializing RMDA done (2157 ms)
INFO: initializing RMDA done (2161 ms)
INFO: loading ID-mapping file: /home/robert/rdfdata/str_normal
INFO: loading ID-mapping file: /home/robert/rdfdata/str_normal
INFO: initializing RMDA done (2163 ms)
INFO: loading ID-mapping file: /home/robert/rdfdata/str_normal
INFO: loading ID-mapping file: /home/robert/rdfdata/str_index
INFO: loading ID-mapping file: /home/robert/rdfdata/str_index
INFO: loading ID-mapping (attribute) file: /home/robert/rdfdata/str_attr_index
INFO: loading string server is finished (46 ms)
INFO: loading ID-mapping (attribute) file: /home/robert/rdfdata/str_attr_index
INFO: loading string server is finished (47 ms)
INFO: loading ID-mapping file: /home/robert/rdfdata/str_index
INFO: loading ID-mapping (attribute) file: /home/robert/rdfdata/str_attr_index
INFO: loading string server is finished (75 ms)
INFO: allocate 256MB RDMA cache
INFO: gstore = 4294967296 bytes
INFO: header region: 153008209 slots (main = 12582917, indirect = 6543109)
INFO: entry region: 461708984 entries
INFO: allocate 256MB RDMA cache
INFO: gstore = 4294967296 bytes
INFO: header region: 153008209 slots (main = 12582917, indirect = 6543109)
INFO: entry region: 461708984 entries
INFO: 2 files and 2 attributed files found in directory (/home/robert/rdfdata/) at server 0
INFO: 2 files and 2 attributed files found in directory (/home/robert/rdfdata/) at server 1
INFO: allocate 256MB RDMA cache
INFO: gstore = 4294967296 bytes
INFO: header region: 153008209 slots (main = 12582917, indirect = 6543109)
INFO: entry region: 461708984 entries
INFO: 2 files and 2 attributed files found in directory (/home/robert/rdfdata/) at server 2
got bad completion with status: 0xc, vendor syndrome: 0x81, with error transport retry counter exceeded, qp n:1 t:0
wukong: /home/robert/wukong/rdma_lib/rdmaio.hpp:769: rdmaio::Qp::IOStatus rdmaio::Qp::poll_completion(uint64_t*): Assertion false' failed. [server11:504915] *** Process received signal *** [server11:504915] Signal: Aborted (6) [server11:504915] Signal code: (-6) [server11:504915] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f4f64c883c0] [server11:504915] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7f4f64ac518b] [server11:504915] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7f4f64aa4859] [server11:504915] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7f4f64aa4729] [server11:504915] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7f4f64ab5f36] [server11:504915] [ 5] ../build/wukong(+0x113dfb) [0x55af55f15dfb] [server11:504915] [ 6] ../build/wukong(_ZN10BaseLoader13flush_triplesEii+0x1b3) [0x55af55f2db83] [server11:504915] [ 7] ../build/wukong(_ZN10BaseLoader4loadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIS8_I8triple_tSaIS9_EESaISB_EESE_RS8_IS8_I13triple_attr_tSaISF_EESaISH_EE+0x44e) [0x55af55f7152e] [server11:504915] [ 8] ../build/wukong(_ZN6DGraphC2EiP3MemP12StringServerNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1f3) [0x55af55f5f633] [server11:504915] [ 9] ../build/wukong(main+0x6af) [0x55af55ef7d1f] [server11:504915] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f4f64aa60b3] [server11:504915] [11] ../build/wukong(_start+0x2e) [0x55af55efaade] [server11:504915] *** End of error message *** got bad completion with status: 0xc, vendor syndrome: 0x81, with error transport retry counter exceeded, qp n:0 t:0 wukong: /home/robert/wukong/rdma_lib/rdmaio.hpp:769: rdmaio::Qp::IOStatus rdmaio::Qp::poll_completion(uint64_t*): Assertion false' failed.
[server12:447992] *** Process received signal ***
[server12:447992] Signal: Aborted (6)
[server12:447992] Signal code: (-6)
[server12:447992] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f7f340c53c0]
[server12:447992] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7f7f33f0218b]
[server12:447992] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7f7f33ee1859]
[server12:447992] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7f7f33ee1729]
[server12:447992] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7f7f33ef2f36]
[server12:447992] [ 5] ../build/wukong(+0x113dfb) [0x55a443ccadfb]
[server12:447992] [ 6] ../build/wukong(_ZN10BaseLoader13flush_triplesEii+0x1b3) [0x55a443ce2b83]
[server12:447992] [ 7] ../build/wukong(_ZN10BaseLoader4loadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIS8_I8triple_tSaIS9_EESaISB_EESE_RS8_IS8_I13triple_attr_tSaISF_EESaISH_EE+0x44e) [0x55a443d2652e]
[server12:447992] [ 8] ../build/wukong(_ZN6DGraphC2EiP3MemP12StringServerNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1f3) [0x55a443d14633]
[server12:447992] [ 9] ../build/wukong(main+0x6af) [0x55a443cacd1f]
[server12:447992] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f7f33ee30b3]
[server12:447992] [11] ../build/wukong(_start+0x2e) [0x55a443cafade]
[server12:447992] *** End of error message ***
got bad completion with status: 0xc, vendor syndrome: 0x81, with error transport retry counter exceeded, qp n:0 t:0
wukong: /home/robert/wukong/rdma_lib/rdmaio.hpp:769: rdmaio::Qp::IOStatus rdmaio::Qp::poll_completion(uint64_t*): Assertion `false' failed.
[server13:433953] *** Process received signal ***
[server13:433953] Signal: Aborted (6)
[server13:433953] Signal code: (-6)
[server13:433953] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7fd3872f63c0]
[server13:433953] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7fd38713318b]
[server13:433953] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7fd387112859]
[server13:433953] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x25729) [0x7fd387112729]
[server13:433953] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x36f36) [0x7fd387123f36]
[server13:433953] [ 5] ../build/wukong(+0x113dfb) [0x5622cef68dfb]
[server13:433953] [ 6] ../build/wukong(_ZN10BaseLoader13flush_triplesEii+0x1b3) [0x5622cef80b83]
[server13:433953] [ 7] ../build/wukong(_ZN10BaseLoader4loadERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIS8_I8triple_tSaIS9_EESaISB_EESE_RS8_IS8_I13triple_attr_tSaISF_EESaISH_EE+0x44e) [0x5622cefc452e]
[server13:433953] [ 8] ../build/wukong(_ZN6DGraphC2EiP3MemP12StringServerNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1f3) [0x5622cefb2633]
[server13:433953] [ 9] ../build/wukong(main+0x6af) [0x5622cef4ad1f]
[server13:433953] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fd3871140b3]
[server13:433953] [11] ../build/wukong(_start+0x2e) [0x5622cef4dade]
[server13:433953] *** End of error message ***

mpiexec noticed that process rank 0 with PID 504915 on node server11 exited on signal 6 (Aborted).

@EsdeathYZH
Copy link
Member

你好,这个仓库的codebase相对比较老,可以使用我们最新的仓库:https://ipads.se.sjtu.edu.cn:1312/opensource/wukong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants