私はIntel oneApiをインストールしましたが、Intel MPIが含まれています。 Fortranコードをコンパイルすることができますが、mpiifort
intelを使用して実行しようとすると、「のみを使用するmpirun
とこのエラーが発生します。
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
[1628187174.213489] [localhost:38060:0] select.c:406 UCX ERROR no active messages transport to <no debug data>: self/self - Destination is unreachable, rdmacm/sockaddr - no am bcopy
[1628187174.213511] [localhost:38061:0] select.c:406 UCX ERROR no active messages transport to <no debug data>: self/self - Destination is unreachable, rdmacm/sockaddr - no am bcopy
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1169)..............:
MPIDI_OFI_mpi_init_hook(1909): OFI get address vector map failed
Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(138)........:
MPID_Init(1169)..............:
MPIDI_OFI_mpi_init_hook(1909): OFI get address vector map failed
今ここのアドバイスに従ってください。https://github.com/openucx/ucx/issues/4742#issuecomment-584059909、これらの環境変数を設定するexport UCX_TLS=ud,sm,self
と実行可能ファイルが実行されますが、このエラーも発生します。
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
libi40iw-i40iw_vmapped_qp: failed to pin memory for SQ
libi40iw-i40iw_ucreate_qp: failed to map QP
[1628186868.761953] [localhost:35174:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
[1628186868.793554] [localhost:35173:0] sys.c:618 UCX ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, please check shared memory limits by 'ipcs -l'
出力は次のとおりですipcs -l
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18014398442373116
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
問題が何であるか、解決策を理解できません。誰でも私を助けることができますか?