昨日の職場では、Raid5アレイに新しい3TBドライブを追加し、一晩再構築しました。今日、私は日記で次のエラーを見つけました。
Jan 16 07:49:42 iHugo kernel: INFO: task md0_resync:854 blocked for more than 120 seconds.
Jan 16 07:49:42 iHugo kernel: task:md0_resync state:D stack: 0 pid: 854 ppid: 2 flags:0x00004000
Jan 16 07:49:42 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 120 seconds.
Jan 16 07:49:42 iHugo kernel: task:jbd2/md0p1-8 state:D stack: 0 pid: 1006 ppid: 2 flags:0x00004000
Jan 16 07:51:43 iHugo kernel: INFO: task md0_resync:854 blocked for more than 241 seconds.
Jan 16 07:51:43 iHugo kernel: task:md0_resync state:D stack: 0 pid: 854 ppid: 2 flags:0x00004000
Jan 16 07:51:43 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 241 seconds.
Jan 16 07:51:43 iHugo kernel: task:jbd2/md0p1-8 state:D stack: 0 pid: 1006 ppid: 2 flags:0x00004000
Jan 16 07:53:44 iHugo kernel: INFO: task md0_resync:854 blocked for more than 362 seconds.
Jan 16 07:53:44 iHugo kernel: task:md0_resync state:D stack: 0 pid: 854 ppid: 2 flags:0x00004000
Jan 16 07:53:44 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 362 seconds.
Jan 16 07:53:44 iHugo kernel: task:jbd2/md0p1-8 state:D stack: 0 pid: 1006 ppid: 2 flags:0x00004000
Jan 16 07:55:45 iHugo kernel: INFO: task md0_resync:854 blocked for more than 483 seconds.
Jan 16 07:55:45 iHugo kernel: task:md0_resync state:D stack: 0 pid: 854 ppid: 2 flags:0x00004000
Jan 16 07:55:45 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 483 seconds.
Jan 16 07:55:45 iHugo kernel: task:jbd2/md0p1-8 state:D stack: 0 pid: 1006 ppid: 2 flags:0x00004000
Jan 16 07:57:45 iHugo kernel: INFO: task md0_resync:854 blocked for more than 604 seconds.
Jan 16 07:57:45 iHugo kernel: task:md0_resync state:D stack: 0 pid: 854 ppid: 2 flags:0x00004000
Jan 16 07:57:45 iHugo kernel: INFO: task jbd2/md0p1-8:1006 blocked for more than 604 seconds.
Jan 16 07:57:45 iHugo kernel: task:jbd2/md0p1-8 state:D stack: 0 pid: 1006 ppid: 2 flags:0x00004000
その後、サーバーを再起動してみました。再起動後、Raidは起動しません。
Jan 16 09:17:26 iHugo blkdeactivate[82348]: [MD]: deactivating part device md0p1...
Jan 16 09:17:26 iHugo blkdeactivate[82359]: cat: /sys/block/md0p1/md/sync_action: No such file or directory
Jan 16 09:35:58 iHugo kernel: md/raid:md0: not clean -- starting background reconstruction
Jan 16 09:35:58 iHugo kernel: md/raid:md0: device sdd operational as raid disk 1
Jan 16 09:35:58 iHugo kernel: md/raid:md0: device sdf1 operational as raid disk 3
Jan 16 09:35:58 iHugo kernel: md/raid:md0: device sdb operational as raid disk 0
Jan 16 09:35:58 iHugo kernel: md/raid:md0: force stripe size 512 for reshape
Jan 16 09:35:58 iHugo kernel: md/raid:md0: cannot start dirty degraded array.
Jan 16 09:35:58 iHugo kernel: md/raid:md0: failed to run raid set.
以下はいくつかの詳細です。
統計データ:
root@iHugo:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md0 : inactive sdf1[4] sdd[1] sde[3] sdb[0]
8790276327 blocks super 1.2
unused devices: <none>
母:
root@iHugo:~# mdadm -D /dev/md0
mdadm: Unknown keyword INACTIVE-ARRAY /dev/md0:
Version : 1.2
Creation Time : Thu Jan 13 18:57:19 2022
Raid Level : raid5
Used Dev Size : 1953378304 (1862.89 GiB 2000.26 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent
Update Time : Sun Jan 16 07:47:20 2022
State : active, FAILED, Not Started
Active Devices : 3 Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 512K
Consistency Policy : unknown
Delta Devices : 1, (3->4)
Name : iHugo:0 (local to host iHugo)
UUID : dc5e662f:4f32bd91:95ee7139:7ef94601
Events : 59708
Number Major Minor RaidDevice State
- 0 0 0 removed
- 0 0 1 removed
- 0 0 2 removed
- 0 0 3 removed
- 8 64 2 spare rebuilding /dev/sde
- 8 48 1 sync /dev/sdd
- 8 16 0 sync /dev/sdb
- 8 81 3 sync /dev/sdf1
fdisk -l
root@iHugo:~# fdisk -l
Disk /dev/sdb: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WDC WD20EFRX-68E
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Samsung SSD 840
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 1B771FD2-DAB6-428D-88A6-0A1F1D35671E
Device Start End Sectors Size Type
/dev/sda1 2048 1050623 1048576 512M EFI System
/dev/sda2 1050624 486395903 485345280 231.4G Linux filesystem
/dev/sda3 486395904 488396799 2000896 977M Linux swap
Disk /dev/sdf: 2.73 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: ST3000DM001-1CH1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: DCF1340A-3DA8-42EE-B7B1-7F439E571148
Device Start End Sectors Size Type
/dev/sdf1 2048 5860532223 5860530176 2.7T Linux filesystem
Disk /dev/sdc: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: SAMSUNG HD103UJ
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EFBD098F-4397-4EC9-B242-1712149A75C9
Device Start End Sectors Size Type
/dev/sdc1 2048 1953525134 1953523087 931.5G Linux filesystem
Disk /dev/sde: 1.82 TiB, 2000394706432 bytes, 3907020911 sectors
Disk model: WDC WD20EARS-00J
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/sdd: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WDC WD20EARS-00M
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
黒子
root@iHugo:~# blkid
/dev/sdb: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="2a63392a-2f36-5e40-509a-8a968c132b66" LABEL="iHugo:0" TYPE="linux_raid_member"
/dev/sda1: UUID="CC0A-CBAA" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="a8c63d30-5ddd-4a1f-b9d5-faed36434457"
/dev/sda2: UUID="7fe4974d-d6ae-4c09-a8b7-8cb46ab978b8" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="39f49ee3-8061-4da8-9802-a77beaec158a"
/dev/sda3: UUID="e668eb26-e954-48d9-9fba-f92f6437c49f" TYPE="swap" PARTUUID="95f3627d-1e7f-4fa8-a39b-d91b2b2b7012"
/dev/sdf1: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="336f8167-c619-4553-8ab2-0b4516106ae1" LABEL="iHugo:0" TYPE="linux_raid_member" PARTLABEL="primary" PARTUUID="26c1353a-0796-48ba-92db-d4fdae4f7f98"
/dev/sdc1: LABEL="Leer" BLOCK_SIZE="512" UUID="01D437C7F0FED880" TYPE="ntfs" PARTUUID="975d1223-be02-471d-a291-e3433048e0ee"
/dev/sde: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="b66d96aa-5e60-83f2-8a8c-9f8c3d1caf65" LABEL="iHugo:0" TYPE="linux_raid_member"
/dev/sdd: UUID="dc5e662f-4f32-bd91-95ee-71397ef94601" UUID_SUB="6d33f5c9-e81f-3965-da7b-37a2366ed1d1" LABEL="iHugo:0" TYPE="linux_raid_member"
あなたの考えを聞いて教えてください
システムはドライブを認識し、mdadmは一種の配列を見ることができますが、起動することはできません。アレイがダーティデグレードされていると記録されていますが、/sys/block/md0/md/array_stateでは非アクティブとしてマークされるのはなぜですか?
私は今まで何を試しましたか
mdadm --run を使用して再起動します。
root@iHugo:~# mdadm --run /dev/md0
mdadm: Unknown keyword INACTIVE-ARRAY
mdadm: failed to start array /dev/md/iHugo:0: Input/output error
強制的に再組み立て
root@iHugo:~# mdadm --assemble --force /dev/md0 /dev/sdb /dev/sdd /dev/sde /dev/sdf1
mdadm: Unknown keyword INACTIVE-ARRAY
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
編集1
dmesg -t --level=alert,crit,err,warn の出力secureboot: Secure boot could not be determined (mode 0)
x86/cpu: VMX (outside TXT) disabled by BIOS
MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
#3
pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
pci 0000:00:02.0: BIOS left Intel GPU interrupts enabled; disabling
ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20200925/utaddress-204)
ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200925/utaddress-204)
ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200925/utaddress-204)
ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200925/utaddress-204)
lpc_ich: Resource conflict(s) found affecting gpio_ich
r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
at24 0-0050: supply vcc not found, using dummy regulator
r8169 0000:03:00.0: firmware: failed to load rtl_nic/rtl8168f-1.fw (-2)
firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
r8169 0000:03:00.0: Direct firmware load for rtl_nic/rtl8168f-1.fw failed with error -2
r8169 0000:03:00.0: Unable to load firmware rtl_nic/rtl8168f-1.fw (-2)
FAT-fs (sda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
OCFS2 User DLM kernel interface loaded
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...
md/raid:md0: cannot start dirty degraded array.
md/raid:md0: failed to run raid set.
md: pers->run() failed ...