RAID1の片ドライブが外れていたので復帰させてみた=>次の日RAID解消された

ふと確認してみたらRAID1が片肺運転だった。

$ cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md126 : active raid5 sdd1[3] sde1[4] sdc1[1]
      3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

md127 : active raid1 sda1[3]
      2930134016 blocks super 1.2 [2/1] [U_]
      bitmap: 10/22 pages [40KB], 65536KB chunk

unused devices:

sdbのSMARTを確認。

# smartctl -a /dev/sdb

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-514.6.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD30EZRZ-00Z5HB0
Serial Number:    WD-WCC4N7ES0157
LU WWN Device Id: 5 0014ee 2b8db729e
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Sep 12 17:54:24 2018 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

<<省略>>

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       124
  3 Spin_Up_Time            0x0027   215   177   021    Pre-fail  Always       -       4241
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       4
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       10
  9 Power_On_Hours          0x0032   081   081   000    Old_age   Always       -       14250
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       3
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       963269
194 Temperature_Celsius     0x0022   120   111   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   196   196   000    Old_age   Always       -       4
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       45
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       43
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       96

SMART Error Log Version: 1
No Errors Logged

<<以下略>>

14250時間=約19.5か月。
Reallocated_Event_Count(セクタ代替処理発生回数)＝4回。
Current_Pending_Sector（代替処理保留中セクタ数）＝45か所。
Offline_Uncorrectable（代替不能セクタ数）＝43か所。

# dmesg | grep md127

<<抜粋>>
[    8.837958] md/raid1:md127: active with 1 out of 2 mirrors

とあるので、前回停電から再投入したときから片肺だったと思われる。

ふと、RAID1に復帰させたらどうなるか試したくなった。

# mdadm --add /dev/md127 /dev/sdb1

# cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md126 : active raid5 sdd1[3] sde1[4] sdc1[1]
      3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

md127 : active raid1 sdb1[2] sda1[3]
      2930134016 blocks super 1.2 [2/2] [UU]
      bitmap: 2/22 pages [8KB], 65536KB chunk

unused devices:

復帰した。

追記。

次の日の夜に解消されていたのを発見しました。
素直に電源を落として同じ容量のHDDと交換しました。