ubuntu dedicated server

ElectronicBanana

New Member
Joined
Dec 16, 2018
Messages
3
Reaction score
0
Credits
0
Hello all, ive had dedicated server for about 4 years now running ubuntu 16.04. it has soft raid with 2x256 SSD drives. for a few months now, ive been recieving smart health warnings -drive failure is imminent in the next 24 hours. ive done a few tests on the drives and really cant find which one is failing . can somewon tell me from the output from these cmds i ran?

smartctl -d ata -a /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-3.14.32-xxxx-grs-ipv6-64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Crucial/Micron MX100/MX200/M5x0/M600 Client SSDs
Device Model: Micron_M600_MTFDDAK256MBF
Serial Number: 154110C502A1
LU WWN Device Id: 5 00a075 110c502a1
Firmware Version: MU03
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Dec 16 15:23:44 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 814) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 5) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0
5 Reallocate_NAND_Blk_Cnt 0x0032 100 100 010 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 25341
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 23
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
173 Ave_Block-Erase_Count 0x0032 001 001 000 Old_age Always - 4668
174 Unexpect_Power_Loss_Ct 0x0032 100 100 000 Old_age Always - 22
180 Unused_Reserve_NAND_Blk 0x0033 000 000 000 Pre-fail Always - 1920
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always - 0
184 Error_Correction_Count 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 066 048 000 Old_age Always - 34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Percent_Lifetime_Used 0x0030 000 000 001 Old_age Offline FAILING_NOW 100
206 Write_Error_Rate 0x000e 100 100 000 Old_age Always - 0
210 Success_RAIN_Recov_Cnt 0x0032 100 100 000 Old_age Always - 0
246 Total_Host_Sector_Write 0x0032 100 100 000 Old_age Always - 177689076668
247 Host_Program_Page_Count 0x0032 100 100 000 Old_age Always - 8846171358
248 Bckgnd_Program_Page_Cnt 0x0032 100 100 000 Old_age Always - 35802776690

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Vendor (0xff) Completed without error 00% 16878 -
# 2 Vendor (0xff) Completed without error 00% 9545 -
# 3 Vendor (0xff) Completed without error 00% 55 -
# 4 Short offline Completed without error 00% 4 -
# 5 Short offline Completed without error 00% 2 -
# 6 Short offline Completed without error 00% 2 -
# 7 Short offline Completed without error 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sdb1[1] sda1[0]
20478912 blocks [2/2] [UU]

md2 : active raid1 sdb2[1] sda2[0]
229050304 blocks [2/2] [UU]
bitmap: 2/2 pages [8KB], 65536KB chunk



thanks a bunch for the feedback.
mark
 


Hey there - welcome to the forum!

I scanned through it and noticed that you queried sdb up top which says failure .. but at the bottom it states raid 1 is set up between sdb2 and sda2.

I haven't used smartctl in a bit, so i'd need to re-read a few things.. does querying sda also result in failure warnings?
 
so how would you query the drive? can you give me some cmds to try please. thanks

you mean this way?

smartctl -d ata -a /dev/sdb2 and i also did
smartctl -d ata -a /dev/sda2


you mean do that?
Percent_Lifetime_Used 0x0030 000 000 001 Old_age Offline FAILI NG_NOW 100
thats the only failures i get , i get them on both. so I guess both drives are failing? should I just replace one at a time, so the raid rebuilds. then replace the other after? this has been going on for a few months now. so its time to do something i guess
 
Last edited:
you mean this way?

smartctl -d ata -a /dev/sdb2 and i also did
smartctl -d ata -a /dev/sda2
Yep, well.. I think you would need to just query the drive itself, not the partitions (so, /dev/sda and /dev/sdb.. not /dev/sda2, /dev/sdb2).

If they both tell you failure coming, replace both.. one at a time.
 

Staff online

Members online


Latest posts

Top