XFS_Repair on a failed Raid Volume

jimleslie1977

New Member
Joined
Feb 25, 2020
Messages
16
Reaction score
1
Credits
111
Hello, and thank you for your support on this topic.

First, this is not my data, it has not been backed up and requires repairing if possible. I'm aware it should have been backed up, but I have only been engaged with after the event.

following a raid failure and replacing the affected drives we have been left with an unusable file system. We have unmounted it, and are considering using XFS_Repair to hopefully resolve this.

We have ran XFS_Repair -nv to export a report, I have noticed 85k entries stating:

"entry "NAME" in shortform directory 1769739 references non-existent inode 4687987753
would have junked entry "NAME" in directory inode 1769739"

Could you please advise what the outcome from this would be ie:
Would the file "NAME" be fixed and accessible as it was prior to the failure (amazing if that’s the answer)?
Would this be moved the the Lost+Found directory?
Would this file be unretrievable?
Or anything else I may not have thought about?

Thanks again
Jim
 


The message you received indicates that an entry named “NAME” in the shortform directory with inode number 1769739 references a non-existent inode (inode number 4687987753).
This typically occurs when an entry points to an inode that no longer exists due to corruption or other issues.
Possible Outcomes:
Unfortunately, the outcome depends on the severity of the corruption and the specific context. Here are some possibilities:
Best-case scenario: If the corruption is minor and limited to a few entries, xfs_repair might successfully fix the filesystem, and the file “NAME” could become accessible again.
Worst-case scenario: If the corruption is extensive or affects critical metadata, the file might be irretrievable.
Intermediate scenarios:
The file might end up in the Lost+Found directory if xfs_repair identifies orphaned entries.
Some data loss may occur, especially if the filesystem structure is severely damaged.
Recommendations:
Before proceeding, ensure you have a backup of the filesystem or its data.
Run xfs_repair without the -n (no modify) option to apply repairs. This will actually fix the filesystem rather than just reporting issues.
 
RAID systems need to be monitored and anytime there is a problem they need to be investigated. RAID is great at keeping the data going including during physical failure because it can pick up parts of the file from another drive when the first one is damaged. This also means that both drives can be damaged physically and still give you good data as long as the same parts of a file are not damaged on both drives which almost never happens. However if both drives have problems, replacing one drive will cause catastrophic issues. You may need to put the old drive back into the RAID and pull everything that way. Then do the replace.
First step is to check the drives using S.M.A.R.T. and see what you are dealing with. Replace all drives that show ANY errors. But only do this after backing up the data. It sounds like you may have more than one problem.
 


Top