Check Disk Filesystems

From unRAID
Jump to: navigation, search

Checking and fixing file systems

If you suspect or have been told you may have file system corruption, then you are on the right page!
The tools and instructions below are ONLY for the maintenance and correction of file system issues, NOT for hardware or other issues. If you have hardware issues, drive errors such as read and write errors or a drive dropping off-line, then you need to check the syslog for the type of error, and obtain a SMART report for the drive.
Very important!!! Do NOT run these tools on the parity drive. It does NOT have a file system, and running ANY file system repair tool on your parity drive can corrupt it! If you are here because of issues with your Parity drive, please leave NOW! You are in the wrong place! These tools and instructions are ONLY for formatted data drives.
Because the instructions are different depending on how your drive is formatted and which version of unRAID you are running, this page is divided into 5 sections. Please follow only the instructions in the correct section for your drive.
If you are running unRAID v6, you can use the webGui to check and fix the file system of any data drive. Unless you prefer to work at the command line, go to Checking and fixing drives in the webGui.
If you are running unRAID v4 or v5 or prefer working at the command line -


Checking and fixing drives in the webGui

The instructions here are designed to check and fix the integrity of the file system of a data drive, while maintaining its parity info.

Preparing to test

  • Stop the array, if it is currently started and not in Maintenance mode.
  • Start the array in Maintenance mode, by clicking the Maintenance mode check box before clicking the Start button. This starts the unRAID driver but does not mount any of the drives.
  • Click the name of the disk that you want to test and/or repair. For example, if the drive of concern is Disk 5, then click on Disk 5. If it's the Cache drive, then click on Cache.
  • You should see a page of options for that drive, beginning with various partition, file system format, and spin down settings. The section following that is the one you want, titled Check Filesystem Status. There is a box with the 2 words Not available in it. This is where the progress and results of the command will be displayed. Below that is the Check button that starts the test or repair, followed by a box where you can type in options for the test/repair command.

Running the test

  • The default for all file system formats is a read-only check of the file system, with no changes made to the drive. Click the Check button, to start the test.
  • The progress and results will be displayed, and you will review them, and decide what action to take next. It is necessary to refresh the screen periodically, using your browser's refresh button or keystroke, until the display indicates the test or repair is complete.
  • If the results display a successful test, with no corruptions found, then you are done! Skip down to After the test and repair.

Running the repair

  • If however issues were found, the display of results will indicate the recommended action to take. Typically, that will involve repeating the command with a specific option, clearly stated, which you will type into the options box (including any hyphens, usually 2 leading hyphens).
  • Then click the Check button again to run the repair.
  • Progress and results will again be displayed, and it's possible that you will have to run it again, with perhaps a different option.
  • For ReiserFS drives
    For more info on the reiserfsck tool and its options, see reiserfsck.
    There is more information about reiserfsck options in the command line sections for reiserfsck below.
  • For XFS drives
    For more info on the xfs_repair tool and its options, see xfs_repair.
  • For BTRFS drives
    Don't know yet, use your best judgement ---work-in-progress--
  • During and at the conclusion of the command, a report will be displayed. If errors are detected, this report may specify additional actions to take. For XFS and BTRFS, we don't yet have much expertise to advise you. Use your best judgement. Most of us will probably do whatever it suggests we do.
  • If your file system has only minor issues, then running the first option suggested should be all that is necessary. If it finds more significant corruption, then it may create a lost+found directory and place in it the files, directories, and parts of files it can recover. It will then be up to you to rename them and restore those files and directories to their correct locations. Many times it will be possible to identify them by their contents, or their size.
  • Note: If the repair command performs write operations to repair the file system, parity will be maintained.

After the test and repair

If you are in Maintenance mode, you can resume normal operations by stopping the array, then restarting the array with the Maintenance mode check box unchecked.

Additional comments

These test and repair tools may take a long time on a full file system (several minutes to a half hour, or more).
If there was significant corruption, then it is possible that some files were not completely recovered. Check for a lost+found folder on this drive, which may contain fragments of the unrecoverable files. It is up to you to examine these and determine what files they are from, and act accordingly. Hopefully, you have another copy of each file. When you are finished examining them and saving what you can, then delete the fragments and remove the lost+found folder. Dealing with this folder does not have to be done immediately. This is similar to running chkdsk or scandisk within Windows, and finding lost clusters, and dealing with files named File0000.chk or similar. You may find one user's story very helpful, plus his later tale of the problems of sifting through the recovered files.


Drives formatted with XFS

Note: for more info on the xfs_repair tool and its options, see xfs_repair.
The xfs_repair instructions here are designed to check and fix the integrity of the XFS file system of a data drive, while maintaining its parity info.

Preparing to run xfs_repair

Start the array in Maintenance mode, by clicking the Maintenance mode check box before clicking the Start button. This starts the unRAID driver but does not mount any of the drives.

Running xfs_repair

Now you are ready to run the XFS file system test. At the console or in a Telnet session, type this: (Note: the following example refers to Disk 1, as /dev/md1. You will need to substitute the correct drive for your case. For example, if it is your Disk 5 that you are testing, then substitute md5 for md1.)
xfs_repair -v /dev/md1
During and at the conclusion of the xfs_repair command, a report will be output. If errors are detected, this report may specify an additional action to take. We don't yet have much expertise to advise you about this. Use your best judgement. Most of us will probably do whatever it suggests we do.
If your file system has only minor issues, then running xfs_repair -v should be all that is necessary. If it finds more significant corruption, then it may create a lost+found directory and place in it the files, directories, and parts of files it can recover. It will then be up to you to rename them and restore those files and directories to their correct locations. Many times it will be possible to identify them by their contents, or their size.
Note: If xfs_repair performs write operations to repair the file system, parity will be maintained.

After running xfs_repair

If you are in Maintenance mode, you can resume normal operations by stopping the array, then restarting the array with the Maintenance mode check box unchecked.

Additional comments

The xfs_repair tool may take a long time on a full file system (several minutes to a half hour, or more).
If there was significant corruption, then it is possible that some files were not completely recovered. Check for a lost+found folder on this drive, which may contain fragments of the unrecoverable files. It is up to you to examine these and determine what files they are from, and act accordingly. Hopefully, you have another copy of each file. When you are finished examining them and saving what you can, then delete the fragments and remove the lost+found folder. Dealing with this folder does not have to be done immediately. This is similar to running chkdsk or scandisk within Windows, and finding lost clusters, and dealing with files named File0000.chk or similar. You may find one user's story very helpful, plus his later tale of the problems of sifting through the recovered files.
If you get an error indicating something like trouble opening the file system, it may indicate that you attempted to run the file system check on the wrong device name. For almost all repairs, you would use /dev/md1, /dev/md2, /dev/md3, /dev/md4, etc. If operating on the cache drive (which is not protected by parity), you would use /dev/sdX1 (note the trailing "1" indicating the first partition on the cache drive).
If you want to test and repair a non-array drive, you would use the drive's partition symbol (e.g. sdc1, sdj1, sdx1, etc), not the array device symbol (e.g. md1, md13, etc). So the device name would be something like /dev/sdj1, /dev/sdx1, etc.


Drives formatted with BTRFS

--- work in progress ---


Drives formatted with ReiserFS using unRAID v5 or later

Note: for more info on the reiserfsck tool and its options, see reiserfsck.
Note2: unRAID data disks formatted with ReiserFS use ReiserFS version 3.6.
This section is only for users who are running unRAID v5.0-beta8d or later. If you are running an earlier version, including all unRAID v4, please go to the next section, Drives formatted with ReiserFS using unRAID v4.
The reiserfsck instructions here are designed to check and fix the integrity of the Reiser file system of a data drive, while maintaining its parity info.
Note: the following examples refer to Disk 1, as /dev/md1. You will need to substitute the correct drive for each case. For example, if it is your Disk 5 that you are testing, then substitute md5 for md1, in all of the instructions below.

Preparing to run reiserfsck

Start the array in Maintenance mode, by clicking the Maintenance mode check box before clicking the Start button. This starts the unRAID driver but does not mount any of the drives.

Running reiserfsck

Now you are ready to run the Reiser file system test. (Note: --check is the default option, not strictly required, but included here for clarification.) At the console or in a Telnet session, type this:
reiserfsck --check /dev/md1 [answer with the word Yes when prompted, do not type yes or YES, but Yes (capital Y and lower case es)]
At the conclusion of the reiserfsck --check command, a report will be output. If errors are detected, this report may specify an additional action to take. The most common ones are to re-run reiserfsck specifying the --fix-fixable switch or the --rebuild-tree switch, for example:
reiserfsck --fix-fixable /dev/md1 [answer with Yes when prompted. (capital Y and lower case es)]
If your file system has only minor issues, then running reiserfsck --fix-fixable should be all that is necessary.
Important Note!!! Do NOT run reiserfsck with the --rebuild-sb or --rebuild-tree switches, unless you are instructed to, by the instruction of a previous run of reiserfsck, or by an expert user! They are last-resort options, to repair a severely damaged Reiser file system, and recover as much as possible. They almost always create a lost+found directory and place in it the files, directories, and parts of files it can recover. It will then be up to you to rename them and restore those files and directories to their correct locations. Many times it will be possible to identify them by their contents, or their size.
Important Note #2!!! If the option --rebuild-sb is suggested, then PLEASE ask for assistance on the unRAID forums. The --rebuild-sb option requires answers that must be PERFECT. Please see this thread, and this too.
Note: If reiserfsck performs write operations to repair the file system, parity will be maintained.

After running reiserfsck

If you are in Maintenance mode, you can resume normal operations by stopping the array, then restarting the array with the Maintenance mode check box unchecked.

Additional comments

The reiserfsck tool may take a long time on a full file system (several minutes to a half hour, or more). Also re-mounting the disk can take up to 15 seconds or so.
If you were instructed to use special parameters such as --fix-fixable and --rebuild-tree, then it is possible that some files were not completely recovered. Check for a lost+found folder on this drive, which may contain fragments of the unrecoverable files. It is unfortunately up to you to examine these and determine what files they are from, and act accordingly. Hopefully, you have another copy of each file. When you are finished examining them and saving what you can, then delete the fragments and remove the lost+found folder. Dealing with this folder does not have to be done immediately. This is similar to running chkdsk or scandisk within Windows, and finding lost clusters, and dealing with files named File0000.chk or similar. You may find one user's story very helpful, plus his later tale of the problems of sifting through the recovered files.
If you get an error that says reiserfs_open: the reiserfs superblock cannot be found on /dev/sdX. Failed to open the filesystem. it usually indicates you attempted to run the file system check on the wrong device name. For almost all repairs, you would use /dev/md1, /dev/md2, /dev/md3, /dev/md4, etc. If operating on the cache drive (which is not protected by parity), you would use /dev/sdX1 (note the trailing "1" indicating the first partition on the cache drive).
If you want to test and repair a non-array drive, you would use the drive's partition symbol (e.g. sdc1, sdj1, sdx1, etc), not the array device symbol (e.g. md1, md13, etc). So the device name would be something like /dev/sdj1, /dev/sdx1, etc.


Drives formatted with ReiserFS using unRAID v4

Note: for more info on the reiserfsck tool and its options, see reiserfsck.
Note2: unRAID data disks formatted with ReiserFS use ReiserFS version 3.6.
This section is only for users who are running any version of unRAID prior to v5.0-beta8d, including all unRAID v4 versions. If you are running a later version, please go to the previous section, Drives formatted with ReiserFS using unRAID v5 or later.
The reiserfsck instructions here are designed to check and fix the integrity of the Reiser file system of a data drive, while maintaining its parity info.
Note: the following examples refer to Disk 1, as /dev/md1. You will need to substitute the correct drive for each case. For example, if it is your Disk 5 that you are testing, then substitute md5 for md1, and disk5 for disk1, in all of the instructions below.

Preparing to run reiserfsck

Start the array, then from the console or in a Telnet session, type this:
cd [this will make sure you are in the /root directory]
samba stop [all your shares will disappear from network]
umount /dev/md1 ['md1' corresponds to disk1, 'md2' to disk2, etc. note: it is 'umount', not 'unmount']
Note: you will not be able to unmount the disk if it is "busy." A disk is busy if any processes are using it, or referencing files/folders on it. If the "umount" command is not successful, you will need to stop any add-on process you might have running that are referencing the disk before you can unmount it. If you have "changed directory" to it, you must log off, or "cd" elsewhere (off the disk) before the "umount" will succeed.

Running reiserfsck

Now you are ready to run the Reiser file system test. (Note: --check is the default option, not strictly required, but included here for clarification.) At the console or in a Telnet session, type this:
reiserfsck --check /dev/md1 [answer with the word Yes when prompted, do not type yes or YES, but Yes (capital Y and lower case es)]
At the conclusion of the reiserfsck --check command, a report will be output. If errors are detected, this report may specify an additional action to take. The most common ones are to re-run reiserfsck specifying the --fix-fixable switch or the --rebuild-tree switch, for example:
reiserfsck --fix-fixable /dev/md1 [answer with Yes when prompted. (capital Y and lower case es)]
If your file system has only minor issues, then running reiserfsck --fix-fixable should be all that is necessary.
Important Note!!! Do NOT run reiserfsck with the --rebuild-sb or --rebuild-tree switches, unless you are instructed to, by the instruction of a previous run of reiserfsck, or by an expert user! They are last-resort options, to repair a severely damaged Reiser file system, and recover as much as possible. They almost always create a lost+found directory and place in it the files, directories, and parts of files it can recover. It will then be up to you to rename them and restore those files and directories to their correct locations. Many times it will be possible to identify them by their contents, or their size.
Important Note #2!!! If the option --rebuild-sb is suggested, then PLEASE ask for assistance on the unRAID forums. The --rebuild-sb option requires answers that must be PERFECT. Please see this thread, and this too.
Note: If reiserfsck performs write operations to repair the file system, parity will be maintained.

After running reiserfsck

You can resume normal operations by, from the console or in a Telnet session, typing this:
mount /dev/md1 /mnt/disk1 [important to match up the 'md1' with 'disk1', 'md2' with 'disk2', etc.]
samba start [all shares should again be visible]

Additional comments

The reiserfsck tool may take a long time on a full file system (several minutes to a half hour, or more). Also re-mounting the disk can take up to 15 seconds or so.
If you were instructed to use special parameters such as --fix-fixable and --rebuild-tree, then it is possible that some files were not completely recovered. Check for a lost+found folder on this drive, which may contain fragments of the unrecoverable files. It is unfortunately up to you to examine these and determine what files they are from, and act accordingly. Hopefully, you have another copy of each file. When you are finished examining them and saving what you can, then delete the fragments and remove the lost+found folder. Dealing with this folder does not have to be done immediately. This is similar to running chkdsk or scandisk within Windows, and finding lost clusters, and dealing with files named File0000.chk or similar. You may find one user's story very helpful, plus his later tale of the problems of sifting through the recovered files.
If you get an error that says reiserfs_open: the reiserfs superblock cannot be found on /dev/sdX. Failed to open the filesystem. it usually indicates you attempted to run the file system check on the wrong device name. For almost all repairs, you would use /dev/md1, /dev/md2, /dev/md3, /dev/md4, etc. If operating on the cache drive (which is not protected by parity), you would use /dev/sdX1 (note the trailing "1" indicating the first partition on the cache drive).
If you want to test and repair a non-array drive, you would use the drive's partition symbol (e.g. sdc1, sdj1, sdx1, etc), not the array device symbol (e.g. md1, md13, etc). So the device name would be something like /dev/sdj1, /dev/sdx1, etc.


Tools

xfs_repair

Syntax
xfs_repair [ -dnv ] device
xfs_repair -V

Source: derived from Linux man page (there are many more options, but probably too dangerous for our use)

Description
xfs_repair repairs corrupt or damaged XFS filesystems (see xfs(5)). The filesystem is specified using the device argument which should be the device name of the disk partition or volume containing the filesystem. If given the name of a block device, xfs_repair will attempt to find the raw device associated with the specified block device and will use the raw device instead.
Regardless, the filesystem to be repaired must be unmounted, otherwise, the resulting filesystem may be inconsistent or corrupt.
Options
-n
No modify mode. Specifies that xfs_repair should not modify the filesystem but should only scan the filesystem and indicate what repairs would have been made.
-v
Verbose output.
-V
Prints out the current version number and exits.
-d
Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only. This is typically done on a root fileystem from single user mode, immediately followed by a reboot.
Notes
Disk Errors: xfs_repair aborts on most disk I/O errors. Therefore, if you are trying to repair a filesystem that was damaged due to a disk drive failure, steps should be taken to ensure that all blocks in the filesystem are readable and writeable before attempting to use xfs_repair to repair the filesystem. A possible method is using dd(8) to copy the data onto a good disk.
lost+found: The directory lost+found does not have to already exist in the filesystem being repaired. If the directory does not exist, it is automatically created if required. If it already exists, it will be checked for consistency and if valid will be used for additional orphaned files. Invalid lost+found directories are removed and recreated. Existing files in a valid lost+found are not removed or renamed.
Corrupted Superblocks: XFS has both primary and secondary superblocks. xfs_repair uses information in the primary superblock to automatically find and validate the primary superblock against the secondary superblocks before proceeding. Should the primary be too corrupted to be useful in locating the secondary superblocks, the program scans the filesystem until it finds and validates some secondary superblocks. At that point, it generates a primary superblock.
Examples
These are just examples, replace drive ID with the correct drive symbol, either an md number (md2, md15, etc) or an sd symbol (sdc1, sdj1, etc).
xfs_repair -v /dev/md3 -> tests and reports, making changes when necessary
xfs_repair -nv /dev/sdc1 -> tests and reports without making changes
xfs_repair -V -> displays version and exits
xfs_repair -> displays options and exits
Note: As far as I can tell, these are the ONLY options we should be using.


unknown btrfs command

--- work in progress ---


reiserfsck

For a full description, please see the About.com page for reiserfsck.
Description
The reiserfsck tool checks for a Reiser file system (must be a partition like /dev/md3 or /dev/sdc1, not a drive like /dev/sdc), replays any transactions, and checks or repairs it.
Syntax examples
Note: in the following examples, the option is preceded by 2 hyphens. Drive 3 (/dev/md3) is just used as an example. If you were testing Disk 13, you would use /dev/md13.
reiserfsck --check /dev/md3 -> checks file system for errors
reiserfsck --fix-fixable /dev/md3 -> fixes file system errors
reiserfsck --rebuild-tree /dev/md3 -> rebuilds the file system (may have lost files)
reiserfsck --rebuild-tree -S /dev/md3 -> rebuilds the file system from entire partition (may have lost files, may recover old deleted files or their pieces)
reiserfsck --rebuild-sb /dev/md3 -> rebuilds superblock based on series of questions, answers MUST be accurate!
--- work in progress ---