Troubleshooting

From unRAID
Jump to: navigation, search


This page has been partly superseded in unRAID v6 by Need help? Read me first!, and the Diagnostics available from the Tools page. If you are running v6, please start there instead. However there are also useful sections below that have been updated for v6. Update: try the new Fix Common Problems plugin!

Trouble with your unRAID server?

If you are running v6, please see Need help? Read me first!.
Otherwise, this is the best place to start. Although the unRAID forums are a great place to find help, it is best to work through the tips and suggestions of this page first.
If you still need assistance, then there are a lot of helpful unRAID users on the unRAID forums. If appropriate, start by searching for posts similar in nature to your issue. Registration is not necessary to browse the forums, although we would like to hear from you. However, if you need to ask for help, you MUST register first.
If you have questions, please check the FAQ and the Best of the Forums first.
Here is a community statement about unRAID support, how it is handled, how long it may take, commercial vs. community support, etc.


How to get help

Note: this section was designed for v4 and v5 versions of unRAID, not for v6. For v6, please see Need help? Read me first!.
If after searching the forums (use advanced search) and this wiki, and you are not able to resolve your problem, here is an (almost) foolproof way of asking for help that will give the experts the info they need to help you quickly. If you don't provide enough detail, you risk making it far more difficult for the gurus, who may not be able to correctly answer your cry for help. This procedure will work on ALL versions of unRAID from 4.2.1 onward.
Please post only after you have at least tried to find help on the topic. Many problems are common, and a little research can quickly have you up and running again. Try the FAQ first, then the Best of the Forums page, then the general wiki pages. Going the forum route takes more time, but is needed in some situations.
Keep reading the rest of this wiki and some of these commands are explained in greater detail, and other options are provided. What is presented here is a "what but not why" set of steps for capturing and posting a log. This is a very basic process that can be used for reporting any problem from network outages to disks not being recognized to other oddities.
Sometimes the worst thing you can do is try something that could make the problem worse, so avoid the urge to try anything risky! Ask for advice first!
1. Go to the unRAID server
2. Log on (unless you changed it, user name is "root" with no password) (Press the enter key first if you do not see a login prompt)
3. Enter these commands
cp /var/log/syslog /boot/syslog.txt
chmod a-x /boot/syslog.txt
4. Shutdown the unRAID box by one of the following methods:
  • Stop the array from the unRAID Web Management page with the Stop button. Then use the Power Down button to shut down your server. (For information about installing a stop script, please see here.
  • If you have installed the powerdown package, then you can type powerdown at the console prompt, and a safe shutdown will proceed. (For information about installing powerdown, please see here and here and here.)
  • If you have installed a stop script on your flash drive, then you can use it in the next method.
  • The stop command is no longer included with unRAID. But if you are running an older version, earlier than around v4.3, you can try the following sequence of commands:
stop
sync
powerdown
Wait for each command to finish, or display an error, or turn the machine off. If the stop command results in an error, keep going. The sync command does not produce output. The powerdown command may not be available either.
  • If none of the methods above were available, or did not work (the machine is still running), then type the poweroff command. Unfortunately, if you have to use the poweroff command, the array may not be stopped correctly, and a parity check may begin on the next boot. Your array is safe though.
5. Wait for computer to power off
6. Remove the USB stick from your unRAID server and put into your desktop computer
7. Start a new thread in the unRAID forums (see Troubleshooting#Creating a forum post about your problem)
8. Enter a description of the problem, title it something useful like "HELP: Drive with all my data says its unformatted"
9. Attach the syslog.txt file from your flash drive to the post. It may be zipped if too large to attach otherwise.
10. Take a Valium and wait patiently for a response.


Capturing your syslog

The more information you can give others about the problem, the quicker you can get back to normal operation. There is probably nothing more important to helping others help you, than by capturing your syslog.
Every time you reboot, the syslog file is replaced. So if you have a failure, it is important to capture the syslog before you reboot! Otherwise, any chance of understanding what happened to cause your failure will be lost.
Here are various methods to capture your syslog: (depending on the unRAID version and the state of your system, not all methods will be available)
  • unRAID v6.0 and later
The best and easiest way is to download the Diagnostics zip file, which contains the syslog and SMART reports and a lot of other diagnostic information. Go to the Tools tab, click on the Diagnostics icon, then click on the Download button (Collect button in earlier v6 versions). It will download the zip file to your desktop station.
If you are sure that all you want is the syslog, then go to the Tools tab, click on the Syslog icon, then click on the Download button. As of v6.0-beta14b, it was on the very bottom so you will have to scroll to the bottom to find it.
After clicking the Download button, it will save syslog.zip to your computer, which is a zipped file containing syslog.txt, which is a copy of your syslog with DOS friendly line endings, not Linux line endings. The zip file is ready to attach to a forum post. (Since syslogs are highly compressible, we appreciate them being attached as zip files!)
  • All versions since unRAID v4.5-beta2
As of unRAID v4.5-beta2, you can access your syslog directly by browsing to http://tower/log/syslog (substitute 'tower' with your server name or static server IP). Depending on your browser, you may want to put that into your browser's location box, change the name to your unRAID server's name, go there, then save a bookmark or favorite for it. Now you can use your browser's options to view or Save Page or Save Page As or Download etc, to obtain a copy of your syslog on your computer. If you have the option, use the .txt extension.
Depending on your browser and how you dealt with the syslog, the file downloaded may have just the simple name syslog with no file extension, and it may also have Linux line endings, which *may* be a problem in some editors or text viewers. You should probably rename it with a .txt extension and add a label and/or date (eg. "syslog 2015-04-12 disk3 failed.txt").
  • All unRAID versions
If your network is down, this is the only method that works at the console. It saves the syslog to your flash drive.
To obtain a copy of your current syslog, at the unRAID console or in a terminal session with SSH or Telnet, type the command:
cp /var/log/syslog /boot
This will make a copy of the system log in the root directory of your flash drive, which you can either copy directly from the flash share of your server, or plug the flash drive into your PC and access the syslog there. Any file manager such as Windows Explorer can access the file across the network. For example, if your unRAID server name is Tower, then you can access your newly created syslog as \\Tower\flash\syslog. It is recommended to rename it with the date and time and the .txt extension, for example syslog-2007-08-28-1630.txt.
One warning, many of the files on the flash drive, including the syslog, may be flagged System and Hidden, so don't forget to adjust Windows Explorer to be able to see System and Hidden files. This command will remove the System and Hidden flags:
chmod a-x /boot/syslog
The above syslog copy command is the simplest form, but leaves it on the flash with just the name syslog. Here's another form, that gives it a fuller name, ready to copy to an unRAID logs folder, or attach to an email or forum post. (Reminder: since syslogs are highly compressible, we appreciate them being zipped and attached as zip files!)
cp /var/log/syslog /boot/syslog-2008-04-10.txt
chmod a-x /boot/syslog-2008-04-10.txt
  • Using UnMENU
An easier way to access the syslog, if you have installed the UnMENU Add On, and you have network access to your unRAID server, is to go to the UnMENU web page, click the Syslog plugin link, then click the syslog download link. This will save it directly to your computer.
See also Viewing the System Log
How often should you save a copy of your syslog? Right now, of course, for a baseline copy, and then as often as needed. You can always delete the old and extra ones later. A syslog contains most of the unRAID setup, especially of the drives, plus most or all of the issues reported for that session. But once you reboot, it is gone forever, unless the syslog was saved.
Having multiple syslogs saved allows you, or someone helping you, to use file comparison tools that help to quickly isolate what is different between 2 syslogs, especially if there is a baseline syslog and another syslog that covers a problem period. Total Commander has a built-in file comparison tool for quick analysis. WinMerge (with a suggested 'prediffer' of 30 columns) provides better analysis, with better handling of moved lines and added or missing sections. Quick isolation of just the changes is important when reading syslogs, because it's often more about what to ignore, than what to look for. 'Before and after' syslogs or 'baseline and problem' syslog pairs are ideal for syslog analysis.


If you cannot copy your syslog

If the instructions above were not successful, then there may be other problems. If you cannot mount your flash drive, or have lost access to it, then /boot may have disappeared, and the instructions above won't work. If so, try this variation, which copies the syslog to the first data disk:
cp /var/log/syslog /mnt/disk1/syslog.txt
Then after rebooting, if you have network access to your unRAID server, you can copy it from Disk 1.
If the reason you cannot access the syslog is because the system appears to have crashed, then it is probably too late. Do try to capture a syslog BEFORE the system becomes unresponsive. It may not cover the time of failure, but may have the info needed for troubleshooting. Even a syslog copied immediately after booting may be helpful, certainly better than nothing.
Also, you can try the following command at the console prompt. It will fill your unRAID screen with the tail end of the syslog, and *may* show you the error(s) as they happen.
tail -f --lines=100 /var/log/syslog
Using your digital camera, you can take a photo of your screen, avoiding as much glare as possible, and post the image. A screen pic is better than nothing at all.
If you are running headless (no monitor and possibly no keyboard or graphics card), then you can try directing the output of the command above to your flash drive or a data drive. For example, the following will output the last syslog lines to syslogtail.txt on your flash drive. This should allow you to obtain the very last message that the system was able to log.
tail -f --lines=100 /var/log/syslog >/boot/syslogtail.txt
Be aware though that there are some troubleshooting issues where you *must* hook up a monitor!


Creating a forum post about your problem

The first thing to do is to select the appropriate forum board for your issue. Here are the support forums for different kinds of issues -
After selecting the correct forum, enter it by clicking the link above.
  • If your issue is with a plugin or Docker container or VM template, then find and enter its support thread. We strongly recommend reading through the thread, searching for someone with the same issue already resolved. If the topic seems way too long to read all of it (some are!), then at least read the last 3 to 10 pages. If you don't find anything helpful, then click the Reply button.
  • If your issue is not with a plugin or Docker container or VM template, then click the New Topic button. Start with an appropriate subject heading, not too general, but something relatively unique and specific to your problem (not just "Help!").
Your soapbox is now ready. Indicate the exact unRAID version. Clearly indicate what the problem is, and include the exact wording of any error messages (if any). If it's not in your signature (consider adding it there), add some detail of your hardware setup, including motherboard, CPU, amount of RAM, your flash drive, addon cards - especially disk controllers, and the drives you have installed.
And of course, as mentioned in the section above, if diagnostics or a syslog would be useful to a troubleshooter, attach it now! Don't wait to be asked for it! The v6 diagnostics zip file is the most desired troubleshooting tool, as it includes the syslog, all SMART reports, and a lot of other helpful system info. Some problems don't need a syslog, but most do. It is our window into the internal workings of the Linux kernel and the unRAID driver, what it is seeing and what it is doing. A problem getting a USB flash drive to boot is one type of issue that usually does NOT involve a syslog. For USB boot problems, see below.
There is a limit on the size of attachments.
  • If your diagnostics zip file is too large, you may have to split it into 2 zip files, or find an external public place (e.g. Pastebin, your ftp site, etc) to store it. Provide a link to it.
  • If you are only providing the syslog and it's too large, perhaps because of many error log entries, you should zip it, and attach the zip file. The syslog, especially if there are repetitive error entries, will compress very small. Normal syslogs zip to around 15% of original size, and those with errors are typically much larger and contain lots of repetition, which compresses way down, usually to 7% to 9% of original size. Zipping them also ensures they are received intact.
  • Note: if the syslog is too large, attaching a zipped copy of it is ALWAYS preferred. DO NOT split your syslog!
  • DO NOT attach it as an .rtf or .doc or .pdf or as anything but the original text file you captured.
  • If this is important to you, it's OK to edit out any personal or private info from the syslog, so long as the syslog file remains a text file!


Boot problems

There is probably nothing more frustrating than getting all excited about unRAID, spending hours reading about it, then grabbing a spare flash drive, only to waste many more hours trying fruitlessly to get it to boot unRAID. There have possibly been more potential unRAID users lost for this reason, than any other. Some flash drives are harder to prepare than others, and certainly, some motherboard BIOS's are much more picky about booting from a USB drive. Thankfully, this kind of problem is becoming rare.
The instructions at USB Flash Drive Preparation are very complete, especially the troubleshooting tips. If still unsuccessful booting unRAID, then check the tips below (some are already covered in the previous instructions). If still unsuccessful, then it is time to post a question on the unRAID forums.
  • One of the most common issues is forgetting to set the Volume Label on the flash drive to UNRAID, exactly 6 capital letters.
  • Some BIOS's reshuffle boot order, especially when a new hard drive has been added. In your BIOS Setup Menus, try Harddisk-USB first. See here for more instructions and video guides.
  • Check USB Boot Issues


Network not working

If you have added a NIC, make sure that any onboard LAN is disabled in the BIOS Setup Menus, and don't plug the network cable into the onboard NIC!
Make sure the workgroup name is the same on each of your machines.
See also the Networking FAQ.


Name Resolution

If you are having trouble connecting to your unRAID server from other machines on the network using the hostname (i.e. tower) but using the server's IP address works, you are having a name resolution problem.
If you are having trouble accessing the web management page, make sure you are using //tower or http://tower and not \\tower.
Make the unRAID machine the local master network browser by logging into the web management page (use the IP address to access it) and set Local Master on the settings page to Yes. Reboot the unRAID server and the computer that cannot resolve the network name and try again.
If name resolution still does not work, the workaround is to use the hosts file to set its IP address manually. For Windows, the hosts file is located at
%WINDIR%\System32\drivers\etc\hosts
For Linux it is at
/etc/hosts
Open the file in a text editor and at the bottom add
192.168.x.y<tab>hostname
Replace 192.168.x.y with the IP address of your unRAID server and replace hostname with the server's hostname (i.e. tower). Don't actually type <tab>, just press the <Tab> key.


SAMBA service not started

Check your Samba settings (when changing settings, make sure to stop the array in advance)
Sometimes the file /etc/samba/private/secrets.tbd gets corrupted and the samba service won't start. Follow these instructions.


Hard drive failures

See also the Hard Drives FAQ.
unRAID can recover from a SINGLE disk failure. It is actually easy to miss a failure unless you notice degraded performance. (Note you will only have degraded performance reading from the one drive that has actually failed, not from reading other disks in the array.) You may not even notice the degraded performance, as it is very likely it will still be sufficient to serve media files fast enough over the LAN that you would not notice. The only way to tell for sure if unRAID has detected a drive failure is to look for a red ball next to one of your drives on the Main page of the Web interface. But it is easy for even unRAID to miss the fact that a drive has failed if it is not accessed for a while. This is another reason to run the monthly parity check - to make sure that unRAID "knows" that a drive has failed.


How to prevent drive failures?

The best way to avoid failures is some preventative maintenance.
  • Use only high quality SATA cables, preferably with locking connectors. The most common problem causing drive errors is NOT a problem with the drive, but a problem with its cables, either a bad SATA cable, or a loose SATA connector, or a loose or faulty power splitter in the power to the drive. Cables are cheap! Do NOT go cheap when you buy them! Bad SATA cables cause CRC errors, either the BadCRC or ICRC error flags in exception handler messages, and/or an increase in the SMART attribute UDMA_CRC_Error_Count. Loose SATA and power connectors make the drive appear to disconnect and reconnect, often causing the PHYRdyChg flag.
  • Make sure your cooling is adequate. High heat stresses all parts of your computer, including your hard drives. Although it is hard to give a precise temperature to start to be concerned about, temps below 40 are good, between 41-45 are getting warm, and temps above 45 should steer you towards adding active cooling on your hard disks. I (personally) would shutdown my server with hard drive temps over 50C. See also the UnRAID Topical Index, Fans topic.
  • Run a periodic parity check (every month). For v6, go to Settings -> Scheduler and enable the parity check and set its schedule. For older unRAID versions, we recommend running the Monthly parity check script from the UnRAID Add Ons wiki page. Although you may not realize it, hard drives have an internal error checking system, known as S.M.A.R.T., that monitors all drive operations, including the media surface condition. If a spot on the disk starts to go bad, the drive can "remap" the sector, and avoid reporting a bad sector error. It does this by taking the bad sector offline, and mapping a spare sector (from a reserved pool of spares) into its place, and moving the contents of the bad sector to the replacement one. It does this quietly, transparently to the system, so that no errors are reported, just logged within the SMART system of the drive. But if the drive doesn't read a sector for a very long time, that sector can go from good to marginal to bad without the drive noticing. Running a parity check (besides verifying that your parity is being properly maintained) will also cause each and every sector of every disk to be read, and give your drive's SMART monitoring a chance to take corrective action, and prevent a future error.
  • Do not use round IDE cables. Although sold as premium cables, they do not meet the technical specifications for high speed use. Just because they have worked fine for 5 years on your Windows box, does not mean you will have good luck with them with unRAID. (You have been warned!) Instead use the flat cables that come with most motherboards (use the 80 pin cables, not the 40 pin cables made for CD ROM use - see this picture. Use a cable that looks like the top cable, not the bottom one).


Cabling Problems

Hard drives can sometime appear to fail or be failing when actually the problem is a loose or bad cable. Although most common when installing new drives, the vibrations inside the computer can cause cables to become unsecured. The first step when you lose a drive, hear pops and clicks from your drive, or see resets or other errors in the syslog, is to identify the drive causing problems, and to unplug and replug (and if that doesn't work replace) the data cable. Also check the power cable while you're at it. If you are using backplanes, they become a part of the cabling from unRAID's perspective. Make sure that the backplane isn't causing the problem. (See the smartctl section below for more hints of cabling problems).


What if I get an error?

If your array has been running fine for days/weeks/months/years and suddenly you notice a non-zero value in the error column of the web interface, what does that mean? Should I be worried?
Occasionally unRAID will encounter a READ error (not a WRITE error) on a disk. When this happens, unRAID will read the corresponding sector contents of all the other disks + parity to compute the data it was unable to read from the source. It will then WRITE that data back to the source drive. Without going into the technical details, this allows the source drive to fix the bad sector so next time, a read of that sector will be fine. Although this will be reported as an "error", the error has actually been corrected already. This is one of the best and least understood features of unRAID!
There may be OTHER types of errors than this one, so it is certainly worth your while to capture a syslog after an error is detected, but this is likely what has happened. Also, if you notice this happening more than once in a very great while, you might want to consider testing and replacing the disk in question. Remapped sectors have been linked with higher than normal drive failure.
After getting an error, run a parity check soon after, to make sure that all is well.


What do I do if I get a red ball next to a hard disk?

What do I do if I get a red X next to a hard disk?

Note: until unRAID v6.0, the drive status indicator for a disabled drive was always a red ball. As of v6.0, it has been changed to a red X, to assist those with red/green color blindness. Be aware that the red X is still often referred to as a red ball, by many of the veteran users and much of the documentation.
[For v4 only] If you have moved your drives around (or sometimes even if you haven't), unRAID can get confused about what drive is assigned to what slot. It will NOT START the array, and some drives may have red balls next to them. You will also see italicized drive serial numbers. You need to go to the Devices page and re-assign the right drives to the right slots. (The italicized serial numbers on the main page will guide you to assign the right drive to the right slot.) You can then safely start the array.
If you see a red ball (or red X) next to a drive, and the array is started, that disk has been taken out of service because an attempt to write to it has failed. unRAID does not take a disk out of service casually, but if a disk experiences a write failure, it will do exactly that, it will take the disk out of service. A write failure is serious. A single write failure will take a disk out of service and unRAID will then show a red indicator next to it in the management interface.
Many things can cause such a failure that have nothing to do with the drive. Cables can (and do) go bad or wiggle loose. SATA cables in particular are notorious for slipping off their connectors, if they aren't the locking type. PSU's (power supplies) can do weird things and induce failures. Disk controllers can go bad. Motherboards can go bad. At a minimum, it is worth a little time to recheck all of the connections, to make sure something hasn't come loose. Whenever the computer case is opened, especially just before closing it up, cabling can shift and cause a connection to a drive to fail. When checking for loose connections, take care not to disturb connections to other drives, complicating your failure.
Gather information
  • Because there are many causes, more information is needed, to know which fix to apply. The two sources of drive error information are the syslog and the SMART report for the drive.
  • [For v6 only] Go to Tools -> Diagnostics and download the diagnostics zip file. It contains the syslog and SMART reports for all drives (plus much more system info). You can also obtain the syslog from Tools -> Syslog, and the SMART report by clicking on the drive on the Main screen. Important! Check whether the SMART report is available or not! Is the right SMART report in the Diagnostics zip?
  • [For v4 and v5 only] See the Capturing your syslog section above to obtain your syslog. See the Obtaining a SMART report section below to obtain the SMART report for the drive.
  • It's possible that you won't be able to obtain a SMART report for the drive yet. If the drive has not completely failed, but there was a port or controller failure or other reason, and the drive has been disabled by the kernel, then you will not be able to obtain the SMART report until after rebooting. Just remember that you MUST obtain the syslog or diagnostics now, BEFORE you reboot. If after rebooting, the SMART report still cannot be obtained, then that is a clue in itself (not a good sign!).
  • Try connecting the drive to a different port known to be working, such as a motherboard port, and try different cables, known to be good. If a SMART report still cannot be obtained, then the drive is dead, needs to be replaced, see Replacing a Data Drive.
  • If multiple drives are having trouble, determine if they are on the same disk controller. If they are, then the controller is defective. One good thing though, the drives are fine! It was the controllers fault they could not be accessed!
  • If the drive has truly failed, then you cannot obtain a SMART report for it, even after powering off and rebooting. If the cables to the drive are good, and the controller and port are good, but the drive won't spin, it's finished. You can attempt to RMA it.
Analyze the syslog and SMART report
  • Either do it yourself, or create a forum post about your problem in the General Support board for your unRAID version. Make sure you attach either the diagnostics zip or the syslog and SMART report for the drive, with a description of what happened, and which drive is at fault.
  • incomplete, work in progress, may move it to separate wiki page
What to do
  • Drive is fine - if you have determined that the issue was not the drive's fault, and you are sure from the SMART report that the drive is fine, then you will want to re-enable the drive.
  • Drive has completely failed - if analysis has determined that the drive has failed, to the point you don't consider it salvageable, then you will need to RMA it or trash it, and purchase a replacement. We always recommend Preclearing the replacement drive first, to make sure it's a good drive that won't fail for a few years at least. Then proceed to Replacing a Data Drive.
  • Drive had faults - if you have determined that the drive developed bad sectors or other faults that caused the red ball, but you aren't sure the drive itself is bad/failed, then you will probably want to take it offline for thorough testing. Since drive testing takes awhile, either your array will be off for however long it takes, or it will be running in a degraded unprotected state (dangerous!), or you will replace the drive with another, to give yourself plenty of time to test and decide. If you decide to replace it, buy the replacement drive (Preclear it) and go to Replacing a Data Drive. If you decide to leave the system off or run unprotected, then unassign the drive and test and clean it (by Preclearing it), then re-enable the drive.
Note: it's always good to have a prepared and tested replacement drive already on hand!


Obtaining a SMART report

Drives are self-monitoring through their SMART features.
It is important to understand how SMART reporting and testing work. Drives are little computers without monitors or keyboards, so we communicate with them using a tool called smartctl (whether at the command prompt or using the v6 webGUI). The smartctl tool is nothing more than a relay, a message passer between us and the drive. It does no drive testing or reporting. If we want a SMART report, we use smartctl to request one from the drive, and the drive then gathers the info, puts it into a report, and returns it to smartctl which relays it to us. If we want a drive test, we use smartctl to request one, and it passes the request to the drive (then quits), and in the background the drive performs the test on itself. Depending on which test was requested, it will take a short or long time. When we think the test should be complete, we use smartctl to request another SMART report, which contains a test report section near the bottom. It will indicate if the test is complete, whether it passed or failed, and at what point it failed.
[For v6 only] You are welcome to use the smartctl commands below at a command prompt. However, it's easiest to go to Tools -> Diagnostics and download the diagnostics zip file. It contains the syslog and SMART reports for all drives (plus much more system info). You can also obtain the SMART report or view the SMART report sections by clicking on the drive on the Main screen. And you can perform drive testing there too.
[For v4 only] The smartctl tool has been included with unRAID since v4.3 (click this LINK if using pre 4.3 final version of unRAID). Here are some instructions on using it from Tom. Also see unRAID Addons and UnRAID Topical Index, SMART for more Smartctl links. If when trying the smartctl commands below, you get an error about a missing library, then see this post for instructions for installing it.
At the unRAID console, or from a terminal session with SSH or Telnet, type:
 smartctl  -a  -d  ata  /dev/sda
or if you are using a newer SATA controller
smartctl -a -A /dev/sda
Note: If you get an error like "error while loading shared libraries: libstdc++.so.6", then you are using a very old version of unRAID (such as v4.4.2) that was missing a required library. Please see this post.
Look at the Main page (Devices in 4.7) for the device identifier (within the parentheses) for each disk, and substitute that for 'sda' on the command line.
This command will print out the SMART info for the drive. Refer to this article to better understand the SMART report.
To copy the results to a file called smart.txt on your USB stick that you can use to post to the forums, use this command:
 smartctl  -a  -d  ata  /dev/sda >/boot/smart.txt
or if you are using a newer SATA controller
smartctl -a -A /dev/sda >/boot/smart.txt
This form makes it easier to look at the smart.txt file from a Windows workstation.
 smartctl  -a  -d  ata  /dev/sda | todos >/boot/smart.txt
or if you are using a newer SATA controller
smartctl -a -A /dev/sda | todos >/boot/smart.txt


The smartctl output will provide a bunch of statistics that the drive captures about itself.
  • Perhaps the most important attribute to look at is the "Reallocated_Sector_Ct", the RAW_VALUE is a count of sectors that have been reallocated/remapped. If a sector goes bad, the drive has the ability to "remap" a spare sector to the bad sector. This is done at a low level, within the drive itself, so the OS doesn't even know it happened. (unRAID actually uses this feature to maintain the integrity of your array.) Each time this happens, the reallocated sector count is incremented. Seeing a few reallocated sectors is not necessarily a bad thing, but seeing that number start to go up is often a sign that the drive is failing. Anytime you see a value other than 0 you should closely monitor the drive. If the number holds steady and does not increase even after several parity checks, your drive is likely okay. But if it seems to be going up by even 1 or 2 at a time, start to be concerned. This is likely the first hint that the drive is failing. A special note - bad cabling CANNOT cause reallocated sectors to occur.
  • An equally important attribute is the "Current_Pending_Sector", the RAW_VALUE is a count of suspect sectors pending reallocation. It should ALWAYS be zero and must be zero if the drive is to be used to reconstruct another. If it's not zero, then you will probably (but not always) see the Reallocated Sector Count increase in the future, when this does return to zero. Before remapping a suspect sector, it tests it one last time, and *may* pass it and not remap it. (There are good reasons why it is designed to work this way.)
  • Another important stat to look at is the "Temperature_Celsius". It tracks the current and min/max temperatures of the drive. If your drives are running hot (see recommendations in the "Preventative Maintenance" section above), consider adding active cooling to your hard drives.
  • One user had the "UDMA_CRC_Error_Count" greater than zero. Research showed that this can be caused by bad cabling ("Possible causes of UDMA CRC errors are bad interface cables or cable routing problems through electrically noisy environments (e.g., cables are too close to the power supply cables, or too close to other SATA cables. SATA cables are not normally shielded, and tie-wrapping them into bundles to make it look neat can often cause crosstalk and induce noise from one cable to another. Cut the tie-wraps, get some space between the cables. ).")
  • Near the end of the smartctl report is a list of the last few errors the drive encountered. Errors that indicate that commands are not recognized is a sign of bad cabling, and not necessarily of a bad drive.
  • (Add more info about specific smartctl statistics)
Each of the smartctl attributes are provided in "raw" format (RAW_VALUE) as well as in "normalized" format (VALUE). The raw format is sometimes more human readable (like the temperature in Celsius or the reallocated sector count), but not always. They can also vary wildly from vendor to vendor. The normalized format shows the current value as a normalized value between 255 and 0 (higher is better). If the value falls below the "THRESH" value, it means that the drive is failing. The WORST normalized value is also shown.
If you want to understand SMART reports better, please see Understanding SMART Reports. Unfortunately, it is rather incomplete, especially in the attributes section.
When reviewing SMART attributes, see this helpful chart of Known S.M.A.R.T. attributes.


Running a SMART test

Smartctl provides drive tests that you can run. Smartctl does not actually conduct the tests, it just tells the drive to initiate a test on itself. You need to run the SMART report command, shown above, to get the results. If the test is still in progress, the SMART report will tell you that as well. In general, you are not supposed to request a report too soon. You can go by the rough estimates below, but it's best to use the recommended time given in the SMART DATA SECTION of the SMART report, just before the attributes. For the short test, use the Short self-test routine recommended polling time, and for the long test, use the Extended self-test routine recommended polling time. For the long test, it is strongly recommended to temporarily disable drive spin down, as a spin down will terminate the test prematurely.
[For v6 only] You are welcome to use the smartctl commands below at a command prompt. However, it's easiest to click on the drive on the Main screen, and you will then see the SMART report sections, and can perform any drive SMART testing right there.
This short test takes 1 to 3 minutes (remember to substitute your drive's identifier for 'sda' as described above)
smartctl -t short /dev/sda
This long test takes about 2 hours per terabyte
smartctl -t long /dev/sda
To see the results, or the progress of the test, use the SMART report command, shown above in the Obtaining a SMART report section.


Resolving a Pending Sector

Pending sectors occur as a result of a read failures. An unreadable sector will interfere with the reconstruction of a failed drive. Pending sectors need to be cleared as soon as possible because 2 drives with unreadable sectors will most likely be unrecoverable within unRAID. Data disks with a small number of pending sectors should be fairly easy to recover with utilities in Linux or Windows and If anyone knows of a Mac utility that can recover Reiserfs please update this entry.
The safest procedure is to replace the drive with a pre-cleared spare. The original drive can then be pre-cleared and the pending sector count should go to zero. The original drive can then be used as a spare. Multiple pre-clear cycles should not be required and the disk should be RMAed if 1 cycle doesn't work. If the drive cannot be returned then multiple cycles may restore the drive to a usable state.
If no spare is available then follow the next procedure to re-enable the drive. The pending sector count should be zero after rebuilding. If not then replace.


Re-enable the drive

Okay, you are sure the drive is good, but the cable was loose, or the port was bad, or the controller crashed, or you think the failure was a fluke ( note: it is NEVER a fluke )- how can I get unRAID to reuse this same disk that it has disabled?
If you are sure that the drive is fine, and the SMART report confirms it, then you will want to re-enable the drive, and return it to service. There are 2 ways to do it. Remember however that while the first way is quick, there may have been writes to the drive, including the write that failed and caused the red ball. Those writes went to the emulated drive, not the physical drive, so by far the best and safest option is the second way, to rebuild the drive onto itself, writing the up-to-date emulated drive to the physical drive.
  • If you are absolutely certain that you have not written to the drive since it was taken off-line, then you can use the Trust My Array procedure (for v4 and v5), to quickly recover your drive and the array to an all green condition. Remember, it was taken out of service when a "write" to it failed so using the "trust" procedure will effectively forget the data written while the drive was disabled.
  • Unless you are certain you have not written to the disk, a rebuild (reconstruction) is much better. The safest option is to reconstruct the drive. Only use the Trust procedure if a reconstruction is not possible.
You can re-enable the hard drive and reconstruct it as follows:
  • Stop the array.
  • Go to the Main page (Devices in version 4.7) and unassign the disk.
  • Go to the Main page (Array Operations section) and start the array.
  • Stop the array again.
  • Go to the Main page (Devices in version 4.7) and re-assign the disk.
  • Go to the Main page (Array Operations section) - the system should indicate there is a "new" drive to replace the disabled one. Check the confirmation box and click the Start button to start a reconstruct/rebuild of the disk.


Replace the drive

Please see Replacing a Data Drive


Remove the drive

Please see Shrink array


General hardware issues

Note: we are still waiting for editors (including you if you are reading this!) to provide good step by step analysis of hardware issues, like testing memory, checking cables, checking temps and airflow, disabling components, BIOS and firmware updates, etc.


More Links