[SOLVED] Drives Drops Randomly


Recommended Posts

I don't know if anyone can help, but here is what is going on:

 

A little while ago, I noticed that a drive would get write errors. When I stopped the array, I would notice that the drive was no longer available. It just disappeared! At first I thought it was the drive, so I replaced the drive, but then it happened again. I have replaced the controller card and the cable. I have run a chkdsk on the USB drive and it still keeps dropping off! Can anyone see where the problem lies? Here is the most recent log.

 

I have noticed that disks 13 and then disk 17 are the ones that drop off.

 

Any help would be MUCH appreciated!

syslog-2016-10-19_cropped.txt

Link to comment
Would a 850W PSU be sufficient for 20 drives?
No. Yes. Maybe. Depends.

 

Which specific model, how many amps are available to the hard drive, single or split rails, what other stuff (mb,cpu,ram,hba,etc) is pulling power at the same time?

 

All those questions have to be answered before we can hazard a guess.

Link to comment

First of all, thank you for who have gotten back to me with their ideas and comments!

 

Yesterday, I went to my local store and spoke to a rep there (who happens to use unRAID) and he did the calculations for my hardware and it came up at about 466W. The PSU is over 5 years old, so it seemed like a good idea to replace it any way. Also keeping the load at %50 of the PSU's capacity is a good idea, so I went with a EVGA Supernova 850G2. I replaced the PSU (fortunately I had some extra molex splitters since there weren't enough in the box) and fired the box up. I decided to remap all of the drives and almost immediately drive 13 dropped off! The logs claimed corruption, so it was dropped off. So I remapped the drives WITHOUT drive 13. Now this morning drive 20 has 2258864 errors and I bet that if I stop the rebuild, I will find that drive 20 has dropped off. I happen to know that there is no data on that drive, but that's besides the point. Sadly, I cannot share anything more at the moment since there are no logs AT ALL! All I see on the logs page is "/usr/bin/tail -f /var/log/syslog" and the 'close' button.

 

Now my concern is not seeing a log. Could it really be the USB stick?

 

I'll make a note here that I do want to upgrade to unRAID 6, but have been waiting to get over the hurdle of 'drives dropping off first', but is that necessary? Maybe unRAID will reveal more? I don't know.

Link to comment

Something else I forgot to mention is that when I first fire the server up, one of the critical folders had no data in it. When I noticed this, I looked at the logs, and discovered errors on drive 13. When I removed drive 13, the folder now displayed the files. Whew!

 

Just a thought, I have not replaced the RAM yet.

Link to comment

So here is the log after a reboot:

 

Oct 21 12:34:22 VVData emhttp: shcmd (68): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md20 /mnt/disk20 |$stuff$ logger (Drive related)

Oct 21 12:34:22 VVData kernel: REISERFS (device md20): found reiserfs format "3.6" with standard journal (Routine)

Oct 21 12:34:22 VVData kernel: REISERFS (device md20): using ordered data mode (Routine)

Oct 21 12:34:22 VVData kernel: reiserfs: using flush barriers (Drive related)

Oct 21 12:34:22 VVData kernel: REISERFS (device md20): journal params: device md20, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 (Routine)

Oct 21 12:34:22 VVData kernel: REISERFS (device md20): checking transaction log (md20) (Routine)

Oct 21 12:34:23 VVData logger: mount: /dev/md20: can't read superblock (Drive related)

Oct 21 12:34:23 VVData emhttp: _shcmd: shcmd (68): exit status: 32 (Drive related)

Oct 21 12:34:23 VVData emhttp: disk20 mount error: 32 (Errors)

Oct 21 12:34:23 VVData emhttp: shcmd (69): rmdir /mnt/disk20 (Drive related)

Oct 21 12:34:23 VVData kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=2, nr_items=4, free_space=3840 rdkey  (Minor Issues)

Oct 21 12:34:23 VVData kernel: REISERFS error (device md20): vs-5150 search_by_key: invalid format found in block 32770. Fsck? (Errors)

Oct 21 12:34:23 VVData kernel: REISERFS (device md20): Remounting filesystem read-only (Drive related)

Oct 21 12:34:23 VVData kernel: REISERFS error (device md20): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] (Errors)

Oct 21 12:34:23 VVData kernel: REISERFS (device md20): Using r5 hash to sort names (Routine)

Oct 21 12:34:24 VVData emhttp: shcmd (70): mkdir /mnt/user (Drive related)

Oct 21 12:34:24 VVData emhttp: shcmd (71): /usr/local/sbin/shfs /mnt/user -disks 16777214 -o noatime,big_writes,allow_other -o remember=0  |$stuff$ logger (Drive related)

Oct 21 12:34:24 VVData emhttp: shcmd (72): crontab -c /etc/cron.d -d $stuff$> /dev/null (Drive related)

Oct 21 12:34:24 VVData emhttp: shcmd (73): /usr/local/sbin/emhttp_event disks_mounted (Drive related)

Oct 21 12:34:24 VVData emhttp_event: disks_mounted (Drive related)

Oct 21 12:34:25 VVData emhttp: shcmd (74): :>/etc/samba/smb-shares.conf (Drive related)

Oct 21 12:34:25 VVData avahi-daemon[6737]: Files changed, reloading. (Drive related)

Oct 21 12:34:26 VVData emhttp: get_config_idx: fopen /boot/config/shares/lost+found.cfg: No such file or directory - assigning defaults (Drive related)

Oct 21 12:34:26 VVData emhttp: Restart SMB... (Drive related)

Oct 21 12:34:26 VVData emhttp: shcmd (75): killall -HUP smbd (Minor Issues)

Oct 21 12:34:26 VVData emhttp: shcmd (76): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service (Drive related)

Oct 21 12:34:26 VVData avahi-daemon[6737]: Files changed, reloading. (Drive related)

Oct 21 12:34:26 VVData avahi-daemon[6737]: Service group file /services/smb.service changed, reloading. (Drive related)

Oct 21 12:34:26 VVData emhttp: shcmd (77): ps axc | grep -q rpc.mountd (Drive related)

Oct 21 12:34:26 VVData emhttp: _shcmd: shcmd (77): exit status: 1 (Drive related)

Oct 21 12:34:26 VVData emhttp: shcmd (78): /usr/local/sbin/emhttp_event svcs_restarted (Drive related)

Oct 21 12:34:26 VVData emhttp_event: svcs_restarted (Drive related)

Oct 21 12:34:26 VVData emhttp: shcmd (79): /usr/local/sbin/emhttp_event started (Drive related)

Oct 21 12:34:26 VVData emhttp_event: started (Drive related)

Oct 21 12:34:27 VVData avahi-daemon[6737]: Service "VVData" (/services/smb.service) successfully established. (Drive related)

Oct 21 12:40:26 VVData emhttp: shcmd (80): set -o pipefail ; mkreiserfs -q /dev/md20 |$stuff$ logger (Drive related)

Oct 21 12:40:26 VVData logger: mkreiserfs 3.6.24 (Drive related)

Oct 21 12:40:26 VVData logger:  (Drive related)

Oct 21 12:42:26 VVData emhttp: shcmd (81): mkdir /mnt/disk20 (Routine)

Oct 21 12:42:26 VVData emhttp: shcmd (82): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md20 /mnt/disk20 |$stuff$ logger (Drive related)

Oct 21 12:42:26 VVData kernel: REISERFS (device md20): found reiserfs format "3.6" with standard journal (Routine)

Oct 21 12:42:26 VVData kernel: REISERFS (device md20): using ordered data mode (Routine)

Oct 21 12:42:26 VVData kernel: reiserfs: using flush barriers (Drive related)

Oct 21 12:42:27 VVData kernel: REISERFS (device md20): journal params: device md20, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 (Routine)

Oct 21 12:42:27 VVData kernel: REISERFS (device md20): checking transaction log (md20) (Routine)

Oct 21 12:42:33 VVData kernel: REISERFS (device md20): Using r5 hash to sort names (Routine)

Oct 21 12:42:33 VVData kernel: REISERFS (device md20): Created .reiserfs_priv - reserved for xattr storage. (Drive related)

Oct 21 12:42:33 VVData emhttp: resized: /mnt/disk20 (Drive related)

Oct 21 12:42:34 VVData emhttp: shcmd (83): :>/etc/samba/smb-shares.conf (Drive related)

Oct 21 12:42:34 VVData avahi-daemon[6737]: Files changed, reloading. (Drive related)

Oct 21 12:42:34 VVData emhttp: shcmd (84): chmod 777 '/mnt/disk20' (Drive related)

Oct 21 12:42:34 VVData emhttp: shcmd (85): chown 'nobody':'users' '/mnt/disk20' (Drive related)

Oct 21 12:42:34 VVData emhttp: get_config_idx: fopen /boot/config/shares/lost+found.cfg: No such file or directory - assigning defaults (Drive related)

Oct 21 12:42:34 VVData emhttp: Restart SMB... (Drive related)

Oct 21 12:42:34 VVData emhttp: shcmd (86): killall -HUP smbd (Minor Issues)

Oct 21 12:42:34 VVData emhttp: shcmd (87): cp /etc/avahi/services/smb.service- /etc/avahi/services/smb.service (Drive related)

Oct 21 12:42:34 VVData avahi-daemon[6737]: Files changed, reloading. (Drive related)

Oct 21 12:42:34 VVData avahi-daemon[6737]: Service group file /services/smb.service changed, reloading. (Drive related)

Oct 21 12:42:34 VVData emhttp: shcmd (88): ps axc | grep -q rpc.mountd (Drive related)

Oct 21 12:42:34 VVData emhttp: _shcmd: shcmd (88): exit status: 1 (Drive related)

Oct 21 12:42:34 VVData emhttp: shcmd (89): /usr/local/sbin/emhttp_event svcs_restarted (Drive related)

Oct 21 12:42:34 VVData emhttp_event: svcs_restarted (Drive related)

Oct 21 12:42:35 VVData avahi-daemon[6737]: Service "VVData" (/services/smb.service) successfully established. (Drive related)

Oct 21 12:47:10 VVData sSMTP[8607]: Creating SSL connection to host (Drive related)

Oct 21 12:47:10 VVData sSMTP[8607]: SSL connection using AES128-SHA (Drive related)

Oct 21 12:47:12 VVData sSMTP[8607]: Sent mail for root@localhost (221 2.0.0 closing connection a26sm1760895qtb.32 - gsmtp) uid=0 username=root outbytes=8479 (Drive related)

 

THe line "Oct 21 12:34:23 VVData emhttp: disk20 mount error: 32 (Errors)" puzzles me. Is it really a bad drive?

Link to comment

There's no clue above about the physical status of the drive so I assume it's fine, but Disk 20 has a badly corrupted file system, couldn't find the Reiser superblock on the drive.  See Check Disk File systems and run it on Disk 20.  You may have to rebuild the superblock, there's a section of that wiki page about that.

 

Thanks for that! I now see that there can be a corrupt file system and not physical hard drive corruption. I didn't quite grasp that before. As it turned out, when I remapped disk20, unRAID thought it should be formatted. Fortunately, disk20 had no significant data on it, so I formatted the drive and let the rebuild finish. As of this morning, all is running OK again and all folders are available and there are no errors.

 

I will now start the recovery process and add drives as necessary. I am think that now would be a good time to upgrade to unRAID 6.0

Link to comment

Thanks for that! I now see that there can be a corrupt file system and not physical hard drive corruption. I didn't quite grasp that before. As it turned out, when I remapped disk20, unRAID thought it should be formatted. Fortunately, disk20 had no significant data on it, so I formatted the drive and let the rebuild finish. As of this morning, all is running OK again and all folders are available and there are no errors.

Formatting a drive is certainly one method to fix a file system!  Not recommended if there is data to be saved though!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.