mfarlow

Members
  • Posts

    49
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

mfarlow's Achievements

Rookie

Rookie (2/14)

5

Reputation

1

Community Answers

  1. Well that seems to have sorted it out. Parity is rebuilt, no errors. Hopefully this new PSU will resolve my issue with random disks going bad. I appreciate all the help with this!
  2. Looking good so far. It's rebuilding parity and no errors yet. I will update this post once the parity sync completes. I do appreciate all your help. I wish I had the time to spend learning the intricacies of UnRaid.
  3. So I replaced the breakout cables to my data drives, and replaced the SATA cable on the parity drive. I also replaced all the SATA power cables. I should mention I replaced the PSU and SATA cables prior to this issue. I was having this issue prior to the PSU swap, and it was suggested that power might be the issue, so I upgraded the PSU to a higher wattage and tier-A model. I've attached the latest diagnostics. Thanks for any suggestions you can offer. tower-diagnostics-20240315-1512.zip
  4. I was swapping out my parity drive to a larger drive. During the parity check my first drive showed the following error: "Unmountable: No file system". I stopped the array, restarted the array in maintanence mode and ran a File System Check on the drive. I had to use the -L option to rebuild the log. which threw the following error: "fatal error -- failed to clear log: I still have the old parity drive that I can swap back in, and I also have a spare drive to replace the one that is throwing the error. my question is this, if I put the old parity drive back in and replace the failed drive, would it still rebuild from the old parity drive? My gut feeling is that it will start a new parity check and I will lose the data on that drive. Right now the array is in maintanence mode. I'm just looking for guidance on next steps if there is anyway to save the data. If not, I can restore from backup, but I am trying to avoid that just due to the time and PITA'ness of it all. Thank you in advance tower-diagnostics-20240314-2113.zip
  5. I was finally able to replace my power supply. I ordered a 650W EVGA that was on the A-Tier of the PSU list. First one took 2 weeks to arrive from Amazon. Then due to work I was unable to rewire my Unraid server for a while. Finally got the replacement PSU in, but one of the SATA ports on the PSU was bad which limited me to powering only 4 drives, so off to order another replacement. Turns out they sent me a returned PSU. The 2nd replacement arrived and I was able to quickly get it installed. I was able to start the array, perform a new config to get rid of the red X on my "bad" drive and run a full parity check which took a couple of days. The parity drive completed, but oddly I received an error message that there were x number of errors (it was alot). But after running a smart report I found no errors with any of the drives. So far the drives have not turned bad. I assume the error I saw very breifly had to do with the parity drive being in a error state during the parity check. I think it is too early to say for certain it was the power supply, but for now it is working fine. I plan on running another parity check in a week or so to see if the issue returns. For now, I am backing up the data just in case it happens again. I want a full backup before I run the parity check. I wanted to thank everyone who chimed in on this. I was getting pretty frustrated with UnRaid, and was considering switching to another storage solution. Glad I didn't have to switch.
  6. I currently have 2 silverstone splitters each running 4 drives (the type with the capacitors). I have actually replaced those already, just to be safe. I'm going to try rewiring so I only have 2 drives per splitter, maybe that will help. In the meantime I will try to aquire another power supply from and see if that helps. Do you think my current power supply might be underpowered? I appreciate alll the help!
  7. I am running a Segotep 650W Gold 80plus. https://www.amazon.com/gp/product/B0832F6NS8/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
  8. Okay, I finally surrender. I have an ongoing issue where every time I run a parity check one of the disks goes bad (red X). This has been going on for close to a year now. I run my parity check at the begining of the month each month, and each month the parity starts runs for a short time then the array goes bad and I am left with a bad disk. I should mention this started happening after I added my 10th data drive to the array. I should also mention that it is not the same disk going bad. One month it is disk2, the next month it might be disk8 or disk6. When I run a smart check on the disk, it comes back with no errors. I am currently running Unraid 6.12.6 Right now I am sitting with the array stopped so the issue doesn't progess any further. In the past I would perform a new config, reassign the drives and it will run for a short time before the disk goes bad and becomes unmountable. At that point I would run the built-in File System Check took, which would tell me that something is corrupted and it tries to rebuild it. (Sorry I forget what gets corrupted in the file system, probably one of the nodes.) At that point I would pull the drive replace it with a new one and everything will work again until next month. Initially I thought it might have been the drives, and just kept replacing them with new drives. But then it started happening with the new drives as well. So I turned my attention to the HBA. My parity and cache plug directly into the motherboard SATA ports. My HBA supports 8 data drives (the other 2 are on the mobo). I figured maybe the HBA was struggling so I replaced it. I was able to get 1 parity check out of it before the issue returned. Next I decided to run 2 HBA's and split the load across them, 4 drives each. Again the issue returned. At this point I thought maybe the temps in the case were too high (they were), So I added cooling fans on top of the HBA's which drastically reduced the temps. The issue still occurred. At this point I thought perhaps it was a power draw issue. I have 4 data drives connected to a Silverston CP06 -E4 power splitter. I have 4 drives on each splitter. All together I am running 3 splitters. I decided to add 2 more splitters for the data drives so there is only 2 drives per sliptter. I also added another SATA power cable to my power supply so that I have more power connections to spread around. Again none of these seemed to help. I am not worried too much about data loss as I have backups, but it is getting to be a PITA having to restore backups every single month. We're talking about 10-14 TB of data to restore every month. So at this point I am tired of banging my head against the wall and was hoping someone from this forum might have a suggestion or idea that I can try. tower-syslog-20240203-1851.zip tower-diagnostics-20240203-1350.zip
  9. Thanks You! I was able to restore the partition using testdisk. Now the array sees the drive as mountable. All the data was placed into lost+found, when I ran the xfs_repair, so now I need to figure out how to sort through all of that if possible, but it's still a win. I appreciate all the help with this!
  10. Makes sense root@Tower:~# fdisk -l /dev/sde Disk /dev/sde: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors Disk model: WDC WD101EMAZ-11 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Unfortunately for me, now my parity drive is showing the red X, so I don't think I will be rebuilding from parity. When it rains it pours. Thanks
  11. First pass with the -n switch (xfs_repair1.png) Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 8 - agno = 5 - agno = 4 - agno = 6 - agno = 7 - agno = 1 - agno = 9 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. And running without any switches to repair (xfs_repair2.png) Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 4 - agno = 8 - agno = 2 - agno = 5 - agno = 6 - agno = 7 - agno = 9 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... done Restarting the Array in normal mode (xfs_repair3.png) Disk 2 still shows as unmountable (xfs_repair4.png) Any idea what I should try next? Thanks for all the help!!
  12. I did try it with the partition number but I received the following error /dev/sde1: No such file or directory fatal error -- couldn't initialize XFS library /dev/sde1: No such file or directory fatal error -- couldn't initialize XFS library I'll re-run the GUI and post that command and output.
  13. This is the command I am currently running from the command line xfs_repair /dev/sde . Should I kill that command and re-run the GUI then post that command and output?
  14. Yeah, that was my last run, which I used the -n option to see if it found any other errors. I did one pass with the -n switch, then a second pass without no switches, that looked like it found a backup superblock, and took a little while to run. After that I stopped the array, restarted the arry in normal mode, and it was still unmountable. I then went back and ran xfs_repair from the GUI with the -n switch to see if there were other errors. That is the output I posted. I did stop the array and restart it in normal mode, but the disk was still unmountable. I even restarted the server and started the array in normal mode, but same unmountable state. I started running the xfs_repair on the command line and it found the bad superblock again. The command line is still running, just giving me a bunch of .................. it's been running for about 3 hours now, not sure if it is supposed to go that long, butI want to see if it finishes.
  15. After the latest Unraid upgrade ran, I decided to reboot the server. After the server came back up I noticed drive 2 was showing the Unmountable: Unsupported partition layout error. Not suggesting the upgrade was the cause, more likely it was the reboot. I attempted to perform an xfs_repair following the directions here. First pass came back with a repair status of Phase 1 - find and verify superblock... bad primary superblock - bad sector size !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. I followed that up by running the repair without the -n switch figuring that would repair the superblosk issue, which it did, however the drive is still unmountable. Running the repair again gives me the following Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 9 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 1 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 9 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 1 - agno = 2 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. At this point I am at a loss as how to proceed. Is my only option to reformat the drive? Thanks for any input on this! tower-diagnostics-20230907-1452.zip