Kernel crashes w potential reiserfs corruption: 6.0-beta12-x86_64


Recommended Posts

I've had two hard crashes since running beta-12. After the first crash I setup putty to generate a log. Unfortunately I had another crash today but at least I have some helpful info this time. The last few hours of the log data is in the attached file. If more data is needed just let me know.

 

I haven't looked at the logs at all, but I wonder if you're suffering from some RFS (ReiserFS) corruption issues that had plagued at least 2 other folks. They were having seemingly random crashes or reboots and their issues went away completely once they migrated to XFS.

Link to comment

I've had two hard crashes since running beta-12. After the first crash I setup putty to generate a log. Unfortunately I had another crash today but at least I have some helpful info this time. The last few hours of the log data is in the attached file. If more data is needed just let me know.

 

I haven't looked at the logs at all, but I wonder if you're suffering from some RFS (ReiserFS) corruption issues that had plagued at least 2 other folks. They were having seemingly random crashes or reboots and their issues went away completely once they migrated to XFS.

 

There were 3 people that I remember, this would be number 4 if it's reiserfs corruption.

I think it could be down to the metadata level. If I remember correctly One user was doing the reiserfsck multiple times and was still getting corruption.

 

So my question would be,  @doublewhatever, were the beta's used that were known to have potential corruption ever used?

In prior investigations, one member said no, another said yes. I cannot remember the third situation all that well.

 

In looking at these logs I see some reiserfs calls, and the messages about stalled CPU.

Link to comment

I've had two hard crashes since running beta-12. After the first crash I setup putty to generate a log. Unfortunately I had another crash today but at least I have some helpful info this time. The last few hours of the log data is in the attached file. If more data is needed just let me know.

 

This log wasn't much help, as it is only a small piece of the syslog, actually a small piece that repeats over and over, a CPU stall with Call Trace that repeats once every 3 minutes.  It does look exactly like the previous users with CPU stalls, and the CPU involved is executing Reiser code every time.  I don't think we were ever able to conclude that it was directly related to the Reiser file corruption issue, but that it IS clearly doing something that's Reiser related.

 

At least one of the other users found that reiserfsck would find issues, but this problem could occur again right after reiserfsck had declared the file system clean.  I believe that one or more users 'solved' the problem by converting the disks to XFS.  In your case, the log has little info, does not indicate any particular drive is involved.  I would start the system in Maintenance mode and check every one of the data drives (see Check Disk File systems).

Link to comment

So my question would be,  @doublewhatever, were the beta's used that were known to have potential corruption ever used?

Yep. I was running beta 7 but just for a few days before I upgraded to beta 9. I never came across any corrupted files though. Unfortunately with beta 9 I experienced these same crashes. So I downgraded to beta 6 and ran that until beta 12 was available. I never had any crashes while running beta 6. This is what confuses me. If it was corruption causing the issue, wouldn't I have seen the same crashes with beta 6? Basically that beta is butter smooth for me but anything after that seems to cause these crashes.

 

This log wasn't much help, as it is only a small piece of the syslog, actually a small piece that repeats over and over, a CPU stall with Call Trace that repeats once every 3 minutes.

It actually repeats like that in the log. I've attached a larger portion of the log. Basically I was just moving a crap ton of files around before the crash. Getting all of the frequently accessed stuff into a single share so I can just have the directories cached on the one share.

 

The server is a AVS-10/4 loaded with 4TB drives and is 71% full across all drives. What's the best way to convert to XFS with a system this full? Can I pull and replace a drive at a time and let unraid rebuild them as XFS?

log.zip

Link to comment

FYI...I have had crashes since migrating to XFS...same CPU stall messages.  But I have not seen the corrupt file system messages I was seeing before which I think led to my other crashes.

 

I have since disabled 2 dockers (nzbget and nzbdrone) and one plugin (SNAP) and have not had a crash in 3 days.  If I can go a solid week without a crash, I think my culprit was either a docker container or the SNAP plugin.

 

John

Link to comment
Can I pull and replace a drive at a time and let unraid rebuild them as XFS?
No. Unraid can only rebuild an entire drive as it currently is, it can't convert filesystems on the fly. One way to accomplish a conversion like what you describe is to empty a drive by moving the contents onto other drives, change the filesystem type to XFS, and let unraid reformat the drive. Then you can use the new empty XFS drive to receive the contents of the next drive you want to convert, lather rinse repeat until you are all done.

 

There is no way I know of to convert a reiserfs drive to xfs directly, and even if there was, I'm not sure I would trust it if the reiserfs drive has suspected errors.

 

Bottom line, you have to be able to come up with a totally blank drive with enough free space. Each copy operation should be done with a utility that supports checksum verification before the source files are removed.

Link to comment

Bottom line, you have to be able to come up with a totally blank drive with enough free space. Each copy operation should be done with a utility that supports checksum verification before the source files are removed.

 

If you dare do so, you can use your parity drive as the intermediary drive if you do not have a spare drive.  Be warned:  your data will be unprotected during the migration process.  Most here would advise against this.

 

John

Link to comment

Just had another crash. It definitely seems to be triggered by the mover somehow. The mover finished and within a few seconds the cpu stall errors started flying.

 

I guess my biggest curiosity at the moment is why these crashes never happened on beta 6 and earlier betas. I wasn't on beta 7/8 long enough to know if they triggered the crashes but they were definitely a problem with beta 9. Does anyone have any ideas on what changes might have been made that could cause this problem?

Link to comment

Have we confirmed that data has or hasn't been corrupted here yet?  Crashing and corruption can be related, but I want to get confirmation that corruption has truly occurred on data first.  Have you attempted to open a file and witnessed corruption first hand?  What about CRC checks?  Reiserfsck?

Link to comment

I haven't come across any corrupted files. In my case, the crashes always seem to occur just after the mover has finished. Just happened again tonight. No log this time unfortunately as I forgot to have putty write the session to a file.

 

Ok, what about resierfsck?  Did you run this to check for issues?

Link to comment
  • 3 weeks later...

Couple issues in this thread which I'm going to move to general support.

 

1)  The OP doesn't provide enough information to go off of nor followed the defect report post guideline to give us something to test.

 

2)  No corruption (as the subject indicates) has been proven to have occurred at this point.

 

3)  All replies to this thread by others that say, "they've had a similar issue" are not the same issue (Smallwood's issue is not related here IMHO).

 

4)  No feedback to requests for reiserfsck has been provided, so we have nothing to go off of.

 

As I mentioned above, I'm moving this to general support.  When we can see some log that has actual corruption reported or a reiserfsck report, we can help further, but this is definitely not a bug with beta 12.

Link to comment

Sorry. I actually did run reiserfsck but had a crash almost immediately. I just said screw it and have since converted to XFS. No issues since. I really wouldn't be so certain that there's not a bug though. My setup ran 100% fine on Beta 6 and earlier but crashed once a week with ReiserFS errors on anything above Beta 6. So instead of just writing it off, you may want to call it a rare bug with the solution being to convert to XFS.

Link to comment

XFS seems to be the better way to go. I had a filesystem corruption one to many times with ReiserFS and 0 with xfs. I am so glad they opened unraid to multiple filesystems.

 

Sorry. I actually did run reiserfsck but had a crash almost immediately. I just said screw it and have since converted to XFS. No issues since. I really wouldn't be so certain that there's not a bug though. My setup ran 100% fine on Beta 6 and earlier but crashed once a week with ReiserFS errors on anything above Beta 6. So instead of just writing it off, you may want to call it a rare bug with the solution being to convert to XFS.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.