Pixar studio stories - The movie vanishes


limetech

Recommended Posts

The sad thing is that they DID have a backup and it failed. They needed a backup of the backup!

 

I tell my clients that if they haven't practiced and gotten a successful restore they don't have a backup.

 

I remember one client coming to me 5 years after a project asking if we had a backup of a solution (they had the data just not the programs) we had developed for them because their restore failed. (Not just A restore failed. They tried to restore from daily, weekly, monthly backups going back I don't know how far. Their backup solution had been tested on a tiny subset of data when it was new, but it either never really worked or became unreliable and nobody knew - while they kept using it to backup over and over and over again.) A few hundred K of source code and executables. This is a critical system for their business. My file copy on 2 floppy disks pulled from central files saved their bacon.

 

unRAID users should realize that unRAID's redundancy is not a backup. But that doesn't mean you have to backup everything. Often a few hundred megs of pictures are worth more than 20T of movies. Make sure you backup what you can't afford to loose.

 

And one more comment. unRAID is a tool. How users use that tool is important to how much protection it provides. Don't allow external access (to easily add and remove) to your drives? Don't test the heck out of a new build? Don't monitor your smart reports? Don't monitor your syslogs periodically looking for hard resetting links and other things that look odd? Don't really understand what parity is and how one parity disk can protect a 20+ drive array? Think a monthly parity checks and looking at the unRAID error column is enough? You are asking for trouble.

 

Do all of these things? Still backup the data you can't afford to loose. But rest easier knowing you'll probably never have to use it.

Link to comment

Don't monitor your smart reports? Don't monitor your syslogs periodically looking for hard resetting links and other things that look odd? Don't really understand what parity is and how one parity disk can protect a 20+ drive array? Think a monthly parity checks and looking at the unRAID error column is enough? You are asking for trouble.

 

Hi, I am one of those only looking at 0 errors on the monthly parity check.

Would you mind to elaborate more on what to look out for? What would be suspicious and how do you monitor the syslog?

 

Thanks :-)

Link to comment

Don't monitor your smart reports? Don't monitor your syslogs periodically looking for hard resetting links and other things that look odd? Don't really understand what parity is and how one parity disk can protect a 20+ drive array? Think a monthly parity checks and looking at the unRAID error column is enough? You are asking for trouble.

 

Hi, I am one of those only looking at 0 errors on the monthly parity check.

Would you mind to elaborate more on what to look out for? What would be suspicious and how do you monitor the syslog?

 

Thanks :-)

 

I created a tool call myMain a while back to try to put all the information you need to monitor your array in one easy to use tool. myMain is a part of unmenu, an excellent tool by Joe L. that provides many other features besides myMain. HERE IS THE LINK to the post on installing unmenu, and myMain is one of the options. I put it out there for free to help the community, and get nothing but an occasional thank you, so there is no profit to my promoting it. If you've been here for very long hopefully you know my goal is to help out around here with people having problems, and provide tools like this to make unRAID user easier and more secure. I am not affiliated with LimeTech except as a moderator in this forum, but am a big supporter and long time user.

 

To answer your questions I'll explain how to use myMain to monitor your array and system:

 

1. Monitor the syslogs. The syslog is a confusing thing to review BUT, once you look at it a few times you start to be able to recognize the normal stuff vs the unusual stuff. One of the easiest problems to spot, if they are happening, are "hard resetting links" littering the syslog. These are almost always associated with loose cables. They slow down the server a little but otherwise are hard to know they are happening.  A loose cable can cause a drive to drop out of the array and present a red ball on the unRAID GUI, and it is a bit of a pain to get it back in the array. And if you don't know what you're doing in the process you can loose data.

 

Below is a myMain screenshot with some instructions on its use, but below the screenshot I will tell you how to review the syslog in particular.

 

mymain1.jpg

 

Notice in the info column, on the blue row for totals, you see a link for "Sy". That will bring up the full syslog. (There are other ways but this is one). You'll find a colorful syslog viewer that is helpful to review the rows. Please note that the highlighting is a little imperfect. It is just looking for certain keywords to decide how to do the highlighting. But I think you'll find it useful anyway 95% of the time. There are filtering features you can explore.

 

But a not very well known feature are the "sy" links in the info column next to each drive. Clicking that link will bring up the syslog filtered for that drive only - normally only 25 lines or so. So by going down and clicking on that link on each drive you can monitor syslog entries relating to your drives. Most of the lines are routine and colored green (Drive Related), but if you see red or other colored lines ask about them. Someone can help you understand what's happening. And if you see "resetting links", it is a good time to check the cabling to that drive. The first time or two this will be very interesting and you'll have questions, but after a short time it becomes routine as you know what to expect and not expect.

 

2. Monitor your smart reports. Drives have a built in self-monitoring feature called SMART. Each of a few dozen parameters are tracked. The two most important ones to monitor are #5 (Reallocated_Sector_Ct), and #197 (Current_Pending_Sector). myMain gives two ways to monitor the smart report. One is the "sm" link in the info column of the drives (see screenshot above). It will bring up the full smart report for the drive. (You might get an error when trying to do this depending on the controller and drive, and there is an easy way to fix if this happens. Just post a question. )

 

But there is an easier way to monitor the smart reports than clicking on them one at a time. Click on the word smart just above the drive table towards the right of the screenshot above (one of the Select View options).

 

A screen like the following will be shown.

 

mymain3.jpg

 

You will notice a mix of spinning and sleeping (spundown) drives. Notice only WD drives are able to report Smart data while asleep, and myMain knows that. myMain will not spinup your disks unless you ask. The second refresh icon spins up the disks it needs (non-WD) to to show you all of the current issues. The column called "Add'l Issues / Failures" is particularly important to monitor. It will show you important attributes (I think they are important anyway) that have values >0 that you should know about. In this example, "high_fly_writes=1" tells me that the SMART system detected a high_fly_write. Research should tell you that vibration can cause these. I am not particularly concerned about 1 or even several dozen of these over months or years of service, but if they start to increase just after you added a new PSU or Fan to your server, you might want to check it out. Lots of different attributes might increase, including the reallocated and pending sector counts, and they will all show up in this column if they apply. (I recently did an update as I discovered that new attributes have been added to some drives. I sent to Joe L. and expect he has already - or will soon - deployed the update, so you might want to do an update). This is a good way to monitor your drives. Very quick.

 

This screen has another feature I use all of the time. You can tell myMain that you know about that 1 high_fly_write and you don't want to see it again unless it increases. By clicking on the error itself, you can go to another screen and tell it "1" is normal for high_fly_writes" on this drive and you won't see it again (unless you click one of the "RAW" refresh buttons). But if it goes to "2", you will see it again. Whenever I see an issue I always research it, and set the value to the new normal so that anything that pops up on this screen is a new issue. Most times this screen shows with no yellow (or red) highlighted issues. (Most SMART attributes never decrease in value.)

 

If you click on the "myMain Instructions" hyperlink in my sig you'll see more details about using myMain. I wrote this tool as a plugin to unmenu (with a lot of help from Joe L's work) several years ago to assist the community. There are other tools today and I admit I have not spent much time loading and reviewing them, but you might find some projects that provide some similar features and look much prettier. But from the screenshots I've seen, nothing provides the indepth data analysis features included with myMain.

 

3. What is parity. Follow the hyperlink in my sig.

 

Hope this helps!!!

 

 

Link to comment

Definitely agree with monitoring the info in MyMain.  I've used UnMenu for years ... primary for 3 things:  (1) UPS shutdown support (by far the #1 reason I use it -- NO system should be without a UPS, ESPECIALLY a NAS server !!);  (2) PowerDown -- this provides a more reliable ability to do a "clean" powerdown on your server;  and (3) My Main -- not an add-in package, but a very nice built-in extra feature that allows you to check the SMART status.    It's also a good idea to periodically go to Disk Management (within UnMenu) and check the more details SMART status for each individual drive.

 

Link to comment

By the way, just checking your drives so you know their status and hopefully notice any indications that they're on the path to failure (a good non-SMART indicator is when the drive starts running warmer than normal) ... is NOT a substitute for backing up.

 

But as the Pixar experiences shows, don't just backup -- CHECK those backups !!

 

Link to comment

Wow, I am blown away by your response! Thank you so much!!! I have read about mymain but I thought it was "just some outdated simplefeatures, dynamix like webui."

But man was I wrong :-)

 

That tool is amazing and I will definitely give it a go!

Thanks again for sharing your work and explaining it in such detail. This should absolutely be stickied and/or added to the wiki! :-)

 

Link to comment

I agree it's a very nice tool.

 

Quite frankly, I don't understand why all the fuss over Simple Features and Dynamix.  A lot of compatibility issues seem to be associated with interactions between these GUI replacements and other plugins.

 

Personally, I'm quite happy with the stock Web GUI plus UnMenu ... the latter simply for UPS support, Clean PowerDown, and MyMain's SMART view.    The only other thing I ever use in UnMenu is the detailed SMART data on the Disk Management page ... which I look at anytime MyMain shows any anomalies in SMART.

 

 

Link to comment

"Hope this helps!!!"

What, are you kidding? That's outstanding. I think this ought to be a sticky or part of mandatory tutorial.

Thanks

Mike

Thanks Mike. I at least added it to my sig (see "and [here]" after myMain instructions).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.