File List Generation


Recommended Posts

I split the topic because I have a PM conversation with someone else about this.

Seems people are interested in it so we can discuss it in length.

I'm going to include my other discussion so everyone gets a hint of it all.

 

 

What command did you use to pull this summary count of files by disk?

===========================

here's my break down.

  146485 disk1.filelist

  4761906 disk2.filelist

    7141 disk3.filelist

  2704215 disk4.filelist

  169797 disk5.filelist

        0 disk6.filelist

      610 disk7.filelist

    2647 disk8.filelist

    1601 disk9.filelist

    4979 disk10.filelist

    4270 disk11.filelist

    3013 disk12.filelist

    2377 disk13.filelist

    2530 disk14.filelist

    63326 disk15.filelist

  7874897 total

===================

Full discussion here:

http://lime-technology.com/forum/index.php?topic=22906.15

 

 

 

 

I have a cron job that gets the day of the month then does

find /mnt/disk{$DD} -type f > /mnt/cache/.flocate/filelist.disk${DD}

 

To simplify it, here's the basics.

DD=`date "+%e"|sed -e 's# ##g'`
[ ! -e /mnt/disk${DD} ] && exit 3
find /mnt/disk{$DD} -type f > /mnt/cache/.flocate/filelist.disk${DD}

 

 

I create a 'list' of files by drive.

The 'list' is created once a month for each drive.

 

drive 1 1st day of month

drive 2 2nd day of month

etc, etc,

until 16th day of month.

 

This way in a pinch, if you do loose a drive due to some unforseen circumstance, you can have a list read of what you had.

Or if there is corruption, you can see what is missing.

 

Eventually I will make md5sums of the list so I can check for issues such as bit rot and to double check unraid's integrity.

The script segment can be  re-written to list all files on all drives.

Then you do a wc -l on each file.

 

This is super cool.  I am all about getting more integrity to my systems.  There are 4 unraid systems here.  1 is always live, and there are 2 offsite backups, (essentially clones of the live box that rotate in and out and get updated by rsync). 

 

Do you have the complete scripts to share?  I would see great benefit to implementing this to my system too.

 

We need to package all these addons to unraid that smart people have put together and put them into a repository or knowledge.

 

 

 

 

 

I have to refine the script still. It's a work in progress, then I'll make an addon package and post it to my google code page.

 

 

Discuss!

 

 

Link to comment

It certainly makes sense to split off and come over here, and not pollute the pre-clear thread.

 

Though I'm capable of writing my own scripts to do all of this, my scripting is somewhat rusty at this point, and if you're willing to share your scripts as you've indicated, I'm all for that!

 

I'll check back in later when I can be more coherent.

Link to comment

Thanks for sharing!

 

This may be a bit of a silly question..

 

Do each of the files created from running the script create a list of folder names/file names stored on that particular disk, or is it a count of the number of files stored on each disk?

 

I can see the value of both, just couldn't quite work out if it did one, the other, or both!  :-[

Link to comment

Thanks for sharing!

 

This may be a bit of a silly question..

 

Do each of the files created from running the script create a list of folder names/file names stored on that particular disk, or is it a count of the number of files stored on each disk?

 

I can see the value of both, just couldn't quite work out if it did one, the other, or both!  :-[

 

It is only a list of files. (full path)  Not even a count, the count is done by issuing a word count (wc -l) on the file.

 

Using the filelist, you can derive the directory by splitting it with basename/dirname

Empty directories would not show up.

 

Link to comment

I use neofinder (mac) and AbeMeda (PC), the demo versions of these can do up to 10 catalogs, which is more than the number of shares I have. Can also schedule updates and is very fast at indexing.

 

Most tempting if I had to spurge out (well over kill for me) is spaceobserver by Jam-soft (the maker of treesize). It can tracks the growth and change of your data.

Link to comment

I use neofinder (mac) and AbeMeda (PC), the demo versions of these can do up to 10 catalogs, which is more than the number of shares I have. Can also schedule updates and is very fast at indexing.

 

Most tempting if I had to spurge out (well over kill for me) is spaceobserver by Jam-soft (the maker of treesize). It can tracks the growth and change of your data.

 

Do any of these tools do md5sums on the files?

Link to comment

I use neofinder (mac) and AbeMeda (PC), the demo versions of these can do up to 10 catalogs, which is more than the number of shares I have. Can also schedule updates and is very fast at indexing.

 

Most tempting if I had to spurge out (well over kill for me) is spaceobserver by Jam-soft (the maker of treesize). It can tracks the growth and change of your data.

 

Do any of these tools do md5sums on the files?

 

TreeSize does either MD5 or SHA256 for finding duplicate files only.

Link to comment

Most tempting if I had to spurge out (well over kill for me) is spaceobserver by Jam-soft (the maker of treesize). It can tracks the growth and change of your data.

 

Neat for windows shops.  I would want a automated linux process that fills the database.  A windows client for viewing the data would be fine.  And its ~ $300..... 

 

I'll be happy with anything weebotech puts together....

Link to comment

I use neofinder (mac) and AbeMeda (PC), the demo versions of these can do up to 10 catalogs, which is more than the number of shares I have. Can also schedule updates and is very fast at indexing.

 

Most tempting if I had to spurge out (well over kill for me) is spaceobserver by Jam-soft (the maker of treesize). It can tracks the growth and change of your data.

 

Do any of these tools do md5sums on the files?

neofinder and abemeda have a md5 checksum function. I've never used it as I use it to just search for files.

Link to comment

Most tempting if I had to spurge out (well over kill for me) is spaceobserver by Jam-soft (the maker of treesize). It can tracks the growth and change of your data.

 

Neat for windows shops.  I would want a automated linux process that fills the database.  A windows client for viewing the data would be fine.  And its ~ $300..... 

 

I'll be happy with anything weebotech puts together....

 

I was planning to do a text filelist.disk# version so you can just grep what's needed.

long term I would make a sqlite database, keep track of mtime, size and md5sum to detect changes.

Also determine if files are missing and/or recoverable from fsck.

 

I planned to add a locate function to scan the DB, then add a quick search function in the unRAID front end.

 

These text files were the first step.

my time has been very limited with two jobs and dj'ing on the side.

Perhaps I'll release that as package for now, then create new packages for the other functionality as I build it.

 

I really want to be able to export or build md5sum files for each disk as a separate text file so that disks can be checked if there are any problems or parity sync errors.

 

Link to comment
  • 1 month later...

I have a cron job that gets the day of the month then does

find /mnt/disk{$DD} -type f > /mnt/cache/.flocate/filelist.disk${DD}

 

To simplify it, here's the basics.

DD=`date "+%e"|sed -e 's# ##g'`
[ ! -e /mnt/disk${DD} ] && exit 3
find /mnt/disk{$DD} -type f > /mnt/cache/.flocate/filelist.disk${DD}

 

Hoping someone can point me in the right direction to get this running.

 

I've pasted the above code into a file name 'filelist.sh' and saved it into my /boot/custom/ folder. I've also created a folder on my cache drive for the filelists to be saved into. When I try and run this from the console it returns the following:

 

: numeric argument required: 3

 

I've worked out that the first line calls the date command, and splits the output so that only the day value is stored in the 'DD' variable, and the third line calls the find command and pipes the output to the location specified on the cache disk.

 

Trouble is, I've no clue what the second line is doing, and I have a feeling that's what is causing the error I'm seeing when I try to run the whole script.

 

Anyone able to help me out?

Link to comment

It is a "conditional test, followed by an "exit" (early termination) of the script with an exit status of 3.

 

The test is true if the directory /mnt/disk${DD} does not exist.  The equivalent "if" statement is

if [ ! -e /mnt/disk${DD} ]

then

  exit 3

fi

In other words, on the 30th of the month, /mnt/disk30 does not exist, so the script terminates.  On the first of the month /mnt/disk1 does exist, so the scripts does not exit and file list is created.

 

I suspect your error is that you have a trailing carriage return in your script. and "exit 3" is actually "exit 3\r"

 

Link to comment

That makes sense, since I don't have a disk18 in my system! Thanks for explaining that Joe .

 

This is my first time working with bash scripts, and Weebo's script was clearly more complex and serving a slightly different purpose than I realised.

 

I'm attempting modify the script so that it will enumerate all mounted array disks, run the find command on each, and pipe the output to a file named with:

 

a) the disk number the list corresponds to

b) the current date

 

I'm sure I can work out how to split the date output to my requirements, but I'm not sure how I'd go about enumerating the disks, and running the command on each array disk that has been found.

 

Could you offer any pointers on that?

 

Link to comment
  • 2 months later...

I've made some progress on this... in case this is of help to others, here is what I've come up with:

 

now=$(date +"%d_%m_%Y")
find /mnt/ -type d -maxdepth 1 | grep -v cache | grep -v user | grep disk | xargs -n 1 -I {}  find {} -type f > /mnt/cache/FileLocations/filelist_$(date +%d_%m_%Y).log

 

It's still something of a work in progress, as I still intend to try and split the output into multiple files - one for each disk.

 

I've put a copy of this script (filelocation.sh) in /boot/custom/, and have added a line to the go script to copy this into /sbin/ - is this the best location, or is there somewhere more appropriate to copy a custom script to?

 

Final question is, I'm sure I've seen in Linux scripts before a line or two at the beginning, possibly with a # or other special character - are these initial lines required? As I've not got any on this script I've cobbled together, and am not sure if it matters!

 

Any other constructive criticism is welcome - finding this quite a steep learning curve !

 

Thanks in advance :)

Link to comment
  • 1 month later...

I generate lists from Windows using Glenn Alcott's Directory Printer [http://www.galcott.com/dp.htm ].    It's not free -- but I've had it for years and it works great.

 

Before I bought that, I used Karenware's free Directory Printer, which also works quite nicely.

http://www.karenware.com/powertools/ptdirprn.asp

 

In either case, you should also install the free CutePDF (a utility EVERYONE should have -- it's very convenient for saving a variety of things you might otherwise print to paper and store in a folder).    http://www.cutepdf.com/    The free version is fine, and is completely "safe" => but be careful when installing it -- like a lot of things these days, it wants to install some other "stuff" that you most likely don't want .. and you have to uncheck the boxes or you'll get it installed.  It does need to download a rendered after the install -- that's fine.

 

So ... you simply use Directory Printer to select what disk or folder you want to "print" a directory of, set a few options [i.e. what level you want the files printed to -- e.g. if I'm printing my "DVDs" share, I want the names of all the movies;  but don't want the subfolders that contain all the .VOB's, .IFO's, etc.;  so I check the "Include subdirectories" box and set "Levels" to 2]; and then PRINT - and in the print dialogue box you choose "CutePDF Writer".    Then CutePDF will prompt you for the filename and you're done.

 

Link to comment

In either case, you should also install the free CutePDF (a utility EVERYONE should have -- it's very convenient for saving a variety of things you might otherwise print to paper and store in a folder).    http://www.cutepdf.com/    The free version is fine, and is completely "safe" => but be careful when installing it

PDFCreator is also free, and in my opinion a better choice. Also, instead of having to be careful not to check the wrong thing when installing stuff, just use Ninite. I use it all the time, it's awesome. Just select all the programs you want to keep up to date, d/l the ninite app for that list, click it and it installs or updates everything on the list. I never click update on the adobe junk anymore, just let ninite handle it.
Link to comment
  • 3 weeks later...

Once I rebuild my tower unRAID server, I'll try and recoup the tools I used to make the filelists.

It's been a slow recovery for me.

 

I would really like to get this moving too.  I have some backup servers, and I want to get a full file index of every file on my servers, and be able to track the changes to the servers.  Anybody have anything started on this yet? 

 

I'm guessing it best to create an SQLITE database?

 

Anybody have some sample code I can start playing with?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.