diskmv -- A set of utilities to move files between disks


Recommended Posts

I've created a couple of bash scripts to facilitate moving files between disks.  As these utilities are touching user data, I'd like to have some experienced users review and/or test them.  If there is interest and no major problems, I will look into making them easier to use (first a plugin for command line usage, then maybe gui)

 

  • 'diskmv'

    will move a user share directory from one disk to another. It uses a find/rsync command similar to what is found in the standard mover script. It is suitable for merging a user share directory onto a disk that already contains that directory.  Files that have duplicate file names  on the destination disk will not moved. By default, 'diskmv' will run in test mode and display some information about how directories would be moved, but will not actually move files unless forced.
     

  • 'consld8'

    can consolidate a user share directory from multiple disks onto one disk. If a destination disk is not specified, it will pick the best disk based on max usage and available space.  By default, consld8 will run in test mode and display some information about how directories would be moved, but will not actually move files unless forced.

    'diskmv'

    is required to actually move files.
     

Example usage:

 

diskmv "/mnt/user/video/tv/Pushing Daisies" disk4 disk1
    
consld8 /mnt/user/video/tv/Wonderfalls disk3

 

To get get a help message:

 

diskmv -h

 

I am tagging releases for code that I trust and have used on my real data. Releases are here: https://github.com/trinapicot/unraid-diskmv/releases

 

The master branch has code that I have tested on sample data, but I have not necessarily used it on my real data.

 

These utilities move files around on an unRAID server. I have done my best to prevent any data loss, but there is always a chance something can go wrong. Use at your own risk and please ensure you have everything backed up.

 

Note for those wishing to use diskmv in the process of changing filesystems:

 

The diskmv script will not verify that copied files are the same as the source files.

 

With the default options, diskmv will rsync each file and if the rsync is successful, that file is then deleted before moving to the next file. There is no opportunity to compare the source and destination files. The script is relying on the OS, rsync and whatever else is involved to correctly read data from one disk and write it to another.

 

If you still want to use diskmv to move all the files and directories from one disk to another, you can use this syntax:

diskmv -f "" disk1 disk2

Or to keep source files (copy instead of move):

diskmv -f -k "" disk1 disk2

  • Like 1
Link to comment

I didn't go too deep with it yet, but my first comment is

 

test for empty arguments. In one script I see $1 and $2 referenced before they are tested with null

 

if [ -z "$1" -o -z "$2" ];

  then usage

          exit 1

fi

 

next,

 

"$1" should always be encapsulated in quotes to handle spaces.

you could have some disastrous results without it. Especially since you are running as root.

Link to comment

Congratulations these really caught my attention as being genuinely useful and powerful.

 

I have some ideas for consideration:

 

Since these tools are so unRAID specific check for the existence of "/etc/unraid-version" as a sanity check before doing anything.

 

It is not beyond the realms of possibility that a version of unRAID comes out with a missing dependency or a user simply breaks a dependency. To ensure this doesn't end up in unpredictable behavior check before execution i.e. rsync, shopt etc

 

Consider by default having the rsync "-n, --dry-run  " option set in diskmv so that a user essentially has to opt in to a write action (the same logic as you have put in place with -t in consld8).

 

Consider more script commenting. e.g. i doubt most users will know what this is without googling "-i -dIWRpEAXogt --numeric-ids --inplace" and in general it will help users get a better feel for how it works.

Link to comment

test for empty arguments. In one script I see $1 and $2 referenced before they are tested with null

 

Yep, I failed to consider a call like:

diskmv "" disk1 disk2

, but that could be valid usage if you want to move everything from one disk to another, so I think I'll leave it.

consld8 "" disk1

could also be valid, but I don't know if anyone would find that useful.

 

$2 is not tested with null, but the subsequent variable $SRCDISK is tested to be a valid disk, so a null value in $2 is caught. Is there some reason I need to test $2 with null at first reference?

 

"$1" should always be encapsulated in quotes to handle spaces.

you could have some disastrous results without it. Especially since you are running as root.

 

Word splitting is not performed in case statements and assignment statements so

case $1 in ...

and

SRCDISK=$1

are OK.  But I will add some more quoting.

 

Thanks for looking and commenting.  I welcome more comments if you look deeper.

Link to comment

Congratulations these really caught my attention as being genuinely useful and powerful.

 

I have some ideas for consideration:

 

Since these tools are so unRAID specific check for the existence of "/etc/unraid-version" as a sanity check before doing anything.

 

It is not beyond the realms of possibility that a version of unRAID comes out with a missing dependency or a user simply breaks a dependency. To ensure this doesn't end up in unpredictable behavior check before execution i.e. rsync, shopt etc

 

Consider by default having the rsync "-n, --dry-run  " option set in diskmv so that a user essentially has to opt in to a write action (the same logic as you have put in place with -t in consld8).

 

Consider more script commenting. e.g. i doubt most users will know what this is without googling "-i -dIWRpEAXogt --numeric-ids --inplace" and in general it will help users get a better feel for how it works.

 

Thanks. All your ideas are under consideration.  I especially like the dry run default for diskmv. I was planning to remove the test option as default in consld8, but I now think having them both be "opt in" for write would be better.

Link to comment

$2 is not tested with null, but the subsequent variable $SRCDISK is tested to be a valid disk, so a null value in $2 is caught. Is there some reason I need to test $2 with null at first reference?

 

I would check any 'required' parameters for null rather then falling through an assumed check later on.

 

consider the following snippet added to the top also.

[ ${DEBUG:=0} -gt 0 ] && set -x -v

 

Then consider

#!/bin/bash

 

[ ${DEBUG:=0} -gt 0 ] && set -x -v

 

if [ -d "$1" ]

then

  FULLNAME=$(readlink -e "$1")  # Handle relative path

  MERGEDIR=${FULLNAME#/mnt/*/}  # Remove any /mnt/*/ prefix

else

  MERGEDIR="$1"

fi

 

if [ ! -d "/mnt/user/$MERGEDIR" ]

then

  echo "$1 is not a valid user share"

  exit 1

fi

 

SRCDISK=${2#/mnt/}    #Remove any leading /mnt/ prefix

SRCDISK=${SRCDISK%%/*}    #Remove any trailing path

if [[ ! -d "/mnt/$SRCDISK" \

    || "$SRCDISK" != disk[1-9] && "$SRCDISK" != disk[1-9][0-9] && "$SRCDISK" != "cache" \

  ]]

then

  echo "$2 is not a valid disk"

  exit 1

fi

 

DESTDISK=${3#/mnt/}    #Remove any leading /mnt/ prefix

DESTDISK=${DESTDISK%%/*}    #Remove any trailing path

if [[ ! -d "/mnt/$DESTDISK" \

    || "$DESTDISK" != disk[1-9] && "$DESTDISK" != disk[1-9][0-9] && "$DESTDISK" != "cache" \

  ]]

then

  echo "$3 is not a valid disk"

  exit 1

fi

 

echo "Merging /mnt/$SRCDISK/$MERGEDIR into /mnt/$DESTDISK/$MERGEDIR"

 

then...

root@unRAIDx:/# /tmp/test.bash BACKUPS disk2

is not a valid disk

 

root@unRAIDx:/# DEBUG=3 /tmp/test.bash BACKUPS disk2

 

if [ -d "$1" ]

then

  FULLNAME=$(readlink -e "$1")  # Handle relative path

  MERGEDIR=${FULLNAME#/mnt/*/}  # Remove any /mnt/*/ prefix

else

  MERGEDIR="$1"

fi

+ '[' -d BACKUPS ']'

+ MERGEDIR=BACKUPS

 

if [ ! -d "/mnt/user/$MERGEDIR" ]

then

  echo "$1 is not a valid user share"

  exit 1

fi

+ '[' '!' -d /mnt/user/BACKUPS ']'

 

SRCDISK=${2#/mnt/}    #Remove any leading /mnt/ prefix

+ SRCDISK=disk2

SRCDISK=${SRCDISK%%/*}    #Remove any trailing path

+ SRCDISK=disk2

if [[ ! -d "/mnt/$SRCDISK" \

    || "$SRCDISK" != disk[1-9] && "$SRCDISK" != disk[1-9][0-9] && "$SRCDISK" != "cache" \

  ]]

then

  echo "$2 is not a valid disk"

  exit 1

fi

+ [[ ! -d /mnt/disk2 ]]

+ [[ disk2 != disk[1-9] ]]

 

DESTDISK=${3#/mnt/}    #Remove any leading /mnt/ prefix

+ DESTDISK=

DESTDISK=${DESTDISK%%/*}    #Remove any trailing path

+ DESTDISK=

if [[ ! -d "/mnt/$DESTDISK" \

    || "$DESTDISK" != disk[1-9] && "$DESTDISK" != disk[1-9][0-9] && "$DESTDISK" != "cache" \

  ]]

then

  echo "$3 is not a valid disk"

  exit 1

fi

+ [[ ! -d /mnt/ ]]

+ [[ '' != disk[1-9] ]]

+ [[ '' != disk[1-9][0-9] ]]

+ [[ '' != \c\a\c\h\e ]]

+ echo ' is not a valid disk'

is not a valid disk

+ exit 1

 

So while the error condition is reported, there is no usage condition that says, there are 3 required parameters and this is how you use the program.

Link to comment

it's amazing what can be achieved through bash only  :o

 

I've been working on a app (written in go), to "unbalance" the disks.

 

It basically tries to leave one disk, with as much space as possible.

 

It's still early alpha, and suited to my folder structure, but I'm attaching some pics so that you guys get a taste of what it does.

 

Please let me know if anyone would be interested in something like this, as a complement to what Freddie built.

 

Main interface after reading array configuration

y0swC39.png

 

After calculating how files will be moved around, it shows how much space will be left available in each disk (it has a hard low limit of 250mb).

J1ox4pr.png

 

It currently does a "dry-run" showing which commands would eventually be executed (mv commands).

5xVaNSG.png

unbalance2.png.2ec9799df2443086bfce22dd7843da5b.png

Link to comment

So while the error condition is reported, there is no usage condition that says, there are 3 required parameters and this is how you use the program.

I agree. The error message is not good and that may be a reason to have a separate check for a null value. I will make it better and add some usage output on error.

 

I will also add the DEBUG snippet.

 

Thank you.

Link to comment

In an ideal world 'unbalance' would be a front end to a 'diskmv' and 'consld8' back end.

 

I can see that 'unbalance' aims to allow partial merging but if that feature was dropped then in theory they are feature comparable and it seems a shame to have two separate code bases doing the same job.

 

Also I agree rsync is a requirement over mv as it is much safer.

Link to comment

'diskmv' is a utility to that moves files.  'consld8' is one of many possible utilities that determine how files should be moved depending on the goal. These many possible utilities then call 'diskmv' to get the job done. mbfrosty's 'rebalance' / 'pressure relief' and lboregard's 'unBalance' / 'disk packer' are other examples that could use 'diskmv' to get their job done. Another I've seen suggested is 'empty disk' which is very similar to 'unBalance', just with more available space.

Link to comment
  • 3 months later...

Based on the comments and suggestions made earlier, I have released updated scripts here:

 

https://github.com/trinapicot/unraid-diskmv/tree/v0.2.0

 

This is very much a "use at your own risk" kind of thing, but I am tagging releases that I trust and have used on my own data. But I definitely make sure my data is fully backed up before using it.

 

Anyone interested enough to test it out?

 

@NAS & @WeeboTech: The master branch has an option in diskmv to move only small files from one disk to another. If you are interested.

Link to comment

These appear to be great utilities.  I was trying to follow your code, I am not a coder, and was trying to understand how you use "user, as in "consld8 /mnt/user/video/tv/Wonderfalls disk3".  I am assuming you tear it apart and use the actual disk(n) values so that the data is not lost by using the /user paths.

 

With one more script to look through all of the subdirectories to parse to feed "consld8" this could be a totally automated solution that could be scheduled from cron.  Attached is a hi-level flow chart that I cobbled together for how I would envision such a program would step through.  I had thought about trying to cobble something together, or better yet hoping someone who actually knows how to code would develop.

 

Any thoughts about the flowchart?

Flowchart1.png.82d41475f2852da4b50f86dfa6ec8acc.png

Link to comment

I was trying to follow your code, I am not a coder, and was trying to understand how you use "user, as in "consld8 /mnt/user/video/tv/Wonderfalls disk3".  I am assuming you tear it apart and use the actual disk(n) values so that the data is not lost by using the /user paths.

 

Yes, consld8 calls diskmv and diskmv only moves files from one disk to another. No moves are done to or from the user share.

 

Any thoughts about the flowchart?

 

I was trying to follow your flowchart, but I am not a flowcharter.

 

I think a lot of the logic involving disk usage and free space is already included in consld8. A relatively simple script could be made to loop through all first level subdirectories and perform consld8 on each of them, letting consld8 determine the destination disk. You could even use 'find' and do it with one line (probably).

 

An alternative solution might be to add a recursive option to consld8. If a disk is not found with enough free space for the specified directory, consld8 is performed on all of the first level subdirectories.

 

...this could be a totally automated solution that could be scheduled from cron.

 

I came at these tools from a perspective that day-to-day dispersal of files is controlled by split level settings. If a new season of a tv show starts then a new season folder is created on the same disk as the other seasons of that show. This can be accomplished with split level settings. But sometimes mistakes are made or disks fill up and things need to be rearranged. That is when I intended for diskmv and consld8 to be used.

 

That being said, if you want to run them on a cron schedule, I won't try to stop you.

Link to comment

@Freddie,

 

I had a little bit of time to test the scripts.  Over all, they seem to work great.  A couple of observations"

 

consld8 does not like a paren "(" in the subdirectory name.  It returns the following error:

-bash: syntax error near unexpected token `('

 

To get around it, I just escaped the entire sub-directory like this.

consld8 -f -q /mnt/user/Test_Shows/"Archer (2009)" disk3

 

 

if you issue the quite "-q" flag to consld8, it does not pass it diskmv when it is called.

 

This is all I have noticed with the quick testing I was able to do.

Link to comment

Thanks for testing and providing feeback switchman.

 

if you issue the quite "-q" flag to consld8, it does not pass it diskmv when it is called.

 

Yep, that should happen, but it doesn't. I'll work on that.

 

consld8 does not like a paren "(" in the subdirectory name.  It returns the following error:

-bash: syntax error near unexpected token `('

 

To get around it, I just escaped the entire sub-directory like this.

consld8 -f -q /mnt/user/Test_Shows/"Archer (2009)" disk3

 

That is a "feature" of bash (the shell you are using to enter the commands). Spaces and parens are special characters in bash and they must be quoted or escaped when they are in file names. You get the same cryptic error for the same problem in very basic bash commands:

root@tower:~/testdata# ls Archer (2009)
-bash: syntax error near unexpected token `('
root@tower:~/testdata# ls "Archer (2009)"
Archer\ (2009)
root@tower:~/testdata# ls Archer\ \(2009\)
Archer\ (2009)

 

It's just something you have to deal with when working at the command line. The "\" tells bash to ignore the special meaning of the next character.

 

The tab key can be very helpful in these situations, it will autocomplete file names when possible. Type in the first part of the file name then hit tab and it will autocomplete if there is a single file that matches.  Hit tab twice and it will list all the possible matches. When it autocompletes it includes the necessary backslashes to escape the special characters.

Link to comment

would you consider adding an option to exclude a disk to consld8 . Obviously I am coming at this from the accelerator drive end which essentially precludes the use of consld8.

 

Edit: actually thinking about it would it perhaps be a better idea to use consld8 with size and extension limits (as diskmv) i.e. as opposed to skipping the accel drive use consld8 to populate it

Edit2: i dont think that would work since its fine for populating the accel drive but without being to exclude it all other consld8 actions would start pulling file from it again

Link to comment

So I pretty consistently receive the following error.  Any thoughts?

 

 

root@kingsnake:~# diskmv -f "/DVDs" disk3 disk6

.....

.//DVDs/AGI - Long Range Shooting/VIDEO_TS

.//DVDs/AGI - Long Range Shooting

Cannot stat file /proc/7924/fd/4: No such file or directory

.//DVDs/AGI - Spring Course/AGI - Spring Course.avi

.//DVDs/AGI - Spring Course

Cannot stat file /proc/8470/fd/4: No such file or directory

.//DVDs/AGI - Browning Hi-Power Armorer's Course/AGI - Browning Hi-Power Armorer's Course.avi

.//DVDs/AGI - Browning Hi-Power Armorer's Course/AGI - Browning Hi-Power Armorer's Course.gif

.//DVDs/AGI - Browning Hi-Power Armorer's Course

.//DVDs/High Standard/High Standard.avi

.//DVDs/High Standard

.//DVDs/AGI - Heckler and koch CETME G3/AGI - HK CETME G3.avi

.//DVDs/AGI - Heckler and koch CETME G3

.//DVDs/AGI - AK Trigger Job/AGI - AK Trigger Job.avi

.//DVDs/AGI - AK Trigger Job

.//DVDs/AGI - Remington 1110_1197 Shotgun Armorers Course/AGI-1124 - Remington 1110_1197 Shotgun Armorers Course.avi

 

Link to comment

would you consider adding an option to exclude a disk to consld8 .

 

Yes I will consider it.

 

So I pretty consistently receive the following error.  Any thoughts?

I have encountered that type of error in the past with the mover script. I traced it back to cache_dirs but never really understood what was going on. Are you running cache_dirs? For me, running 'ps 7924' (where the number matches the number in the error message) showed it was a cache_dirs process. That's about as far as my understanding of the proc filesystem goes. Maybe it's time for me to do some reading.

Link to comment

If there is interest and no major problems, I will look into making them easier to use (first a plugin for command line usage, then maybe gui)

 

Having a plugin to install these and keep them current would be great!

 

 

 

There are a lot of people who are moving an entire disk's worth of data from one disk to another so they can switch filesystems.  These utilities seem to operate around shares, could there be an option (or a third script) to move entire disks?

 

It would be great to tell people to add a plugin and run a simple command to move an entire disk's worth of data around.

Link to comment

I'm not sure diskmv is the best tool to use for switching file systems, especially if you want to follow the recommendation to verify files on the destination disk before deleting files from the source disk. I've updated the first post to try and address this issue.

 

I think it would be good to have more testing before making a plugin. I only know of the few users that have posted in this thread. Has anyone else out there used these scripts? Any problems?

 

New users may want to wait until there's been more testing, but if you want to jump in thestewman, these are just bash scripts that you run from the commandline (telnet, ssh or a keyboard/monitor attached to the server). It works much like the preclear script if you used that. There are many ways to set it up, the easiest may be:

[*]Download the latest release zip from github, currently https://github.com/trinapicot/unraid-diskmv/archive/v0.2.0.zip

[*]Extract the zip and copy diskmv and consld8 to the root of your unRAID flash drive

[*]At the command prompt, change to the directory that contains the scripts: cd /boot/

[*]Run the script: diskmv -h

Link to comment

I have not looked up the switches, but you could probably do this in two steps.  I would only do this for full drive moves.  This will probably be needed pretty soon.  Always move between two equal size drives or from a smaller to larger size.

 

Step 1

Use Rsync to mirror The Source drive to the Destination drive.  You now have two copies of the files on your system.  Other flags as appropriate will need to be set.

 

Step 2

Use Rsync with the checksum compare flag set.  After you verify all of the files moved correctly, you can format the source drive and proceed to the next.

 

It will take a while, but it would probably be the safest way to go.

 

 

EDIT:  I just had a second thought.  For step 1, you could use the standard copy command to initially get the files mirrored.  Then use rsync to do the compare in step 2.  This would probably speed up the entire process

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.