bitrot - a utility for generating sha256 keys for integrity checks (version 1.0)


Recommended Posts

This script will create SHA256 key values for files and binds those values to the file itself which can be used to verify the integrity of a file on file systems which do not have built in checksums.

 

An evolution of my inventory script with a sqlite database, I realized why not store the SHA string with the file itself? The SHA256 value and the date the file was scanned is stored in the user extended attributes and you can export the SHA256 values to a text file for use in the recovery of files in the lost+found directory.

 

Why SHA256 and not MD5? SHA256 has had no collisions found or known potential. In my testing, the speed difference was negligible on large media files.

 

Options:

  -p, --path      Full path to scan, required

                  For recovery, specify the path of the files to recover

  -a, --add        Scan for files with no SHA key and process them

  -v, --verify    Verify files with their SHA key and report mismatches

  -u, --update    Verify files with their SHA key, report mismatches, and update key

  -d, --days      Do not verify files that have been scanned within x days

  -r, --remove    Remove extended attributes created by this script

  -e, --export    Export SHA keys to file specified by the -f/--file option

  --recover        Recover files using SHA keys from file specified by the -f/--file option

  -f, --file      File to import/export. Defaults to /tmp/bitrot.sshkeys.txt

                  Will overwrite file if it exists

  --import        Will import/restamp files with a SHA key from an export file if none exists

  --ignorepath      Ignore path when importing SHA keys, match off of file name (TO DO)

  -m, --mask      Only process files matching file mask

                  Default: *

  -l, --log        Log all files added to the syslog

  -i, --id        Specify an alphanumeric ID to use instead of a

                  psudo-random number

 

Examples:

Add new files in the share 'TV' and compute SHA256

  bitrot.sh -a -p /mnt/user/TV

Add only *.mov files in a user share that contains a space

  bitrot.sh -a -p "/mnt/user/My Videos" -m *.mov

Scan for files in a subdirectory on a user share

  bitrot.sh -a -p /mnt/user/Documents/John/Catalog

Verify SHA on previously scanned files

  bitrot.sh -v -p /mnt/user/Documents

Verify SHA on previously scanned files that have not been checked in the past 90 days

  bitrot.sh -v -p /mnt/user/Documents -d 90

Update SHA on all modified files in a share (best to do after a verify)

  bitrot.sh -u -p /mnt/user/Documents

Update SHA on a specific file

  bitrot.sh -u -p /mnt/user/Documents -m modifiedword.doc

Export SHA keys

  bitrot.sh -e -p /mnt/disk1/Movies -f /mnt/disk1/Movies/shakeys_disk1.txt

Recover lost+found files by matching the SHA key with an exported list

  bitrot.sh --recover -p /mnt/disk1/lost+found -f /tmp/shakeys_disk1.txt

 

To utilize the script, you first need to download hashdeep-4.4-x86_64-1rj.txz and place it in the same directory as bitrot.sh - the script will auto-install the package if it hasn't been. For 32bit versions of UnRAID, install the hashdeep tools from UnMenu.

 

To Do:

Auto append new files to the export log

Option to add SHA keys to matching file names in a different location

 

Download bitrot-v1.0.zip

Edited by jbartlett
Link to comment
  • Replies 94
  • Created
  • Last Reply

Top Posters In This Topic

I scanned the options quickly, but is it correct there is not an option to write out the checksums to a separate file -- this will only write to the files as it scans?

 

Sorry, but there is not a chance in hell I'd use this! I'd only scan read-only shares and write the checksums to a separate file or files.

Link to comment

I scanned the options quickly, but is it correct there is not an option to write out the checksums to a separate file -- this will only write to the files as it scans?

 

Sorry, but there is not a chance in hell I'd use this! I'd only scan read-only shares and write the checksums to a separate file or files.

 

My understanding is that the extended attributes are stored in the ionode for the file, not in the file itself.

Link to comment

... but won't work if you're processing files on a read-only share?

 

It runs from a telnet shell, you're accessing directories and not shares. As long as user "root" can access the file, the script can process the file. That should be the case for your media stored under /mnt/user or /mnt/diskx

I think the point was that if you are interested in cataloguing checksums for a read only file system (for whatever reason) this utility can't direct its output to a separate location, it has to modify the original files, which are read only.

 

Many users are sceptical about using a utility that modifies their files on purpose, they are afraid it could corrupt the content. If you could provide a better explanation of how exactly your utility handles user attributes on various file systems, and what happens to that information when a file is copied, backed up, or restored, it may go a long way towards acceptance.

 

Mucking around in user attributes, alternate and hidden data streams is seen as black magic to many folks.

Link to comment

It's a brilliant way of storing the hash near the file.

Using tools like cp and rsync to preserve the extended attribute allows it to follow the file on the destination and provides the ability to verify the copy was good. I.E. as long as it is not cached.

 

Using fadvise, you can dump the cache on that particular file to insure it was copied and read back correctly.

 

As long as you can export the extended attribute for reuse by the source command, it's a good way to double check your data.

Link to comment

Awesome! Final question, do you backup files to a second server using rsync or similar? Am wondering if it's possible to preserve the extended attributes when copying the file to another location.

 

 

rsync has  -X, --xattrs                preserve extended attributes

Some quick googling seems to indicate NTFS has extended attributes. I rsync to SNAP mounted NTFS for my offsite backups. Any idea if this would work for me?
Link to comment

Using fadvise, you can dump the cache on that particular file to insure it was copied and read back correctly.

 

Not sure I understand this part of your post, how/why might you do this?

 

Also are you preferring this method to your proposed method using a SQL DB?

 

 

Yes and No.  In my application, I plan to use fadvise like md5sum does in telling the kernel that the next reads are sequential. (It has to do with caching) at the end I will tell the kernel that we do not need the file which will cause the kernel to dump the file from the cache.  I.E. It could assist in keeping the directory inodes in cache vs actual file data we may not need.  The issue I'm considering is for every file that is hash checked, the whole file is read into the cache, thus pushing out other data. By advising the kernel we are not going to need this file anymore it will preserve what is already in cache.

 

 

In my read tests, I would read a file, and the second read would be minuscule in time vs the first read.

using the fadvise call and dumping the cache causes the buffer cache data for the file to be freed, thus causing the next read to be from the disk itself.  I think this could help in some application that were to move files from one place to another. I.E. read, copy and hash as it's being read. Dump the buffer cache for the file, then re-read the file for hash integrity.  Might be good for testing unRAID when new versions are released.

 

 

So will this play into the SQL hash DB, partially. I only planned to use it for dumping the buffer cache after a file was hashed and stored into the DB.  I added information about fadvise simply to raise awareness in that it exists.

 

 

If a hash file is created, files are moved, it would be advisable to drop all caches before re-checking the hashes in the destination location or you might be using the buffer cache version.

Link to comment

Great script! Sadly when adding my Anime collection I get the following error:

 

 bitrot.sh -a -p /mnt/user/Anime

bitrot, by John Bartlett, version 1.0

Scanning for new files... (53.9%)
stat: cannot stat `/mnt/user/Anime/Kami Nomi zo Shiru Sekai/05 Kami nomi zo Shiru Sekai - Megami-hen [season 3] (BD 1280x720)/[JacobSwaggedUp] Kami nomi zo Shiru Sekai - Megami-hen - 01 (BD 1280x720).mp4': No such file or directory
./bitrot.sh: line 195: 874833978000 + : syntax error: operand expected (error token is "+ ")

 

Do you have an idea what goes wrong?

Link to comment

Could be. :) Here are the exact filenames in that directory:

01 Kami nomi zo Shiru Sekai/
02 Kami Nomi zo Shiru Sekai_II_-_[1280x720_Blu-ray_FLAC]/
03 Kami nomi zo Shiru Sekai - 4-nin to Idol [OVA] (BD 1280x720)/
04 Kami nomi zo Shiru Sekai - Tenri-hen [OVA] (BD 1280x720)/
05 Kami nomi zo Shiru Sekai - Megami-hen [season 3] (BD 1280x720\)/
06 Kami nomi zo Shiru Sekai - Magical Star Kanon 100% [DVD 576p 10bit AAC].mkv

Other Special chars and even Japanese Characters are possible. So if the script could be improved to handle such special cases, that would be great. :)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.