jbartlett Posted September 12, 2014 Share Posted September 12, 2014 (edited) This script will create SHA256 key values for files and binds those values to the file itself which can be used to verify the integrity of a file on file systems which do not have built in checksums. An evolution of my inventory script with a sqlite database, I realized why not store the SHA string with the file itself? The SHA256 value and the date the file was scanned is stored in the user extended attributes and you can export the SHA256 values to a text file for use in the recovery of files in the lost+found directory. Why SHA256 and not MD5? SHA256 has had no collisions found or known potential. In my testing, the speed difference was negligible on large media files. Options: -p, --path Full path to scan, required For recovery, specify the path of the files to recover -a, --add Scan for files with no SHA key and process them -v, --verify Verify files with their SHA key and report mismatches -u, --update Verify files with their SHA key, report mismatches, and update key -d, --days Do not verify files that have been scanned within x days -r, --remove Remove extended attributes created by this script -e, --export Export SHA keys to file specified by the -f/--file option --recover Recover files using SHA keys from file specified by the -f/--file option -f, --file File to import/export. Defaults to /tmp/bitrot.sshkeys.txt Will overwrite file if it exists --import Will import/restamp files with a SHA key from an export file if none exists --ignorepath Ignore path when importing SHA keys, match off of file name (TO DO) -m, --mask Only process files matching file mask Default: * -l, --log Log all files added to the syslog -i, --id Specify an alphanumeric ID to use instead of a psudo-random number Examples: Add new files in the share 'TV' and compute SHA256 bitrot.sh -a -p /mnt/user/TV Add only *.mov files in a user share that contains a space bitrot.sh -a -p "/mnt/user/My Videos" -m *.mov Scan for files in a subdirectory on a user share bitrot.sh -a -p /mnt/user/Documents/John/Catalog Verify SHA on previously scanned files bitrot.sh -v -p /mnt/user/Documents Verify SHA on previously scanned files that have not been checked in the past 90 days bitrot.sh -v -p /mnt/user/Documents -d 90 Update SHA on all modified files in a share (best to do after a verify) bitrot.sh -u -p /mnt/user/Documents Update SHA on a specific file bitrot.sh -u -p /mnt/user/Documents -m modifiedword.doc Export SHA keys bitrot.sh -e -p /mnt/disk1/Movies -f /mnt/disk1/Movies/shakeys_disk1.txt Recover lost+found files by matching the SHA key with an exported list bitrot.sh --recover -p /mnt/disk1/lost+found -f /tmp/shakeys_disk1.txt To utilize the script, you first need to download hashdeep-4.4-x86_64-1rj.txz and place it in the same directory as bitrot.sh - the script will auto-install the package if it hasn't been. For 32bit versions of UnRAID, install the hashdeep tools from UnMenu. To Do: Auto append new files to the export log Option to add SHA keys to matching file names in a different location Download bitrot-v1.0.zip Edited January 29, 2021 by jbartlett Quote Link to comment
neilt0 Posted September 12, 2014 Share Posted September 12, 2014 I scanned the options quickly, but is it correct there is not an option to write out the checksums to a separate file -- this will only write to the files as it scans? Sorry, but there is not a chance in hell I'd use this! I'd only scan read-only shares and write the checksums to a separate file or files. Quote Link to comment
jbartlett Posted September 12, 2014 Author Share Posted September 12, 2014 I scanned the options quickly, but is it correct there is not an option to write out the checksums to a separate file -- this will only write to the files as it scans? Sorry, but there is not a chance in hell I'd use this! I'd only scan read-only shares and write the checksums to a separate file or files. My understanding is that the extended attributes are stored in the ionode for the file, not in the file itself. Quote Link to comment
neilt0 Posted September 12, 2014 Share Posted September 12, 2014 What could possibly go wrong?! :-) Quote Link to comment
jbartlett Posted September 12, 2014 Author Share Posted September 12, 2014 What could possibly go wrong?! :-) Hrm, as long as you're not running beta7 or 8, nothing? ETA: User attributes have been long supported and hailed as safe. Quote Link to comment
PeterB Posted September 13, 2014 Share Posted September 13, 2014 What could possibly go wrong?! :-) Hrm, as long as you're not running beta7 or 8, nothing? .... until next time! ETA: User attributes have been long supported and hailed as safe. ... but won't work if you're processing files on a read-only share? Quote Link to comment
jbartlett Posted September 13, 2014 Author Share Posted September 13, 2014 ... but won't work if you're processing files on a read-only share? It runs from a telnet shell, you're accessing directories and not shares. As long as user "root" can access the file, the script can process the file. That should be the case for your media stored under /mnt/user or /mnt/diskx Quote Link to comment
JonathanM Posted September 13, 2014 Share Posted September 13, 2014 ... but won't work if you're processing files on a read-only share? It runs from a telnet shell, you're accessing directories and not shares. As long as user "root" can access the file, the script can process the file. That should be the case for your media stored under /mnt/user or /mnt/diskx I think the point was that if you are interested in cataloguing checksums for a read only file system (for whatever reason) this utility can't direct its output to a separate location, it has to modify the original files, which are read only. Many users are sceptical about using a utility that modifies their files on purpose, they are afraid it could corrupt the content. If you could provide a better explanation of how exactly your utility handles user attributes on various file systems, and what happens to that information when a file is copied, backed up, or restored, it may go a long way towards acceptance. Mucking around in user attributes, alternate and hidden data streams is seen as black magic to many folks. Quote Link to comment
TheDragon Posted September 15, 2014 Share Posted September 15, 2014 Thanks for sharing this, seems a very elegant solution compared to the script I'm currently using. Found some good info here regarding extended attributes - http://www.linux-mag.com/id/8741/ I presume writing the hash to metadata does not alter and effectively invalidate the hash immediately? Hopefully I've worded that question correctly :-) Quote Link to comment
jbartlett Posted September 15, 2014 Author Share Posted September 15, 2014 Thank you for that link, it's informative! And you presume correctly, changing the user attributes does not alter the hash of the file itself. Quote Link to comment
TheDragon Posted September 16, 2014 Share Posted September 16, 2014 Awesome! Final question, do you backup files to a second server using rsync or similar? Am wondering if it's possible to preserve the extended attributes when copying the file to another location. Quote Link to comment
WeeboTech Posted September 16, 2014 Share Posted September 16, 2014 It's a brilliant way of storing the hash near the file. Using tools like cp and rsync to preserve the extended attribute allows it to follow the file on the destination and provides the ability to verify the copy was good. I.E. as long as it is not cached. Using fadvise, you can dump the cache on that particular file to insure it was copied and read back correctly. As long as you can export the extended attribute for reuse by the source command, it's a good way to double check your data. Quote Link to comment
WeeboTech Posted September 16, 2014 Share Posted September 16, 2014 Awesome! Final question, do you backup files to a second server using rsync or similar? Am wondering if it's possible to preserve the extended attributes when copying the file to another location. rsync has -X, --xattrs preserve extended attributes Quote Link to comment
jbartlett Posted September 16, 2014 Author Share Posted September 16, 2014 If you create an exported hash file, be sure to regenerate it after moving files around so you have the current file location in the event you need to recover the files. bitrot.sh -e -p /mnt/disk1 -f /mnt/cache/sha256.disk1.txt Quote Link to comment
TheDragon Posted September 16, 2014 Share Posted September 16, 2014 What's the benefit to having an exported hash file as well as the hash in the metadata? Also how often are you running your script, do you have it scheduled? Quote Link to comment
TheDragon Posted September 16, 2014 Share Posted September 16, 2014 Think the penny just dropped... is the exported hash file to compare to the hash in the metadata of files in lost+found to identify them? Quote Link to comment
TheDragon Posted September 16, 2014 Share Posted September 16, 2014 Using fadvise, you can dump the cache on that particular file to insure it was copied and read back correctly. Not sure I understand this part of your post, how/why might you do this? Also are you preferring this method to your proposed method using a SQL DB? Quote Link to comment
trurl Posted September 16, 2014 Share Posted September 16, 2014 Awesome! Final question, do you backup files to a second server using rsync or similar? Am wondering if it's possible to preserve the extended attributes when copying the file to another location. rsync has -X, --xattrs preserve extended attributes Some quick googling seems to indicate NTFS has extended attributes. I rsync to SNAP mounted NTFS for my offsite backups. Any idea if this would work for me? Quote Link to comment
hawihoney Posted September 16, 2014 Share Posted September 16, 2014 For 32bit versions of UnRAID, install the hashdeep tools from UnMenu. On unMENU I find md5deep-3.6.orig.tar.gz only. Is that the package you call hashdeep? Thanks in advance. Quote Link to comment
WeeboTech Posted September 16, 2014 Share Posted September 16, 2014 Think the penny just dropped... is the exported hash file to compare to the hash in the metadata of files in lost+found to identify them? I would answer yes to this, in addition to any other corruption that may have occurred. Quote Link to comment
WeeboTech Posted September 16, 2014 Share Posted September 16, 2014 Using fadvise, you can dump the cache on that particular file to insure it was copied and read back correctly. Not sure I understand this part of your post, how/why might you do this? Also are you preferring this method to your proposed method using a SQL DB? Yes and No. In my application, I plan to use fadvise like md5sum does in telling the kernel that the next reads are sequential. (It has to do with caching) at the end I will tell the kernel that we do not need the file which will cause the kernel to dump the file from the cache. I.E. It could assist in keeping the directory inodes in cache vs actual file data we may not need. The issue I'm considering is for every file that is hash checked, the whole file is read into the cache, thus pushing out other data. By advising the kernel we are not going to need this file anymore it will preserve what is already in cache. In my read tests, I would read a file, and the second read would be minuscule in time vs the first read. using the fadvise call and dumping the cache causes the buffer cache data for the file to be freed, thus causing the next read to be from the disk itself. I think this could help in some application that were to move files from one place to another. I.E. read, copy and hash as it's being read. Dump the buffer cache for the file, then re-read the file for hash integrity. Might be good for testing unRAID when new versions are released. So will this play into the SQL hash DB, partially. I only planned to use it for dumping the buffer cache after a file was hashed and stored into the DB. I added information about fadvise simply to raise awareness in that it exists. If a hash file is created, files are moved, it would be advisable to drop all caches before re-checking the hashes in the destination location or you might be using the buffer cache version. Quote Link to comment
SlrG Posted September 16, 2014 Share Posted September 16, 2014 Great script! Sadly when adding my Anime collection I get the following error: bitrot.sh -a -p /mnt/user/Anime bitrot, by John Bartlett, version 1.0 Scanning for new files... (53.9%) stat: cannot stat `/mnt/user/Anime/Kami Nomi zo Shiru Sekai/05 Kami nomi zo Shiru Sekai - Megami-hen [season 3] (BD 1280x720)/[JacobSwaggedUp] Kami nomi zo Shiru Sekai - Megami-hen - 01 (BD 1280x720).mp4': No such file or directory ./bitrot.sh: line 195: 874833978000 + : syntax error: operand expected (error token is "+ ") Do you have an idea what goes wrong? Quote Link to comment
BRiT Posted September 16, 2014 Share Posted September 16, 2014 Looks like the script can not handle spaces in file or directory names. It likely can't handle single quotes or other special characters too, such as square brackets or parentheses. Quote Link to comment
MyKroFt Posted September 16, 2014 Share Posted September 16, 2014 Spaces are fine on mine, special symbols? Myk Quote Link to comment
SlrG Posted September 16, 2014 Share Posted September 16, 2014 Could be. Here are the exact filenames in that directory: 01 Kami nomi zo Shiru Sekai/ 02 Kami Nomi zo Shiru Sekai_II_-_[1280x720_Blu-ray_FLAC]/ 03 Kami nomi zo Shiru Sekai - 4-nin to Idol [OVA] (BD 1280x720)/ 04 Kami nomi zo Shiru Sekai - Tenri-hen [OVA] (BD 1280x720)/ 05 Kami nomi zo Shiru Sekai - Megami-hen [season 3] (BD 1280x720\)/ 06 Kami nomi zo Shiru Sekai - Magical Star Kanon 100% [DVD 576p 10bit AAC].mkv Other Special chars and even Japanese Characters are possible. So if the script could be improved to handle such special cases, that would be great. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.