Rotating incremental backups with rsync

Ulvan · February 10, 2014

Does anyone have a simple script to do rotating incremental backups with rsync, running either on my desktop or on my array (not sure if latter is possible)? The backup would have to go on my desktop's HDD.

The script would do a full backup on day 0, and then do an incremental backup on days 1-n (seven days in my case). When n is reached, it would do a new backup replacing the old full backup, then replace the daily incremental backups one by one, rinse repeat. Usually the script creates a separate backup folder for each day.

I used SyncBackPro to do this, but I'm on Linux Mint now. I tried this script running on my desktop, but it gives two errors when running it:

!/bin/sh: not found

and

rsync: change_dir "/run/user/1000/gvfs/smb-share:server=themonolith.local,share=rest/Personal/" failed: Permission denied (13)

Ulvan · February 10, 2014

Got the script linked in the OP to work. The first error was caused by some newbie mistake, and/or my system not having the proper bash, sh, dash, or something like that. Fix was simple: adding "sh" to the beginning of the command.

I did initial seeding, and added the following to my crontab to see if it works. It should run a daily sync at 4am every day, and re-seeds the backup once a week. I've stored the script in a separate directory, and I will have a different script for my weekly and monthly backups.

I'll report back after I've confirmed this works.

0 4 * * * sh /home/[username]/Scripts/rsync-daily.sh /run/user/1000/gvfs/smb-share\:server\=[servername].local\,share\=[backup source] /media/[username]/[backup destination] >> /home/[username]/Scripts/[logfile] 2>&1

NAS · February 11, 2014

Good work.

Back link to a user that is interested in this

http://lime-technology.com/forum/index.php?topic=31731.15

WeeboTech · February 11, 2014

I have a version that does a daily backup to a dated directory.

The daily backup uses -link-dest so that every day, the prior day's backup is hard linked to the current one.

Then an incremental rsync occurs from the source.

This has the benefit in keeping a FULL backup for every date with only the incremental changes being saved to the disk, thus saving disk space.

Since the output directory is the output of the date command any strftime formatting could be used.

Currently I name the directory with

%Y%m%d

if it was named with %Y-%w then it would be year-weeknumber and you would have weekly backups.

if it was named as

%Y%m%d-%H then you could have hourly backups. etc. etc.

It requires 1x space for the initial seed.

It requires whatever space for the incrememntal updates and for how often you run them.

# du -hs 2014*
8.7G    20140101
99M     20140102
98M     20140103
104M    20140104
97M     20140105
96M     20140106
96M     20140107
100M    20140108

and if you examine it on a per day basis.

# du -hs 20140102 20140103 20140104
8.7G    20140102
98M     20140103
104M    20140104

# du -hs 20140103 20140104         
8.7G    20140103
104M    20140104

# du -hs 20140104 20140105 20140106 
8.7G    20140104
97M     20140105
96M     20140106

# du -hs 20140106 
8.7G    20140106

This allows you to remove the days you do not want to keep, yet still have access to a full day's backup for the dates you want to keep.

See my script on google code called rsync_linked_backup.sh for examples of how it works.

Note that it expects the script to run on the output side. i.e. pull the files.

What I have never implemented in this particular script is a purging routine.

I.E. I leave it up to the end user to remove whatever directories they choose.

Eventually I'll have something, however it requires the conversion of a date to a number of days and some external tools to do that. I have them as bash loadables, I've not been able to package them up for use yet.

Ulvan · February 13, 2014

I have a version that does a daily backup to a dated directory.

The daily backup uses -link-dest so that every day, the prior day's backup is hard linked to the current one.

Then an incremental rsync occurs from the source.

This has the benefit in keeping a FULL backup for every date with only the incremental changes being saved to the disk, thus saving disk space.

Do I understand correctly that I could go to any arbitrary date's directory, and do a cp of it to recover a full backup, thanks to the hardlinks?

Does it also include hardlinks to prior incremental backup(s) when necessary? Incremental only backs up files which differ from the previous incremental backup, not the full backup. For example, FileA was changed on Day 3, FileB on Day 5, backups are daily. For things to work correctly, incremental backup on Day 5 would include a copy of FileB, a hardlink to FileA of Day 3, and hardlink to the full backup for all other files.

Or does the new backup on Day 5 include copies of all files that have changed since the last full backup? In the example, a copy of both FileA and FileB, and hardlinks to the full backup for all other files. If so, it's a differential backup, which is a whole another beast Incremental backs up everything changed from last incremental backup, differential backs up everything changed from last full backup.

The main problem with rotating incremental backups is that to recover the data you have to do it for each created backup instance. ie. if you have a daily rotation refreshed every week, you would need up to seven recovery runs in sequence. While seven is probably easy enough, this can get really tedious and error prone if you have a large number of backups (eg. daily backups refreshed only monthly). With differential you only need the full backup and latest differential backup, so recovery is easier.

One reason to rotate backups is temp files or other files which change frequently. Without rotation your backup set would get larger and larger with incremental backup, and even faster with differential. How can you limit size requirements in your scheme with lots of temp files?

WeeboTech · February 13, 2014

Every date or 'version' is a full backup.

This is with the added benefit of saving space by use of hard links to files that have not changed.

For every file that does not change, the current file in today's directory is a hard link to the prior day.

To do a restore to one particular day, one would only need to go to that particular day.

Or does the new backup on Day 5 include copies of all files that have changed since the last full backup?

No.

backup on Day 5 would include a copy of FileB, a hardlink to FileA of Day 3, and hardlink to the full backup for all other files.

Yes.

One reason to rotate backups is temp files or other files which change frequently. Without rotation your backup set would get larger and larger with incremental backup, and even faster with differential. How can you limit size requirements in your scheme with lots of temp files?

I think you might have to use the --delete option.

Delete all the files on the destination backup that do not exist in the current data to be rsynced. I'm not sure I use this option in my script.

Ulvan · February 13, 2014

Every date or 'version' is a full backup.

This is with the added benefit of saving space by use of hard links to files that have not changed.

For every file that does not change, the current file in today's directory is a hard link to the prior day.

To do a restore to one particular day, one would only need to go to that particular day.

Or does the new backup on Day 5 include copies of all files that have changed since the last full backup?

No.

backup on Day 5 would include a copy of FileB, a hardlink to FileA of Day 3, and hardlink to the full backup for all other files.

Yes.

One reason to rotate backups is temp files or other files which change frequently. Without rotation your backup set would get larger and larger with incremental backup, and even faster with differential. How can you limit size requirements in your scheme with lots of temp files?

I think you might have to use the --delete option.

Delete all the files on the destination backup that do not exist in the current data to be rsynced. I'm not sure I use this option in my script.

Ok, got it now. Sounds like the hardlinks would fix one of the main drawbacks of incremental backups, ie. the necessity to recover each and every backup instance in sequence.

But the backup set will continue to grow with each backup run. If you have a lot of changing files - especially if they are large -, or temp files, you will run out of space sooner or later. --delete option would alleviate this, but not if the files change.

For example, I store Photoshop files which are 1GB+, and edit them over several days. Each edit would create a new copy of the file for that day. Purging the old ones might work, but how to automate that? For this kind of use I think rotating backups is a good approach.

--delete also risks losing a file due to user error within a day, which is another reason to do rotating backups. I guess you could set it so that it deletes files only after n days, though.

WeeboTech · February 13, 2014

rsync with the hardlink destination creates a new directory for that backup linking unchanged files from the previous directory.

Now if you edit the same file over 7 days and you keep 30 days of backups, you will have all 30 days plus the 7 changes.

If you choose to only keep 14 days, then so be. It's all based on how many days you want to keep of full backups.

I have a script that keeps.

Ever year (first of the year) forever.

Every month (first of the month) for the last 6 months.

Ever week (sunday) for the last 12 weeks.

Every day for the last 64 days.

Since I date my files as YYYY-MM-DD I can parse the directory name, take apart the date and use strftime or epoch date values to determine what to purge.

For my vmware hosts the directories are named as

YYYY-WW where WW is the week number, so I have daily backups that get overwritten until the next week number arrives.

Then it starts a new YYYY-WW

For source directories I have hourly backups

YYYY-MM-DD-HH again, purging based on whatever the application needs.

So currently, the only logic missing from the DATED mechanism is the purging mechanism.

This is how I prefer to do it. I'm not fond of the rotating backups because I need to keep certain backups of certain days around longer.

--delete also risks losing a file due to user error within a day, which is another reason to do rotating backups. I guess you could set it so that it deletes files only after n days, though.

I believe this would only work on the current date's backup. It's not going to go back and delete the previous day's backup (as far as I know), but that remains to be tested.

I have a few bash loadable plugins to make available. One is strftime for formatting the date, the other is strptime for taking a formatted date and converting it to epoch time. Once you have the epoch time, you can do arithmetic to determine how old a directory is and remove it.

You could use the stat command to get the mtime of a specific file and use that as a semaphore or seed file to determine a backup's date.

i.e touch a file at the start of backup, touch a file at the end of the backup, use stat to pull the mtime of either file and do anything you need to with the mtime.

Probably more then you want to do. So rotating your backup directories numerically may be preferable.

I like dated directories so I can go right to a point in time.

In some applications I need to keep the 8th, 15th and 25th day of a monthly set longer then others.

So I need a way of determining that easily.

maybe one day I'll write a C program to do the test of a date and provide some measurable to determine age and purge candidacy.

Ulvan · February 13, 2014

I have a script that keeps.

Ever year (first of the year) forever.

Every month (first of the month) for the last 6 months.

Ever week (sunday) for the last 12 weeks.

Every day for the last 64 days.

That's a pretty extensive backup scheme. For my needs (only personal data) that's probably overkill.

When I only had my desktop to back up, I used to keep a:

- daily backup, full backup rotated weekly ("online," ie. attached to the computer, not on internet)

- weekly full backup, deleted files purged after three months (offline)

- monthly full backup, deleted files purged after six months (offline, offsite)

- Crashplan

Now with an unRAID box, the online backup would be from the array to my desktop, which is the topic of the OP.

I'm considering making both the weekly and monthly backups offsite now. That way I would always have at least one full backup offsite in case of fire or theft.

Haven't looked at how to purge aging files, but your scripts sound like they might be up to the task.

WeeboTech · February 13, 2014

That's a pretty extensive backup scheme. For my needs (only personal data) that's probably overkill.

Keep in mind, if nothing changes, very little disk space is used.

For programmers, I've been asked to go back weeks and months.

For other customer related data, I've been asked to go as far back as 6 months to a year.

If used the right way with the dated/named directories, there is hardly any thinking. You can go right to a date and target the file.

With the YYYY-WW (week number) it was easy to get a daily backup, that was saved weekly without any thought to it.

Can't even stress how this saved my butt. After the hurricane sandy, I was able to access key documents and updates in this mechanism even though half the server was under salty fuel laden sewage water. LOL!

Ulvan · February 13, 2014

Can't even stress how this saved my butt. After the hurricane sandy, I was able to access key documents and updates in this mechanism even though half the server was under salty fuel laden sewage water. LOL!

Ouch! That's a compelling argument for the need for offsite backups.

Rotating incremental backups with rsync

Recommended Posts

Ulvan

Link to comment

Ulvan

Link to comment

NAS

Link to comment

WeeboTech

Link to comment

Ulvan

Link to comment

WeeboTech

Link to comment

Ulvan

Link to comment

WeeboTech

Link to comment

Ulvan

Link to comment

WeeboTech

Link to comment

Ulvan

Link to comment

Join the conversation