Jump to content

Crashplan Performance


Recommended Posts

Hi Guys,

 

After literally several years of waiting and tinkering, I finally have two UnRaid boxes and have Crashplan running on both.  They're pretty standard boxes, copying the hardware of the "Official builds" that Tom provides.

 

Just wanted to share for those that are interested - both wired to a gigabit wireless router I get one backing up to the other at about 40Mbps, sometimes up to 150Mbps.  Not sure why it varies so much, but if you were estimating transfer time I'd estimate about 300 Megabytes per minute should be pretty achievable.  That's about 60 hours per TB.  Remember that Crashplan encrypts, versions, etc. so pure bandwidth isn't the only factor.

 

Going up to the cloud is obviously much, much slower.  Comcast business class looks like about 4Mbps, roughly ten times as long: 600 hours per TB (25 days).

 

Any ways to boost my local backup performance so I can backup my 15 TB from one box to the other and then take the backup offsite?  :-)

 

Any way to get it to perform both backups at the same time?  Seems I can only do cloud or local, can't do both at the same time.

 

Thanks,

 

Russell

Link to comment

Here is what I would try.

1. put them on a dedicated network. I realize this is kinda pointless if you want to access the data on them though.

2. Make sure you have data de-dupulication set to minimal.

3. http://support.crashplan.com/doku.php/recipe/speeding_up_your_backup

 

Good Luck backing up 15TB to Crashplan over your 4mbps upload. You will need lots of patience especially with Crashplan's variable upload speeds. Currently increasing your speeds won't even help since I have been getting 1.5-3mbps recently.

 

It sounds like you are on the right path having two servers and doing the initial backup locally and then sending it offsite. Once it is offsite then then the incremental backups will be done over your upload and not affected by Crashplans's servers. Just put that server on a connection that has a faster download than your upload.

Link to comment

Just wanted to share for those that are interested - both wired to a gigabit wireless router I get one backing up to the other at about 40Mbps, sometimes up to 150Mbps.  Not sure why it varies so much, but if you were estimating transfer time I'd estimate about 300 Megabytes per minute should be pretty achievable.  That's about 60 hours per TB.  Remember that Crashplan encrypts, versions, etc. so pure bandwidth isn't the only factor.

 

What are you using to measure the speed? The Crashplan client doesn't measure 'wire speed' but rather the speed it's churning through the data.

 

Therefore it would report a lower throughput rate (closer to your actual network transmission speed) on data that it can't compress. Compressible or dedupeable  files will have the client report a much higher 'file throughput' rate as it's pipelining the data more efficiently. This could explain why you're seeing it vary as it will depend on the data type.

 

Having said that, I have found there is a performance cap in crashplan at some point. You can throw all the bandwidth locally at it you want but you'll end up gated somewhere and hit a brick wall. I don't really think the client is very good. I suspect it's single threaded and we all know it's a memory hog!. And Java.

 

Any ways to boost my local backup performance so I can backup my 15 TB from one box to the other and then take the backup offsite?  :-)

 

Turn off all the additional features. Compression, dedupe, encryption etc. The fewer of those you have the close you'll get to full line speed performance. You can also tweak the network settings to a degree though I don't know how much use it would be.

 

Any way to get it to perform both backups at the same time?  Seems I can only do cloud or local, can't do both at the same time.

 

Not that I'm aware. It's not a cloud or local thing - purely a target / backupset thing. You could have 12 local targets set up (i.e 12 different local clients or 12 different local paths to backup to) and it will only backup sequentially. Backup to crashplan central is just another target.

 

I'm guessing - but see my previous comment about the lack of multithreading.

 

 

Link to comment

Oops this kills my Crashplan idea. 

 

Based on this, I guess that Crashplan won't compete with my rsync based backup of one server to another over GB lan.  I had planned on looking at it as an alternative for backups that get taken locally yet stored offsite.  I still move too much data for a WAN update.  I bring the servers onsite and let them backup over GB lan.  However I was wanting to encrypt my backups so they don't sit offsite in visible format.

 

With rsync, I can do a backup where little has changed with 20 TB in 2 hours over GB lan.  If there is 1/2 TB of differences, it takes several more hours, but I am amazed how fast it runs.  Definitely orders of magnitude faster than you are describing.

 

Daily differences being backed up vary from only a few gigabytes to 1/2 TB. 

 

Link to comment

Oops this kills my Crashplan idea. 

 

Why? Unless it's posted somewhere else by OP so you can compare - it still depends largely on your hardware.

 

And also what you're doing with rsync - you can make rsync veeery slow as well depending on what options you give it. By the time you switch everything off in crashplan (which is then analogous to rsyncs default behavior) it will be much faster.

 

Crashplan is also a different thing than rsync. The high level 'my data is in another place' result is the same but they offer different things and do it differently. Crashplan can also use inotify to understand what it needs to backup as opposed to rsync having to crawl the filesystem (which will take a while if you have a large number of files - assuming you're not doing something else to generate a list of files to specifically pass to rsync for backup).

 

Try it.

Link to comment

Well you are the expert, thanks for the tips.  I will try it then. 

 

The only reason I want this is to get an encrypted version of the backup.  Today rsync is trawling all the files and doing a disk to disk backup, so the backup server is a dup of the original server and all disks are max of 95% full.  Some usershares span multiple drives but I don't use them for backing up.  Here is the relevant rsync parameters being used:

 

rsync -av --stats --progress /mnt/disk2/ /mnt/t3disk2/  >> /boot/logs/cronlogs/t3disk2.log

 

disk1 -> disk1

disk2 -> disk2

disk3 -> disk3

etc....

 

How does crashplan handle the disk swap.  I assume it will decide where to put things?  Will it fill a drive to overflowing before going to the next drive?  Or can I just do a similar backup to how it is working with rsync? 

 

(Most of the files being backed up are pictures either jpg, 25 megabyte raw image files or 200 megabyte sized tiffs.  )

 

Link to comment

Well you are the expert, thanks for the tips.  I will try it then. 

 

Far from an expert..believe me!

 

How does crashplan handle the disk swap.  I assume it will decide where to put things?  Will it fill a drive to overflowing before going to the next drive?  Or can I just do a similar backup to how it is working with rsync? 

 

Crashplan asks you where you want to store incoming backups. You give it a directory.

 

Within that directory it will create a structure based on the Crashplan ID of your machine and then what I think is the 'virtual block / byte range' it uses internally.

 

So you may end up with something like (numbers made up) :

 

/path/to/crashplan/backup/dir/25972358625256/cpbf0000000000000000000/

/path/to/crashplan/backup/dir/25972358625256/cpbf0000000000000259722/

 

Inside each of those directories will be a 4 gigabyte file (and a couple of other bits of metadata) that contains your backup data. You'll get another directory, with a byte / block id, for every 4 gigs you back up.

 

What this means is that :

 

- You're not getting the same as rsync. Your backup isn't in 'plain text'. You can't just see the files via the filesystem and manipulate them from there. The files are in crashplans proprietary format and, especially if you're using compression, dedupe or encryption, are very obfuscated. You need the crashplan client to make sense of it.

 

- As It's a directory and a bunch of subdirs you can use the user share spanning in unraid to let the backup span disks using that mechanism.

 

- As crashplan only takes a single directory as the backup location I'm not sure you can do a manual mapping of disk1 -> disk1 etc.

 

It will also be slower, your rsync isn't doing much other than copying the data - if you enable encryption in crashplan you will see a big hit in overhead. If you're also using user shares you'll get the overhead with that too.

Link to comment

Hi Guys,

 

I'm the OP.  Let me know if you have any questions and I'll see if I can find the answers for you in my current setup.  I've been afraid to tamper much as I've literally waited years to get this working.  While I am a programmer, I'm not a Linux guy at all - and couldn't get it working for ages.  As it happens, I had a rare issue that was recently solved.  I bought a four year CrashPlan package on a special promo and I've burned up 9 months of it before I got this working.

 

I could never figure out how to do the rsync setup.

 

I'm glad kbowman confirms my plan is decent - backup one unraid to another unraid with CrashPlan on my local network - then take the encrypted one offsite (family/friend's house).  They should be able to stay in sync pretty well, I hope.  Super critical things I'll send up to CrashPlan's Cloud also (may as well, I'm paying for it.)  Just FYI, the "unencrypted option" in CrashPlan doesn't mean the data will end up unencrypted - I guess it means the transfer is not encrypted.  (No way to accomplish an unencrypted copy, which is really what I preferred, using CrashPlan.)

 

Yes, Boof, I was using the CrashPlan client - wire speed doesn't really matter, does it?  It's the total time from reading disk to saved on disk at the destination that matters.  I was just trying to give others a heads-up on what they might expect for "overall transfer time."  I will try your performance ideas when I get some time.

 

Tr0910, I have the basic hardware...  Not wanting hassles, I've bought both my UnRaid boxes following the standard hardware that Tom provides in his prebuilt systems at the time I bought them.  One is about three years old, I think.  The other just under a year.  You can likely find the forum threads for my builds - definitely my first build is well documented here; several people copied me since I described every tiny step to build it.  They aren't powerful, just the basics, so you may get better performance - almost surely for your local transfer.  Somewhere I thought I read that you could also now bind multiple NICs in UnRaid for boosted performance locally.

 

Tr0910, to continue...  On my "Destination Unraid" system I simply created one share - creatively named "backup" then pointed CrashPlan to that location as the destination for incoming backups.  UnRaid handles spanning the data over multiple drives as CrashPlan pours the data into the share.  This was a simple way to set it up - since it's encrypted it didn't really make sense to keep family photos in a separate folder from taxes, etc.  CrashPlan client shows the files in the same folder structure as the original location.  Boof described well what the "backup" share looks like if you browse it - a whole bunch of numbered folders and numbered files and some cpbf (which I assume means "CrashPlan Backup File").  Boof is right, as far as I can tell - on the "destination" machine in CrashPlan you can only choose a single folder as the backup location - oddly the "Inbound Backup Settings" dialogue box calls it "Default backup archive location" as if you can choose more than one, but I can't find a way to do it.

 

Hope that helps - again, if you have any questions, I'll try to research them.

 

Russell

 

 

 

Link to comment
What this means is that :

 

- You're not getting the same as rsync. Your backup isn't in 'plain text'. You can't just see the files via the filesystem and manipulate them from there. The files are in crashplans proprietary format and, especially if you're using compression, dedupe or encryption, are very obfuscated. You need the crashplan client to make sense of it.

 

- As It's a directory and a bunch of subdirs you can use the user share spanning in unraid to let the backup span disks using that mechanism.

 

- As crashplan only takes a single directory as the backup location I'm not sure you can do a manual mapping of disk1 -> disk1 etc.

 

It will also be slower, your rsync isn't doing much other than copying the data - if you enable encryption in crashplan you will see a big hit in overhead. If you're also using user shares you'll get the overhead with that too.

 

Perhaps I should try it without encryption first.  Might be obfuscated enough.  Then if its fast enough, I can always try the encryption method.

 

Even though I don't use usershares at all for backups, sounds like I will be forced to at least create one on the destination system called backup to all it to span all the disks.

Link to comment

Interesting thread.  Hadn't even thought about using Crashplan, since I don't think a cloud backup is viable with a large UnRAID store -- my 25TB of data and 1Mb upload rate would take YEARS to upload ... and would probably realistically never get up-to-date.    Didn't realize Crashplan allowed local backup destinations ... in which case a 2nd UnRAID makes a very good backup choice.

 

Just curious:  What do you see as the advantages of this vs. simply using a free sync utility like SyncBack, SyncToy, etc. to do the backups?  SyncBack works very well for backups, and even supports encryption of the backups (although I don't use that feature).    I use it for backing up all of my non-UnRAID systems (I currently use a different approach to UnRAID backups; but a 2nd server is an attractive alternative that I'm considering).

 

Link to comment


Interesting thread.  Hadn't even thought about using Crashplan, since I don't think a cloud backup is viable with a large UnRAID store -- my 25TB of data and 1Mb upload rate would take YEARS to upload ... and would probably realistically never get up-to-date.    Didn't realize Crashplan allowed local backup destinations ... in which case a 2nd UnRAID makes a very good backup choice.
 
Crashplan also allows you to use your own offsite targets as well. Analogous to backing up to their crashplan central servers (I refuse to use the word cloud as once you scratch the surface of their system I can't see much elastic about it) but with your own tin.
 
At that point, coupled with 'pre-seeding' for want of a better word, you can have the best of both worlds presuming your daily churn / change rate is workable.
 
To mould to your example above you could build a second crashplan server and, locally over your LAN, transfer your 25TB to it.
 
You could then physically take this server offsite to somewhere else that is on the end of an internet connection (friend, family, workplace, paid hosting in a rack) and then you would just be transferring the daily deltas across - which could be much more palatable.
 
I would also remind that it would be worth really considering how much data you need to backup. I don't wish to come across as presumptuous about your data (it could be 25TB of your childrens growing up in pictures / video after all) but I personally found that a lot of stuff I have stored I don't really need to backup. It would be easily regeneratable should the worse happen. Of course there is a trade off here to consider against the time it might take to do that (e.g re-ripping all media) against the costs and practicality of the backup system. BUt you can manually move that balance point.
 
To meet a middle ground I do store 'metadata' about what I have and back that up - e.g a text file generated listing the data I have - to help me know what I actually have to regenerate should the worst happen.
 
Everyone is different though.
 
Whilst it would work, if I was looking to use crashplan *only* for local backups there would perhaps be more incentive to use something else. It still brings benefits but how much purely for local I'm not sure. Once you start mixing local with remote etc then it becomes a different game.
 

Just curious:  What do you see as the advantages of this vs. simply using a free sync utility like SyncBack, SyncToy, etc. to do the backups?  SyncBack works very well for backups, and even supports encryption of the backups (although I don't use that feature).    I use it for backing up all of my non-UnRAID systems (I currently use a different approach to UnRAID backups; but a 2nd server is an attractive alternative that I'm considering).
 
I haven't used synctoy etc so I suppose I'm not in a place to compare directly (though I can guess at how they work). Here is a quick list of likes and dislike for me personally around crashplan. You can infer them against what you know about syncback etc if you like. Feel free to ask for more detail (here or pm if you like).
 
Likes
--------
 
  • Flexibility. As above with one single client I can backup to multiple remote locations, some under my control, accept remote backups from others, backup locally. I currently backup offsite to my parents house and to my workplace (fortunate enough to have oodles of 'free' bandwidth and storage to backoff onto). At the same time all the other machines in my house backup into crashplan on unraid. All this through one client.
  • Feature set. Encryption. Dedupe. Compression. Those are the big ones for me, but it does many other things others might find useful
  • Coupled with the above, which could be a like or dislike depending on what your stance is, obfuscation of the target data. Data is stored in crashplans own format which means for remote backups, coupled with encryption, it's difficult (impossible?) for anyone snooping to see what your data is. Even crashplan if you're careful about how you implement encryption. This is much more desirable for me than having just a normal copy of a file tree on an offsite location - especially if I don't fully control that location.
  • Other than a few bobbles it does seem to 'just work'. Which is nice. Whilst you might be able to replicate it yourself using rsync and some wrapper scripts - you need to manage it all yourself. And likely fettle and prod it now and again to keep it going. I don't have any time or patience for that any more.
  • Cross platform. It's java (see below as well..) so runs on pretty much anything. Official support for platforms is good and covers the main ones - but I have had it running on things not listed. Like FreeBSD. Flexibility again I suppose
     

 

I think the biggest one to take from the above is flexibility

 

Dislikes

-----------

  • Java. Yuck. Resource hog. Usual java issues of arbitrary reliance on jvm versions. Crashplan supply you (or can do) with a supported stack to sit and use exclusively with crashplan which gets round this. But still. I appreciate it brings cross platform benefits which might mean suffering java is worth it. But sometimes I think it would be nice to see a nice streamlined C client humming along.
  • No matter if you use their storage or not evverything revolves around crashplans online portal / central service. Your client is always signed in there. You have a constant heartbeat connection. I understand the reason for this and it brings benefits but it also brings reliance on crashplan when there may be no real need for it. It also makes me nervous about what data they're actually getting from you. Even if not using their storage and if encrypting everything fully the client still leaks data to them - e.g what paths you're backing up - which may trouble you
  • Alongside with the above many important parts of their documentation can be vague. I don't know if that's to try and keep it simple - or to hide the detail from you on purpose. Tin foil hat or benefit of the doubt?
  • Coupled with the heartbeat you *need* the crashplan client to make sense of your backup data. If crashplan as a company went away tomorrow without doing anything to mitigate access issues you would be stuck - again even if not using their service
  • As hinted above they have had blips in working. For me that's fine, these things happen, but the way it manifested itself was concerning - specifically the client saying it had backed up ok when actually it had done no such thing. That's many times worse than the client spitting out an error and not doing the backup anyway
  • It can be slow. I attribute that to java + heavyweight feature set. This only really applies if you're looking to maximise local performance. It should, generally, be able to work hard enough to saturate a home uplink. Unless you have a very slow machine or a very fast uplink!
  • There is no concept of an overall 'snapshot' of the system. I can't tell the crashplan client to restore my system as it was 5 days ago (as far as I'm aware). I can do that for individual files but not for the whole lot. This could be problematic if only if you know you've lost data but don't know what. At that stage being able to say  'put me back to how I was before' is very useful.

 

This is just around the client. Not their actual crashplan central service (they store your data). Which is a whole different story again. I don't use it and the price isn't a barrier. I just don't trust them and I don't like how they've implemented the service. That's just my opinion on it though - many people use it with no issues and to their credit when they say unlimited they do seem to mean it. If I didn't have my own options to backup offsite would I use them? probably - but with great care on the config of the client and the data I exposed to them.

 

Hope this helps. At the end of the day I use it - whilst it may not be a glowing recommendation it counts for something.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...