Linux admins have to constantly keep directories synchronized on a regular basis. Here are two simple use case scenarios. In the first, a backup of important data needs to be constantly maintained at another location (either local or remote). This backup runs at set periodic intervals. In the other scenario, a web server’s data is mirrored on another server, either for load sharing, or for backup purposes. In both these instances, there is a “source”, and a “destination” that need to be kept in sync.
The preferred tool for using this is “rsync”. But why? Why not just use the standard “cp” copy command that Linux provides and be done with it? Here are a list of reasons.
Rsync vs cp (Copy)
Rsync and copy differ in the following ways:
1. Copy only the changes
cp has an option called “-u” which ensure that data is copied only if the source file is newer than the destination. This avoids unnecessary copying and transferring of data. However, rsync goes one step further. It only copies those parts of the file that have changed! This means that for very large files with small changes, rsync is remarkable efficient. Think databases with files that can span several GB. In these situations, rsync is enormously efficient.
Also not that the “-u” relies on the timestamp to determine whether or not to copy a file. It means that if the timestamp is the same but the contents have changed, the file will not be copied over.
2. Transfer Efficiency
Rsync is built around efficiency. It uses techniques like pipelining to ensure that the transfers proceed as quickly as possible with minimal overhead.
3. Delete if not in Source
This is an important one. When keeping two directories mirrored, you need to identify files in the source directory that have been deleted, and then remove them on the destination as well (if they exist). Without this, the destination directory will slowly fill up with useless files that no longer exist in the original location. If we want to maintain an exact “mirror” of a directory, this functionality is absolutely crucial.
4. Compression and Encryption
Rsync has options to compress data as well as encrypt it during the transfer process. If your information is sensitive, you want to keep it as scrambled as possible, to minimize the chances of man in the middle attacks. And for large data files, compression can be absolutely invaluable.
5. Human Readable Ouput
An under appreciated benefit of rsync is that it informs you of the progress of each file, the number of bytes transferred, and even shows you a progress bar. cp can get boring and you have no idea what’s happening until the job is done.
6. Creating Target Folders
It may seem like a small thing, but rsync also creates the destination folder for you if it doesn’t already exist. cp will instead give you an error saying “xyz is not a directory”.
Using Rsync – It’s Simple!
If you want to sync files and folders with rsync, the command couldn’t be easier. It’s just:
rsync -avh [source folder] [destination folder]
Explanation of options:
- -a: Archive mode. Recursive, keep all permissions, owner, group info, etc;
- -v: Verbose. Let us know what’s going on;
- -h: Human readable file sizes.
Here’s an example of copying a bunch of files and a directory to another folder:
You can see that it creates the directory, and then gives you an update of each and every file copied over. In addition, it tells you the amount of data sent and received.
Here’s the same example using the “-z” parameter which enables compression:
Now the data sent is just 314 bytes instead of 3.7 K! That’s a huge savings in bandwidth. You can see that while we might get the same basic job done using cp, rsync is a much more rounded tool for keeping two directories in sync. And it’s easy to use as well!