Friday, December 17, 2010

Rsync with 3 machines

In this article I will describe how to use rsync to sync some files on three servers where the same list of files is maintained by users i.e. a change to a file could happen on any machine (this is different to a hierarchical sync where files are pushed from a master server to a set of other machines).

Rsync at your company

I assume that rsync is installed on all three servers, not necessarily though in the same location, the rsync binaries need to exist on the local and remoate machines which talk to each other.
My test case were two Solaris sparc servers (they had rsync pre-installed) and one Solaris x86 server where I had to manually add it (I got it from a sunfreeware site and put it into /tools/rsync/bin/rsync but I leave that to the user).

Rsync mirror design

In order to have a proper control of what is sync-ed the main rsync process should be running on one machine only and perform the following steps.

Assume you have 3 machines A, B, C.
  • A->B: update files on machine B which are newer on machine A.
  • B->A: update files on machine A which are newer on machine B.
At this stage A and B are in sync. Now do the same with machine C.
  • A->C: update files on machine C which are newer on machine A.
  • C->A: update files on machine A which are newer on machine C.
Now A and and C are in sync but it could be that B is missing some updates from C so do:
  • A->B: update files on machine B which are now newer on A (and came from C)
(of course this is not a 100% safe algorithm since the whole sync scenario is not atomic: after syncing A and B and while A and C are syncing there could be a change happening on B which is not picked up in this round)

Here is the basic rsync command:
rsync --archive --update --verbose --stats --rsh=ssh --cvs--exclude 
        --file-from=somefile /app/foo/conf/ server2:/app/foo/conf
The --archive option is a summary option (recursion and preserving everything like timestamps, symbolic links etc. except hard links).
The --update options says to skip files from transferring which are newer on the target system.
The --verbose and --stats options are for reporting only.
The --rsh=ssh option means to use ssh as the login mechanism to the remote system.
The --cvs-exclude options excludes all CVS related files from being checked. It is basically a filter for certain file names and file extensions.
The --file-from=somefile lists a file which contains all filenames to be checked (nothing else will).
The first argument is the source directory to be checked and the second argument is the target machine server2 and directory to be checked (these could be different: no need for rsync that /app/foo/conf is in the same place).

There are a couple of noteworthy additional options:
--rsync-path=/tools/rsync/bin/rsync tells rsync where to find rsync on the remote system.
--dry-run tells rsync to do a check only but not do a real file transfer.

Note also that if 2 files are equal but have different time stamps rsync will update the time stamps so that they are in sync.


The above design has been implemented with
  • the script is /tools/rsync/scripts/ on server A
  • the list of files to be checked is in /tools/rsync/scripts/rsync.files (a list of config files)
  • password free ssh access from machine A to the remote machines has been enabled by adding the public key (.ssh/ into the remote machines's .ssh/authorized_keys file

Cron (UNIX tool to run regular jobs)

There is a simple cron job on machine A:
7,17,27,37,47,57 * * * * cd /tools/rsync/scripts; ./ | /usr/ucb/mail -s "rsync `date`"  foo@Bar.COM
i.e. it runs every 10 minutes, notification is via email to 'foo' (this could be improved).

Password free ssh

In order for rsync hosts to communicate via ssh but without password one needs to generate a public/private key on the central machine and it needs to be added to the remote machines.
  • On machine A: generate key files .ssh/id_rsa and .ssh/
ssh-keygen -t rsa        
Generating public/private rsa key pair.
Enter file in which to save the key (/app/foo/.ssh/id_rsa):  [Enter return]
Enter passphrase (empty for no passphrase):  [Enter return]
Enter same passphrase again:  [Enter return]
Your identification has been saved in /app/foo/.ssh/id_rsa.
Your public key has been saved in /app/foo/.ssh/
The key fingerprint is:
24:ab:31:1e:f1:74:16:4d:0f:8e:70:19:1b:31:2e:db foo@machineA
  • Check the public key (this is one line which I wrapped for readability)
cat $HOME/.ssh/
ssh-rsa AABAB3NzaC2yc2EAAAABIwCAAIEAvpzxLumVmSRPKmgwSk9NGPUDcxfFpypUAdi3UGpZ2QSqoak
P62JIVdlDOMsnY/eSYpK+ex/9RomVRa/bMw9b/D/e0= foo@machineA

  • Enter this line into .ssh/authorized_keys on remote machines (B and C)
(Note: the keys above are fake, just in case someone wonders)

This scheme works nicely for systems where there aren't that many changes happening at the same time.
Once can easily envision that if two users change the same file at the same time on machines B and C the subsequent rsync will first copy B's version to A and then C's version to A and then forward A's version (which is equal to C) to B thus wiping out the original change on B i.e. this scheme does not guarantee data consistency. I am using it to maintain certain config files where there are changes only once or twice per week and only a handful of users have access to the files.
There is also the issue of files being synced while updated by users at the same time. One would need much more clever file-locking-across-multiple-systems approaches to tackle this.

See also Rsync man page (lots of options)

No comments:

Post a Comment