http://samba.org/rsync/) has a couple issues with mirroring large (> 100K) directory trees.
rsync's memory usage is directly proportional to the number of files in a tree. Large directories take a large amount of RAM.
rsync can recover from previous failures, but always determines the files to transfer up-front. If the connection fails before that determination can be made, no forward progress in the mirror can occur.
The solution? Chop up the workload by using perl to recurse the directory tree, building smallish lists of files to transfer with rsync. Most of the time these small lists of files transfer over fine, but if they fail, this script can look for that specific failure and retry that set a couple times before giving up.