Tuesday Tip: rsync Command to Include Only Specific Files
I find myself using rsync a lot, both for moving data around, for creating backups using rsnapshot (yes, even on Windows!) and for mirroring public Open Source projects and repositories.
I used to create all sorts of filters and scripts to make sure I was getting only the files I wanted and needed, but I found a better way, and it wasn’t exactly intuitive.
--include="*/" --include="*.iso" --exclude="*"
In order for this to work as intended, the “include” patterns have to come before the “excludes”. This is because the very first pattern that matches is the one that gets evaluated. If your intended filename matches the specified exclude pattern first, it gets excluded from the scope.
When dealing with a very large, possibly unknown remote directory structure, you either have to include all of the remote subdirectories individually like this:
--include="/opt" --include="/var" --include="/home"
Or you can use the following syntax to include all directories (not files) in the scope:
--include="*/"
Once you’ve included every directory below your target scope, you can pass the filespec you’re interested in (in this case, I wanted every bootable ISO file from a remote CentOS mirror), and then you exclude everything else that doesn’t match that filespec. It looks like this:
1.) Include every directory:
--include="*/"
2.) Include *.iso as your intended matching scope
--include="*.iso"
3.) Exclude everything else
--exclude="*"
That’s the magic sauce.
Some of these options and the order they appear in may seem very non-intuitive, so please read the rsync documentation carefully paying specific attention to the “EXCLUDE PATTERNS” section of the docs.
When in doubt, always use “–dry-run –stats” to check your work before copying or modifying any data.
Measure twice, cut once.