Using fdupes to Solve the Data Duplication Problem: I’ve got some dupes!
Well, 11.6 hours later after scanning the NAS with fdupes, I noticed that I’ve got some dupes across my system backups.
# time ./fdupes -R -Sm "/nas/Backups/System Backups/" 2153352 duplicate files (in 717685 sets), occupying 102224.5 megabytes real 698m15.606s user 38m20.758s sys 92m17.217s
That’s 2.1 million duplicate files occupying about 100GB of storage capacity in my backups folder on the NAS. DOH!
Now the real work begins, making sense of what needs to stay and what needs to get tossed in here.
UPDATE: I may give up on fsdupes altogether, and jump to rmlint instead. rmlint is significantly faster, and has more features and functions. Here’s a sample of the output:
# rmlint -t12 -v6 -KY -o "/nas/Backups/System Backups/" Now scanning "/nas/Backups/System Backups/".. done. Now in total 3716761 useable file(s) in cache. Now mergesorting list based on filesize... done. Now finding easy lint... Now attempting to find duplicates. This may take a while... Now removing files with unique sizes from list...109783 item(s) less in list. Now removing 3917500 empty files / bad links / junk names from list... Now sorting groups based on their location on the drive... done. Now doing fingerprints and full checksums.. Now calculation finished.. now writing end of log... => In total 3716761 files, whereof 1664491 are duplicate(s) => In total 77.66 GB [83382805000 Bytes] can be removed without dataloss.