Using fdupes to Solve the Data Duplication Problem: I’ve got some dupes!



Friday, August 23rd, 2013 at 12:05 pm | 1,446 views | trackback url

Well, 11.6 hours later after scanning the NAS with fdupes, I noticed that I’ve got some dupes across my system backups.

# time ./fdupes -R -Sm "/nas/Backups/System Backups/"
2153352 duplicate files (in 717685 sets), occupying 102224.5 megabytes


real	698m15.606s
user	38m20.758s
sys	92m17.217s

That’s 2.1 million duplicate files occupying about 100GB of storage capacity in my backups folder on the NAS. DOH!

Now the real work begins, making sense of what needs to stay and what needs to get tossed in here.

UPDATE: I may give up on fsdupes altogether, and jump to rmlint instead. rmlint is significantly faster, and has more features and functions. Here’s a sample of the output:

# rmlint -t12 -v6 -KY -o "/nas/Backups/System Backups/"
Now scanning "/nas/Backups/System Backups/".. done.
Now in total 3716761 useable file(s) in cache.
Now mergesorting list based on filesize... done.
Now finding easy lint...
Now attempting to find duplicates. This may take a while...
Now removing files with unique sizes from list...109783 item(s) less in list.
Now removing 3917500 empty files / bad links / junk names from list...
Now sorting groups based on their location on the drive... done.
Now doing fingerprints and full checksums..
Now calculation finished.. now writing end of log...
=> In total 3716761 files, whereof 1664491 are duplicate(s)
=> In total  77.66 GB  [83382805000 Bytes] can be removed without dataloss.

Last Modified: Sunday, August 25th, 2013 @ 21:38

Leave a Reply

You must be logged in to post a comment.