Removing thousands of duplicate email messages from your email

Tags: , , ,

I’ve been slowly loading all of my mail into GMail in an attempt to try to use the system as a better way to manage my email, “folder-free”.

GMail uses the notion of tagging emails with “labels” and “Archival” of messages instead of the classic mail folder heirarchy. Productivity experts higher-than-me continue to praise the system as being better, so I decided to give it a try… on 10 years of my email; over 300,000 messages.

But today I noticed that some of my larger mail folders had duplicate emails in them. LOTS of duplicate emails (one folder had over 15,000 duplicates!). Removing that many dupes from hundreds of local IMAP folders was not going to be a fun task…

I looked around to find some good tools to do it, and came up with several shell scripts, Python tools and other home-grown things, but nothing I wanted to really try on my large email archive.

Then I found the Remove Duplicate Messages add-on for the Mozilla Thunderbird Mail client. I don’t use Thunderbird, and prefer to use Evolution or Outlook 2007 for managing my PIM data now (yes, I really do use Outlook 2007, because frankly, nothing even comes close to functionality in the Linux space).

But I decided to give it a try. I configured my local IMAP account in Thunderbird, let it query my folder list and then installed the add-on. Here is the process to delete those duplicate messages:

  1. When your IMAP account is configured in Thunderbird, expand the folder you wish to check for dupes.
  2. Right-click the folder and select “Remove Duplicate Messages” (highlighted in red in the screenshot below):
    Thunderbird Remove dupes (rightclick)
  3. A window will pop up after it scans for dupes, offering the following:
    Thunderbird Remove dupes (main window)
  4. Click on “Delete Selected” to remove the duplicate messages it found.

That’s it. It’ll move those messages to the Trash folder, and you can go in there later, right-click the Trash folder and select “Empty Trash” to permanently delete them.

Pretty simple and easy. Obviously make sure you back up your mail folders FIRST before you try any of this, just in case.

Update: After I ran this through all of my folders and deleted a lot of “legacy” mail folders (old 3Com palm-dev Palm mailing lists going back to 1999), I now have 144,962 messages in my local mail archive (a 52% reduction in number of messages).

Much better and easier to manage, search and back up to the FreeBSD backup array. It also removed 800M of space from ~/Maildir in the process.

Bad Behavior has blocked 4993 access attempts in the last 7 days.