Removing thousands of duplicate email messages from your email



Tuesday, April 8th, 2008 at 3:58 pm | 70,093 views | trackback url
Tags: , , ,

I’ve been slowly loading all of my mail into GMail in an attempt to try to use the system as a better way to manage my email, “folder-free”.

GMail uses the notion of tagging emails with “labels” and “Archival” of messages instead of the classic mail folder heirarchy. Productivity experts higher-than-me continue to praise the system as being better, so I decided to give it a try… on 10 years of my email; over 300,000 messages.

But today I noticed that some of my larger mail folders had duplicate emails in them. LOTS of duplicate emails (one folder had over 15,000 duplicates!). Removing that many dupes from hundreds of local IMAP folders was not going to be a fun task…

I looked around to find some good tools to do it, and came up with several shell scripts, Python tools and other home-grown things, but nothing I wanted to really try on my large email archive.

Then I found the Remove Duplicate Messages add-on for the Mozilla Thunderbird Mail client. I don’t use Thunderbird, and prefer to use Evolution or Outlook 2007 for managing my PIM data now (yes, I really do use Outlook 2007, because frankly, nothing even comes close to functionality in the Linux space).

But I decided to give it a try. I configured my local IMAP account in Thunderbird, let it query my folder list and then installed the add-on. Here is the process to delete those duplicate messages:

  1. When your IMAP account is configured in Thunderbird, expand the folder you wish to check for dupes.
  2. Right-click the folder and select “Remove Duplicate Messages” (highlighted in red in the screenshot below):
  3. A window will pop up after it scans for dupes, offering the following:
  4. Click on “Delete Selected” to remove the duplicate messages it found.

That’s it. It’ll move those messages to the Trash folder, and you can go in there later, right-click the Trash folder and select “Empty Trash” to permanently delete them.

Pretty simple and easy. Obviously make sure you back up your mail folders FIRST before you try any of this, just in case.

Update: After I ran this through all of my folders and deleted a lot of “legacy” mail folders (old 3Com palm-dev Palm mailing lists going back to 1999), I now have 144,962 messages in my local mail archive (a 52% reduction in number of messages).

Much better and easier to manage, search and back up to the FreeBSD backup array. It also removed 800M of space from ~/Maildir in the process.

Last Modified: Wednesday, April 6th, 2011 @ 13:26

11 Responses to “Removing thousands of duplicate email messages from your email”

  1. I am often confronted with duplicates of letters in my email box. Leaves quite a lot of time trying to remove all unnecessary. Outlook 2007 duplicates remover helps me a lot, also it greatly simplified this task. The program is easy to use takes up little space.

  2. For removing duplicates, ODIR is much better, safer, faster and free.

    http://www.vaita.com/ODIR.asp

    But Outlook can’t handle my Gmail account anyway, because there are too many messages for it to swallow before Outlook gets confused and breaks the connection.

    This is why I had to use Thunderbird and the Thunderbird extension.

  3. Your article is very interesting, but I don` t like Thunderbird. I prefer to use default Windows E-mail client and I use another software to detect duplicate emails in outlook.

  4. The Remove Duplicate Messages (Alternative) for Thunderbird (which I use anyway), would be a great solution – and it works on all folders (aka gmail labels), except on AllMail “folder” – and that’s exactly the “folder” that I have to check for duplicates. Is there a better solution?

  5. I switched from POP to IMAP on gmail/TB 6.0.2, and my duplicates are slightly different sized from the originals; so the add-on won’t recognize them.

    If anyone knows a solution, please write mmmmpppp1@gmail.com. TIA.

  6. I just installed Thunderbird 8.0, and I do not find the “Remove Duplicate Messages” option. It does not appear when I right click on the folder.

  7. correction: I couldn’t find the add-on

  8. Sorry, I did find the add-on. Still struggling to make it work :)

  9. Despite the fact that I would???ê?ève preferred in the event you went into a bit bit a lot more detail, I still got the gist of what you meant. I agree with it. It may well not be a well-known idea, but it makes sense. Will surely come back for much more of this. Fantastic work

  10. Hey there! Do you know if they make any plugins to protect against hackers?
    I’m kinda paranoid about losing everything I’ve worked hard on. Any tips?

  11. I know about a tool that Remove thousands of duplicate email messages in one step. Try Kernel for Outlook duplicates removal tool to remove useless and duplicates emails from your MS outlook. The software work with all version of MS outlook.


Leave a Reply

You must be logged in to post a comment.

Bad Behavior has blocked 2424 access attempts in the last 7 days.