Importing a decade of email into Google Gmail



Saturday, April 5th, 2008 at 8:11 pm | 5,607 views | trackback url
Tags: , ,

I have over 10 years of email on my machine, which I refer to from time to time for various projects and historical reasons. Many of these emails are from very active mailing lists I’m still subscribed to. The total space consumed by all of these messages is currently 2.3 gigabytes, and it is stored in Maildir format.

I’ve been spending the last 2-3 years pushing myself to become more and more productive using a collection of various systems mostly based around David Allen’s GTD system. The whole premise behind David’s system is to “dump your head” into a trusted system, always filter every input through that system. There’s quite a lot more to it, but once you get the methodology down, it really, REALLY does improve how much you can do. Not only can you do more with less time, but you can do what you’re already doing now, and get a LOT more free time back in your day. No, seriously.

Click the image below for a full-size version:

My own “hybrid” system encompasses analogue and digital formats, because of my specific and unique needs for the kind of work I do. At the core of the hybrid system is my PDA; a Treo 680 smartphone. If you want to see why I use a Treo instead of an iPhone, read my previous post on the matter.

Still with me so far?

That PDA synchronizes with 3 physically separate calendaring systems, in this order:

1. Palm → Work email/calendar in Microsoft Outlook 2002 format
2. Palm → personal home calendar in Microsoft Outlook 2007 format
3. Palm ↔ Linux calendaring (Evolution/Kontact)
4. Palm → personal calendar in Outlook 2007 format

What is important to note here is the ORDER in which I sync these systems. In the end, I put everything into my personal Outlook session using my Treo as a “middle-man”. I use Windows in the middle of this process because it has some plug-ins which I require for my Palm device (Natara DayNotez, Natara Comet and a few others available in Windows-only versions, sigh).

I also transport anything in the Linux calendaring side into Outlook (step 4 above), so I can sync that with Google Calendar with the CompanionLink for Google Calendar plugin (Windows-only again, sigh). CompanionLink is set to sync my calendar with Google every hour. This allows people who I have given access, to see my appointments and free/busy time while I’m unavailable, and if anyone has seen my calendar, they know why this is so important to me.

So far, so good… but this is just calendaring. What about email?

About 2-3 years ago, I switched from using pine as my primary mail client to Evolution because I needed Maildir support in my email. I needed to move to Maildir because I wanted the ability to use IMAP, and leverage its ability to store email folders within folders. pine doesn’t support Maildir natively, so I had to find a compatible alternative.

Moving to Evolution leveraged a lot of powerful features that help with my GTD and productivity (virtual folders being the most obvious of those). But I also lose a few things, most-notably the ability to interact with it remotely over ssh. I can still use mutt for that if I have to though, so all is not lost.

Back to IMAP for a moment.. I converted about 8 years of email from mbox to Maildir format when I switched to Evolution, and realized that I had close to 2 gigabytes of email at that time. Yikes! Maildir is faster and much more fault-tolerant, and gives me a lot more flexibility though.

As I write this, I now have pruned out a lot of older email and things I’ll never revisit, and my Maildir store is still 2.2 gigabytes of data, encompassing over 268,693 separate emails. Whew!

I have a local IMAP instance running on my laptop using dovecot which I use to interact with my local email store, and my various other accounts also use IMAP, and those are configured in Evolution as well. This lets me use one single protocol to talk to all of my email, local or remote. I don’t ever speak “Maildir” directly, I let dovecot-imapd do that for me.

With over 10 years of archived email, it becomes necessary to start organizing it into a logical hierarchy of folders and folders-within-folders. I don’t go crazy with the organization though, I just organize it at 2-3 levels at most (for example Palm → pilot-link-general, or Business → gnu-designs → Clients).

After reading some posts from the more-venerate GTD’ers and podcasts from the same, I started wondering about Gmail’s use of “labels” instead of folders, to sort and organize email. Everyone who has jumped to Gmail continues to tout the power of the system of NOT using folders, and using labels instead.

I remained skeptical, but I decided to give it a try anyway. There are limited options out there for importing existing email from a local system into Gmail, and of the tools that do it, they all have some pretty show stopper flaws in my opinion.

The “Gold Standard” in this category is a small tool called “GML”, which stands for “Google Gmail Loader”. You point this little Python applet to your local Maildir store and it begins uploading messages from your system into Gmail, one message at a time, one folder at a time (which you must continue to re-run the tools for each new folder you want to import), at a delay of 2 seconds between messages (to prevent throttling). This bottleneck is easy to bypass however, by editing the source file to set the sleep() time from 2 to 0. Beware, Google may block you if you slam their servers too fast though.

I tried this tool, and while it did work, it had some issues that made it impossible to use in my case (and some of these are core flaws in Gmail itself).

The biggest issue with GML is that since the tool requires an SMTP server to “send” email from your local machine to your Gmail account, the date on the final messages as shown in Gmail is the date that you sent them with GML, not the date that the message was actually originally received by your system. Oops! There’s nothing GML can do about this though, that’s just the way email works when creating the message envelope to send the message.

Another method is to use the Gmail Mail Fetcher, a hidden little tool inside your Gmail account. There’s a couple of ways to use this tool, and I’m currently using one of them to load my Gmail account with ALL of my archived email.

There are two methods to doing this. The first one is basically done by dragging your email from each and every folder of your local mail account into the Inbox of your Google account.

The second one involves creating a local mail account separate from your normal “daily use” account (or use one at your provider’s server), put all of your mail into this account into the Inbox, mark it all as Unread and then set up Google Mail Fetcher to grab it from your machine.

With Evolution either method is very is easy because I can create a virtual folder which has the lone criteria of “Match All”, which will match all local and remote email in all of my accounts. Select all of the email and copy it into the new IMAP account’s Inbox. Right-click, “Mark as Unread”, and the biggest hurdle is done.

I’ll go through the settings step by step:

Method 1: IMAP to IMAP Copy

  1. 1.Enable IMAP on your Gmail account through Settings → Account
  2. From your normal mail client, select the mail you want to move and drag-n-drop it from your local IMAP folder(s) into the of your new IMAP-enabled Gmail account. That’s it, two steps. The main problem with this approach is that the dates on the emails are going to be neutered and set to the date that you’re copying those files from your local IMAP into your Gmail IMAP account.

There’s another approach which isn’t so harmful to the original structure of your emails. Here’s a second method for doing this:

Method 2: pop3 to Google Mail Fetcher (“push to poll”)

  1. Create a pop3 account on your local machine (assuming Linux and dovecot-pop3d that is), or put all of your email into your pop3 Inbox on your POP provider’s server by copying it and dragging it from your local mail client into the account on your provider’s server. DO NOT resend or forward the emails to that account, that will break the dates and threading.
  2. Mark all of these emails as Unread, using whatever facility your mail client provides.
  3. Log into Google Gmail with the account where you want all of this new email to show up and go to Settings → Accounts. In the middle of this page, you will see a “Add Account” option. Add an account and point it to the location of this pop3 account you just filled with email at your server or your provider’s server. This account must be a pop3 account. Google Mail Fetcher doesn’t support IMAP (yet?).
  4. Click the “Check for mail..” link if it is available in this section of your Gmail preferences, and then go back to the Inbox in that account and refresh it after a minute or so. If you got everything correct, you should see new email showing up in the Inbox of this Gmail account. It takes awhile for a lot of messages, so be patient. It is not limited by your bandwidth, Gmail fetches on a polling period, with timeouts between each fetch.
  5. Once all of the email is in the Gmail account (verify by going to Settings → Account and check that there are no more emails indicated to fetch), you can select all of the emails in your local IMAP account’s Inbox and delete them.
  6. Select all from another local IMAP folder, and repeat the process until you’re done. Select all emails in a local IMAP folder, copy to Inbox on another local IMAP account, mark as Unread, poll with Google Mail Fetcher, delete from local Inbox. Lather, rinse, repeat.

One thing to be aware of when you are pulling email into Gmail using this method is that sometimes your valid email will be marked as “Spam” by Gmail’s filters. Just go into the Spam folder, check that you aren’t catching any valid emails and purge the spammy ones from the view.

If valid emails are being flagged as “Spam”, just select them with the checkbox and click the button at the top labeled “Not Spam”.

Using “labels” in Gmail to sort your email
That’s it. If you want to apply “labels” to the mail AS Google Mail Fetcher is pulling it from your local pop3 account, you need to specify that when you create the account within Gmail. Alternately, while your email is being pulled into your Gmail account, you can apply labels to the mail in the Inbox, and then colorize them.

Colorizing your “labels”
To colorize the labels, just go to the “Labels” box on the left column and click the little down arrow on the far-right of each label’s title. In there, you will be able to select a foreground and background color for that label.

Archiving mail to keep the Inbox clean
If you have a lot of labels and a lot of mail is accumulating in your Inbox, you can select the labeled emails and click on “Archive”. This will remove them from the Inbox and put them in the “All Mail” folder. You can still search for the messages, you can still read them in the All Mail folder (along with… all of your mail), or you can click on the label in the left column and just read those emails already tagged with that label. This keeps your Inbox relatively clutter-free.

You can also Archive the emails as they are fetched, by ticking the last checkbox in the Account Set up dialog box to skip the Inbox entirely, and just move them into the archive.

At this point, you should have a Gmail account with all of the emails from your local account in it, with all original dates intact in the emails.

I’m still importing my mail into GMail, and this process alone will take several months at the current speed, if I leave the machines talking together 24×7, but when its done, I should have something much more usable… so the “experts” say.

I’ll write up more on my “hybrid productivity system” shortly in another series of posts.

Incoming search terms for the article:

Last Modified: Saturday, April 5th, 2008 @ 20:11

Incoming search terms for the article:

2 Responses to “Importing a decade of email into Google Gmail”

  1. [...] google_ui_features = "rc:10"; Friday, April 11th, 2008 at 11:59 pm | 338 views | trackback urlIn a previous post, I described two methods of moving almost 15 years of email into GMail.I now have a newer, MUCH [...]

  2. [...] pm | 5,578 views | trackback urlTags: Firefox, Gmail, Python, ThunderbirdI’ve been slowly loading all of my mail into GMail in an attempt to try to use the system as a better way to manage my email, [...]


Leave a Reply

You must be logged in to post a comment.