Archive for December, 2008

Yak-shaving with my Music and Media collection

Tags: ,

Gold iPod Shuffle
This particular bit of yak-shaving all started because one of the Amtrak LSA staff asked me if I could write a tool to print out his MP3 collection by Artist, Album and/or Year. This LSA (Lead Service Attendant; they manage the café car) works as a DJ in his off-hours, doing various gigs for weddings and other parties.

So I took 15 minutes while traveling to the office to whip up something in Perl that did just that, and dumped it to a plain text file which I could then reformat in OpenOffice.org and then export as a pretty PDF he could print and hole-punch into his DJ binder. Problem solved, and he was impressed that it only took 15 minutes to cook that up.

And that’s when it started. The yak-shaving.

“yak shaving is what you are doing when you’re doing some stupid, fiddly little task that bears no obvious relationship to what you’re supposed to be working on, but yet a chain of twelve causal relations links what you’re doing to the original meta-task.”

Here’s how it began:

While building that list of Artist/Album/Title/Year, I realized that some of my mp3 files were missing some pieces of information. Some had the years missing, some had the genre mixed up, some were missing the data altogether.

So I went in and started fixing that.

Then I realized that the album art I was storing as “folder.jpg” was missing in some directories, and each time I rebuild my music library via amaoK or iTunes or anything else, I have to go re-fetch those missing album covers from Amazon or other online places.

So I went in and started fixing that too.

To do that, I had to use a Windows tool called Tag Tuner. I’m not a Windows person by any means, but there really is nothing as slick as TagTuner in Linux (yet?). There is kid3, but it lacks some pretty powerful features (but adds its own, like the ability to remove headers from the mp3 files).

I started adding in all of the missing cover art, storing the album art as an actual image file within the APIC field of the ID3v2 MP3 header. Some of the album art required that I scan in the actual covers from the CDs I have that aren’t available anymore, or aren’t online. Some of it was Google’d up, and others were found in other places on the ‘net.

It was (and still is) an enormous task to make sure every piece of the mp3 metadata is correct, album art is intact (including bootlegs, bonus albums, NFR [not for resale] albums and others).

Then I decided to try to “enhance” the Perl script I wrote, by slapping a web front-end on it, so I could sort by Artist, Album, Year, Genre and so on, and export that to a nicely-formatted PDF file for “Shaggy” (the Amtrak LSA/DJ) or myself.

I started down the path of looking into the Apache::MP3 Perl module on CPAN, which looked promising. When I Google’d up some example code, I saw a reference in an obscure Ubuntu forum post that mentioned using an Apache2 module called mod_musicindex, which supersedes Apache::MP3.

I installed and configured that, and found that there were some discrepancies in the configuration, and that some of the values in the default stanzas indicated in several web references on setting up mod_musicindex all pointed to. They were all incorrect. Here’s what was suggested:

Alias /music/ "/Media/Music/mp3/"
<Directory "/Media/Music/mp3/">
    AuthType Basic
    AuthName "music"
    Require group music

    Options Indexes MultiViews FollowSymlinks
    AllowOverride Indexes
    MusicIndex On +Stream +Download +Search -Rss -Tarball
    MusicFields title artist length bitrate
    MusicPageTitle Media Library
    MusicDefaultCss musicindex.css
    MusicIndexCache file://tmp/music
    MusicDirPerLine 4
    MusicIceServer [ice.gnu-designs.com]:8000
    MusicCookieLife 300
</Directory>

The problem was that restarting Apache resulted in errors with some of those options. I found a small clue buried in the README for musicindex:

“The MusicIndex Option replaces altogether MusicLister, MusicAllowDownload, MusicAllowStream, MusicAllowSearch, and MusicAllowRss.”

Removing those options and replacing them with their new equivalents solved that problem.

Alias /music/ "/Media/Music/mp3/"
<Directory "/Media/Music/mp3/">
    AuthType Basic
    AuthName "music"
    Require group music

    Options Indexes MultiViews FollowSymlinks
    AllowOverride Indexes
    MusicIndex On +Stream +Download +Search -Rss -Tarball
    MusicSortOrder album disc track artist title length bitrate freq filetype filename uri
    MusicFields title artist album length bitrate
    MusicPageTitle Media Library
    MusicDefaultCss musicindex.css
    MusicIndexCache file://tmp/music
    MusicDirPerLine 4
    MusicIceServer [ice.gnu-designs.com]:8000
    MusicCookieLife 300
</Directory>

And that worked. But it was deathly slow to render a single directory of only a handful of music files. I tried to eek out more performance, but it was just too slow to be useful.

Then I found a reference in another forum thread of a replacement for mod_musicindex called “edna“, so I decided to download that and give it a try.

edna is a standalone Python script which listens on a port and can present your music collection in a very similar way to mod_musicindex, but it is VERY fast, and has quite a few additional features that mod_musicindex does not provide.

But… it’s Python, and I have a genetic distaste for anything written in that language. I played with it for quite awhile and walked around my music collection with it. One of the limitations of edna that I found (besides being written in Python) was that it required that album art be in a single, separate file stored in the same directory as the mp3 files. Since I painstakingly took the time to store each and every album cover in the mp3 files themselves, this was a no-go for me.

So I went back to mod_musicindex while I kept looking for alternatives. One of the quirks with mod_musicindex that I found, was its rendering of proper unicode characters. I jumped into the #apache IRC channel on Freenode to ask for some guidance with respect to “tricking” the right charset to be used (for example, Björk was showing up as B?ork) and one of the lurkers in #apache asked if I’d ever heard of “Ampache” before. I hadn’t, so I trundled over and installed a copy.

The installation was really clunky and challenging, and I had to go into the code at one point and gut out a check which was throwing an error, because it made assumptions about my Apache setup that were just not valid.

I installed that, configured it, added a “catalog” (what ampache calls a collection of your music) to begin navigating through the interface.

In doing so, I realized that there were still quite a few mp3 flies with the wrong ID3v2 metadata or missing/incorrect album covers.

So that’s where I am now. I’m using a combination of Ampache + TagTuner to go through my entire MP3 collection and “normalizing” all of the data in each file. It’s long and drawn out work, but ultimately beneficial, since I only have to do it once.

And when I get back on the train to NY again and “Shaggy” is working, I can show him this system and see if it would be useful for his own DJ rig or parties.

THAT, is yak shaving in the true sense and spirit of the term.

Information wants to be a ballerina

Funny quote of the day:

>>>> Information wants to be free!

>>> Information wants to be a ballerina!

>> Then information needs to get her fat ass on a diet or she's 
>> never going to fit into that tutu and make Mommy proud!

> That kind of parenting made information a heroine addicted 
> stripper, now come over here and rub your data against me 
> for a dollar.

Source: Slashdot Managing Last.FM’s “Mountain of Data”

WARNING! Do not install Lookeen and Xobni!

In my last post I wrote that I was comparing two third-party tools that are used to search and index Outlook content; Lookeen and Xobni.

DO NOT INSTALL OR USE THESE TOOLS!!

After I was done testing them and uninstalled them cleanly, they each corrupted my Outlook.pst file in two different ways… and the SCANPST.EXE tool (shipped with Office 12) does not fix the corruption. I even tried Advanced Outlook Recovery and it couldn’t fix the damage either.

I replayed the damage from a clean, previously scanned version of a .pst file before and after installing, running and uninstalling these tools, and the corruption is reproducable (and produces an unusable, unrecoverable file).

Now I have to blow away the damaged .pst file and start from clean backups, restoring from my data exports. The data in my main “Personal Store” (.pst) file goes back 10 years, and has over 11,000 entries in it. Recovery is not a simple process.

I repeat: DO NOT INSTALL OR USE THESE TOOLS!!

External tools to search Microsoft Outlook 2007

Tags: ,

I’ve always been Linux and Open Source developer, supporter and user for as many years as I can remember. I’ve always sought to unify my personal space and environment starting with the tools available in the OSS community because they tend to fit my needs a LOT more than most of the proprietary vendor tools. The other benefit of using the OSS solution is
that if I run into a bug, I can either fix it myself or report it and have it fixed upstream very rapidly so everyone can take advantage of the fix.

But there really is no better calendaring solution that I’ve personally found in any flavor or OS than Microsoft Outlook 2007. Sure, it has its faults, as does any product of this level of complexity, but it works well and seems to suit my needs… and also allows me to quickly pull that information around to my other calendars (Work, Personal, Seryn, Holidays, Friends) and I can easily sync my Treo 680 to it to keep my various calendars fresh. I can also sync it to Google Calendar and export it to a CSV or XLS file for import into other incompatible systems.

However, the single biggest flaw that I’ve run into with Microsoft Outlook 2007 is the “Search” functionality. It is downright useless for anything other than taking up toolbar real-estate.

Granted, I’m spoiled by the power and simple flexibility of Google’s search. I’ve never, not even once, needed to go beyond Page 1 of any Google Search Results (SERPs). When I get results, I look at the quality, then tweak my search accordingly and re-submit. I don’t page through multiple pages until I see what I want.

Outlook doesn’t even correctly search all of the words you put into the search box. For example, if you received a message that included a subject of: “::trap id::820 (Server online) ...” and searched for “Server online” (without quotes), you get results that include the word “Server” and results that include the word “online”. If you use quotes (a Google thing), you get 0 results. If you search for “trap id”, you get 0 results. Well that’s just helpful.

I won’t ramble on about the multiple hundreds of other searches that I can clearly see results for in my Inbox, which Outlook quietly ignores… there are plenty. The Outlook search not only sucks, it’s flat-out broken. I don’t see Microsoft spending any time fixing this any time soon.

Enter third-party search tools to the rescue!

I found this thread on a quick Google search for better search tools for Outlook. The results of that thread boiled down to two leading tools:

  1. Xobni (“Inbox” spelled backwards)
  2. Lookeen

I decided to give them both a thorough test drive and see which one would suit my needs. I pointed both of these plugins to three email accounts I have configured in Outlook 2007 on my laptop, including one that is very-much littered with attachments of every kind and variety, thousands of calendar items, meetings, and two Gmail accounts… one of them with over 300,000 separate emails in it stretching back 10 years. Believe me, these plugins are going to get a serious workout!

One important thing to note is that my work Outlook instance inside the corporate firewall (also Outlook 2007) is so locked down that I can’t install any add-ons into it. In fact, the whole machine is so locked-down that I can’t even double-click the clock in the corner to see the mini-calendar or add a printer or change the desktop wallpaper or screensaver! At work, I’m stuck with the default, broken-by-design Outlook 2007 search. Since that one uses Exchange, it’s even slower than my local Outlook 2007 instance running inside VMware on my laptop.

Xobni

http://www.xobni.com/

First, the bugs and negative points:

  • Slow, slow slow!
  • Requires installing itself as a Windows service which runs all the time, not just when Outlook is running. The problem here is that I keep all of my Outlook data files password-protected (for obvious reasons), and when Outlook isn’t running, the service fires up to index the Outlook data and prompts me for a password. Very inefficient, and doing so loads Outlook, which I do not always want.
  • No advanced search (this thing -“not this one” +that)
  • Broken search when using punctuation (search for “DBD::Sybase”, for example)
  • Seems to incessantly try to get to the net to look up photos, Facebook profiles, LinkIn profiles and so on with no way to disable it. Inside the corporate LAN, none of these sites are accessible, and it drags the search to a crawl, as it attempts to contact those sites and times out.
  • Related to the last item, disabling those “Third-party Extensions” (as Xobni calls them), does not remove the buttons for those extensions from the UI, so they are still accessible to be clicked, they still attempt to query the net, and they are still visible on the GUI.
  • Nasty bug when backspacing to the beginning of the search box and then typing again. 100% reproducible and annoying. I basically have to type at 1/3 the speed to stop it from triggering the bug. Unacceptable.
  • Minimizing the Xobni sidebar uses a minimize widget that sits on the bottom of the bar, completely out of place from the » chevrons for minimizing other widgets which are at the top of each of their respective sidebars.
  • Minor nit: The Xobni skin/theme doesn’t match the Outlook skin/theme, so it stands out like a sore thumb and looks like a clunky “bolt-on” rather than an enhancement. If you have the folder tree, Inbox, message preview, Xobni and Task bar all displayed in 4 vertical columns.. the Xobni one (3rd column) really sticks out. In fact, when you minimize the sidebar, it goes from a black/dark skin when exposed, to the proper, Outlook-compatible color theme/skin when minimized. It’s not consistent. Even the size of the chevrons differs in the minimized and maximized versions of that sidebar.

The good points:

  • Free as in beer
  • Lots of great features that no other search or plugin I’ve found has (“network”, statistics, attachments widget)
  • Relatively fast searching and display of results, including a hover preview mode

Lookeen

http://www.lookeen.net/

The bugs and negative points:

  • Everything is in a popup window, nothing docks with Outlook itself (like Xobni does, for example). Popups are sometimes ok because they can be minimized and moved out of the way, but in this case, having the search results side-by-side with the actual content would be better.
  • Can’t remove searches to re-edit them (changing “Search This” to “Search this” (letter case changes) are ignored when you type them, but are Case Sensitive in the actual search results). Ideally I should be able to expand the search history and hit Delete on each one like I can with the search history in a web browser. I can clear the whole search history, but that’s an all or nothing operation. I can’t delete bogus searches, leaving the others there (which might be frequently-used searches)
  • $39.80/USD price tag. While I have no problem paying this for a functional plugin, there are quite a few niggling things that don’t justify that cost at this point.

The good points:

  • A LOT faster than Xobni. I mean a LOT. Searching is faster, indexing is faster, everything is faster.
  • Between the two, Lookeen handles all sorts of attachments much better and visually represents them in a way that is very quick and easy to visually filter in or out what you’re looking for, IMO. It also has a “Conversations” button in several places that makes zooming into a relevant conversation from a highlighted message or user very quick. Additionally, you can go from 1 day, 1 week, month of conversations to further narrow your search results and scope. Very well thought-out.
  • Lookeen docks in the toolbar, which has a few advantages, while Xobni docks in the sidebar (and based on its featureset, leverages slightly different advantages there)

The Conclusion

Ultimately, I’ll probably continue to use Lookeen, because it does what it advertises to do much better… searching and finding. Yes, their implementation of searching is a bit “clunky” with the popup windows instead of docking against Outlook, but I can deal with that for now. The graphical slowness and constant “Internet poking” that Xobni does completely negates any benefits it provides with the statistics and grouping of messages and search results.

I’m still looking for other options that Lookeen and Xobni provide. If anyone has any ideas or suggestions, I’m all ears.

Submitting Tasks to Tracks via Email

Tags: , , ,

I’ve recently installed Tracks on my server to help me manage my myriad of tasks, projects and timelines utilizing the GTD methodology created by David Allen.

After installing it on my server I almost immediately noticed that Ruby and Rails was going to have a problem coexisting with my very tweaked Apache instances (running roughly 300 separate websites).

I found a project called Passenger, and immediately got that working with Tracks running on top of it. No fuss, no muss.

But one problem with Tracks that I’ve found (well, one of several, none of them showstoppers), is that you can’t interact with Tracks any other way other than the web interface. The interface is nice and clean and slick, but I don’t always have access to the web where I am.. and that limits the functionality of Tracks for me. I do, however… have access to some sort of email account everywhere I go.

So I whipped up a quick little script to allow me to send email to Tracks and have it post Tasks in the right contexts and on the right due dates for me.

Here’s the code for that:

#!/usr/bin/perl
#        _._   
#       /_ _`.      (c) 2008, David A. Desrosiers
#       (.(.)|      setuid at gmail.com
#       |\_/'|   
#       )____`\     Email to Tracks interface
#      //_V _\ \ 
#     ((  |  `(_)   If you find this useful, please drop me 
#    / \> '   / \   an email, or send me bug reports if you find
#    \  \.__./  /   problems with it. I accept PayPal too! =-)
#     `-'    `-' 
#
##############################################################################
# 
# License
#
##############################################################################
#
# This script is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 2 of the License, or (at your option)
# any later version.
# 
# This script is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
# more details.
# 
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc., 59
# Temple Place
#
# - Suite 330, Boston, MA 02111-1307, USA.
#
##############################################################################
# 
# Email format can be any of the following:
#
#       Subject: The main description of the Task
#
#       @Context
#       01/02/2003
#       This is my note text
#
# or:
# 
#       Subject: Default Task description
#
#       c: @context
#       d: 2008/12/31
#       n: The note text to insert
#
# Most valid date formats are accepted, and this will do its best to
# "correct" and normalize them. You can also prefix your lines with the
# modifiers above, or read the regexes below for more. For example: 
# c:, context:, cxt:, con:, ct: are all valid prefixes for "context"
# n: and note: are all valid prefixes for "notes", and so on.
#
##############################################################################
use strict;
use XML::Simple;
use LWP::UserAgent;
use URI::Escape;
use Email::Abstract;
use Date::Manip;
use Date::Parse;
use Data::Dumper; 

my $url                 = "http://your.tracks.site/todos.xml";
my $contextUrl          = "http://your.tracks.site/contexts.xml";

# The default contextid where you want the todo added
# SELECT id,name FROM contexts;
my $contextid           = "8";

my $user                = "yourname";
my $password            = "yourpass";

# Leave these tokens alone.  They are valid as of Tracks 1.5 RESTful API.
my %todo = map { +($_ => "todo[$_]") } qw(notes context_id description due);

# Get the context legend in order to match by name
my $ua                  = new LWP::UserAgent;
my $req                 = new HTTP::Request 'GET',$contextUrl;
$req->authorization_basic($user,$password);
my $res                 = $ua->request($req);
my $contexts            = XMLin($res->content);

# Split apart the email into Subject and Body
my $message             = do { local $/; <STDIN> };
my $email               = Email::Abstract->new($message);
my $subject             = $email->get_header("Subject");
my $body                = $email->get_body;

# These can probably be cleaned up a bit
my ($context_line)      = $body =~ /^(?:c:|ct:|cxt:|con:|context:|@)\s*(.+)$/mi;
my ($date_line)         = $body =~ /^(?:d:|date:)\s*(\d.*)$/m;
$date_line              = UnixDate(ParseDate("today"), "%g") if (length($date_line) == 0);
my $time                = str2time($date_line);
my $due_date            = UnixDate(scalar gmtime($time), "%m/%d/%Y");
my ($note_line)         = $body =~ /^(?:n:|note:)\s*(.*?)$/mi; 

# Concatenate the data here before we send POST to the Tracks server
my $post_data = 
        $todo{'context_id'} . "=" . $contextid . "&" . 
        $todo{'description'}. "=" . uri_escape($subject) . "&" . 
        $todo{'notes'} . "=" . uri_escape($note_line) . "&" . 
        $todo{'due'} . "=" . uri_escape($due_date);

# Use LWP to do the posting ($ua was created earlier)
$req = new HTTP::Request 'POST',$url;
$req->content_type('application/x-www-form-urlencoded');
$req->content($post_data);
$req->authorization_basic($user,$password);
$res = $ua->request($req);

You can send emails in the format in the comments above to your Tracks install and it will create Tasks for you. I could refactor those regexes a bit, and that’s a task on my list, but so far, this works…

Emails look like this:

Subject: Update regexes in Tracks perl email interface
From: "David A. Desrosiers" <desrod at gnu-designs.com>
To: 2e59561d76 at gnu-designs.com
Content-Type: text/plain
Message-Id: <1228182961.14783.123.camel at gnu-designs.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.22.3.1 
Content-Transfer-Encoding: 7bit
Date: Mon, 01 Dec 2008 20:56:02 -0500
X-Evolution-Format: text/plain

c: @Internet
d: 12/15/2008
n: Clean up the regexes that processes email-based Tracks tasks

Set it up in your MTA as an alias, as follows:

2e59561d76:                         "|/path/to/tracks-mail.pl"

Make sure you re-run newaliases(1) after adding this entry:

$ sudo /usr/sbin/newaliases
/etc/mail/aliases: 248 aliases, longest 82 bytes, 15141 bytes total

I tried to make it smart enough to figure out most common (valid please) date formats and if you omit that, it just sets it to today’s date so you can change it later.

The next step is to figure out how to make this script multi-user aware, without forcing users to expose their username or password in email directly. I have some ideas for that and will test that in v1.1 of this script and release it later on.

The Tracks Forums are pretty busy and lots of people are beating up the code, testing it in many different ways. I’m just trying to contribute back in whatever way I can.. in the hopes that the project continues to thrive and grow.

If you have any suggestions or ideas, let me know!

Bad Behavior has blocked 1671 access attempts in the last 7 days.