Tags: Perl, Python
This particular bit of yak-shaving all started because one of the Amtrak LSA staff asked me if I could write a tool to print out his MP3 collection by Artist, Album and/or Year. This LSA (Lead Service Attendant; they manage the café car) works as a DJ in his off-hours, doing various gigs for weddings and other parties.
So I took 15 minutes while traveling to the office to whip up something in Perl that did just that, and dumped it to a plain text file which I could then reformat in OpenOffice.org and then export as a pretty PDF he could print and hole-punch into his DJ binder. Problem solved, and he was impressed that it only took 15 minutes to cook that up.
And that’s when it started. The yak-shaving.
“yak shaving is what you are doing when you’re doing some stupid, fiddly little task that bears no obvious relationship to what you’re supposed to be working on, but yet a chain of twelve causal relations links what you’re doing to the original meta-task.”
Here’s how it began:
While building that list of Artist/Album/Title/Year, I realized that some of my mp3 files were missing some pieces of information. Some had the years missing, some had the genre mixed up, some were missing the data altogether.
So I went in and started fixing that.
Then I realized that the album art I was storing as “folder.jpg” was missing in some directories, and each time I rebuild my music library via amaoK or iTunes or anything else, I have to go re-fetch those missing album covers from Amazon or other online places.
So I went in and started fixing that too.
To do that, I had to use a Windows tool called Tag Tuner. I’m not a Windows person by any means, but there really is nothing as slick as TagTuner in Linux (yet?). There is kid3, but it lacks some pretty powerful features (but adds its own, like the ability to remove headers from the mp3 files).
I started adding in all of the missing cover art, storing the album art as an actual image file within the APIC field of the ID3v2 MP3 header. Some of the album art required that I scan in the actual covers from the CDs I have that aren’t available anymore, or aren’t online. Some of it was Google’d up, and others were found in other places on the ‘net.
It was (and still is) an enormous task to make sure every piece of the mp3 metadata is correct, album art is intact (including bootlegs, bonus albums, NFR [not for resale] albums and others).
Then I decided to try to “enhance” the Perl script I wrote, by slapping a web front-end on it, so I could sort by Artist, Album, Year, Genre and so on, and export that to a nicely-formatted PDF file for “Shaggy” (the Amtrak LSA/DJ) or myself.
I started down the path of looking into the Apache::MP3 Perl module on CPAN, which looked promising. When I Google’d up some example code, I saw a reference in an obscure Ubuntu forum post that mentioned using an Apache2 module called mod_musicindex, which supersedes Apache::MP3.
I installed and configured that, and found that there were some discrepancies in the configuration, and that some of the values in the default stanzas indicated in several web references on setting up mod_musicindex all pointed to. They were all incorrect. Here’s what was suggested:
Alias /music/ "/Media/Music/mp3/" <Directory "/Media/Music/mp3/"> AuthType Basic AuthName "music" Require group music Options Indexes MultiViews FollowSymlinks AllowOverride Indexes MusicIndex On +Stream +Download +Search -Rss -Tarball MusicFields title artist length bitrate MusicPageTitle Media Library MusicDefaultCss musicindex.css MusicIndexCache file://tmp/music MusicDirPerLine 4 MusicIceServer [ice.gnu-designs.com]:8000 MusicCookieLife 300 </Directory>
The problem was that restarting Apache resulted in errors with some of those options. I found a small clue buried in the README for musicindex:
“The MusicIndex Option replaces altogether MusicLister, MusicAllowDownload, MusicAllowStream, MusicAllowSearch, and MusicAllowRss.”
Removing those options and replacing them with their new equivalents solved that problem.
Alias /music/ "/Media/Music/mp3/" <Directory "/Media/Music/mp3/"> AuthType Basic AuthName "music" Require group music Options Indexes MultiViews FollowSymlinks AllowOverride Indexes MusicIndex On +Stream +Download +Search -Rss -Tarball MusicSortOrder album disc track artist title length bitrate freq filetype filename uri MusicFields title artist album length bitrate MusicPageTitle Media Library MusicDefaultCss musicindex.css MusicIndexCache file://tmp/music MusicDirPerLine 4 MusicIceServer [ice.gnu-designs.com]:8000 MusicCookieLife 300 </Directory>
And that worked. But it was deathly slow to render a single directory of only a handful of music files. I tried to eek out more performance, but it was just too slow to be useful.
Then I found a reference in another forum thread of a replacement for mod_musicindex called “edna“, so I decided to download that and give it a try.
edna is a standalone Python script which listens on a port and can present your music collection in a very similar way to mod_musicindex, but it is VERY fast, and has quite a few additional features that mod_musicindex does not provide.
But… it’s Python, and I have a genetic distaste for anything written in that language. I played with it for quite awhile and walked around my music collection with it. One of the limitations of edna that I found (besides being written in Python) was that it required that album art be in a single, separate file stored in the same directory as the mp3 files. Since I painstakingly took the time to store each and every album cover in the mp3 files themselves, this was a no-go for me.
So I went back to mod_musicindex while I kept looking for alternatives. One of the quirks with mod_musicindex that I found, was its rendering of proper unicode characters. I jumped into the #apache IRC channel on Freenode to ask for some guidance with respect to “tricking” the right charset to be used (for example, Björk was showing up as B?ork) and one of the lurkers in #apache asked if I’d ever heard of “Ampache” before. I hadn’t, so I trundled over and installed a copy.
The installation was really clunky and challenging, and I had to go into the code at one point and gut out a check which was throwing an error, because it made assumptions about my Apache setup that were just not valid.
I installed that, configured it, added a “catalog” (what ampache calls a collection of your music) to begin navigating through the interface.
In doing so, I realized that there were still quite a few mp3 flies with the wrong ID3v2 metadata or missing/incorrect album covers.
So that’s where I am now. I’m using a combination of Ampache + TagTuner to go through my entire MP3 collection and “normalizing” all of the data in each file. It’s long and drawn out work, but ultimately beneficial, since I only have to do it once.
And when I get back on the train to NY again and “Shaggy” is working, I can show him this system and see if it would be useful for his own DJ rig or parties.
THAT, is yak shaving in the true sense and spirit of the term.