Archive for February, 2009

Converting FLOSS Manuals to Plucker format

Tags: ,

I stumbled across a site called “FLOSS Manuals” recently, and thought that it would be a great place to create some new Plucker documents for our users, and distribute them. I’ve create hundreds of other Plucker documents for users in years past, so this was a natural progression of that.

You can (and should!) read all about their mission and see what they’re up to. Credit goes to Adam Hyde (the founder), Lotte Meijer (for the design), Aleksandar Erkalovic (developer) and “Mr Snow” for keeping the servers running, among many other contributors.

The FLOSS Manuals Foundation (Stichting FLOSS Manuals) creates free, open source documentation for free, open source software. FLOSS Manuals is a community of free documentation writers that publish free manuals about free software across multiple languages.

Free software can be freely run, studied, redistributed and improved without the restrictive and often expensive licensing systems attached to commercial proprietary software. Developers can adapt free software to their own needs, and can contribute to its ongoing communal development. FLOSS Manuals specifically document software that is free in this development sense and also in price. Free software projects are developed using established methodologies and tools, and sites like Savannah and Sourceforge support established social production models for developing free software. FLOSS Manuals provides the methodologies, tools, and social production models for developing documentation of free software.

By supporting quality, user-friendly documentation of Free, Libre, Open Source Software, FLOSS Manuals aims to encourage the use of this software, to support the technical and social revolution it enables.

If you want to support their cause (and I strongly recommend you do), you can visit their bookstore directly. (Note: I get absolutely nothing for hosting a link to their bookstore here; no affiliate links or commisions whatsoever).

When I quickly Googled around, I found someone was already doing exactly that, albeit in a one-off shell script.

I decided to take his work and build upon it, making it self-healing, and created what I call the “Plucker FLOSS Manuals Autobuilder v1.0” :)

This is all in Perl, clean, and is self-healing. When FLOSS Manuals updates their site with more content, this script will continue to be able to download, convert and build that new content for you… no twiddling necessary.

The code is well-commented, and should be clear and concise enough for you to be able to use it straight away.

Have fun!

#!/usr/bin/perl -w

use strict;
use warnings;
use diagnostics;
use LWP::UserAgent;
use HTML::SimpleLinkExtor;

my $flossurl    = 'http://en.flossmanuals.net';
my $ua          = 'Plucker FLOSS Manuals Autobuilder v1.0 [desrod@gnu-designs.com]';
my $top_extor   = HTML::SimpleLinkExtor->new();

# fetch the top-level page and extract the child pages
$top_extor->parse_url($flossurl, $ua);
my @links       = grep(m:^/:, $top_extor->a);
pop @links;     # get rid of '/news' item from @links; fragile

# Get the print-only page of each child page
get_printpages($flossurl . $_) for @links;

############################################################################# 
#
# Get the pages themselves, and return their content to the caller
#
#############################################################################
sub get_content {
        my $url = shift;

        my $ua          = 'Mozilla/5.0 (en-US; rv:1.4b) Gecko/20030514';
        my $browser     = LWP::UserAgent->new();
        $browser->agent($ua);
        my $response    = $browser->get($url);
        my $decoded     = $response->decoded_content;
 
        # This was necessary, because of a bug in ::SimpleLinkExtor,
        # otherwise this code would be 10 lines shorter. Sigh.
        if ($response->is_success) {
                return $decoded;
        }
}


############################################################################# 
#
# Fetch the print links from the child pages snarfed from the top-level page
#
#############################################################################
sub get_printpages {
        my $page = shift;

        my $sub_extor   = HTML::SimpleLinkExtor->new();
        $sub_extor->parse(get_content($page));

        # Single out only the /print links on each sub-page
        my @printlinks  = grep(m:^/.*/print$:, $sub_extor->a);

        my $url         = $flossurl . $printlinks[0];
        (my $title      = $printlinks[0]) =~ s,\/(\w+)\/print,$1,;

        # Build it with Plucker
        print "Building $title from $url\n";
        plucker_build($url, $title);
}


############################################################################# 
#
# Build the content with Plucker, using a "safe" system() call in list-mode
#
#############################################################################
sub plucker_build {
        my ($url, $title) = @_;

        my $workpath            = "/tmp/";
        my $pl_url              = $url;
        my $pl_bpp              = "8";   
        my $pl_compression      = "zlib";
        my $pl_title            = $title;
        my $pl_copyprevention   = "0";
        my $pl_no_url_info      = "0";
        my $pdb                 = $title;

        my $systemcmd   = "/usr/bin/plucker-build";

        my @systemargs  = (
                        '-p', $workpath, 
                        '-P', $workpath,
                        '-H', $pl_url,
                        $pl_bpp ? "--bpp=$pl_bpp" : (),
                        ($pl_compression ? "--${pl_compression}-compression" : ''),
                        '-N', $pl_title,
                        $pl_copyprevention ? $pl_copyprevention : (),
                        $pl_no_url_info ? $pl_no_url_info : (),
                        '-V1',
                        "--staybelow=$flossurl/floss/pub/$title/",
                        '--stayonhost',
                           '-f', "$pdb");

        system($systemcmd, @systemargs);
}

Bad Behavior has blocked 514 access attempts in the last 7 days.