Validating Blog Pingback Sites with Perl

Tuesday, May 20th, 2008 at 11:25 pm | 14,224 views | trackback url
Tags: , ,

Over the last few months I’ve been wondering what the slow response time has been when I am posting new entries to my blog. Granted, the hits to my blog have more than tripled in the last 2-3 months, but my servers can handle that load. The problem was clearly elsewhere.

Some more digging and I realized that the list of 116 ping sites I have in my blog’s “Update Services” list contains quite a few pingback sites that are no longer valid.

For those new to this, a “Pingback” is a specifically-designed XML-RPC request sent from one website (A, usually a blog site) to another site (B) in response to a new entry being posted on the site.

In order for pings (not to be confused with ICMP pings) to work properly, it requires a physical link in the form of a URL to validate. When (B) receives the notification signal from (A), it automatically goes back to (A) checking for the existence of a live incoming link to (B). If that link exists, the Pingback is recorded successfully.

This “validation” process makes it much less prone to spam attacks than something like Trackbacks. If you’re interested in reading more about how spammers are using Pingbacks and Trackbacks to their advantage, I suggest reading Blog trackback Spam analysis on the “From Information to Intelligence” blog site.

But I needed a way to test all of those ping sites and exclude the ones that are dead, down or throwing invalid HTTP responses… so I turned to Perl, of course!

My list of ping sites is a sorted, uniq’d plain-text list that has one ping site per-line. The list looks something like this:

http://api.moreover.com/ping
http://bitacoras.net/ping
http://blog.goo.ne.jp/XMLRPC
http://blogoole.com/ping
http://blogsearch.google.com/ping/RPC2
http://godesigngroup.com
http://godesigngroup.com/blog/feed
http://imblogs.net/ping
http://ping.bitacoras.com
http://ping.bloggers.jp/rpc
http://ping.blo.gs
http://pinger.blogflux.com/rpc
http://pinger.onejavastreet.com/
http://ping.myblog.jp
http://pingoat.com
http://pingomatic.com
http://rcs.datashed.net/RPC2
http://rpc.blogbuzzmachine.com/RPC2
http://rpc.blogrolling.com/pinger
http://rpc.pingomatic.com
http://rpc.weblogs.com/RPC2
http://rpc.wpkeys.com
...

I pass that list into my perl script and using one of my favorite modules (File::Slurp), I read that file and process each line with the following script:

use strict;
use URI;
use File::Slurp;
use HTTP::Request;
use LWP::UserAgent;

my @ping_sites          = read_file("pings");
my @valid_ping_sites    = ();

for my $untested (@ping_sites) {
        my $url         = URI->new($untested);

        my $ua          = LWP::UserAgent->new;
        $ua->agent('blog.gnu Ping Spider, v0.1 [rss]');
        $ua->timeout(10);

        my $req         = HTTP::Request->new(HEAD=>"$untested");
        my $resp        = $ua->request($req);
        my $status_line = $resp->status_line;

        (my $status)    = $status_line =~ m/(\d+)/;

        if ($status == '200') {
                push @valid_ping_sites, "$url\n";
        } else {
                print "[$status] for $url..\n";
        }
}

my $output      = write_file("pings.valid", @valid_ping_sites);

The output is written to a file called “pings.valid“, which contains all of the sites which return a valid 200 HTTP response. The remainder are sent to STDOUT, resulting in the following output:

$ perl ./pings.pl 
[403] for http://1470.net/api/ping..
[403] for http://api.feedster.com/ping..
[404] for http://bblog.com/ping.php..
[500] for http://blogbot.dk/io/xml-rpc.php..
[403] for http://blogmatcher.com/u.php..
[500] for http://blogsnow.com/ping..
[404] for http://fgiasson.com/pings/ping.php..
[404] for http://pingoat.com/goat/RPC2..
[500] for http://ping.syndic8.com/xmlrpc.php..
[500] for http://ping.weblogalot.com/rpc.php..
[403] for http://popdex.com/addsite.php..
[404] for http://www.blogdigger.com/RPC2..
[500] for http://www.blogsnow.com/ping..
[500] for http://www.blogstreet.com/xrbin/xmlrpc.cgi..
[404] for http://www.catapings.com/ping.php..
[500] for http://www.focuslook.com/ping.php..
[500] for http://www.holycowdude.com/rpc/ping..
[403] for http://www.popdex.com/addsite.php..
[500] for http://xmlrpc.blogg.de..
...

Those failed entries are then excluded from my list, which I import back into WordPress under “Settings → Writing → Update Services”.

The complete, VALID list of ping sites as of the date of this blog posting are the following 49 sites (marking 58% of the list I started with as invalid):

http://1470.net/api/ping
http://api.feedster.com/ping
http://api.moreover.com/ping
http://bblog.com/ping.php
http://bitacoras.net/ping
http://blogbot.dk/io/xml-rpc.php
http://blog.goo.ne.jp/XMLRPC
http://blogmatcher.com/u.php
http://blogoole.com/ping
http://blogsearch.google.com/ping/RPC2
http://blogsnow.com/ping
http://fgiasson.com/pings/ping.php
http://godesigngroup.com
http://godesigngroup.com/blog/feed
http://imblogs.net/ping
http://ping.bitacoras.com
http://ping.bloggers.jp/rpc
http://ping.blo.gs
http://pinger.blogflux.com/rpc
http://pinger.onejavastreet.com/
http://ping.myblog.jp
http://pingoat.com
http://pingoat.com/goat/RPC2
http://pingomatic.com
http://ping.syndic8.com/xmlrpc.php
http://ping.weblogalot.com/rpc.php
http://popdex.com/addsite.php
http://rcs.datashed.net/RPC2
http://rpc.blogbuzzmachine.com/RPC2
http://rpc.blogrolling.com/pinger
http://rpc.pingomatic.com
http://rpc.weblogs.com/RPC2
http://rpc.wpkeys.com
http://www.a2b.cc/setloc/bp.a2b
http://www.blogdigger.com/RPC2
http://www.blogsdominicanos.com/ping/
http://www.blogsnow.com/ping
http://www.blogstreet.com/xrbin/xmlrpc.cgi
http://www.catapings.com/ping.php
http://www.feedsky.com/api/RPC2
http://www.focuslook.com/ping.php
http://www.godesigngroup.com
http://www.holycowdude.com/rpc/ping
http://www.imblogs.net/ping
http://www.pingmyblog.com/
http://www.popdex.com/addsite.php
http://www.wasalive.com/ping/
http://www.xianguo.com/xmlrpc/ping.php
http://xmlrpc.blogg.de

Feel free to use this list in your own blog or pingback list.

If you have sites that aren’t on this list, add them to the comments and I’ll keep this list updated with any new ones that arrive.

Last Modified: Thursday, October 13th, 2011 @ 15:13

One Response to “Validating Blog Pingback Sites with Perl”

  1. Your last update says may 2008. Is this list still valid. I would love to use it but just like you found in 2008 are most no longer working. If you could let me know. Thanks for your time.


Leave a Reply

You must be logged in to post a comment.

Bad Behavior has blocked 894 access attempts in the last 7 days.