Advogato Search Engine

    Well it’s coming along, I managed to get everyone’s diary indexed here. As I was going through it, I noticed that we have 4276 total people listed and of those, 3080 have never posted a single diary entry.

    wget came in handy:

    1. Wget People
      (a 1.48 meg file)

    2. Run the following:

      for i in `cat | grep “,” | grep “a href=” |
      cut -f 2 -d ‘”‘`; do wget -r -l1 -np -nc –accept=xml$i/diary.xml;

      (all on one line, those are accent gravé marks, not single quotes)

    3. Find all empty diary entries: find . -size 30c
    4. Remove them: rm -Rf `find . -size 30c`
    5. Index the data with ht://Dig
    6. Lather, rinse repeat

More later…

