Advogato Search Engine
Thursday, July 5th, 2001 at 12:00 am
| 1,352 views
| trackback url
Advogato Search Engine
- Well it’s coming along, I managed to get everyone’s diary indexed here. As I was going through it, I noticed that we have 4276 total people listed and of those, 3080 have never posted a single diary entry.
- Wget People
(a 1.48 meg file) - Run the following:
for i in `cat
www.advogato.org/index.html | grep “,” | grep “a href=” |
cut -f 2 -d ‘”‘`; do wget -r -l1 -np -nc –accept=xml
http://www.advogato.org/person/$i/diary.xml;
done(all on one line, those are accent gravé marks, not single quotes)
- Find all empty diary entries: find . -size 30c
- Remove them: rm -Rf `find . -size 30c`
- Index the data with ht://Dig
- Lather, rinse repeat
wget came in handy:
More later…