Advogato Search Engine
					
			Thursday, July 5th, 2001 at 12:00 am 
			
			| 1,377 views
			| trackback url
				
		Advogato Search Engine
- Well it’s coming along, I managed to get everyone’s diary indexed here. As I was going through it, I noticed that we have 4276 total people listed and of those, 3080 have never posted a single diary entry.  
- Wget People
 (a 1.48 meg file)
- Run the following:
 for i in `cat 
 www.advogato.org/index.html | grep “,” | grep “a href=” |
 cut -f 2 -d ‘”‘`; do wget -r -l1 -np -nc –accept=xml
 http://www.advogato.org/person/$i/diary.xml;
 done(all on one line, those are accent gravé marks, not single quotes) 
- Find all empty diary entries: find . -size 30c
- Remove them: rm -Rf `find . -size 30c`
- Index the data with ht://Dig
- Lather, rinse repeat
wget came in handy:
More later…
