by Mike Shea on 30 December 2002
Now that I've ripped headlines from just about any news site, I have a new evil project in my head. Forced accessibility and webstandards.
Almost all sites on the web post bad HTML. They have navigation heavy pages of non-standard code with way too many images, branding, ads, and all sorts of other expendable stuff. What if I could write a script that could rip content from presentation? What if I could separate the signal from the noise?
Doing this for any single page would be easy. Even doing it for a particular format of page wouldn't be too hard. Doing it reliably over a set of pages, would be more difficult. If I was able to tie it to the headline ripper I could feed the hrefs of the headlines to this page ripper which would return the results. It would work very well for palm pilots and other accessibility devices. How about I quit writing about it....
Mark Pilgrim has some interesting ideas on using XHTML instead of XML to create the semantic web. It's nice to see people finding new uses for existing technology than finding new technology with no uses at all.