Tagged in

Java

Hackdiary
Hackdiary
A diary of hacks by Matt Biddulph
More information
Followers
14
Elsewhere
More, on Medium

Update: Screenscraping HTML with TagSoup and XPath

UPDATE: Oliver Roup has published updated code that uses the builtin XPath processor in JDK 1.5

Some emails and comments on Screenscraping HTML with TagSoup and XPath alerted me to the fact that the example I gave on…


An RDF crawler

I wrote an RDF crawler (aka scutter) using Java and the Jena RDF toolkit that spiders the web gathering up semantic web data and storing it in any of Jena’s backend stores (in-memory, Berkeley DB, mysql, etc). Download it here.


Photo-annotating bot

A background project for a while has been to write a bot to help me annotate the fairly large number of pictures I post to picdiary (1496 at the last count). Creating a document of RSS-based metadata is a slightly cumbersome text-editor job every time I post a new set of pics.