Now blogging at diego's weblog. See you over there!

atomflow-templates and atomflow-spider

Before I start, there's been a good number of atomflow-related entries in the last day. To mention a few: Matt has explained further many of his ideas, as has Ben. Michael has more thoughts on it as well as links to other related tools. Matthew and Frank also added to the conversation, as did Danny and Grant.

Okay, back to the actual topic of this post.

Another hourlong session of hacking and there are two new tools in the atomflow-package (download): atomflow-templates and atomflow-spider.


atomflow-spider is a simple spidering program that outputs the contents downloaded from a URL to the standard output. There are a number of other programs that do this already (wget and curl being the most prominent) but a similar tool is included with atomflow for completeness, particularly for platforms that don't have wget (e.g., Windows installs without cygwin or similar). Plus, it's good practice (for me) to keep thinking along the lines of simple, loosely coupled components that do one thing well.

The spider's commandline parameters are as follows:

java -jar atomflow-spider.jar -url <URL> [-prefsFile <PATH_TO_PREFS_FILE>]

The -prefsFile parameter is optional. When used, the preferences file stores ETag and Last-Modified information on the URL, to minimize downloads when the content hasn't changed (useful for RSS feeds---I am not sure if other command-line tools support this, but I don't think it's all that common).

Additionally, the spider supports downloading GZIP and Deflate compressed content to speed up downloads.


atomflow-templates is the beginning of a templating system that can be used to transform content in (and eventually out) of atomflow, through pipes. This version supports only RSS to Atom conversion (basically all RSS formats are supported). I think this is pretty important as a basic tool in the package, since there's lots of content out there in RSS format.

atomflow-templates reads from standard input and writes to standard output. Currently it is run as follows:

java -jar atomflow-templates.jar -input rss -output atom

atomflow-templates can be, for example, connected with atomflow-spider and then to the storage core through a CRON job to monitor and store certain RSS feeds, as follows:

java -jar atomflow-spider.jar -url <URL>
| java -jar atomflow-templates.jar -input rss -output atom
| java -jar atomflow.jar -add -storeLocation <STORE_DIRECTORY> -input stdio -type feed

So that's it for tonight--between coding at work and then this, I'm all coded-out for the day :).

Posted by diego on August 24 2004 at 11:14 PM

Copyright © Diego Doval 2002-2011.
Powered by
Movable Type 4.37