atomflow-templates and atomflow-spider
Before I start, there's been a good number of atomflow-related entries in the last day. To mention a few: Matt has explained further many of his ideas, as has Ben. Michael has more thoughts on it as well as links to other related tools. Matthew and Frank also added to the conversation, as did Danny and Grant.
Okay, back to the actual topic of this post.
Another hourlong session of hacking and there are two new tools in the atomflow-package (download): atomflow-templates and atomflow-spider.
atomflow-spider is a simple spidering program that outputs the contents downloaded from a URL to the standard output. There are a number of other programs that do this already (wget and curl being the most prominent) but a similar tool is included with atomflow for completeness, particularly for platforms that don't have wget (e.g., Windows installs without cygwin or similar). Plus, it's good practice (for me) to keep thinking along the lines of simple, loosely coupled components that do one thing well.
The spider's commandline parameters are as follows:
The -prefsFile parameter is optional. When used, the preferences file stores ETag and Last-Modified information on the URL, to minimize downloads when the content hasn't changed (useful for RSS feeds---I am not sure if other command-line tools support this, but I don't think it's all that common).
Additionally, the spider supports downloading GZIP and Deflate compressed content to speed up downloads.
atomflow-templates is the beginning of a templating system that can be used to transform content in (and eventually out) of atomflow, through pipes. This version supports only RSS to Atom conversion (basically all RSS formats are supported). I think this is pretty important as a basic tool in the package, since there's lots of content out there in RSS format.
atomflow-templates reads from standard input and writes to standard output. Currently it is run as follows:
atomflow-templates can be, for example, connected with atomflow-spider and then to the storage core through a CRON job to monitor and store certain RSS feeds, as follows:
So that's it for tonight--between coding at work and then this, I'm all coded-out for the day :).Categories: soft.dev
Posted by diego on August 24 2004 at 11:14 PM
Copyright © Diego Doval 2002-2011.
Movable Type 4.37