Now blogging at diego's weblog. See you over there!

rss autodiscovery, take 2


Being a weekend, not a whole lot has happened, but there have been several good comments on the topic of rss autodiscovery, which has made me think further about the choices we face in making it a reality...

As a recap, Jeremy proposed coming up with an OPML-based standard for specifying lists of feeds on sites, and Russ brought up the idea of using RSS, which some, including me, liked. I followed that up with a couple of mockups with both OPML and RSS so that we could compare that pros and cons or each, along with some comments on the apparent tradeoffs for each.

A note, before going on. The second I read Jeremy's post I thought: "but don't we already have RSD?" I immediately checked and got my answer, but I thought that, for completeness, I'd include it here. The answer is: no. RSD is intended for autodiscovery of APIs that will be used to access/modify the content programmatically. Similar solutions that have been proposed for Atom also deal with APIs rather than feeds. The bottom line is that re-using current autodiscovery techniques/specs from APIs would imply re-spec'ing them, at least partially, which brings us back to square one.

Using either RSS or OPML seems to me like a good solution that will get things done. It might not be the most perfect solution, but it will work. There seems to be some resistance to using OPML. The main basis for this resistance is that OPML can't be validated (or easily used) because its spec is relatively loose. However, I think that if OPML was used for this, it have to be specified properly; which means that what could (and could not) be done would be known and therefore it could allow validation. The fact that the current iteration of OPML cannot be validated is not enough grounds to reject it out of hand, in my opinion. A few small improvements in the spec of OPML or this new OPML-derived format would do the trick.

In summary, my (possibly narrow-minded) view is that all we're doing is agreeging on using a number of tags and the structure of a document. Any solution will look similar to any other, and I think it's eminently useful to base things on a format that can already be parsed by most, if not all, aggregators, as is the case with both RSS and OPML.

There are other possibilities though. Tima put forward the idea of using WSIL (also echoed in this lockergnome entry). As I don't know the intrincacies of the format I can't come up with an example that will re-write in WSIL either of my two mockups and be sure that it won't be broken, and for comparison purposes we need, I think, to be looking at exactly the same content. Conclusion: if Tima or someone else would have a bit of time to re-write my mock-up structure using WSIL, it would be most welcome!

Regardless of format, the main issue that seems to me would drive how the format is used is how hierarchy in the feed is handled. Hierarchy will be necessary to provide the structure used by many news sites (e.g. "Technology/Mobile Technology/Phones"). So, with a heavy emphasis on how hierarchy would be represented, here's a summary of the issues in choosing one format or another as far as I can see:

  • Regarding hierarchy, OPML is clearly a winner here since it is designed to support hierarchies. OPML would, however, properly spec'ing a couple of elements to represent the data that we'd like to represent. OPML, for example, has been variously used to specify links with url, htmlUrl, as well as others like href as this example from Philip Pearson demonstrates (in fact, Philip was actually using OPML to provide a feed directory there). That would be the extent of the work required for an OPML implementation.
  • RSS, on the other hand, is not "naturally" geared towards dealing with hierarchical content: the structure of the information represented is flat. This can be solved in one of two ways:
    1. It is possible to create an implied hierarchy within the file by using category names. All the feeds for a site would be on a single file, and hierarchy would be specified by using a forward slash "/" between category levels. Pros: simple, and it doesn't stretch the use of RSS beyond its single-file origin, and it simplifies checking for new feeds on a given "watched" site since a singet GET is required. Cons: it would be a semantic convention, rather than syntactical, which makes it harder to verify properly.
    2. The alternative is to specify sub-feed sets through the use of the domain attribute in category elements. That is, whenever a category in an entry includes a domain, then the entry is defined as pointing to another feed-of-feeds subset, rather than to a particular feed itself. A backpointer to the original "parent" feed set can be defined by using the source element on RSS entries, which gives us the good side-effect of making the hierarchy fully traversable from any starting point. Pros: the connections between feed sets and their children would be syntactically defined, thus making it easier to validate and verify, all without having to bend in any way the definition of what an RSS feed is. Cons: it makes the structure a bit more difficult to maintain (multiple files) and to access (multiple gets) which also impacts the ease of the process of validation a bit.

In his entry, Tima mentions that WSIL describes hierarchy through the use of multiple files, much like the second RSS alternative mentioned above.

A final element that would also have to be agreed upon is how this master file is usually found. Jeremy, in his original posting, proposed using a standard location similar to robots.txt, and with a standard name, like feeds.opml or rss.opml which sounds quite reasonable.

Okay, so what would be the steps necessary to be able to spec this? A possible outline would be:

  • Define which format would be used, based on pros and cons.
  • For the format used, define the structure and the meaning of the tags used.
  • Agree on a standard location for the top feed-of-feeds set.
  • Formalize the results in a spec.
How does that sound? Did I miss anything?

Categories: soft.dev
Posted by diego on September 14 2003 at 6:30 PM

Copyright © Diego Doval 2002-2011.
Powered by
Movable Type 4.37