Now blogging at diego's weblog. See you over there!

on rss autodiscovery

After reading Jeremy's post on creating a sort of "auto discovery for RSS 2.0" (and agreeing that it was a great idea) I thought, what the hell. Why not try it on for size? Here it goes, then...

The idea is basically that a site could publish a centralized directory of all the feeds it serves. This would allow auto-discovery by aggregators and suggestions to users when new feeds come online on sites they already watch.

Jeremy was proposing to do it on OPML, and Russ floated the idea of using RSS directly, based on this, which I liked.

To get some actual idea of how something would look like, I decided to whip up possible versions of this both in OPML and in RSS. So I spent some time looking at both the OPML and RSS specs and thinking how they would be used in this case (the use, of course, from my point of view which is that of a user and someone who would have to add this to an aggregator :) -- it's possible that I missed something that would be obvious to a content producer).

The results of my little experiment are here: in OPML and in RSS, for a fictitious news site "News4Humans".

RSS, being a richer format than OPML (and possibly more generic as far as content is concerned), has no problem accomodating all the elements. There are a few elements that I added to the OPML version to mirror the data exposed; even though they are not included in the OPML spec it's ok since the spec does not preclude adding new elements. That said (and as Dave mentioned specifically regarding the issue of recursive inclusion), everyone would have to agree on them or they would be useless--possibly an addendum to the spec would be useful as well.

One of the main differences is in structure. OPML supports recursivity, RSS does not. So where OPML can define the category and the feeds for that category as sub-tags, RSS needs to use the category tag, essentially making the structure flat. This seems to be fine to me, unless "deeper recursivity" (or is it recursiveness?) is needed--but I can't think of a news site with more than one level down from the main category at the moment, so I let it stand.

Second, the OPML version contains two additional tags: link to specify the feed on feeds'... well, link :) (to match the same tag of RSS, which could potentially be useful for redirects) and dateCreated, a per-entry element, the idea with this tag being that the aggregator can record when was the last time the feed on feeds was checked, and diff against this date to very easily find out which ones were added since the last check (of course, keeping a full list and doing a diff on that is possible, but then again if there are, say, 50 feeds, and the user subscribes only to one, the aggregator would have to keep all 50 to do a proper diff against number 51, which seems kind of wasteful. RSS, incidentally, supports this functionality by its own basic item date tag. And, again, the date could be useful for redirects if necessary: changing the date on an item you already knew about implies that it has moved.

As I noted in the comments on Jeremy's entry, I sort of instinctively thought that RSS was a better idea. The OPML version however looks enticingly simple and still functional. Hm. Surprising.

Anyway, which one do you like best?

Update: As I mentioned in the comments (replying to Zoe's idea of establishing hierarchy through multiple files) the issue of hierarchy is not terribly clear with RSS. I see two ways of doing it:

  • One, as Zoe proposed, using files. This would require that we agree on a convention that says, for example, that if the item has only a link and nothing else (allowed by the RSS spec--all items are optional) then the link is to a sub-directory. This is feasible and would imply, on the client that subscribes, a multi-step process to obtain the full list.
  • Two, the creation of a "virtual" hierarchy by way of category names. Already the mockup is using category to specify the main topic to which the feed belongs. If the category is, for example News/Sports and there's another category News/Politics then the hierarchy is implicit in a single RSS file, even though the actual structure is flat. This would require a single GET but a bit more processing on the data once received.
I prefer option two since option one, while enticing, implies that we would be giving two different meanings to the tag link, something that's never desirable, but it's possible that I missed something there... Also, if the "hierarchy through files" method was chosen, the connections could be made two way between the files, which is nice, by using RSS's source sub-element for item, so a feed can be traced back to its "parent" feed.

Update #2: Another advantage of using RSS as-is that I keep forgetting to mention is that the language for the feed can be specified. You could automagically define that you only want to see feeds in a certain language and the aggregator could automatically disregard anything else. The same functionality for OPML would require adding a tag for that purpose.

Update #3: I've just noticed that, in the RSS spec, the category element has a domain attribute. If the domain is used to point to sub-category feeds, then hierarchy can be achieved cleanly. Therefore a simple solution that doesn't require any changes to the RSS spec (and as far as I can see doesn't bend its meaning either) could be as follows:

  • When an item contains a link, that item points to an actual news feed.
  • When there's no link, then the category must have a domain, which points to the sub-tree. description and title in that case are the desc and title of the subfeed's category, respectively
How does that sound?

Posted by diego on September 13 2003 at 12:55 AM

Copyright © Diego Doval 2002-2011.
Powered by
Movable Type 4.37