now with Atom support: java google feeds bridge


After last week's experiment of a Google-RSS bridge in Java, I took the next step and decided to check out how hard it was to generate a valid Atom feed as well. The result is an update on the Google bridge page and new code.

The idea, as before, was to write code that could generate valid feeds with as little dependencies as possible (It might even be considered a quick and dirty solution). For reference I re-checked the Atom Wiki as well as Mark's prototype Atom 0.2 feed. In the end, it worked. Adding support for Atom begged for some generalization and refactoring (which I did) but aside from that adding Atom support took a few minutes. Here are some notes:

  • ISO 8601 Dates are terrible. I'd much rather Atom had used RFC 822 dates, which are not only easier to generate but way more readable. It's true, however, that once you have the date generator working it doesn't matter. But boy are they a pain. I put forward my opinion on that when date formats were being discussed, but I didn't get my way.
  • I was confused at the beginning regarding the entry content, particularly because of the more stringent requirements that the feed puts on content type. For example, the content of an entry must be tagged with something like "<content type="text/html" mode="escaped" xml:lang="en">". Now, I must be honest here: about a month ago there was a huge discussion on the Wiki about whether content should be escaped, or not, and how, but I didn't think it was too crucial since, on the parser side, which I added way back when in July to clevercactus, it's pretty clear that you get the content type and you deal with it. But I was sort of missing the point, which is generation. When generating... what do you do? Do you go for a particular type? Is it all the same? Would all readers support it? The pain of generating multiple types would seem to outweigh any advantages...Hard to answer, these questions are, Master Yoda would say. So I went for a basic text/html type enclosed in a CDATA section. (Btw, enclosing in CDATA doesn't seem to be required. The Atom feed validator was happy either way).
  • Another thing that was weird was that the author element was required, but that it could go either in the entry or the feed. I understand the logic behind it, but it's slightly confusing (for whatever reason...)
Overall, not bad. But Atom, while similar to RSS, is more complex than RSS. While I have been able to implement a feed that validates relatively easily, it concerns me a bit that I might be missing something (what with all those content types and all). Maybe all that's needed is a simple step-by-step tutorial that explains the "dos and don't dos" for feed generation. Maybe all that's needed is a simple disclaimer that says "Don't Panic!" in good H2G2 style.

Is it bad that Atom would need something like a tutorial? Probably. Is it too high a price to pay? Probably not. After all, more strict guidelines for the content are good for reader software. I thought "maybe if there's a way to create a simple feed without all the content-type stuff..." but then everyone would do that, and ignore the rest, wouldn't they.

Of course, maybe I misunderstood the whole issue... comments and clarifications on this area would be most welcome.

I guess there's no silver-bullet solution to this. The price of more strict definitions is loss of (some) simplicity. The comparison between a language with weak typing (say LISP) and one with strong typing (say, Java) comes to mind when comparing RSS and Atom in this particular sense. I think that I would go with RSS when I can, since it will be more forgiving... on the other hand I do like strong typing. But should content be "strongly typed"? I'll have to think more about this.

Interesting stuff nevertheless.

PS: there's a hidden feature for the search. It's a hack, yes. It might not work forever. Still worth checking the code for it though :-)

Categories: soft.dev
Posted by diego on September 5 2003 at 10:26 PM
Comments (please see the comments & trackback policy).

Don't panic! ;-)

1. Re: ISO dates. Having looked at the validation results of a large number of feeds, it is my opinion that ISO dates are easier to generate *correctly*. If you look at somebody else's feed and copy what you see, you are likely to get it right. This isn't quite so with RFC 822, particularly if you are outside of the US. The primarly problem area is the limited number of time zones defined in the RFC. I also note that your existing RSS feeds are 1.0, and use this format.

2. The tagging of the content is intended to make you think. It turns out that some people put text/plain descriptions out there. Others include html. Some escape. Some do it literal. Many will yell *do it my way*. My thoughts are that this is a lost cause. Do it *your* way. Just take a moment to tell me what way you chose. Without these attributes, consumers are forced to guess.

3. Both RSS 1.0 and 2.0 allow dc:creator or managingEditor or author on channel or items. What Atom does is require that at least one is present.

And finally, all this will be captured in a consumer friendly manner before Atom goes 1.0.

Posted by: Sam Ruby at September 6, 2003 3:05 AM

Unless I'm missing something the RFC is horrible. And broken.

The ISO, OTOH, is fine on the date front if rather less so on the time front.

All date formats are a pain, its a statement of existence (-: Mind you generating them now appears to be trivial (given that my weapon of choice is currently .NET!!)

Posted by: Murph at September 7, 2003 10:48 AM

Copyright © Diego Doval 2002-2007.
Powered by
Movable Type 3.35