Now blogging at diego's weblog. See you over there!

feedvalidator bug or feature?


I was just alerted that checking this feed against the feedvalidator gives an "This feed is invalid" message. You can try it yourself by clicking here. Now, having checked the feed against the validator when I did the change to valid RSS 2.0, I was surprised. The error in particular says "description should not contain onclick tag". Essentially it's complaining about an onclick handler on one of the HREFs, which is used for a popup window of an image.

Reading the description of the error I get this explanation:

Some RSS elements are allowed to contain HTML. However, some HTML tags, like script, are potentially dangerous and could cause unwanted side effects in browser-based news aggregators. In a perfect world, these dangerous tags would be stripped out on the client side, but it's not a perfect world, so you should make sure to strip them out yourself.
I was sure there was nothing like this on the spec itself, and re-reading it (just in case) proved it.

The disconnect here is that this seems to me to be a guideline, something more suitable for a warning, than something that would mark the feed as "not valid", when it clearly is valid as far as the spec is concerned...

Interestingly enough, the RSS Validator (at rss.scripting.com) does (correctly) validate the feed, you can click here to see the result. I thought they were based on the same code, but clearly there are some differences.

Update: Dave replies, clarifying that they haven't made significant changes to the validator since they took a snapshot of the sources a few weeks ago, and notes that he's seen other instances of this recently.

I had a hunch. Curiosity overtook me. :) I downloaded the sources and poked around it a bit. Unless I'm reading it completely wrong, my suspicion was confirmed: the RSS and Atom validators share most of the code (which is just good design sense, since they are doing very similar things). However, this also means that errors that are flagged for Atom are also in some cases being flagged for RSS. For example, the error that I described above is detected by the function check4evil in validator.py, which is called by htmlEater in the same file, which itself is called in item.py (which parses RSS items). In his reply Dave describes a different case though, of the validator rejecting duplicates which is being done (as far as I can see) only for RSS (in the call do_pubDate(self) in item.py).

I definitely think that these two things are more "should fix" type of guidelines than problems that define non-validity, according to the RSS 2 spec.

On a related note, the version history for some of the validator sources note several changes in the last few weeks. Lots of activity there.

Another update: Sam (through email) encourages me or others interested to post to the list at sourceforge with suggestions. He also points to a message on that list from Phil Ringnalda in which he comments on what's discussed in this post (warnings vs. errors) and finally, as reference, he sends a link to the original bug tracker item which is related to this issue (and includes dates of posting, resolution, etc.). Regarding warnings, mostly I agree with Phil, but hopefully I'll have time to add my 2c to the list during the next couple of days, even if it's something small (lots of work which has nothing to do with this, and that has priority of course...).

Plus: Generic question: what happens when a validator of anything is truly open source? This applies to RSS, Atom, or whatever format that requires validators. Suppose that in a couple of years the original designers of the validator have moved on. New developers have taken over. After some time, they decide that X is good (X is something that most people agree is goot, but is definitely not on the spec). So they update the validator to reflect those views. Meanwhile, the spec hasn't changed. It seems to me that in this case either the validator loses credibitily, or the spec does. Neither option is good (but the spec losing credibility is worse IMO). I wonder what the experience of other formats has been in this regard, but I do think that having many validators is good, and that would be an automatic safeguard against one validator suddenly redefining the spec by itself, or taking it on a different direction. XML is probably a good example, there are many XML validators around... is that why XML has remained stable?

Categories: soft.dev
Posted by diego on February 29 2004 at 2:01 AM

Copyright © Diego Doval 2002-2011.
Powered by
Movable Type 4.37