Okay, so I actually should start using conversation engine, the the name Don suggested and which is think sounds cooler. But for the moment it's still the Conversation Finder. The first version is now live!

This is a very limited version. Only a few sites are being indexed, mostly out of concerns for speed, bandwidth and such. I'll see about expanding it later.

First thing to look at is this result, the conversations the engine (finder?) discovers between Don and Tim Bray.

Interestingly enough, it finds one more aside from their recent Atom conversation, something about flowers :). This is great! It is finding actual conversations!

But... the results are just a just little bit off. I keep seeing what it finds and thinking, "come on, you're so close!". Some links are loops. Some links are pointing to index pages (which might have the content, but...). Some of the text extracts are not relevant (look at "conversations" between the other sites that it's indexing).

I think a big factor here is the fact that the engine knows nothing of archives, or people that run these blogs. Archives duplicate a lot of information, and the engine gets a little confused by that. So maybe the next step is to fiddle around with some of the metadata present in pages for weblogs (the metadata on RSS would be great, particularly the dates, to infer sequencing, however, RSS feeds only go as far back as a few days or posts, so all that's left is parsing for the different types of metadata embedding in HTML).

Anyway, not bad for a few hours of work and a 0.1 version. Looks promising! Now if I just find a way of letting others enabling spidering of their sites without killing my server's bandwith... :))

PS: I wasted a couple of hours on Tomcat setup. Why? Because the JARs I was deploying in WEB-INF/lib didn't have write privileges. Tomcat wants them writable! And it was failing without any error messages, simply not loading the classes in the JARs (and yes, I tried common/lib). Anyway, all is well that ends well.

Update (5/12/2004): The Conversation Finder is now the Conversation Engine.

Posted by diego on December 3 2004 at 8:13 PM

