Now blogging at diego's weblog. See you over there!

rebooting...

macshutdownIcon128.png
Tomorrow, July 11, is the 5-year anniversary of my blog.

5 years! And over 2,000 posts!

Anyway, I thought that this would be a good time to start--again. I've been slowly (given that Ning takes up 99% of my time -- and a fun 99% it is!) building up a new blog to reboot. For the moment it has a basic template and there won't be much stuff there. I've also decided to keep this blog in place for now. I may add 301s for the feeds soon, but I'm still thinking about the rest.

The new blog is at blog.diegodoval.com. Here is the link to the new Atom feed. Update your bookmarks!

And see you on the other side. :-)

Categories: personal, soft.dev
Posted by diego on July 10, 2007 at 11:03 PM

the next blog you'll add to your feed reader...

...is Marc's.

Yup. Go read, and stop wondering what I'm talking about. :)

Categories: ning, soft.dev, technology
Posted by diego on June 3, 2007 at 9:16 AM

openfire and spark: cool stuff

ignite_dl_openfire.gifToday I spent some time tinkering with Openfire and Spark, and they're both pretty cool.

I've been using GAIM (ok, ok, Pidgin) at home on my PC but the last few days it decided to start crashing when connecting to Yahoo. Great. Back to Trillian, but, oh, wait, even though Trillian tickles me the fact that it looks like an app from 1992 drives me bananas. (Gaim ain't that great either). Is it so hard to spend a bit of time on look and feel? Icons? UI matters!

Anyway. So Russ had mentioned recently I should give jabberd a try, but hey, I'm a Java guy, so off I go and I get Openfire. Simple install: check. Embedded Jetty for built-in web configuration: check. Easy way to add IM gateway: err... slightly convoluted, but yeah. Check.

ignite_dl_spark.gifNow for Spark: still in beta, so expect some clunkiness, but the UI is surprisingly clean, and its Synth L&F implementation (at least that's what I think it's using) is also pretty good. Bonus: it doesn't crash.

An advantage of this setup is that I can connect from anywhere to my account on the Openfire server over a TLS channel (something that you can require) and all my IM connections are encrypted, at least to the server. This means I can use IM from open WiFi hotspots without (much) fear of snooping if it was necessary--something impossible if you're logging in directly to Yahoo, MSN, et. al.

Overall, pretty good! My half-hour of weekly free time is over now though :-), so I'll have to wait until next week to tinker with it more.

Categories: soft.dev
Posted by diego on May 20, 2007 at 6:25 PM

jpc: holy emulators batman!

multios.gifJPC is a pure Java emulation of an x86 PC with fully virtual peripherals. You can go to their site and run the applet demo, which runs FreeDOS and then lets you execute various classic PC-DOS games such as Lemmings or Prince of Persia. And it supports protected mode, so you can run Windows 95 and --gasp!-- Linux.

Because it's an emulator and not simply a hypervisor, you can run it anywhere in which a Java 5 or higher JVM can run.

Mindblowing.

ps: in the same vein, check out this Browser emulator which simulates the experience of older browsers within your... browser. Right.

Categories: soft.dev, technology
Posted by diego on May 17, 2007 at 2:42 PM

javafx = applets 2.0

120px-Duke.gifSo after spending a bit of time looking at JavaFX my impression is that it's a great idea... but I question the need for yet another scripting language in the form of JavaFX Script. Java 6 implements JSR 223 and even includes scripting based on Rhino, i.e., Javascript. Now, Javascript has its flaws (and they are many) but it's a standard, so why not start there?

That aside, JavaFX strikes me as applets 2.0, or rather Applets Done Right. Or, As Right As Possible, given the JVM requirement. While a lot of people probably worry about performance or UI, I don't (I have a long-standing position on this topic :)). However, I do worry about the Java webstart "requirement". JWS is a topic on which I've written before, and yes, that was a while ago, but JWS is still a bit clunky. And I am not entirely convinced that the way to create "Web 2.0" applications is to jump out of the web browser altogether. :)

Anyway, an interesting thing to watch as it develops.

Categories: soft.dev
Posted by diego on May 13, 2007 at 12:35 PM

ning javaone slides

I just posted the JavaOne presentation and some notes over at the Ning Developer Blog. Check it out!

Categories: ning, soft.dev
Posted by diego on May 13, 2007 at 10:46 AM

at javaone tomorrow!

j1logo.png

Martin, Brian and myself will be at JavaOne tomorrow presenting Building a Web Platform: Java Technology at Ning. We'll talk about the evolution of the Ning Platform over the last two and a half years and how Java and some specific design choices let us continually grow and expand the platform, replacing and upgrading infrastructure, without affecting users or developers.

The session is TS-6039, in Esplanade 301, at 4:10 pm, so if you're around come say hello. I'll post the slides after and talk a bit more about that and other interesting things. :)

Categories: ning, soft.dev, technology
Posted by diego on May 9, 2007 at 4:02 PM

two ints and a Float...

Some geek humor to start the day:

Two ints and a Float are in a bar. They spot an attractive Double on her own. The first int walks up to her. 'Hey, baby', he says, 'my VM or yours?'. She slaps him and he walks back dejected. The second int walks over. 'Hey, cute-stuff, can I cook your Beans for breakfast?'. After a quick slapping, he too walks back. The Float then ambles over casually. 'Were those two primitive types bothering you?', he remarks. 'Yes. I'm so glad you're here', she says. 'They just had no Class!'
The utter nerdiness of this joke goes may well go beyond geek and kitsch to actually become cool. :-) [from Martin, via IM].

Categories: soft.dev
Posted by diego on June 1, 2006 at 9:53 AM

new atom parser -- in ruby

Martín has released a super-cool Atom 1.0 parser in Ruby, hosted at RubyForge and available under an MIT license. It's a really good showcase of the flexibility of Ruby. Extending it is very easy. If you're into Ruby (or Atom :)) check it out, the extensibility mechanism he's put in place is quite something.

Apropos (?):

Professor Farnsworth: "Let me show you around. That's my lab table and this is my work stool and over there is my intergalactic spaceship. And here is where i keep assorted lengths of wire."
Fry: "Wow, a real live space ship!"
Farnsworth: "I designed it myself. Let me show some of the different lengths of wire I used."
:-)

Categories: soft.dev
Posted by diego on April 9, 2006 at 12:19 PM

did you know...?

... that Ning is hiring? But of course you did! Well, here's a reminder then. :) If you're looking for something, go check out our list of current openings at http://jobs.ning.com/. From Java developers/architects to QA engineers and product management, there's something for everyone!

(Did I say Java? Wasn't Ning about PHP? Well, the apps are written in PHP. But there's a ton of Java in there --some really cool stuff-- even if it's not obvious... but that's a topic for another post).

And, hey, if you don't find what you want in there, but you think you want to work with us, send us an email anyway. :)

Categories: ning, soft.dev, technology
Posted by diego on April 4, 2006 at 9:08 PM

plugging the dns recursion hole

Via this Slashdot article I was reminded about a vulnerability in DNS configs that allow recursion and therefore let the server act as an open relay that could be used in a DDoS attack. I verified my DNS using DNS Report and this matched what I saw in my config files -- my DNS server was open. Rogers had a post last week on the topic which outlined the steps he took and served as a quick guide, and along with this page of the BIND9 manual I had the whole plugged in a few minutes, confirmed by the DNS Report tool. Phew!

Categories: soft.dev, technology
Posted by diego on March 19, 2006 at 11:47 AM

filebox: a quick way to share files

fileboxitbutton.png

Many many times I want to quickly send a file to someone for them to look at, and I can never remember the names of the services that let you do this. But there's Ning! :)

So my 1-hour hack for tonight was to create filebox.ning.com, which allows you to upload files and then share the link with others, and it's deleted after a few days. Basically I cloned Brian's filedrop, modified some things in the code, made the uploads private, added messaging, and made it a little easier on the eye. The power of Ning at work. :)

Categories: ning, soft.dev, technology
Posted by diego on March 11, 2006 at 12:20 AM

the bogus Java-vs-everything argument

javacup.jpg

"Java is done for! Ruby will take over! PHP will rule! Perl wins!" ... and so forth. I have seen discussions on this topic for the last few months, so many that I won't even bother linking to them. If you read news, or, work in the tech sector and are, well, alive in any way, you'll know what I'm talking about.

The extreme argument goes like this: Java is becoming irrelevant, soon to be replaced by scripting languages such as Ruby and PHP.

The more measured argument says that Java is no longer on "the leading edge" of languages and has ceded that position to Ruby and PHP and so forth.

The extreme language is of course ridiculous. Java is not going to be "replaced" by Ruby or PHP anymore that Java "replaced" C++ or C in the mid-90's. Will Ruby, PHP, etc, replace Java for lots of tasks, including rapid web app development, prototyping and such? Sure. Is that one language "replacing" another outright? I don't think so.

In my view, Java has evolved into its current position as the new "systems language". Other languages (yes: Ruby, PHP, etc) are taking precedence in the building of new lightweight web apps for various purposes. It's probably fair to say that the leading edge of development exists in these web 2.0-ish style of apps, which puts Java in the backseat a bit in that category.

In other areas, such as advanced IDEs for the language, Java wipes the floor with pretty much any language, which helps for many types of development.

But so what? Each language and tool has its place. Instead of useless pissing contests, we should be focusing on how to make these various languages and tools interoperate and complement each other better.

Update: Python! Damn, I forgot about Python. Blame the lack of sleep or something. The magic trio these days is definitely Python, Ruby, and PHP. Thanks Joe for the reminder! :) And while I'm updating, what is up with reporters comparing Java, or Ruby, PHP, Python, etc, to AJAX? I don't get that at all. Do they not understand that AJAX is a client-side scripting technique?

Acme coffee challenge update: 12 hours, 24 cups. Not bad.

Categories: soft.dev
Posted by diego on January 9, 2006 at 5:07 AM

give me back my focus!

<rant>
I am sick and tired of Windows stealing my focus away from the current window to show me a helpful message about some dramatic action that an app would like to take, exactly while I'm doing something else where I type the letter that ends up triggering something I don't want to do. I have no issue with apps "suggesting" actions (e.g., "would you like to autoarchive your old items now?") but it's high time Windows stopped sucking focus away from the current window. This is particularly bad if you touch-type (as I do).

Most X-Windows window manager don't do this (I can't remember if the OS X does it... it very well may in some cases) but Windows is definitely the worst at it. The only message acceptable when stealing focus is an information window for something really important, without buttons for confirmation (otherwise you may click "Ok" and never see it again).

Can you tell that I just triggered something bad in the middle of typing furiously?
</rant>
I feel much better now. Thank you. :)

Categories: soft.dev
Posted by diego on October 15, 2005 at 11:49 AM

ning live: day one

What a day. Overwhelming, in a sense--an explosion of discussion around the blogsphere and beyond. Ning shot up to #1 search in Technorati by around noon, and to the #1 tag in Categories: ning, soft.dev
Posted by diego on October 4, 2005 at 8:18 PM

ning!

logo_ning_large.gif

About an hour ago tonight we took the covers off ning: a playground for building and using social applications on the web (how's that for a brief description?). A super simple way of getting personalized social apps up and running, or a way to experiment without having to worry about all the stuff that's usually, well, way too hard (like, say, DB setup--look ma! no DB!).

It's been an incredible year so far, lots of work but lots of fun too, working with a fantastic team. The last few weeks have been... intense. A million of things I want to write about, which for obvious reasons have been sidetracked.

Anyway, you can bet I'll be writing more about this later! :-)

Categories: ning, soft.dev
Posted by diego on October 3, 2005 at 10:02 PM

php's simplexml

One of my favorite features of PHP 5 is SimpleXML. In brief, it maps the structure of an XML document to as variables within PHP, with subnodes as variables within the root object (recursively) and the text value of a node (as you'd expect) to the value of the variable. For example, the code in PHP 5 to parse RSS 2.0 and Atom 0.3 feeds (maintaining a relatively clean structure, and just parsing for the basics) is super straightforward.

Observe!

<?php

  //each individual entry in the feed
  class FeedEntry {
    public $title;
    public $summary;
    public $content;
    public $created;
    public $modified;
    public $issued;
    public $id;
    public $link;
  	
    final function __construct() {
    }
  }

  //the feed (including simpleXML parsing)
  class Feed {
    public $xmlContent;
    //the resulting parsed entries
    public $entries = array();
	
    //the document information
    public $title;
    public $link;
    public $description;
    public $author;
    public $created;
    public $modified;
    public $issued;
		 
    final function __construct($xmlcontent) {
      $this->xmlContent = $xmlcontent;
    }
		 
    function parse()
    {
      $xml = simplexml_load_string($this->xmlContent);
      if ($xml['version'] == '2.0') { //rss 2.0
        $this->parseRSSFeed($xml);
      }
      else if ($xml['version'] == '0.3') { //atom  0.3
        $this->parseAtomFeed($xml);
      }
    }
		 
    function parseRSSFeed($xml) {
      $this->created = $xml->channel->lastBuildDate;
      $this->issued = $xml->channel->lastBuildDate;
      $this->modified = $xml->channel->lastBuildDate;
      $this->title = $xml->channel->title;
      $this->description = $xml->channel->description;
      $this->link = $xml->channel->link;
      foreach ($xml->channel->item as $item) {
       $this->parseRSSEntry($item);
      }
    }
		 
    function parseRSSEntry($entryToParse) {
      $entry = new FeedEntry();
      if ($entryToParse->description == '' &&
                  $entryToParse->title == '') {
        return;
      }
      if ($entryToParse->description !== '' &&
            $entryToParse->title == '') {
        $title = substr(strip_tags($entryToParse->description), 0, 50) . '...';
      }
      else { 
        $title = $entryToParse->title;
      }

      $entry->title = $title;
      $entry->summary = $entryToParse->description;
      $entry->content = $entryToParse->description;
	      
      $entry->created = $entryToParse->pubDate;
      $entry->issued = $entryToParse->pubDate;
      $entry->modified = $entryToParse->pubDate;
	      
      $entry->link =  $entryToParse->link;
      $entry->id = $entryToParse->guid;
		 		
      array_push($this->entries, $entry);
    }
		 
    function parseAtomFeed($xml) {
      $this->created = $xml->created;
      $this->issued = $xml->issued;
      $this->modified = $xml->modified;
      $this->title = $xml->title;
      $this->link = $xml->link;
      $this->description = $xml->tagline;
		 		
      foreach ($xml->entry as $entry) {
        $this->parseAtomEntry($entry);
      }
    }


    function parseAtomEntry($entryToParse) {
      $entry = new FeedEntry();

      $entry->title = $entryToParse->title;
      $entry->summary = $entryToParse->summary;
      $entry->content = $entryToParse->content;
	      
      $entry->created = $entryToParse->created;
      $entry->issued = $entryToParse->issued;
      $entry->modified = $entryToParse->modified;
	      
      $entry->link =  $entryToParse->link;
      $entry->id = $entryToParse->id;
		 		
      array_push($this->entries, $entry);
    }
  }

?>

This code can then be used as follows:

  $feed = new Feed($xmlcontent);
  $feed->parse();

  //print the title of each feed entry
  foreach ($feed->entries as $entry) {
    echo $entry->title;
  }

Where $xmlcontent is obtained through, say, a curl call:

  $ch = curl_init();

  curl_setopt($ch, CURLOPT_HTTPGET, 1);
  curl_setopt($ch, CURLOPT_URL,"http://www.dynamicobjects.com/d2r/index.xml");
  curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
  $xmlcontent = curl_exec ($ch);

  curl_close($ch);

All of which amounts to quite a lot of interesting functionality in a small set of easy-to-understand code.

Does it have limitations? Sure. Is it enough for many XML parsing jobs? Yes.

This is why, btw, PHP is so appropriate for building web stuff where there's so many user-driven elements and "point solutions" (taken to the extreme, one per page in a website).

Who needs reuse when (re)coding is so easy? :)

PS: I would like to read the following statement which I am making of my own free will and without being coerced in any (ouch!) way: Reuse is good, particularly with Java, and no one should ever say that it's not, lest they find themselves in the depths of hell, or possibly a Standards Body terminology & rules meeting. In summary: Java, good. PHP, bad. (Ruby, great.)

PS2: Those that can quote something else from the Simpsons episode which the previous PS is spoofing get extra credit.

PS3: Those who react to PS1 above without knowing the context of the Simpsons episode should refrain from commenting. There's a fine line between good and bad references, a fine line indeed, and while it's never hard to know when it's been crossed, I'd rather not be made aware of the distance traveled since.
Categories: soft.dev
Posted by diego on August 15, 2005 at 11:40 PM

css + javascript = behaviour

Behaviour: "the missing link for your ajax apps", aka "use CSS selectors to specify elements to add javascript events to". Great! [via O'Reilly Radar]

The one thing that keeps bothering me a bit about AJAX and DHTML applications is that MVC tends to create fairly deep abstractions. Debugging, and, more importantly, maintaining this code, will become more and more difficult since the only way to navigate it efficiently (that I know of) is to use, well, grep, Ctrl+F, and such. Hard to refactor. Hard to analyze. Hard to validate. Full Javascript+DHTML Eclipse module anyone? I mean, one that goes beyond autocomplete. I've got that already. :)

Categories: soft.dev
Posted by diego on June 26, 2005 at 11:58 AM

atomflow redux

Mark Paschal has just published a cool perl module that continues along the atomflow idea:

To help myself build atomflow style tools, I wrote XML::Atom::Filter, a Perl module to build command line Atom processing tools around the XML::Atom library.
Fantastic!

I've been slowly (ever so slowly) adding bits and pieces to the atomflow tools over the last few months. This is just what I needed to definitely get me off my slumber and pull together another release.

Ideas I want to explore in this space are not only related to more command-line tools (and Mark has some great ideas there in his post!), but also to creating pipes directly from server to server (that the network is the disk drive again) and do straight flows that can be tied together without ever settling down. For command line my thing is Java, but for the web I'd like to try some PHP, which is simple, fast, and flexible enough. Anyway, I'll try to make a couple of hours within the next few days to do this. Not right now. Too busy. Lots going on. :)

Categories: soft.dev
Posted by diego on June 24, 2005 at 12:22 AM

24 hour laundry: the view from inside

Well, well, well. :)

There's been a lot of discussion recently about a certain new startup called 24 Hour Laundry. It pretty much got started with this CNET article, then as highlights we've got Om, Mark. Even (perhaps predictably) Slashdot.

24HL, as it happens, is where I work. Remember this?

Yep. It's true. Aaaaaall this time and I didn't say anything. Outrageous! How could I?

Well, that's kinda the point.

You see, we didn't want to make any noise. CNET decided that they wanted to "scoop" a story that didn't exist (and is still not all that exciting at this point). We didn't have anything to do with that article.

Then, in the process of not asking for any press and minding our own business, we get branded a certain way, and told we are doing something wrong by focusing on our product.

What is confusing to me is that some of the comments out there begin with "Well, I don't know what they're doing but [insert your thought about why it's wrong here]".

It is one thing to speculate (which we all do a lot of, don't we) and draw tentative conclusions based on that, but it's another to take those assumptions and then categorically "paint a picture". I know: to a certain degree, these are the rules of the game. But there is a difference between saying "If X is doing W, then here are the problems I see" and saying "X appears to be doing W. They're crazy!" This was partially Russ's point with his great post yesterday. (Update 6/23: Jeff Clavier also makes good points on the topic).

For example, Mark Fletcher said:

[...] But creating a new web service is not rocket science and does not take a lot of time or money. My rule of thumb is that it should take no more than 3 months to go from conception to launch of a new web service. And that's being generous. I'm speaking from experience here. I developed the first version of ONEList over a period of 3 months, and that was while working a full-time job. I developed the first version of Bloglines in 3 months.

In other words: "whatever it is you're doing, you should be able to do it in three months."

Ah, those pesky generalizations--but this is actually an interesting point to bring up. Last year, it took me about 3 months to write the first version of clevercactus share, which didn't just include a website/webservice, but also an identity server, a relay server (to circumvent firewalls) as well as a peer to peer client app that ran on Windows, Mac and Linux.

One person, three months. Webservice, servers, clients, deployment systems, UI/design, architecture, code, even support.

Which proves... absolutely nothing.

You have to fit the strategy to the company and not the other way around. In our case, we're doing something a little different (not better, just different) than the next web service, so we're just trying to keep our heads down until we have something that makes sense.

Of course we want to release as quickly as we can. Of course we know that when we launch there will be dozens of features we wanted to add but didn't have time for. Of course we keep in mind that we can't release a "perfect" product.

We absolutely want to involve users in the product's eventual evolution. We just want to make sure that we have a few things figured out before we start sending out press releases to announce our video-blogging social scooter company.

We appreciate the patience, and the interest (even if in some cases it's a bit misguided!). We are working as hard as we can, as fast as we can, to come up with a good product.

Sounds reasonable? :-)

PS: this may be a good time to add "This is my personal website and blog. The views expressed here are mine alone and do not necessarily reflect those of my employer."

Categories: personal, soft.dev, technology
Posted by diego on June 20, 2005 at 11:17 PM

subclassing enums in Java 5

[via Erik]: Beyond the basics of enumerated types: how to use more than simple enums with subclassing and method overrides. Very interesting. While my favorite Java 5 change is the enhanced for loop, generics (i.e. templates) and enums are a close second.

Categories: soft.dev
Posted by diego on May 3, 2005 at 12:04 PM

the problem with scoble's linkblog

While I enjoy perusing Scoble's linkblog when I have time (there's pointers to a ton of interesting stuff in there) I have not been so thrilled about his full-republishing technique. In my opinion, the question who exactly created the content is going to be slightly confusing for someone arriving there from a search engine (this in particular for people that don't yet know what blogs are, much less linkblogs).

Even if it was obvious though, the fact that he is republishing articles/posts wholesale without explicit permission means that a reader that would otherwise end up in my blog suddenly has no reason to do so. I have avoided commenting publicly on this, waiting to see if it changed, but it hasn't.

For example, check out his reposting of my take on AJAX. It's a long post (something relatively common for me) and by the time you scroll down to the second paragraph, you have forgotten that URL at the top. Many people will just get to the end, and move on to the next linkblog post.

Republishing content wholesale without permission is a bad idea. And a linkblog is supposed to be made of links, not full posts.

Robert, I suggest you simply post links and titles, rather than full posts---at most, a 50-word snippet or comment would do (similar to what Kottke does for his linkblog posts). If you think that's unreasonable, I'd ask you to remove any posts of mine that you may have republished over there and to avoid republishing other posts in the future. Thanks. :)

Categories: soft.dev, technology
Posted by diego on March 20, 2005 at 6:09 AM

more on ajax

Don talks about his POV on AJAX, and brings up something that (to keep things simple) I neglected to mention in my post.

For a long time I have been deeply ambivalent about using DHTML and Javascript for serious apps. I really like the reach these apps create, the ease of distribution, and all the good stuff that comes from that. But aside from the problems that Don mentions in terms of pushing the browser "platform" beyond its design limits, I keep wondering about what writing complex applications using these technologies would do to software engineering in general. We have advanced quite a lot, and it's taken a lot of effort, to get away from error-prone techniques and tools that we used as recently as 5 years ago. Thanks to advanced IDEs, we are no longer bound by pre-existing structures and we're much more free to refactor, analyze, and reuse code in new ways. Java has been crucial in that (and in finally establishing how sensible memory-managed languages were), but now we're onto Python, PHP, Ruby, and other languages.

Going to DHTML and Javascript would put us back a bit in this evolution. Javascript and DHTML IDEs are basically non-existent or too basic. Debugging apps in a browser environment is a nightmare. Software maintenance becomes another nightmare. And so on.

But assumming that AJAX apps prosper, I think that we'll just circle back, realize that we're facing the same problems, and then find solutions based on our previous turn on this particular loop. Maybe this will another step towards what Marc Andreessen hoped (almost ten years ago, and I still remembering reading that article!): "A secure, truly mobile agent language -- way beyond Java -- will eliminate the Tower of Babel that prevents us from harvesting more of the benefits of computing and communications today."

Amen.

Categories: soft.dev
Posted by diego on March 19, 2005 at 6:57 PM

on ajax

One of the latest buzzwords to make its way around the web is AJAX (short for Asynchronous Javascript and XML, is that a "buzzcronym" then, rather than a buzzword?), which strikes me, as well as others, as one of those there and back again technologies, old ideas reborn in a different form. Recent weeks have seen the appearance of various AJAX toolkits, from SAJAX (for PHP) to JSonRPC (for Java). JPSpan, another PHP toolkit, is a little older.

But first, what is AJAX? Here's a good intro article on it. The basic concept is a simple one: turn a web browser into a more responsive client by making asynchronous requests to the server on another thread, handled by Javascript (there's nothing that says that this couldn't be done with, say, Flash or Java, but Javascript is a more universal platform on which to implement this). This separate thread can create the appearance of a more responsive UI by managing the requests in a manner transparent to the browser's default navigation mechanisms (e.g., Back, Forward, etc.).

Why do I say that AJAX is an old idea reborn in a different form? To answer that, allow me to take a little detour.

How web apps (used to) keep state, or the thin client way

Originally, a browser+HTTP+server combo was a stateless content-retrieval system. As more complex logic was added, browsers remained "thin" in that held very little state information. Cookies (on the client side) and Sessions (on the server side) were created to address that as more complex applications were brought online. But the limitations of cookies for data storage mean that they are used primarily for two things: initialization information (e.g., automatic login to websites) and historical or subscription data (although even in this case cookies are largely used as browser-held keys that point to more complete server-side DB records). Sessions, therefore, have remained the primary method to maintain state through the lifecycle of a web application (Some data can be held in the web pages themselves, e.g., hidden form fields, but that data has to be passed back to the server on each request). The problem with keeping sessions in the server is that, for nearly all significant operations, the client has to go back to the server to present a result, and it has to do it through the core UI thread, leading to a diminished user experience.

The problem this creates is that as long as we're not all accessing the Internet over low-latency T3 lines, and as long as servers experience load spikes, unexpected loads, etc., the user experience of web applications (via thin clients) differs significantly from that of client-side applications (fat clients). AJAX bridges this gap by creating what amounts to (cue reference to Douglas Coupland's Microserfs) a "thin-fat client."

AJAX: web client/server, or the rise of the "thin-fat client"

In the early 90's, the buzzwords du jour were "client/server systems". These were systems where PCs actually performed a certain amount of processing on data obtained and passed back through tightly coupled connections (typically TCP). As important as servers were in that scheme, one of the keys of client/server computing was that the client maintained most of its state. True, the server did maintain a certain amount of state and logic (just keeping state on a TCP connection would count, for instance), but it was the client that drove the interaction, that kept information on a user's location in the dataflow, etc. The web, however, changed all that.

If the web thin client model decoupled UI from processing (at least relative to client/server), AJAX allows for a flexible "free form coupling" when necessary. By pulling more data-management logic back into the client, AJAX goes back to a more traditional client-server model. True, the server could maintain state if necessary, and undoubtedly some AJAX-powered applications, such as Gmail, do so to some extent. But consider the difference between Google maps and, say, Mapquest. Mapquest stores the current view's data in hidden fields in the page, which have to be sent back to the server on each request. While this is, speaking strictly, stateless operation, the server has to re-create the state of the client for every request, modify it as necessary, and then send it back. Google maps, on the other hand, can keep the state on the client, requesting new data from the server as the user moves around the map, zooms, etc. The result? The server is freed from creating/keeping/updating state and goes back to doing what it does really well, which is serve data.

So does this mean that we're going back to client/server? Doubtful. There is no silver bullet. As cool as AJAX apps (like Google Suggest, Google maps, or A9) are, I suspect that AJAX's greater value will be to add another tool to the toolset, allowing for hybrid thin client/fat client applications that improve web UI interactions and bring us to the next level of distributed applications.

Categories: soft.dev
Posted by diego on March 18, 2005 at 8:59 AM

a9's opensearch

Werner blogs about A9's new OpenSearch. From the site:

OpenSearch is a collection of technologies, all built on top of popular open standards, to allow content providers to publish their search results in a format suitable for syndication. [...]

Many sites today return search results as an tightly integrated part of the website itself. Unfortunately, those search results can't be easily reused or made available elsewhere, as they are usually wrapped in HTML and don't follow any one convention. OpenSearch offers an alternative: an open format that will enable those search results to be displayed anywhere, anytime. Rather than introduce yet another proprietary or closed protocol, OpenSearch is a straightforward and backward-compatible extension of RSS 2.0, the widely adopted XML-based format for content syndication.

Cool!

First Yahoo and now A9 are now pushing the boundaries of Search APIs. Google's API, sadly, remains in its usual we-don't-seem-to-know-what-to-do-with-this-or-want-to-support-it state. Maybe all of these announcements will spur Google to crank it up.

Categories: soft.dev
Posted by diego on March 15, 2005 at 8:54 PM

there and back again

The objectives of [this software] are 1) to promote sharing of files (computer programs and/or data), 2) to encourage indirect or implicit (via programs) use of remote computers, 3) to shield a user from variations in file storage systems among hosts, and 4) to transfer data reliably and efficiently.
Hm. Where did I get that from? Gnutella? Freenet? Some other fancier P2P app?

Nope. It's from RFC 959, circa 1985, which defines the FTP protocol (RFC 765, which it obsoletes, dates from 1980). "To promote sharing of files (computer programs and/or data)". Ain't that a riot?

One of the points I made in my thesis was that initially the Internet was truly a P2P system, and only later it moved into the client/server direction, only to slowly creep back into decentralized mode. FTP, which we hardly think about anymore, was a great example of this that I didn't use.

Consider that the original mode in which FTP worked was one where the client was actually a server as well. How so? Well, these days most FTP connections are "passive mode" connections. The "passive" there is talking about the ftp client. Normally, an FTP server accepts connections along with a port specification on the client. The server then opens connections to the client, which must have its own server for that. Passive mode enabled clients to open all connections themselves, a clear necessity as systems started to find themselves behind firewalls, NATs, and such.

The point is that even FTP, which we tend to think of as one of the prototypical client/server applications, was actually one of the prototypical peer-to-peer applications. The client and server divided the load, clients being responsible for serving transfer connections, and servers for the serving control channel connections.

There are many "loops" of this sort, sometimes repeated decade from decade. A lot of computing has been about making the same things easier, faster, or more scalable. And what's important about this is that, hard as it might be, it's always useful to know what happened before, from the 60s onward. Revisiting ideas is fun, as long as we avoid revisiting the mistakes: we should be trying to make new mistakes, not repeat the ones from the past. :)

PS: let's see how long it takes someone to note the title of the book from which the title of this entry was, er, "downloaded". :)

Categories: soft.dev
Posted by diego on March 13, 2005 at 9:33 PM

new mobibot, new GoogleME

Erik has released a new version of GoogleME and a new version of mobibot. Very cool. I still don't have a local phone, but once I do I'll actually be able to use the local features of GoogleME (which weren't very useful in Ireland :)). Also, mobibot is now uploading directly to del.icio.us, which is a great example of the use of the delicious API. I still have to make some time to play with that myself!

Categories: soft.dev
Posted by diego on March 8, 2005 at 3:17 AM

yahoo! wakes up

Yahoo does webservices!! Jeremy has a good roundup of pointers, and Yahoo search now has an excellent Developer site which includes information on their API and other things. Cool!

I keep saying that Yahoo! doesn't get enough credit for the stuff they do (probably because they have so many services and systems that it's difficult to see a "simple narrative"). Maybe this will start to change things.

Categories: soft.dev
Posted by diego on March 1, 2005 at 6:13 PM

wsss

Today I'm at the web spam squashing summit at Yahoo! HQ. I got here late, the result of a semi-crazy sequence of early-morning events that I will talk about later. Yes. More delayed posting. :)

Categories: soft.dev
Posted by diego on February 24, 2005 at 10:06 PM

myeclipse: wow

As it happens sometimes, I've now come full-circle on MyEclipse. To recap, I started thinking that, while nice, it was too slow for my needs. Then several comments on the entry pointed out that they were not seeing those problems. So after a few days, I decided to give it another try, and discovered that with the proper configuration it was actually not noticeably slower than Eclipse on its own.

Now, after about 5 days of using it constantly, I have to correct myself and say that it is truly a fantastic addition to Eclipse. The hot deployment feature is the key to it all. Once all the proper deployment features are configured, you can be editing a file, make a change, save it, and then go to the browser and reload, and the changes are there, and this applies to both JSPs and servlets. Sometimes the hot deploy fails (usually when modifying static members or persistent classes) but that's not a big deal, and restarting the server is a span in those cases anyway. This, across pretty much every major application server that's out there, "out of the box". Other cool features include autocomplete on JSPs, support for JSTL and other JSP tag libraries.

Anyway, if you use Eclipse and develop web apps and haven't tried MyEclipse yet, give it a try. Just make sure you configure it properly. :)

Categories: soft.dev
Posted by diego on February 10, 2005 at 2:47 PM

add(E), but remove(Object)?

Martin notes that in Java 5, the new-super-shiny-generics-enabled Collection interface has some inconsistencies. Most notably, the add method is dependent on the specific type for the collection, while remove and contains (most notably) are not. Here are the method contracts:

boolean add(E o)
boolean remove(Object o)
boolean contains(Object o)
Martin wonders why the discrepancy. I see two options.

Option one is a hilarious story that involves a monkey, a pantsuit, and a cherry cake interfering with the programmer's work and making her/him commit the grievous error for which she/he will feel regret the rest of her/his days. Let's call this the low-probability option.

Option two is that the Architecture Astronauts involved in the spec were carried away by the following definition of the remove method :

Removes a single instance of the specified element from this collection, if it is present (optional operation). More formally, removes an element e such that (o==null ? e==null : o.equals(e)), if this collection contains one or more such elements. Returns true if this collection contained the specified element (or equivalently, if this collection changed as a result of the call).
and this one from the contains method:
Returns true if this collection contains the specified element. More formally, returns true if and only if this collection contains at least one element e such that (o==null ? e==null : o.equals(e)).
My theory (and hoyven mayven, this is a theory only!) is that because the formal definition involves the use of the equals method and the method specifies an Object (and cannot, given that is the class at the top of the Java object hierarchy, know anything about "E") then the contract for contains and remove also follows with Object rather than E.

Sounds reasonable?

If anyone has the skinny on this, please pass it on. Alternative theories will also be appreciated. Especially if they involve monkeys (monkey-butlers are also accepted).

Categories: soft.dev
Posted by diego on February 7, 2005 at 10:23 PM

myeclipse: it's all about the config

I got several comments to my myeclipse: slow post a few days ago, from others that said, basically, that they didn't see any of those performance problems in their daily usage of Eclipse. Additionally, there was a comment from Riyad (from MyEclipse support) which mentioned a few things to look at, etc. Thanks to everyone for their comments, they definitely made a difference in making me take a second look at the product.

Okay, now the problem was that I had uninstalled, reinstalled a "clean" eclipse, and didn't have time to try everything again, until this morning. I set out with patience early today. I installed a clean copy of tomcat (5.0.30), a clean copy of MyEclipse 3.8.4 (for Eclipse 3.0.x), and tried again.

Initially, I saw the same problems as before. JSPs would appear to have syntax errors (ie. I couldn't get myEclipse to find the JSTL files, etc). I tried adding on the JSTL config myself but couldn't make it work either. Riyad's instructions did not help.

Ah, but then I thought, wait a minute, maybe the problem is that I'm trying to use a pre-existing project with the web development features.

So I created a new project with J2EE/Web Development, and, sure enough, things worked. I selected JSTL right from the start, along with the proper directory configuration, and a few minutes later the "hot deploy" feature was working.

Additionally, I haven't (yet) experienced any slowdown related to all the features that myEclipse provides.

So, take note: if myEclipse is running slowly, it's possible that the problem is that your configuration is wrong somewhere. Try creating a new project from scratch (and when it's working, removing the old ones) and properly specifying libraries, features you're using (such as JSTL or Hibernate), etc.

Conclusion: much better now that everything seems to work. The hot deploy feature is pretty good, although I've already seen it get confused a few times, the integration is definitely a plus. More later when I've had more time to play with it!

Categories: soft.dev
Posted by diego on February 6, 2005 at 3:28 PM

quack!

Martin on duck typing which is not, as you might expect, a discussion involving Daffy Duck and keyboards, but rather the differences between the Java and Ruby object systems. He's been working up his Ruby-ness recently, coming from Java, and has already posted a bunch of cool stuff on the topic. Most excellent.

Categories: soft.dev
Posted by diego on February 2, 2005 at 11:12 AM

software as sound

via Slashdot, The Sound of iPod, or how to extract software by sound. A really cool hack (as if dual-booting an iPod wasn't cool enough!). Reminded me off Matt's body-chemistry-based iPod music.

Categories: soft.dev
Posted by diego on January 30, 2005 at 10:04 AM

myEclipse: slow

And continuing with the "Eclipse" theme this morning... :)

In contrast to PHPEclipse, myEclipse actually has a nice, straightforward installer, and at the beginning everything seems to be fine. But eventually things st-art-to-slow-down. The pages I was editing include JSP tags, that seemed to confuse the JSP editor to no end, and over about two days things started to get slower. The only reason I can think of for why this started happening is that the JSP pages I was editing got progressively bigger, with more tags and commands to verify. Restarting Eclipse didn't help. Changing the memory settings (giving Eclipse 384, 512 megs of RAM) didn't help either.

The "hot deploy" feature of myEclipse is nice: it allows you to specify a variety of app servers and hot deploy to them. But. But. It severly constrains you from actually tailoring what is deployed and how, pretty much forcing you to create a certain structure in your source tree so that the deployment process works well. If you do that, however, you get some pretty good functionality.

Slow though, very slow. At least on my P4 with 1 gig of RAM. It was slo slow in fact, and I was getting so little benefit from it (I couldn't use the hot deploy, etc) that I just had to uninstall it and go back to just Eclipse.

Btw, when I say "go back to Eclipse" I mean that I had to reinstall Eclipse, since myEclipse left trash everywhere in the .metadata and .plugin directories and was making certain things fail (e.g., the Ant build) in Eclipse after the uninstall. In particular, the Ant build problem had to do with invalid settings under the workspace removing the directory workspace/.metadata/.plugins/org.eclipse.debug.core/.launches solved that.

Next up to try: Eclipse Web Tools project. For now, though, back to work.

Bonus: Over at the PHPEclipse site I learned about the "-clean" Eclipse launch option (There's also "-initialize"). Good for when you change install directories, etc., and it is not documented in the page I generally end up at when looking for Eclipse command line options (is is mentioned here though).

Update (a week later): Here's a follow-up to this entry, where I finally got myEclipse to run properly.

Categories: soft.dev
Posted by diego on January 30, 2005 at 9:42 AM

PHPEclipse: a good start

In my recent post about Zend Studio, Gerd mentioned PHPEclipse, a PHP development plugin for Eclipse 3.x. I didn't know about it, and it is pretty good.

The setup is still pretty rough around the edges, particularly when going beyond just PHP and into the MySQL/Apache/Debugger stuff (basically you need separate plugins for each). I wasn't able to actually make the debugger work, but then again I didn't have a lot of time to spend on that.

In comparison, Zend Studio is well integrated and works really well, and at this point it blows PHPEclipse out of the water (Zend also has a built-in advantage since it's not only a company, but the founders are the creators of PHP). Zend does cost $249, as Gerd noted, so it's nice to see that in PHPEclipse there's a good option in the pipeline.

Categories: soft.dev
Posted by diego on January 30, 2005 at 9:32 AM

protected does what?

Okay, quick, what's the effect of adding the protected qualifier to a class variable in Java?

If you're like me (and many others) you'd say: "protected allows only subclasses to access the variable".

Wrong!

At least according to Java 1.4 and Java 5...

We discovered this a few months back while chatting with Erik and I forgot to blog about it. Today I remembered it in a conversation with Martin.

Let's look at the Java Language Spec section for protected access, 6.6.2.1 "Access to a protected Member", it states:

"Let C be the class in which a protected member m is declared. Access is permitted only within the body of a subclass S of C."
Hm. That sounds like only subclasses should be able to access it, doesn't it. But try this code. Create package "x". Then create two classes in there, test1 and test2, with the following contents:

test1.java

package x;

public class test1
{
protected String hello = "TEST";
}

test2.java
package x;

public class test2
{
public static void main(String[] args){
test1 xx = new test1();
System.out.println(xx.hello);
}
}

Not only that will compile, but also running java x.test2 will print "HELLO". Which means that protected is giving package-wide access, not just subclass access. If you read the first definition in the spec:
A protected member or constructor of an object may be accessed from outside the package in which it is declared only by code that is responsible for the implementation of that object.
It seems to be loose in specifying only access-denied for non-package classes that do not inherit (a "negative right"), rather than only for inheriting classes (a "positive right", or is it the other way around? nevermind).

Okay, so what do you think? Does the Java Lang Spec contradict itself by not spec'ing out completely access in the first paragraph? Or is section 6.6.2.1 enough to say that this is a bug? (And I have confirmed this behavior in JDK 1.4.x and Java 5 as well).

Update: Juha, in the comments, notes that section 6.6.1 (a little before the section I referenced) makes the package-distinction clear. Martin agrees. Me? I agree that putting sections 6.6.1 together with 6.6.2 gives us the behavior we're seeing. I might be wrong in reading the text like I am (eu mentioned that in the comments), but I can't quite see it that way. I still think that Section 6.6.2 reads as if it's an all-inclusive rule, particularly since it doesn't reference section 6.6.1, only two paragraphs above, and thus sec. 6.6.2 appears to contradict both the spec from Section 6.6.1 and language behavior.

In any case, there's no mystery. At most, I'd say that section 6.6.2 has to be clarified to avoid appearing to contradict 6.6.1. Or rather, the whole section might be rewritten for clarity...

Or, I should change the grammar/parser in my head and read the spec differently (admit it, that's what you think we really needed, don't you!). Then all problems would be solved. I'll see if I can call the factory and get an upgrade or something. :-)

Categories: soft.dev
Posted by diego on January 29, 2005 at 7:57 PM

tomcat & Java 5 rant

So I go over to Jakarta this morning to get the latest stable Tomcat. Spend some time, as usual, browsing through the bewildering array of choices, until I find that the latest stable version if 5.5.4. But wait! There's also 5.0.30! Both seem to be parts of branches that are currently maintained. So what's the difference?

5.5.4 is compiled against Java 5. So to run it under JDK 1.4.x, you need a special "binary compatibility package". Do you want to bet that the 5.0.x branch is going to get less attention now that the 5.5.x branch allows Tomcat devs to play with the latest language toys?

This is not just a Jakarta issue, btw. Java 5 naturally breaks compatibility in several areas. Yeah, yeah, I know. Evolution of the language and all that. But, I don't know, I'd much prefer it if I could choose when to migrate, rather than have the choice eventually forced on me because good open-source projects start to migrate as well. Bleh.

Categories: soft.dev
Posted by diego on January 28, 2005 at 11:58 AM

Zend Studio: PHP for Java developers

Well, almost. :) I've been learning PHP recently and while at first I was a bit lost (what with coming from an all-encompassing IDE like Eclipse and all), fortunately I quickly switched to Zend Studio, which completely rocks.

Ted recently had a cool post on the IDE issue related to Python and Java, and I think that his comments (and some of the underlying discussion linked from his post) also apply to PHP.

Zend is really great as an environment in that regard (plus the environment is written in Java, using Swing. The first clue was the menu-border bug in WinXP, they should be using winlaf for an easy fix!). Because of its integrated debugging and execution environment (even installs Apache & PHP and allows you to work against that), and things like autocomplete, it feels very close to what Java people (read: me) are used to, and it surely makes things easier for newbies as well.

Anyway, if you use PHP or are interested in working with it, give Zend a try if you haven't yet, you will surely find it useful.

PS: I've been using their Zend Studio 4.0 Beta. I can't speak for 3.5 but I don't think it's that far in terms of functionality from 4.0. And, even though 4.0 is beta, I haven't had any problems with it , it's been pretty solid so far.

Categories: soft.dev
Posted by diego on January 27, 2005 at 2:14 PM

the daily wtf

I was just introduced to this site: The Daily WTF "Curious Perversions in Information Technology." I would start pointing to each good entry but I can't, they're all hilarious (or sad, depending on which side of the fence you're on). The main page is (as far as I can see) a blog-like rendering of the Daily WTF Forum from the forums on the site (check out the other forums too). Most excellent--subscribed.

Categories: soft.dev
Posted by diego on January 26, 2005 at 11:49 PM

update: setting up apache, tomcat, and mysql

tc-ap.png

One of the most visited entries from last year was my configuring apache 2 + tomcat 5 + mysql + jdbc access on linux and windows post. However, after a few months there was a change in the default Tomcat configuration (as well as changes in the connectors) that rendered the Tomcat/Apache connection part nearly useless. That is, it does apply to those specific versions, but no version of Tomcat after 5.0.18 (which is the one I used there) matches the description in the post, even though the Tomcat major version --5.x-- hasn't changed. Sigh. Those are the problems that open source creates sometimes, constant changes in tiny things, config or code, that break a lot of stuff without a clear reason (maybe there is a reason, but it's usually not communicated properly, or at all).

Anyway, I've recently done a new configuration using the latest tomcat, so here's an update to that post on that section. The other sections (that deal with Apache installation, MySQL and so on) have remained largely relevant.

The tomcat config

As far as tomcat itself is concerned, the main change was the configuration for the Context. My post says that you should look for the line "<Context path="" docBase="ROOT" debug="0">" which in the newer default server.xml config files doesn't exist anymore. So where does the context line go? It turns out that the new server.xml includes a default config for a Host. Within the host there's a Logger element set up. After that is where I've now included the context path. Similarly, I've added a Resource tag after the Context for the JDBC connection. Summarizing, After the Logger tag in the Host I've added the following:
<Context path="/appPath" docBase="APACHE_DIR/htdocs" debug="0" reloadable="true" crossContext="true"/>
<Resource name="jdbc/mysql"
auth="Container"
type="javax.sql.DataSource"/>

The connector config

I'm still using JK2. The main difference is that instead of adding the connection point in workers2.properties I am now adding it in jk.conf (which I think is a new file in the recent versions of jk2). I added the following line at the end of jk.conf

JkMount /appPath/*.jsp ajp13

Note that "appPath" is the context that was defined above.

So, I think that's it as an update. It's not a huge difference, but judging from my experience (and from emails I keep getting on the topic) this was more than enough to complicate usage of the old instructions.

Configurations are always a problem, no matter how many HOWTOs you've got, but I hope this makes things at least a bit easier!

Categories: soft.dev
Posted by diego on January 26, 2005 at 5:40 PM

web technologies: a first step towards biomimetism?

Reading Jon Udell's comments on his recent conversation with Adam Bosworth (and Jon's own musings on Alchemy and related ideas: a next-gen client that would include an XML store and the means to manipulate it) I suddenly remembered a conversation I had with Russ last weekend about the usage of tags: Flickr's, Technorati's, Del.icio.us', everyone's. And I started to think about how all of this stuff is emerging at about the same time, and not by coincidence. This is all related.

It seems that I'm pulling the connection out of thin air, but not really. Consider the infrastructure on which the most innovative apps are currently being built.

What I'm thinking is that, through web technologies (by which I mean DHTML, Javascript, scripting, XML, REST, etc), we've spent the last few years walking back from decades of "hardening" of runtime environments and development tools, and consequently of applications.

By "hardening" I mean static checking (of which Java is a good example), fairly strict runtime checking, strict parameter and I/O checking, etc. The cost of which is, of course, the need for a fairly complex environment in which to run those applications. Complex, and delicate, in a sense. Easy to "break".

Many people have been talking for a while about the need for software to become more biomimetic, at least in the sense that systems in biology appear to deal fairly well with unpredictability, failure, and interaction without strong interdependencies.

Consider: to run certain apps (say, Windows, or Java) you need certain versions of DLLs, OSes, etc. Without those, the app doesn't run at all. Java bytecode is way more portable (and portable into future platforms) than, say, an i386-optimized EXE, but it's still tied to platform and libraries. This is the equivalent of an organism dependent on a certain type of plant to survive: remove the plant and the organism dies (i.e., the app doesn't run).

But (before you call the analogy police!) look at DHTML, XML, and web technologies in general: from the start, they can run in vastly different environments. If we look again at the app as depending on and adding to the ecology of its runtime environment (instead of viewing it as a static element that just sits on it), web technologies are fairly flexible creatures: they can survive on many types of environments, both server and client. Sure, they break easily, but they can be fixed almost as easily, and, more importantly, they evolve quickly, mutating, sometimes unnoticed (Google doesn't have versions, that doesn't mean that it hasn't evolved).

So a lot of web tech is so "flimsy" (at least when looking at it through the lens of static/strict checking). But I think that's it precisely why it works. And why, where things like applets (and, yes, ActiveX controls) failed, DHTML, JavaScript, XML, and simple REST interfaces are succeeding in creating a rich ecosystem of apps that build on each other.

PS: I have to think about this a bit more, certainly to come up with a better explanation. Hopefully it makes some sense though. But is this one of those great/crazy 3 am thoughts or what? :)

Categories: soft.dev
Posted by diego on January 21, 2005 at 2:50 AM

outputting dates in RFC822 and ISO8601 formats

Okay, this is something else that is simple but generally requires to look at specs just to see how to do it properly. And these days, what with everyone generating RSS and such, creating properly formatted RFC822 and ISO 8601 dates is important to make sure the feeds validate. So here's the code, using Java's SimpleDateFormat, to output both formats properly.


import java.util.Date;
import java.text.SimpleDateFormat;

//... class definition here...

public static SimpleDateFormat ISO8601FORMAT
    = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
public static SimpleDateFormat RFC822DATEFORMAT
    = new SimpleDateFormat("EEE', 'dd' 'MMM' 'yyyy' 'HH:mm:ss' 'Z", Locale.US);

public static String getDateAsRFC822String(Date date)
{
  return RFC822DATEFORMAT.format(date);
}

public static String getDateAsISO8601String(Date date)
{
  String result = ISO8601FORMAT.format(date);
  //convert YYYYMMDDTHH:mm:ss+HH00 into YYYYMMDDTHH:mm:ss+HH:00
  //- note the added colon for the Timezone
  result = result.substring(0, result.length()-2)
    + ":" + result.substring(result.length()-2);
  return result;
}


Later: Aside from some fixes due to comments (thanks!), Erik, via IM, notes that RFC 822 has been superseded by RFC 2822, in which years on dates use four digits rather than two. The RSS Spec actually encourages the use of four digits from RFC 2822 (even though it explicitly mentions RFC 822 only). So I modified the code for that.

Also, Zoe, via email, notes that a cleaner way of generating ISO 8601 dates is to use the UTC designator "Z", which has the added advantage of making dates easier to parse back in Java. Clearly this is a better solucion when possible, however, this means that all times have to be changed into UTC, something that might not be desirable in many cases where you want to maintain Timezone information. The parsing of an ISO 8601 date with the colon is similarly "dirty" by having to remove the ":" by hand before being able to parse it from Java.

Categories: soft.dev
Posted by diego on January 4, 2005 at 2:26 PM

hiding mailto: addresses with javascript

Making it harder for crawlers to discover email addresses is something that is not hard to do, but it's one of those things that I at least keep redoing just because it's so easy to do. But it should be instantaneous! :) So here's some code, which I've written in different forms over the years, neatly packaged.

What the code does is run a simple substitution cypher on the address, then shows all the code you'd need to include on a page to make it work (using simple CSS/DHTML tricks, making a layer visible). It also lets you include text to put on the links, so that a single Javascript call put anywhere on the page generates the full mailto: link.

By changing the ordering of letters in the key variable, you can change the way it encrypts things for your page. Okay, "encryption" maybe a bit of an overstatement here, but technically... :)

Just for clarity, here's the link again. It's all under an MIT License, so you're free to do with it as you will. Hope this is useful!

Categories: soft.dev
Posted by diego on January 4, 2005 at 12:57 PM

opml icon?

This webapp I'm writing for Nooked has, as one of its outputs, OPML (It outputs RSS as well of course). Now for the RSS output I'm using the common white-on-orange XML icon:


But for OPML there doesn't seem to be a common icon. I wanted something along similar lines, and I did this:
opml.gif

Now, my question is: Does this exist already? I have the strongest feeling of having seen this, somewhere... has anyone else? Why blue? I don't know. It seemed appropriate. One thing that bugs me a bit is that OPML is technically XML as well (although many OPML files out there don't validate) so you really have to look "beyond meaning" and just say "orange icon: RSS, blue icon: OPML" (or whatever) regardless of what they say. Maybe it's time to have an orange "RSS" icon? Or is the XML icon too ingrained by now?

Btw, there are many possibilities for this set of questions to spawn a flamewar, I'm really not interested in that. :) Mostly, I wonder: do people have a favorite way of representing OPML feeds, either in their own apps or as users? Any feedback will be appreciated!

Categories: soft.dev
Posted by diego on December 28, 2004 at 12:32 PM

javaone 2005 call for papers

The JavaOne 2005 call for papers is now open. I wonder if I could sumbit something. But first I wonder if I'll have the time to actually write it. :)

Categories: soft.dev
Posted by diego on December 23, 2004 at 12:13 PM

a manifold kind of day

I'm about to go out for a bit, but before that, here's what I've been talking about for days now: the manifold weblog. (About my thesis work). The PDF of the dissertation is posted there (finally!), as well as two new posts that explain some more things:

as well as the older posts on the topic, plus a presentation and some code in the last one. Keep in mind that this is research work. Hope it's interesting! And that it doesn't instantaneously induce heavy sleep. :)

my favorite Java 5 change

I used Java 5 (with Eclipse 3.1) for the code I wrote last week to use as example for manifold, and there's no question: the enhanced for loop, combined with generics, rocks.

Aside from the basic difference of going from (Note, use of an inline iterator is also common):

//strings is an ArrayList
for (int i=0; i<strings.size(); ++i) {
String s = (String) strings.get(i);
//do something with s
}
to
for (String s : strings) {
//do something with s
}
there's also the cooler use of it to iterate over the contents of a HashMap, so instead of doing
HashMap m = new HashMap();
//...
//fill the map with String, Vector values
//...
Iterator it = m.keySet().iterator();
while (it.hasNext()) {
String key = (String) it.next();
String value = (Vector) m.get(key);
}
you can do
HashMap m = new HashMap();
//...
//fill the map with String, Vector values
//...
for (String key : m.keySet()) {
Vector value = m.get(key);
}
Which is much more concise, and clear IMO. Very cool.

Categories: soft.dev
Posted by diego on December 20, 2004 at 6:19 PM

under attack

Through the last week the clevercactus site has been sporadically unavailable, and it's down right now. This means no web, no service, no emails getting through.

If you're trying to get through to clevercactus and can't please let me know through a comment or email to my personal address.

What happened is that we were attacked (I'm not sure when) and someone left a number of scripts there that are flooding the system (they do other things too, but at least one of them is clearly written simply to flood the network and disable it). This is something obviously intended to bring down clevercactus, not just a simple hacking. Why? What do they gain by bringing down the service of a small company that is going through hard times?

This kind of thing makes me sad, and is really discouraging.

I had this whole thing planned for today, getting the manifold site up and so on but now I'm going to spend time trying to see how to route around the problem for now until we can determine the extent of the hack. I don't even know how they got in yet--we constantly update our software with the latest patches. Needless to say, I'm seriously reconsidering the whole of the software I use and how to set it up so that this doesn't happen again.

Anyway. We'll see how it goes.

Categories: clevercactus, soft.dev, technology
Posted by diego on December 16, 2004 at 2:31 PM

conversation engine: the next step

Since the recent integration of Feedster results into the conversation engine, I stopped coding for a bit and while doing other stuff I've been thinking of how to make it more scalable, covering more weblogs, and not wasting resources in looking at pages with no meaning (read: make it more useful) --- in short, how to solve the problems I mentioned in that entry.

The crucial problem is that Feedster provides only part of the picture. Scott Rafer (Feedster CEO) mentioned in the comments that I could use the Feedster links output, which provides a list of the references to a particular weblog. This doesn't quite do what I need however. The reason is simple: Feedster indexes RSS feeds, not entire sites, and so if someone is providing summary feeds, then Feedster will not be able to find links between weblogs, even if they exist. Because, many, many weblogs provide summary feeds, it is clear that the only way to get the links between entries is to get the actual contents of the HTML page. But.

But what I can do is use Feedster as the source point for the list of pages to index. Right now I am indexing everything on a given website. This has two drawbacks. First, I am forced to download, store, and analyze, waaay more content than I need (which accounts for the small amount of sites the bot is crawling at the moment), particularly when weblogs point to other parts of a site, including Wikis, dynamic apps, etc. Second, it slows down the processing for conversations, which depends on walking the link graph between two sites. This is a problem now, but if I move in the direction of adding multiple-participant conversations (as Don suggests in a comment to my previous conv. engine post, linked above) then this will be even more important.

So.

Next step, then, is to use Feedster as the data source for the entries of a given weblog. Then download/process the pages for each entry's permalink. Then analyze that and combine the results with the Feedster information.

Stay tuned! More in the next few days.

Categories: soft.dev
Posted by diego on December 7, 2004 at 10:36 AM

microsoft and software piracy: oops

This isn't new-news (it's about 3 weeks old), but is still surprising enough that I'm blogging it now: Patroklos notes that Microsoft has used a cracked version of the application Sound Forge to create some of the media files in Windows XP. Patroklos has a screenshot there, I verified it myself in my own copy of Windows XP. To check it, just open any of the WAV files present under C:\WINDOWS\Help\Tours\WindowsMediaPlayer\Audio\Wav and go to the end, you'll see the "Deepz0ne" signature that marks the source app as cracked. This definitely seems to be what it appears.

Oops.

I'm sure that MS will fix this quickly, when an organization is so large, things like these are probably bound to happen. Still, it doesn't look good. If Microsoft itself can't properly enforce licensing with its own employees and contractors (though, admittedly, in a small scale, considering the size of its products), it weakens their own ability to condemn it and prosecute it.

I'm sure there's a lot more fortune-cookie-wisdom that could be expounded from this incident but it's too early in the day, so I'll leave it at that. :)

Categories: soft.dev
Posted by diego on December 6, 2004 at 9:11 AM

conversation engine, v0.2 (enter feedster)

I've been busy with other things, but I had a couple of hours this morning to make some mods to the conversation finder. First, I changed its name. It is now the conversation engine. This is a name that Don used and that I thought was much better than mine, so there it is.

The main change in this version is that I am now using Feedster's search results, combined with my own spidering, to find conversations. This has to consequences, one good and one bad.

The good consequence is that I can now use Feedster's stored metadata for the post, which is excellent. Check out the "canonical" :) search for conversations between Don and Tim. Now you've got pictures and everything! Great. but you'll notice there's one less conversation than there used to be, which brings me to the "bad" consequence.

The bad consequence is that, because I am at the moment using only feedster's first 100 results (the maximum feedster allows on queries) as a filter, it means I am losing the earlier data I had. The data is still on my DB, since it was spidered, but the engine is no longer finding it. This is an easy fix though: just iterating through more feedster results will do the trick, "activating" the older spidered pages on my DB. (BTW, this is why also some conversations that showed up before aren't showing up anymore. I'll be sure to post once I have made this fix, since it's pretty crucial.)

Why am I using both the Feedster DB and my own results, you ask? Because Feedster is, at the moment, returning only a parsed version of the RSS feed for that site:just HTML. And that means that there are no links in the entry. And that means that I can't create the conversation graph, since there are no links to follow.

Even if Feedster was providing the raw entries, it would still be a problem. The reason is summaries: many RSS feeds provide summaries, not the whole content, so it's not guaranteed that you'll be able to extract links from a feed's description element. That means that you absolutely need to crawl the entire site, and then use the combination of Feedster's results and the crawling ("intersecting" them) to come up with the list of links you're interested in.

Anyway, looks much better now, doesn't it? :)

Categories: soft.dev
Posted by diego on December 5, 2004 at 6:40 PM

conversations, metadata, and the URL disambiguation problem

Since my first tack at the conversation finder experiment looks promising, I was looking at what didn't work, and thinking about how it could be improved.

The first point that was clear was that metadata is crucial for this--exactly the kind of metadata that is present in RSS feeds: Author, top-level link, date for posts, etc. While it seems possible to infer a lot of this stuff from the raw HTML, one crucial component, the dates, can't be. While the Last-Modified header in HTTP responses would be mildly useful, it doesn't actually help much becase the page can be rebuilt both by the author and others (e.g., when they post comments).

The date, however, is important but not crucial. The sequencing of the "conversation" is determined by the direction of the link-graph, not by dates.

What is crucial though is solving the ambiguity/duplication of URLs. Most weblogs have archives, which repeat the information already present elsewhere. The result is that the same posts appear many times within the entire index of a single site. Archives cannot be avoided by some generic algorithm because their "shape" varies greatly. So you end up with many pages that have the same content and even appear to create loops within the conversation, particularly when a single archive page contains two posts that belong to the conversation. Right now, I am doing some analysis on the text that surrounds the link to determine whether this is a "duplicate" (see for example the "Other pages from this site with the same reference include" in this conversation list). But while putting duplicates together is clearly possible, I still don't know which was the actual original post and which are archive duplicates--and you'd want the original post of course.

The second problem of URLs is that multiple, completely different URLs point to the same content. This is a notable case in weblogs where sometimes the blog has moved providers over time, chosen a better URL, and both URLs are maintained. Scoble is a good example of this, having both http://scoble.weblogs.com/ and http://radio.weblogs.com/0001011/ pointing to exactly the same content. The only way for the software to "know" that this is actually the same site is by looking at the metadata in the feed (since checking the content will not necessarily be foolproof, since the sites could be slightly out of sync).

Then there's the simpler issue of the host (rather than the full URL) being different in different links. The simplest case is of course something.com and www.something.com pointing to the same thing. But Rui for example has equivalents in www.taoofmac.com and the.taoofmac.com. Again the feed would provide a lot of information to resolve the ambiguity and realize that these two seemingly different sites are actually the same. In this case other things are possible as well, IP checks, content checks, etc, but metadata seems to me a simpler and more effective solution.

I have other things to do today, but this is definitely something interesting to keep thinking about in the background.

Bonus: In the comments to my previous post Don mentioned the idea of "combustible conversations" which means, as describes it "bringing past cluster of interactions to the present when and where it's relevant." This is a great idea! Also something to think about.

Categories: soft.dev
Posted by diego on December 4, 2004 at 3:36 PM

conversation finder v0.1

Okay, so I actually should start using conversation engine, the the name Don suggested and which is think sounds cooler. But for the moment it's still the Conversation Finder. The first version is now live!

This is a very limited version. Only a few sites are being indexed, mostly out of concerns for speed, bandwidth and such. I'll see about expanding it later.

First thing to look at is this result, the conversations the engine (finder?) discovers between Don and Tim Bray.

Interestingly enough, it finds one more aside from their recent Atom conversation, something about flowers :). This is great! It is finding actual conversations!

But... the results are just a just little bit off. I keep seeing what it finds and thinking, "come on, you're so close!". Some links are loops. Some links are pointing to index pages (which might have the content, but...). Some of the text extracts are not relevant (look at "conversations" between the other sites that it's indexing).

I think a big factor here is the fact that the engine knows nothing of archives, or people that run these blogs. Archives duplicate a lot of information, and the engine gets a little confused by that. So maybe the next step is to fiddle around with some of the metadata present in pages for weblogs (the metadata on RSS would be great, particularly the dates, to infer sequencing, however, RSS feeds only go as far back as a few days or posts, so all that's left is parsing for the different types of metadata embedding in HTML).

Anyway, not bad for a few hours of work and a 0.1 version. Looks promising! Now if I just find a way of letting others enabling spidering of their sites without killing my server's bandwith... :))

PS: I wasted a couple of hours on Tomcat setup. Why? Because the JARs I was deploying in WEB-INF/lib didn't have write privileges. Tomcat wants them writable! And it was failing without any error messages, simply not loading the classes in the JARs (and yes, I tried common/lib). Anyway, all is well that ends well.

Update (5/12/2004): The Conversation Finder is now the Conversation Engine.

Categories: soft.dev
Posted by diego on December 3, 2004 at 8:13 PM

manifold, the 30,000 ft. view

As a follow-up to my thesis abstract, I wanted to add a sort of introduction-ish post to explain a couple of things in more detail. People have asked for the PDF of the thesis, which I haven't published yet, for a simple reason: everything is ready, everything's approved, and I have four copies nicely bound (two to submit to TCD) but... there's a signature missing somewhere in one of the documents, and they're trying to fix that. Bureaucracy. Yikes. Hopefully that will be fixed by next week. When that is done, right after I've submitted it, I'll post it here (or, more likely, I'll create a site for it... I want to maintain some coherency on the posts and here it gets mixed up with everything else).

Anyway, I was saying. Here's a short intro.

Resource Location, Resource Discovery

In essence, Resource Location creates a level of indirection, and therefore a decoupling, between a resource (which can be a person, a machine, a software services or agents, etc.) and its location. This decoupling can then be used for various things: mapping human-readable names to machine names, obtaining related information, autoconfiguration, supporting mobility, load balancing, etc.

Resource discovery, on the other hand, facilitates search for resources that match certain characteristics, allowing then to perform a location request or to use the resulting data set directly.

The canonical example of Resource Location is DNS, while Resource Discovery is what we do with search engines. Sometimes, Resource Discovery will involve a Location step afterwards. Web search is an example of this as well. Other times, discovery on its own will give you what you need, particularly if the result of the query contains enough metadata and what you're looking for is related information.

RLD always involves search, but the lines seemed a bit blurry. When was something one and not the other? What defines it? My answer was to look at usage patterns.

It's all about the user

It's the user's needs that determine what will be used, how. The user isn't necessarily a person: more often than not, RLD happens between systems, at the lower levels of applications. So, I settled on the usage patterns according to two main categories: locality of the (local/global) search, and whether the search was exact or inexact. I use the term "search" as an abstract action, the action of locating something. "Finding a book I might like to read" and "Finding my copy of Neuromancer among my books" and "Finding reviews of a book on the web" are all examples of search as I'm using it here.

Local/Global, defining at a high level the "depth" that the search will have. This means, for the current search action, the context of the user in relation to what they are trying to find.

Exact/Inexact, defining the "fuziness" of the search. Inexact searches will generally return one or more matches; Exact searches identify a single, unique, item or set.

These categories combined define four main types of RLD.

Examples: DNS is Global/Exact. Google is Global/Inexact. Looking up my own printer on the network is Local/Exact. Looking up any available printer on the network is Local/Inexact.

Now, none of these concepts will come as a shock to anybody. But writing them down, clearly identifying them, was useful to define what I was after, served as a way to categorize when a system did one but not the other, and to know the limits of what I was trying to achieve.

The Manifold Algorithms

With the usage patterns in hand, I looked at how to solve one or more of the problems, considering that my goal was to have something where absolutely no servers of any kind would be involved.

Local RLD is comparatively simple, since the size of the search space is going to be limited, and I had already looked at that part of the problem with my Nom system for ad hoc wireless networks. Looking at the state of the art, one thing that was clear was that every one of the systems currently existing or proposed for global RLD depends on infrastructure of some kind. In some of them, the infrastructure is self-organizing to a large degree, one of the best examples of this being the Internet Indirection Infrastructure (i3). So I set about to design an algorithm that would would work at global scales with guaranteed upper time bounds, which later turned out to be an overlay network algorithm (which ended up being based on a hypercube virtual topology), as opposed to the broadcast type that Nom was. For a bit more on overlays vs. broadcast networks, check out my IEEE article on the topic.

Then the question was whether to use one or the other, and it occurred to me that there was no reason I couldn't use both. It is possible to to embed a multicast tree in an overlay and thus use a single network, but there are other advantages to the broadcast algorithm that were pretty important in completely "disconnected" environments such as wireless ad hoc networks.

So Nom became the local component, Manifold-b, and the second algorithm became Manifold-g.

So that's about it for the intro. I know that the algorithms are pretty crucial but I want to take some time to explain them properly, and their implications, so I'll leave that for later.

As usual, comments welcome!

Categories: science, soft.dev, technology
Posted by diego on December 3, 2004 at 12:41 PM

conversation finder, part 3

The conversation finder saga continues! :) (Parts one, two)

Parser's done, at least in basic form. Both parser and bot seem to be running and playing along nicely. I created a simple conversation finder site to have a fixed point of reference for all this, particularly for the Bot which should start showing up in some logs any minute now. I am keeping it under control by manually specifying which sites it can download (I know this isn't scalable, but it's an easy fix once the rest is done), and at the moment only three are active: My weblog, Don's and Tim's, since the core of the idea came from a conversation Don was having with Tim, I'll use that as the "index case". If the finder actually finds conversations there, I'll start expanding the field, or possibly add a form so that others can "activate" spidering of a site.

But that's for later. Now, regarding the parser, a few interesting things.

As it turns out, parsing itself was a lot simpler than interpreting the information. I am using the HTML Parser in the JDK's HTMLEditorKit, which is actually quite easy to use: just define a Callback and specify what to do with each tag opening, closing, etc. But for the algorithm I'm using links between pages, which sounds simple enough... until you realize that links come in many shapes and sizes. Normalization of links into full URIs took a bit of figuring out. What I needed to do was, starting from any possible HREF, end up with a full URI, of the form: scheme + hostinfo + path + query + fragment (hostinfo actually has components, but let's leave that aside for now).

But URLs in HREF can be both relative and absolute. Relative URLs can be absolute within the site (e.g., /d2r/) or relative to the current page (e.g., ../index.html). Absolute URLs vary in form even if they point at the same thing (e.g., www.dynamicobjects.com and dynamicobjects.com), and you can also use IP numbers.

Then there's parsing errors, and URLs that are malformed but may be "recoverable" through fancier parsing, but I decided to ignore that for the moment. Another decision was to ignore IP URLs (i.e., not attempt to match them with the site), but this is an easy fix and not that critical I think--no weblogs that I know of use IP URLs for permalinks.

For separating the pieces I'm using elements of the following regular expression:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
which is specified in Appendix B of the URI RFC, with the java.util.regex.* classes.

Here's how I'm normalizing URLs at the moment.

For absolute URLs, the main case that has to be disambiguated is something.com vs. www.something.com, which in the vast majority of the cases applies only to the root domain (i.e., www.something.com might point to something.com, but www.other.something.com probably won't exist at all for other.something.com). So there I'm doing a couple of checks and converting something.com to www.something.com when necessary.

For relative URLs I use the current site and page to generate the full URI. With the full URI, I then normalize the path when the reference is relative-relative (e.g., ../index.html).

This solution is far, far from perfect (I don't even want to think about how many special cases I'm not covering), but it's good enough for now.

I'm now working on the algorithm to find conversations. Getting close!

Categories: soft.dev
Posted by diego on December 2, 2004 at 12:17 PM

conversation finder, part 2

Okay, so going forward with the conversation finder thingy, I think I'm pretty much done with the bot and DB layer for it (nothing to go crazy about, just a few classes, mysql statements, and such). My recent musings on search bots have been helpful, since I had already considered a number of problems that showed up.

DB-related, I simply created a few tables in MySQL (4.x) to deal with the basic data I know I'll need, and then I'll add more as it becomes necessary. To start with, I've got:

  • Site, including main site URL, last spidered date, and robots.txt content.
  • Page, which includes content, URL of a page, pointer to a Site, parsed state, last spidered.
  • PageToPageLinks, just two fields, source PageID and target PageID.
Add to that some methods to create, delete, and query and we're in business.

Bot-related, a simple bot using the Jakarta HttpClient. Why that and not the standard HttpURLConnection from the JDK? Because HttpURLConnection doesn't allow you to set timeouts. That simple. And when you're downloading potentially tens of thousands of links, you need tight timeouts, otherwise a single slow (or worse, non-responsive) site can throw a wrench on things, even if you use thread pools to do simultaneous parsing, you can have threads be locked for way too long and diminish the usefulness of pooling.

Anyway, so the basics out of the way, the bot records ETag and Last-Modified values (which sounds like a good idea but maybe won't always work--we'll see later) to download only changed things. It performs a HEAD request and then a GET if necessary.

But, since there's no parser yet, I can only download single pages I specify myself.

So, coming up: the parser. :)

PS: just to clarify again, I am doing this as a way to relax a bit. This might exist in other forms, in fact, James pointed out in a comment to the previous entry that BottomFeeder already supports this--very cool. I think it doesn't exist in this form, but even if it did, it would be good excercise for the brain anyway.

Categories: soft.dev
Posted by diego on December 1, 2004 at 5:10 PM

conversation finder

My next step is to write something different to get the gears moving in my head again, and Don's conversation category idea is appealing. So, let's do that. :)

Don described it as:

A conversation aggregator subscribes to the category feed of all the participants and merge them into a single feed and publishes a mini-website dedicated to the conversation. The 'referee' of the debate or the conversation moderator gets editorial rights over the merged feed and the mini-website. Hmm. This stuff is very close to what I am currently working on so I think I'll slip this feature in while I am at it.
Since this is probably too big to do in quick & dirty fashion, I was a little worried. But last night I thought of a different approach. How about something that finds conversations, rather than is subscribed to certain categories?

After all, we already have a mechanism to define conversation thread: the permalink. Generally when you're in a cross-blog thread, you point back at the last "reply" from the other person. A cross-blog thread also has the advantage of being a directed graph, with a definite starting point. So permalinks and some kind of graph traversal thingamagic could be used to find the threads that exist, and maybe are in progress.

As Don notes, sometimes you might refer to the other party by name, or make oblique references. That could be step two, using text-based search to add some more information to the graph formation. But let's say we start with permalinks only.

Hm. Okay. So what do I need for this? First things that come to mind:

  • A crawler
  • A DB (the tables, I mean)
  • A parser (to find the set of links)
  • The algorithm to find the conversations
  • Some kind of web front end to make it more usable?
Neither the crawler nor the parser have to be super-sophisticated, so maybe they are doable in a few hours. Or a couple of days?

This sounds like a good starting point. First step should be DB & crawler. More later!

Categories: soft.dev
Posted by diego on December 1, 2004 at 6:52 AM

what my thesis is about

Since one of the starting points will be to talk more about my thesis, here goes the abstract. Maybe I'll start a new site or blog for it, to keep things cleaner (there's a lot of stuff to discuss) but for the moment this will do, and the abstract is as good a place to start as any.

Sorry about the use of the "royal 'we'" but this is pretty much a copy/paste of the abstract from the dissertation. Also, maybe it takes some flights of fancy in terms of possibilities, but that's the point of research, isn't it?

Anyway, here goes:

Self-Organizing Resource Location and Discovery

by Diego Doval (abstract - 30 September, 2003)

Networked applications were originally centered around backbone inter-host communication. Over time, communications moved to a client-server model, where inter-host communication was used mainly for routing purposes. As network nodes became more powerful and mobile, traffic and usage of networked applications has increasingly moved towards the edge of the network, where node mobility and changes in topology and network properties are the norm rather than the exception.

Distributed self-organizing systems, where every node in the network is the functional equivalent of any other, have recently seen renewed interest due to two important developments. First, the emergence on the Internet of peer-to-peer networks to exchange data has provided clear proof that large-scale deployments of these types of networks provide reliable solutions. Second, the growing need to support highly dynamic network topologies, in particular mobile ad hoc networks, has underscored the design limits of current centralized systems, in many cases creating unwieldy or inadequate infrastructure to support these these new types of networks.

Resource Location and Discovery (RLD) is a key, yet seldom-noticed, building block for networked systems. For all its importance, comparatively little research has been done to systematically improve RLD systems and protocols that adapt well to different types of network conditions. As a result, the most widely used RLD systems today (e.g., the Internet's DNS system) have evolved in ad hoc fashion, mainly through IETF Request For Comments (RFC) documents, and so require increasingly complex and unwieldy solutions to adapt to the growing variety of usage modes, topologies, and scalability requirements found in today's networked environments.

Current large-scale systems rely on centralized, hierarchical name resolution and resource location services that are not well-suited to quick updates and changes in topology. The increasingly ad hoc nature of networks in general and of the Internet in particular is making it difficult to interact consistently with these RLD services, which in some cases were designed twenty years ago for a hard-wired Internet of a few thousand nodes.

Ideally, a resource location and discovery system for today's networked environments must be able to adapt to an evolving network topology; it should maintain correct resource location even when confronted with fast topological changes; and it should support work in an ad hoc environment, where no central server is available and the network can have a short lifetime. Needless to say, such a service should also be robust and scalable.

The thesis addresses the problem of generic, network-independent resource location and discovery through a system, Manifold, based on two peer-to-peer self-organizing protocols that fulfill the requirements for generic RLD services. Our Manifold design is completely distributed and highly scalable, providing local discovery of resources as well as global location of resources independent of the underlying network transport or topology. The self-organizing properties of the system simplify deployment and maintenance of RLD services by eliminating dependence on expensive, centrally managed and maintained servers.

As described, Manifold could eventually replace today's centralized, static RLD infrastructure with one that is self-organizing, scalable, reliable, and well-adapted to the requirements of modern networked applications and systems.

Categories: soft.dev
Posted by diego on November 30, 2004 at 6:36 PM

looking for the next big thing

So. A week has gone by with no posting. Lots has happened, but more than anything it's been a time of consolidation of what had been happening in the previous weeks. First, the short version (if you have a couple of minutes, I recommend you read the extended version below): tomorrow is my last day working for clevercactus. And that means I'm looking for the next thing to do. So if you know of anything you think I could be interested in, please let me know.

Now for the extended version.

For the last couple of months (and according to our plan) we have been looking for funding. Sadly, we haven't been able to get it. This hasn't just been a matter of what we were doing or how (although that must be partly a problem) but also a combination of factors: the funding "market" in Europe and more specifically in Ireland (what people put money into, etc), our target market (consumer) and other things. Suffice it to say that we really tried, and, well, clearly it was a possibility that we wouldn't be able to find it.

On top of this, I haven't been quite myself in the last few weeks, maybe even going back to September (and my erratic blogging probably is a measure of that). By then I was quite burned out. Last year was crazy in terms of work, and this one was no different: between January and the end of July I only took two days off work (yes, literally, a couple of Sundays) and the stress plus that obviously got to be too much. I see signs of recovery, but clearly this affected how much I could do in terms of moving the technology forward in recent weeks. Since there's only two of us, and it's only me coding (my partner deals with the business side of things), this wasn't the most appropriate time to have a burnout like that. I screwed up in not pacing myself better. Definitely a lesson learned there.

At this point, the company is running out of its seed funding and we don't have many options left. Even though it's possible that something would happen (e.g., acquisition), what we'll be doing now is to stop full time work on the company, which after all won't be able to pay for our salaries much longer, and look for alternatives since of course we need to, you know, buy food and such things. The service will remain up for the time being, and I'll try to gather my strength to make one last upgrade (long-planned) to the site and the app, if only just for the symmetry of the thing. Plus, you can't just make a service with thousands of users disappear overnight. Or rather, you can, but it wouldn't be a nice thing to do.

Now I have a few weeks before things get tight, and I'll use that time to get in the groove again and hopefully find something new to do that not only will help pay for the bills but is cool as well. Who knows? I might even end up in a different country! As I said at the beginning, if you know of something that I might find interesting, please send it my way. Both email and comments are fine (my email address can be found in my about page).

In the meantime, I'm going to start blogging more. No, really. I have some ideas I want to talk about, and maybe I can get back into shape by coding (or thinking about) something fun and harmless.

Or, as the amended H2G2 reads: Mostly harmless. :)

because search bots have feelings too

For reasons passing understanding, in the last couple of weeks I've developed a curiosity for the topology of both content and links in certain groups of webpages.

So today I sat down and wrote an extremely simple bot/parser to get some data. I was done in about an hour, tested a bit, fiddled, and it started to dawn on me just how hard it is to build a good search bot.

We hear (or read) to no end about the algorithms that provide search results, most notably with Google's. There's a vast number of articles about Google that can be summarized as follows: "PageRank! PageRank! PageRank is the code that Rules the Known Universe! All bow before PageRank! Booo!" (insert "blah blah" at your leisure instead of spaces).

But what's barely mentioned is how complex the Bots (for Google, Yahoo!, Feedster, etc) must be at this point (I bet the parsers aren't a walk in the park either, but that's another story). You see, the algorithm (PageRank! PageRank! Booo!) counts on data already processed in some form. Analyzing the wonderful mess that is the web ain't easy, but the "messiness" that it has to deal with is inherent to its task.

But the task of a Bot, strictly speaking, is to download pages and store them (or maybe pass them on to the parser, but I assume that no one in their right mind would tie in parsing with crawling--it seems obvious to me that you'd want to do that in parallel and through separate processes, using the DB as common point). And yet, even though the task of the Bot is just to download pages, it has to deal with a huge amount of, um, "externalities."

In other words, the bot is the one that has to deal with the planet (ie., reality), while the ranking algorithm (PageRank! PageRank! Booo!) sits happily analyzing data, lost in its own little world of abstractions.

Consider: some sites might lock on the socket and not let go for long periods. Tons of links are invalid, and yet the Bot has to test each one. There are 301s, 404s, 403s, 500s, and the rest of the lot of HTTP return codes. Compressed streams using various algorithms (GZIP, ZLib...). Authentication of various sorts. Dynamic pages that are a little too dynamic. Encoding issues. Content types that don't match the content. Pages that quite simply return garbage. And on and on.

What makes it even harder is that the chaotic nature of the Internet forces the Bot (and those in charge of it) to go down many routes to try to get the content. A Bot has to be:

  • extremely flexible, able to deal with a variety of response codes, encodings, content types, etc.
  • extremely lax in its error management (being able to recover from various types of catastrophic failures).
  • extremely good at reporting those errors with enough information so that the developers can go back and make fixes as appropriate (to deal with some kind of unsupported encoding, for example).
  • as fast as possible, minimizing bandwidth usage.
  • respectful of all sort of external factors: sites that don't want to be crawled, crawling fast, but not too fast, (or webmasters get angry), robots.txt and meta-tag restrictions, etc.
  • massively distributed (with all that it entails).
...as well as any number of things that I probably can't think of right now.

Bots are like plumbing: you only think about them when they don't work. Of course, the algorithm is crucial, but the brave souls that develop and maintain the bots deserve some recognition. (The parser people too :)).

Don't you think?

PS: (tangentially related) Yahoo! should get a cool name for its algorithm, at least for press purposes. (Does it even have a name? I couldn't find it). Otherwise referring to it simply as a "ranking algorithm" --or something-- is kind of lame, and journalists steer towards PageRank and we end up with "PageRank! PageRank! Booo!". :)

Categories: soft.dev
Posted by diego on November 20, 2004 at 4:04 PM

the new MSN search: an unmitigated disaster

The first pointer I got to it was via Dave (Interestingly, there wasn't a Slashdot article on it--maybe I missed it, but I don't think so). There I went, to http://beta.search.msn.com/.

The home page loaded quickly, which was a good sign. I liked its simplicity, but I wasn't going to give them any points for copying Google.

Then I typed in a simple search: "microsoft", and waited.

And waited.

And waited.

Two minutes later, I got this result.

That didn't look good at all. But who knows, maybe it was a fluke.

So I did it again.

Same result.

"Maybe they have deep-seated psychological problems that prevent them from returning their own results properly," I thought. So I tried "linux" (without success), then switched back to MS-themed searches with "microsoft visual studio," then started trying random queries.

Nothing worked!!

This lasted for about twenty minutes. I kept trying, because I couldn't quite believe what I was seeing (I have tons of respect for Microsoft's software development prowess). Then, when I was about to give up, I tried "microsoft" again, you know, just in case, and there it was.

One result. (Yes, one result, look at the screenshot).

Just one?

Just one.

Not only just one result, but also the response was "1-1 of 1", which must mean I've been asleep for a few centuries and now there's only a single page with the term "microsoft" in the planet. Also, note how there didn't seem to be any problems in finding ads for it.

"There goes nothing," I thought, and I tried "linux". Another fantastic one-query-hit-page.

In fact, it wasn't just that it was returning a single result, it was also that it was splattering the page with ads, at the top, at the bottom, and to the right. After those two, er, "successes," I tried a few more queries that returned no results at all, or worse, outdated pages! (weeks and weeks old).

And, in case you're wondering, I am not making this up. Those screenshots are real.

I could add a thousand things: that they should have added more hardware, or made sure that the thing worked before releasing it, or whatever, but I'm not fond of repeating the utterly obvious.

I will say, though, that there are two search engines I use at the moment, Google and A9. Occasionally, I use Teoma and Yahoo!.

And it doesn't seem that I'll be adding Microsoft to the list any time soon.

PS: if any Microsofties happen to wander through this entry and want to know more for debugging purposes, I ran my searches between midnight and 1:00 am PST (8:00-9:00 AM GMT).

Categories: soft.dev
Posted by diego on November 11, 2004 at 6:49 PM

feedster developer contest

Feedster has launched a Developer Contest (see also). Prizes are iPods for the winner on each category (and there are more than a few of them). Normally I don't have time for contests, but in this case it seems that I already have entries ready for at least two categories: Feedster plugin for FireFox (which I wrote last year and is linked to in the Feedster Help page and it's also available via mycroft... but maybe it counts! :)) and Intro to RSS with my introduction to syndication (with its companion introduction to weblogs).

Should be interesting to see the things people come up with.

Hmmm... iPod....

Categories: soft.dev
Posted by diego on November 10, 2004 at 2:43 PM

slides in CSS

[via Joel]: S5: A Simple Standards-Based Slide Show System.

"S5 is a slide show format based entirely on XHTML, CSS, and JavaScript. With one file, you can run a complete slide show and have a printer-friendly version as well. The markup used for the slides is very simple, highly semantic, and completely accessible."
Most excellent. I was looking for something like this!

Categories: soft.dev
Posted by diego on November 10, 2004 at 12:30 PM

some thoughts on metadata

Through a series of random links I ended up a recent post by Ramesh Jain on metadata. He raises a number of issues that have crossed my mind a lot recently, particularly with all the hoopla about podcasting ("how do I search all that audio content?") and makes a number of good points. Quote:

Text is effectively one dimensional – though it is organized on a two-dimensional surface for practical reasons. Currently, most meta data is also inserted using textual approaches. To denote the semantics of a data item, a tag is introduced before it to indicate the start of the semantics and another closing tag is introduced to signal the end. These tags can also have structuring mechanisms to build compound semantics and dictionaries of tags may be compiled to standardize and translate use of tags by different people.

When we try to assign tags to other media, things start getting a bit problematic due to the nature of media and the fact that current methods to assign tags are textual. Suppose that you have an audio stream, may be speech or may be other kind of audio, how do we assign tags in this? Luckily audio is still one dimensional and hence one can insert some kind of tag in a similar way as we do in texts. But this tag will not be textual, this should be audio. We have not yet considered mechanisms to insert audio tags.

[...]

I believe that we can utilize meta data for multimedia data. But the beauty of the multimedia data is that it brings in a strong experiential component that is not captured using abstract tags. So techniques needs to be developed that will create meta data that will do justice to multimedia data.

I agree. However, I'd point out that the problem is not just one of metadata creation, but of metadata access.

Metadata is inevitably thought of as "extra tags" because, first and foremost, our main interface for dealing with information is still textual. We don't have VR navigation systems, and voice-controlled systems rely on Voice-to-Text translation, rather than using voice itself as a mechanism for navigation.

Creating multimedia metadata will be key, but I suspect that this will have limited applicability until multimedia itself can be navigated in "native" (read: non-textual) form. Until both of these elements exist, I think that using text both as metadata (even if it's generated through conversions, or context, like Google Image Search does) and text-based interfaces will remain the rule, rather than the exception.

Categories: soft.dev
Posted by diego on November 6, 2004 at 3:56 PM

the synth look and feel: what Sun should do next

duke.jpgOne of the much-hyped new features in JDK 1.5 (or "Java 5" as we're supposed to call it now) was the new Synth Look and Feel, which is a "skinnable" L&F that allows non-programmers to create new look and feels by editing an XML file. Since creating a look and feel before involved complex acts of witchcraft, this is actually good news for programmers as well.

But.

There's very little documentation available. The most referenced article on Synth is this one by SwingMaster Scott Violet, which is a good intro but doesn't go into much detail. There's a mini-intro over at JDC. There's a more recent article by John Zukowski over at IBM DeveloperWorks which also covers the new Ocean L&F (which replaces the absolutely-positively-obsolete Metal L&F). Then there's the API docs for Synth and the Synth descriptor file format. And... that's about it, as far as I can tell. All the examples stop at the point of showing a single component, usually a JTextField or JButton.

But, let's assume that documentation will slowly emerge. There is something that Sun should do as quickly as possible (and that in fact it should have done for this release), which is to use Synth for its own L&Fs. What better chance to show off Synth than to rewrite the Metal L&F in it? (I am fairly sure that this hasn't happened yet, since the way to load the Metal L&F remains the same, and all the Metal L&F classes remain under its javax.swing.plaf locations in the JDK 1.5 distribution).

In fact, while we're at it, why not write all the look and feels with Synth, including Windows, which would make it much easier to correct the inevitable problems with it that appear after every release (and because of which something like winlaf exists)?

This is also known in the vernacular as "eating your own dog food". :)

Re-writing Metal in Synth would also be a perfect use-case that would serve both as a testing platform and example for others. As it stands, it's hard to know if this wasn't done because of performance limitations, limitations in Synth, time-constraints, or what.

So I'd like to see Sun clearly spell out the reasons why Synth wasn't used for Metal, and where they are taking it next. I, for one, am not thrilled about the idea of yet another look and feel that will remain dead in the water (like Metal did all these years), when there are so many other important things that Sun could be improving in the JRE (platform integration, anyone?).

If all L&Fs will eventually be Synth-etized, that would simplify usage and fixes of L&Fs for all developers (and maintenance on Sun's side), and prove that Synth is the way of the future.

PS: it would also be a good idea to add built-in support for the notion of L&F hierarchies to Synth files (Currently all the commands must exist in a single file; you could create a single stream of XML descriptor out of multiple Synth files, but who's gonna do that?). Having to do copy+paste for everything and then changing two or three lines in a file because all you want is a different image somewhere doesn't sound like good practice to me.

Categories: soft.dev
Posted by diego on November 6, 2004 at 3:14 PM

eclipse visual editor

Turning back to software for a moment, the find of the day is the Eclipse Visual Editor, useful for quick prototyping and trying out ideas both in Swing and SWT, and easily installed through Eclipse Software Updates. Here's a good article (if a bit old) describing its basics. Add that to the list of things I didn't know about.

Categories: soft.dev
Posted by diego on October 30, 2004 at 2:07 PM

removing images from google's image index

Something that has increased load on my blog (and therefore server) beyond its already-high levels is the sudden appearance of images from my weblog as the first hit on many queries in Google Image search (example: "the lord of the rings").

This leads to people linking to the image from profiles in web boards or homepages, without hosting the image themselves. I ended up removing external linking for images, but the hits still arrive since people tend to copy the link without verifying that it won't work for them.

So that's a pain, and tempted as I am to replace the images with ads (heh) or something (such as an H2G2 'Don't Panic' image), I think a better solution is to remove them from the Google Image index altogether, which also means Googlebot won't be reindexing them all the time.

So, how to do it? The first search provided the answer. Done. Now to wait until the next google bot pass and google dance, and see if it works.

Categories: soft.dev
Posted by diego on October 20, 2004 at 10:12 AM

atomflow-templates and atomflow-spider

Before I start, there's been a good number of atomflow-related entries in the last day. To mention a few: Matt has explained further many of his ideas, as has Ben. Michael has more thoughts on it as well as links to other related tools. Matthew and Frank also added to the conversation, as did Danny and Grant.

Okay, back to the actual topic of this post.

Another hourlong session of hacking and there are two new tools in the atomflow-package (download): atomflow-templates and atomflow-spider.


atomflow-spider

atomflow-spider is a simple spidering program that outputs the contents downloaded from a URL to the standard output. There are a number of other programs that do this already (wget and curl being the most prominent) but a similar tool is included with atomflow for completeness, particularly for platforms that don't have wget (e.g., Windows installs without cygwin or similar). Plus, it's good practice (for me) to keep thinking along the lines of simple, loosely coupled components that do one thing well.

The spider's commandline parameters are as follows:

java -jar atomflow-spider.jar -url <URL> [-prefsFile <PATH_TO_PREFS_FILE>]

The -prefsFile parameter is optional. When used, the preferences file stores ETag and Last-Modified information on the URL, to minimize downloads when the content hasn't changed (useful for RSS feeds---I am not sure if other command-line tools support this, but I don't think it's all that common).

Additionally, the spider supports downloading GZIP and Deflate compressed content to speed up downloads.


atomflow-templates

atomflow-templates is the beginning of a templating system that can be used to transform content in (and eventually out) of atomflow, through pipes. This version supports only RSS to Atom conversion (basically all RSS formats are supported). I think this is pretty important as a basic tool in the package, since there's lots of content out there in RSS format.

atomflow-templates reads from standard input and writes to standard output. Currently it is run as follows:

java -jar atomflow-templates.jar -input rss -output atom

atomflow-templates can be, for example, connected with atomflow-spider and then to the storage core through a CRON job to monitor and store certain RSS feeds, as follows:

java -jar atomflow-spider.jar -url <URL>
| java -jar atomflow-templates.jar -input rss -output atom
| java -jar atomflow.jar -add -storeLocation <STORE_DIRECTORY> -input stdio -type feed

So that's it for tonight--between coding at work and then this, I'm all coded-out for the day :).

Categories: soft.dev
Posted by diego on August 24, 2004 at 11:14 PM

atomflow

I'll probably continue babbling about EuroFoo-related ideas for a week (or a month), but this is a good one to start with.

First there was a conversation with Matt and Ben on Friday night regarding syndication and Atom. Matt was describing something he had written about a few days ago: how he'd like to have a sort of "atom storage/query core" that would allow you do a) add, remove and update entries and then b) query those entries (postIDs, dates, etc, to which I immediately added keyword-based queries in my head).

Matt had two points.

One, that by using Atom as input format, you could simplify entry into this black-box system and use it, for example, on the receiving end of a UNIX pipe. Content on the source could be either straight Atom or come in some other form that would require transforming it into Atom, but that'd be easy to do, since transforming XML is pretty easy these days.

Two, that by using Atom as the output format you'd have the same flexibility. To generate a feed if you wanted, or transform it into something else, say, a weblog.

(This is, btw, my own reprocessing of what was said, Matt's entry, to which I linked above, is a more thorough description of these ideas).

So, for example, you'd have a command line that would look like this for storage (starting from a set of entries in text format):

cat entries.txt | transform-to-atom | store

and then you'd be able to do something like
retrieve --date-range 2004-04-08 2004-04-09 | transform-to-html
Now, here's what's interesting. I have of course been using pipes for years. And yet the power and simplicity of this approach had simply not occurred to me at all. I have been so focused on end-user products for so long that my thoughts naturally move to complex uber-systems that do everything in an integrated way. But that is overkill in this case.

I kept thinking that a system like that shouldn't be too hard to do.

So far so good.

Then on Saturday Ben gave a talk on "Better living through RSS" where he described some of the ways in which he uses feeds to create personal information flows that he can then read in a newsreader, and other ways in which content can be syndicated to be reprocessed and reused.

This fit perfectly with what Matt had been talking about on Friday. During the talk and after we talked a bit more about it, and by early afternoon I simply couldn't wait. So I sat down for an hour or so and coded it. By Saturday night it was pretty much ready, but by then there there were other things going on so I let it be. I nibbled at it yesterday and today, improving the command line parameters to make it more usable and now it's good enough to release.

More importantly, I settled on a name: atomflow. :)

But of course, what would be the point of releasing something without a good use example?

So after another a bit of thinking I realized there was something that I thought would be a good example for atomflow, for, well, educational purposes.

News.com provides feeds, but they don't have the whole stories. Sometimes, after some time has passed the story becomes unavailable (I'm not sure when that happens, but it has to me). Other times you want to look for a particular story, based on keywords, but within a certain timeframe, to avoid getting dozens of irrelevant results from more recent news items.

So I built a scraper for News.com that takes their stories on the frontpage, and outputs them to stdout as Atom entries. Then I pipe the result into atomflow, which allows me to query the result anyway I need, and, more interestingly, subsequent pipes to atomflow calls that can narrow down content when the query parameters are not enough.

So, without further ado, here's the first release: atomflow.zip.

There's a README.txt in the ZIP file that explains how to use it, but here's an example:

To add items:

java -jar newscraper.jar | java -jar atomflow.jar -add -storeLocation /tmp/atomflowstore -input stdio -type feed
The previous command, run at intervals (say, every twelve hours) will "inject" the latest stories into the local database. Then, at intervals (or at your leisure! :)) the database can be queried, for example, by doing:
java -jar atomflow.jar -storeLocation /tmp/atomflowstore -query "latest 3"
which will return an Atom feed with the latest 3 items.

The following are the query parameters currently supported:

<field-id> is an atom date field, eg "created"
--query "range <tag> 2004-04-20 2004-04-21" //date-range
--query "day <tag> 2004-04-20" //day
--query "week <tag> 2004-04-20" //week (starting on date)
--query "month <tag> 2004-04" //month (starting on date)

looks in summary, title and content
--query "keywords <keyw1 [keyw2 ... keywn]> [n]"

tag here must be one of [content|content/type|author/name|author/email|title|summary]
--query "keywords-tag <tag> <keyw1 [keyw2 ... keywn]> [n]"
--query "id <id>"
--query "latest <n>"

additionally, you can add a --sort parameter as
--sort <created|issued|modified> [false|true]

So that's it for now. Obviously there's more information on how the apps work, their options, and certain sticky issues that might not be so easy to solve cleanly (e.g., assigning IDs when entries don't have them, currently atomflow creates an ID itself but in most cases you'll want to specify a certain format). Also, the "scraping" of News.com is done as a quick-and-dirty example. In a few cases the parsing will fail and it will simply ignore it and move on (the sources tell the full story of how ugly that parsing is!).

Anyway, comments and questions most welcome. I hope this can be useful, it was certainly a blast to do it. :)

Update (24/08): The first rev of two new atomflow-tools, atomflow-spider and atomflow-templates released.

PS: I still have the comment script disabled, pending a solution (at least partial) to comment spam. In the meatime, please send me email (address is on my about page).

PS 2: atomflow uses lucene for indexing and kxml2 for XML parsing. (I repackaged the binaries into a single JAR to make it easier to run it.)

Categories: soft.dev
Posted by diego on August 23, 2004 at 10:55 PM

conditional HTTP GETs and compressed streams in Java

Just posted to my O'Reilly Weblog: Optimizing HTTP downloads in Java through conditional GET and compressed streams.

PS: kind of a longish title isn't it? :)

Categories: soft.dev
Posted by diego on July 21, 2004 at 11:37 AM

and now it's StAX's turn

It seems I need some bug spray or something...

I might be wrong (corrections and comments most welcome!) but I think I've found a bug in StAX 1.0.

The bug is as follows: when parsing an element of the form:

<id>&lt;code01&gt;</id>
Which should return
<code01>
when calling getElementText() (or when parsing based on CHARACTER event types) StAX actually returns:
<code01
To prove it, I wrote this small test program that uses both StAX and kXML 2 (which implements the Common XML Pull Parsing API) to parse the same XML document (included in the program as a String, and read through a StringReader).

This bug is a deal-breaker for my use of StAX, and what's much worse is that I have no way of looking at the code to fix it (and yes, I've tried parsing after that element to see if there's more text, but it seems that StAX is just making the "&gt;" at the end disappear). Yes, StAX was supposed to be hosted at codehaus now, but when I go to the site there's nothing there in the way of sources, the JSRs don't include sources (they reference private BEA packages) and there is no indication of when or where this might change.

So I guess I'll have to switch everything to one of the other parsers, just as kXML. Oh, well.

Categories: soft.dev
Posted by diego on July 12, 2004 at 6:48 PM

a JList "feature"

Yesterday I was doing performance testing on pro when I hit upon something strange: deleting an email from the mail view was taking too long.

What was even more strange was that this only happened when using the Delete key, and not when using the toolbar icon (strange since essentially the process is the same in both cases). This pointed so something with the keys--but what?

After a bit of debugging I discovered that the slowdown was happening on the repaint thread, which made it more complicated to debug since it happened outside of my control, so to speak. More testing.

Backtracking for a second: JLists can be used in "dynamic" mode: when you set the appropriate parameters the list will only load the items that are visible in the viewport, which is critical when there's access to disk-based data involved. So generally any kind of list activity implies only a few hits on the DB, say 10-20, which happens in a few milliseconds.

So I eventually placed a print statement on the getElementAt call of the ListModel and discovered the problem: when using the Delete key to delete an item, the entire contents of the ListModel where scanned, twice. On a typical email view this meant loading from the DB thousands of objects. Caching was kicking in, of course, but memory usage went through the roof. That, and the time it was taking was not acceptable. Not good.

But why was this happening? I checked and double checked the code and I couldn't find what could possibly be triggering a full reload of the list (twice!). Finally, I started checking the call traces on the getElementAt and I discovered that there's an inner class called KeyHandler inside BasicListUI (that handles the underlying UI for JList) that was doing something strange in its keyPressed method.

So what was it doing? The javadocs for that method say the following:

Moves the keyboard focus to the first element whose first letter matches the alphanumeric key pressed by the user. Subsequent same key presses move the keyboard focus to the next object that starts with the same letter.
When I read this, all I could think of was: Oh. My. God.

The problem, of course is that to determine whether "the next object starts with the same letter" or not it needs to obtain the String for that cell. This means getting the String if the contents are a String, or doing a toString() on the object.

Skipping for a moment on why on earth you'd want complex behavior like this pre-built... what if you're using the JList for an arbitrary component that doesn't translate into a String? What if the toString() is meaningless? Then the listener iterates through the entire list and, of course, fails to lock on to what it was looking for. And when the list is in the process of changing, it does it twice.

Oh yeah, it's a great "feature."

Even worse, the keyPressed call doesn't check for actual alphanumeric keys being pressed. Any key that is not a function key, cursors, or not registered through KeyStrokes (such as Page Up) goes through this loop.

Even if you're not hitting this problem head on as I was, it's strange to think that people that need this behavior would rely on a completely generic implementation that doesn't take into account the underlying storage mechanism.

So how to disable it? Since this "really helpful" listener is registered automatically on any list created, the only way I've found of disabling it is removing it "by hand" right after the list is created, as follows:


JList list = new JList();
KeyListener[] keyListeners = list.getKeyListeners();

for (int i=0; i list.removeKeyListener(keyListeners[i]);
}
By the way, this is the only listener that is pre-registered (I checked) so doing the loop actually just removes the listener in question. It's ugly, but it works.

This leaves me wondering what other "features" are lurking in there. It's pretty bad that something like this is set as default behavior with no warning, on the other hand it's good that once I identified the problem I could fix relatively easily even if the solution is less than ideal. In any case, it was an interesting (if at times maddening) trip into the deep core of Swing.

So if you're wondering why key presses are slowing down your JList implementation, it's quite possible this is the reason why.

Categories: soft.dev
Posted by diego on July 11, 2004 at 6:01 PM

why we dropped java web start

On a question in the clevercactus forums, John was asking why we stopped using Java Web Start to distribute share (and in fact other applications). We started using JWS late last year to simplify deployment and updates. Making sure that the updates in particular didn't require a reinstall was critical for us, since we usually spin out new version fairly quickly during betas, so JWS was good in that sense. As soon as we released the "internal" beta of share, it became obvious that JWS was a huge problem for most users. We want share to be easy to use, and that includes easy to install (and, yes, uninstall if necessary) and JWS was getting in the way in more ways than one. Here's why.

Problem one was installation: we had to detect whether people had JWS installed or not, and the procedure for detecting this is quite simply a joke. You need to code for multiple browsers and in some cases it isn't guaranteed to work.

Now, if you didn't have JWS (and skipping over the problem of detection) you had to install the virtual machine yourself, usually by being redirected to the java.com site. The site, while no doubt a good showcase for the Java brand, is terribly confusing to a user that only wants to install share, and not Java, and most have no idea of what Java is, and, more importantly, they shouldn't care.

Now let's skip over the problem of java.com being utterly confusing (you'll note we're going to be doing a lot of "skipping over" in this discussion) and say we hosted the JVM ourselves for download to simplify the process, users still had to install a separate piece of software, and one that installs all sorts of icons in your desktop and programs folder, which again terribly confuse non-technical users.

So let's skip over all of that (see what I meant before?), and say that the detection does work, and JWS is installed, the user is often presented with a dialog asking to run a file of type "JNLP" which makes little sense. Now say that you ignore that and click "OK". The app runs. You are presented with a horrible warning dialog with an unfamiliar look and feel (the Metal L&F, why Sun doesn't use the native L&F for this is beyond me) that for someone who's never seen one sounds like you are about to give permission to something to do all sorts of evil things on your PC, create havoc and possibly come in to your house at night while you're sleeping and steal all your furniture.

The point is not that the security warning is wrong, since it is accurate, but users are put off by it. They already know that they are downloading an application, and they do so everyday for other things, and native applications have as much ability (in theory) to do Bad Things as a JWS app with full privileges. The warning is confusing because it seems to be something extra that they're giving permission for. (Incidentally, I don't fault JWS on this point, users should be warned, the problem is that the warning is completely different from other warnings they might see when downloading applications as they usually do).

Much worse than this was when users ran into the invalid Java certificate problem, which didn't allow them to install the application at all even if they wanted to.

But let's say that they accept the certificate (yes, the skipping over again). The app runs. Bing! Window locked. Why? Because JWS is asking you to create shortcuts to the app. It was very easy to get confused because the dialog for this was modal, and sometimes it would end up behind the splash screen or the app, which left most users confused. (Yes, this bug has probably been fixed in the latest JREs, but how many broken JREs are out there?).

All of this, and I haven't even mentioned the problems that occur when you, say, switch download locations for the JARs, for example, if you need to deploy on a server with more bandwidth. Or the problems that come from detecting that JWS exists but that it's an incompatible version (and you can't tell). Or the problems that exist when JWS has cached the old JNLP file and doesn't want to let go. Or the fact that you've only got limited resources and it's impossible to test against all the JREs that are out there. Or... well, you get the idea.

As a development environment, JWS has significant shortcomings as well, since it completely insulates you from the native platform, which is good, but it gives you almost no way to access the native environment, which is a disaster. Simple platform-integration things, like launch-on-startup, or right-click integration, become nearly impossible. Btw, this isn't getting better in the next release, as Erik noted recently. Aside from cosmetic changes and a few new features (like the ability to make changes to the launcher UI) Java 5.0 is pretty much the same in terms of JWS and other platform integration features.

I think that JWS has its place in tightly controlled environments, for example, to quickly deploy point applications within a corporation, where the target machines are well known (in terms of software installed), the infrastructure is small, etc. In those situations, JWS is an excellent way of quickly an easily deploying your app to users and not having to deal with creating installers and such.

But for end-user applications that have to be easy to use and install and widely deployed, JWS, in my experience, doesn't quite cut it---and even though we put a lot of effort into making it work, that's why we had to stop using it.

Categories: soft.dev
Posted by diego on July 9, 2004 at 2:26 PM

on open source java

A couple of weeks ago I was invited to write on the Weblogs section of the O'Reilly Network -- and here's my first post: Open Source Java: No magic pixie dust.

Categories: soft.dev
Posted by diego on June 29, 2004 at 7:27 AM

eclipse 3.0

I am now downloading Eclipse 3.0 final (released yesterday, apparently) for both Windows and Mac OS X. I had the luck of finding a mirror that is giving speeds of 100 KB/sec, which is excellent (the main download sites all maxed out at around 5 KB/sec, at least for me). I also found a BitTorrent tracker list, but it didn't include 3.0 -- only up to RC3, and then again it was only for Linux and Windows.

Anyway, this comes in handy as I am finishing the setup in a new machine--I had installed RC3 but now I'll just move over immediately and finish making the changes to the configuration (one of the main problems I have with Eclipse is how difficult it is to move over from one version to the next--settings have to be changed, reset, exported and imported in multiple places--maybe I just don't know what to do exactly, I don't know). Aside from that, and various quirks notwhistanding, I've been quite happy with Eclipse as an environment, its integration with CVS and Ant, etc. The release of 3.0 final is a major step forward.

Good stuff.

Update: Don has a number of interesting comments on the 3.0 release, including wondering why they released ahead of schedule, and mentioning version migration problems (which is sort of a relief to read -- now I know I'm not the only one!)

Categories: soft.dev
Posted by diego on June 26, 2004 at 1:31 PM

StAX (aka JSR 173) @ codehaus

One more, Don alerts me (ok, maybe not just "me" but no one was next to me staring at the monitor when I saw his entry! Lame excuse for egocentrism, I know, what what are you gonna do...) to something that I wasn't aware of: the open-sourcing of BEA's StAX (JSR 173) at codehaus. (rather, I knew this was supposed to happen, but last time I checked I couldn't find signs of it, and I ended up getting the 1.0 StAX RI from JavaSoft). Here's the link to the main StAX site there. Nice. I've been using StAX for a while now and it's become an indispensable component in our toolkit.

ps: to anyone that might be thinking that I enjoy writing subjects full of acronyms that are understood only by a small band of geeks on Earth, as in the case of this entry, I'll confess: yes, yes I do. Very much so. :))

Categories: soft.dev
Posted by diego on June 21, 2004 at 10:28 AM

Berkeley DB -- in Java

[via Don] The new version of Berkeley DB in pure Java is out. I'm definitely behind the curve on this--I didn't know there was an old version, much less a new one! Will have to check it out.

Unrelated, Don also notes a disturbing new development with phising techniques that uses yet another of IE's "features" (why would anyone want to place an overlay anywhere on the screen is beyond me-- yes, I'm sure there are applications but I'm also sure that there are other solutions for whatever problem the applications are solving). Hopefully other browsers don't allow this sort of thing!

Categories: soft.dev
Posted by diego on June 17, 2004 at 9:39 AM

win32 and the web

Joel: How Microsoft lost the API War. Must read (Comments later).

Categories: soft.dev
Posted by diego on June 17, 2004 at 8:59 AM

((no single == many) && (no single != no)) point(s) of failure

So today an outage of some sort at Akamai's distributed DNS service brought down access to some major sites from various parts of the world, including Google, Yahoo, and Microsoft. Pretty quickly, as evidenced by this slashdot thread the questions over how the days of "no single point of failure" are over started to pop up.

The myth of the Internet being so resilient that it would never fail is an interesting one. More accurately, its a set of layers of myths, that go back to the often-repeated idea that "the Internet was designed to survive a nuclear attack".

One of the crucial ideas of ARPAnet was that it would be packet-switched, rather than circuit switched. With packet-based communications, clearly the packets will attempt to reach their destination regardless of the circuit used, and there is no question that packet-based networks are much more resilient to failures than circuit-switched networks.

Let me be clear: part of my argument is semantic. That is, the fact that packet-switching means "no single point of failure" doesn't mean that there are no points of failure at all. The problem, however, is that we end up ignoring the word "point" and reading "no failure". The idea of "no single point of failure" eventually ends up implying "failure proof". Which is why we are so surprised when a systemic failure does occur.

ARPAnet, however, never qualified as a failure-proof network, and the points of failure were few enough that "no single point of failure" had little meaning. In the early days you could literally take out most of the Internet by cutting a bunch of cables in certain areas of Boston and California. With time, yes, more lines of communications where available, reducing the probability of failure even further, but even today the amount of trans-continental and intercontinental bandwidth is certainly not infinite.

But, ok. Let's concede the point that a systemic failure at the packet-switching level is of very low probability in today's Internet. What about the services?

Because it is the services that create today's Internet. And many of the services that the Internet depends on are centralized.

Take DNS. Originally, name resolution ocurred by matching names against the contents of the local hosts table (stored in /etc/hosts) and when a new host was added a new hosts table was propagated across the participating hosts. Eventually, this process became impossible, since hosts were being added too fast. This led, in the 80s, to the development of DNS, which eventually became the standard.

DNS, however, is a highly centralized system, and it was designed for a network a couple of orders of magnitude smaller than what we have today. The fact that it does work today is more a credit to sheer engineering prowess in implementation, rather than design, although the design was clearly excellent for its time.

Even today, if the root Internet clusters (those that serve the root domains) where to be seriously compromised), the Internet would last about a week until most of the cached DNS mappings expired. And then we'd all be back to typing IP numbers.

And it doesn't stop with DNS. What if Yahoo! was to go offline? What if Google vanished for a week? What if someone devised a worm that flooded, say, 70% of the world's email servers?

For users, the Internet has now become its applications and services rather than its protocols. And the applications and services leave a lot to be desired.

What's missing is a shift at the service and application level in all fields, routing, security, and so on (Spam is just the tip of the iceberg). Something that brings the higher levels of networking in line with the ideas of packet switching.

So, today, Akamai sneezes and the rest of the world gets a cold. Tomorrow, it will be someone else. This will keep happening until the high-level infrastructure we use everyday becomes decentralized itself. Only then the probability of systemic failure will be low enough. Low enough, mind you, not non-existent: Biomimetism and self-organization, after all, don't guarantee eternity. :)

Categories: soft.dev, technology
Posted by diego on June 15, 2004 at 7:41 PM

a couple of java links

On a (less than regular) visit to java.sun.com, I notice that a couple of weeks ago they released Beta 2 of JDK 1.5 a.k.a Tiger (I completely missed this--you can tell I've been busy no? :)). I assume that their plan is to announce the final version at JavaOne (which starts on June 28 this year). Then reading through the Tiger docs I notice this link which leads to a "New To Java Center" that was (probably recently) introduced directly into the documentation. Interesting. Another place to point people to.

Continuing with my random-Java-related navigation, I found this link on the history of Java, and this page which partially documents the evolution of the java.sun.com site. Both pages have pointers to yet other interesting "historical" destinations, including this Java Technology: The Early Years" article.

Ah, the good old days... :)

Categories: soft.dev
Posted by diego on June 12, 2004 at 3:51 PM

JDIC: JDesktop Integration Components

Sun has released the first version of JDIC (here's a link to the documentation). These APIs provide desktop-environment integration for Java apps, such as registration of components for certain filetypes or hyperlinks, a simple component that embeds a native web browser, and a few other things.

Sun is very late to this party, particularly considering that the Eclipse framework has had similar functionality for months. But, "as they say" :), better late than never. Let's hope that Sun isn't just paying lip-service to desktop integration with this and that they are starting to put some full-time resources behind it (ie., that they use the LGPL license as a good way to spread technology rather than as a way to off-load development to others). Sun still has a lot to do in this area.

Categories: soft.dev
Posted by diego on June 8, 2004 at 11:43 AM

two steps forward, one step back

Reading this News.com article ("Longhorn goes to pieces"), any number of paragraphs stand out, for example:

Advanced search features that Gates has termed the "Holy Grail" of Longhorn, the next major version of Windows, won't be fully in place until 2009, Bob Muglia, the senior vice president in charge of Windows server development, told CNET News.com.
And I immediately remembered my post from last year on Cairo and WinFS, where I said:
Seriously now: To anyone that might say that the technology could not be built... please. Microsoft is one of the top engineering organizations in the world. NeXT could do it. Why not Microsoft? The only reasonable explanation, as far as I can see, would be a realignment of priorities and the consequent starving of resources that go along with it (which is what killed both OpenDoc and Taligent, for example). Which is all well and good.

But then the question is: could it happen again?

Probably not--then again, never say never.

2009? Wow. I keep wondering if this is because they start in a direction and realign the schedule, or simply because there are parts of the vision that are spoken out loud but never properly defined.

But another possibility that comes to mind is that Microsoft's intent of Windows vertical integration that it makes it harder to build these components. Wouldn't it be an idea to work on this thing as a completely separate component, and then let the features percolate upwards into the UI over time? (Of course this idea must have been considered, but put another way, is there any technical reason why it shouldn't be done like this? I can't see one.)

I say this because monolithic design is a tendency difficult to avoid, and this has been very much on my mind lately. You start pulling pieces together (for different reasons) at compile-time, and before you realize it you en up with a set of highly interdependent binaries. (Regardless or language or platform, all modern platforms support dynamic binding to different degrees). And it's deployment where things start to slip, because it's there that the potentially different versions of a component have to be reconciled. In other words, deployment is one of the largest half-cracked nuts in software development.

What I realized recently is that deployment is not a separate problem within an application's life-cycle, it is integral to the UI of the app. Deployment should be properly defined from day one, and the goals set for it have to be pursued in parallel with the rest. An installable app or system should be the target from day one. Because, from the moment a user sees your screen, it's your UI. What they have to do to keep up to date, to install or uninstall components or the whole system, and so on. Java for example has a problem in this sense, but no platform is perfect.

Anyway, just a thought for the day. :)

Categories: soft.dev
Posted by diego on May 15, 2004 at 11:30 AM

an internet of ends

Yesterday I gave a talk at Netsoc at TCD titled "An Internet of Ends". Here's the PDF of the slides. There are many ideas that I think are in there that finally jelled in the last few days, ideas that have been buzzing around my head for quite some time but that I haven't been able to connect or express in a single thread up to this point. I thought it would be a good idea to start expanding on them here, using this post as a kick-off point.

Yes, more later. And as always, questions and comments most welcome!

Categories: science, soft.dev, technology
Posted by diego on April 29, 2004 at 9:21 PM

yet another way in which Sun should get its act together

It's no secret that I appreciate (perhaps more than many) the work Sun has done is building a true multi-OS platform with Java, even as I've talked in the past about some of the shortcomings of the platform as an environment in which to deploy end-user client side apps. But what we discovered today was unbelievable.

"Discovered" here is not accurate since it was an issue that had been discussed before--I just wasn't aware of it. The problem is this: Verisign Root Certificates on many of Sun's JVMs were set to expire on Jan 7, 2004.

What this means is that any Java webstart application or Applet signed with a perfectly valid Verisign certificate might not run at all.

Let me say that again: a perfectly valid signed application might not run at all on any of the affected versions of the JDK. Not only that, they are not even sure of when this happens (note all the conditional statements). Not only that, when it fails, it fails spectacularly. The user sees a horrible warning from which they can't do anything. There is absolutely no way to run the application in that JDK, aside from Sun's "fantastic" workaround in which they suggest people should be copying updated root certificates into the certs directory.

It wasn't enough that the JWS installation process was so confusing to most end users, and that it has a few incredibly annoying bugs. No. The certificate had to be invalid. On some systems. Maybe. And when it does, there's no easy way to fix it for an end user that could not care less about root certificates (and why should we force users to even have to know what a root certificate is?).

I can't overstate how much something like this incenses me. Instead of spending so much time of creating a GTK+ compatible skin (for example), Sun should spend a little bit more time on quality control of the actual basic pieces of the JDK.

Or is that too much to ask for?

Categories: soft.dev
Posted by diego on March 30, 2004 at 12:45 AM

testing... testing...

Since we're hiring, this topic has been on my mind lately. Can you really know the person... before you know the person? Microsoft for example is famous for putting out these little puzzles that you have to solve in your interview. Other companies take "tests" to prove your knowledge. So let's take a multiple-choice test as an example... I was thinking what one that I wrote would be like and I came up with this:

I like XML because...

  1. it lets me validate what people tell me.
  2. properly closed tags feel all warm and fuzzy around my neck.
  3. where would flamewars be without it?
  4. Big-endian down, Little-endian all the way!
  5. The W3C should burn in hell.
  6. X... M... ?

What comes to mind when you hear 'XSLT'?

  1. To validate is glorious, but to transform is divine.
  2. Velocity! Velocity! Velocity!
  3. you know the movie 'Se7en'...?
  4. Is that that new instruction on IBM's Power3 processor that I keep hearing about?
  5. The W3C should burn in hell.
  6. My cat's name is Mittens.

What do you know about Enterprise JavaBeans?

  1. EJB rulez! Wooooohooooo! Wooooo! Wooooo!
  2. I have three words for you: nice and slow.
  3. Aside from the fact that they suck?
  4. I prefer tea, thank you.
  5. The W3C should burn in hell.
  6. I ate my crayon.

Your thoughts on Java vs .Net.

  1. Windows will be the end of our species.
  2. What's the difference? Windows is better anyway.
  3. Linux. MacOS, maybe, if you asked nicely.
  4. Anything other than HEX and Assembly is for thin-skinned freaks.
  5. The W3C should burn in hell.
  6. My tummy hurts.

Did you enjoy this test?

  1. Very much so. I found it soothing.
  2. Not really. But the pastries were good.
  3. Where's the door?
  4. Stop with the damn test already and get me to a keyboard!
  5. The W3C should burn in hell.
  6. What test?

Categories: soft.dev
Posted by diego on March 21, 2004 at 7:55 PM

programmers at work

Via Scott, a great article in Salon that touches on some of the subjects of a book he's working on. The article talks about a panel, which included:

Andy Hertzfeld, who wrote much of the original Macintosh operating system and is now chronicling that saga at Folklore.org; Jef Raskin, who created the original concept for the Macintosh; Charles Simonyi, a Xerox PARC veteran and two-decade Microsoft code guru responsible for much of today's Office suite; Dan Bricklin, co-creator of VisiCalc, the pioneering spreadsheet program; virtual-reality pioneer Jaron Lanier; gaming pioneer Scott Kim; and Robert Carr, father of Ashton-Tate's Framework.
The quotes are fascinating (Now if I could only just find a full transcript for the panel discussion... and I just wish Bill Joy would've been there too!). Lanier has some very interesting viewpoints, like his one-half of a manifesto that he wrote in response to Bill Joy's Why The Future Doesn't Need Us (which I mentioned in my ethics and computer science post recently), and when he says that
"Making programming fundamentally better might be the single most important challenge we face -- and the most difficult one." Today's software world is simply too "brittle" -- one tiny error and everything grinds to a halt: "We're constantly teetering on the edge of catastrophe." Nature and biological systems are much more flexible, adaptable and forgiving, and we should look to them for new answers. "The path forward is being biomimetic."
He's definitely right on target. (Making software biomimetic is something that Joy advocates too as far as I can remember). There are other great moments, such as this:
Bricklin sent waves of laughter through the auditorium by reading a passage from Lammers' interview with Bill Gates in which the young Microsoft founder explained that his work on different versions of Microsoft's BASIC compiler was shaped by looking at how other programmers had gone about the same task. Gates went on to say that young programmers don't need computer science degrees: "The best way to prepare is to write programs, and to study great programs that other people have written. In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating systems."

Bricklin finished reading Gates' words and announced, with an impish smile, "This is where Gates and [Richard] Stallman agree!"

LOL! Bill Gates dumpster-diving for operating system code listings!

As principal Skinner would say: Ooooh, mercy.

Categories: soft.dev
Posted by diego on March 21, 2004 at 12:09 PM

erik goes blogging

Erik, Master of the URL, Finder of the Link, has really started blogging now (let's ignore that no one can say what "real blogging" is for the moment...). His weblog now contains more, um, personal meditative entries, and his linkblog is the new location where his daily blast of interesting links is posted. Excellent! Check out this guest column he wrote for the JavaLobby newsletter, or this post on syndication formats to see what we've been missing.

And, his posts are still identified by beats, which is both obscure and cool (or cool because it's obscure? Or obscure because it's cool?) in a retro kind of way which should be deeply appreciated by geeks worldwide. :)

Categories: soft.dev
Posted by diego on March 11, 2004 at 1:35 PM

it probably wasn't google

Ok, after a bit of digging I arrived at the conclusion that the problem I mentioned yesterday, of google's clicktrack ing breaking my logfiles, was probably not google. Rather, it looks like some search engine who is taking google's results for a query, doing their own parsing, and possibly presenting them as their own. As a hint, the referers were wrong or generally empty (something I should have noticed yesterday but didn't), as was the user-agent field.

So it probably wasn't Google. Ok. But this small "incident" presents a number of interesting questions. Not just for Google (how do you stop something like that from happening? My guess is throwing lawyers at them is the only option, at most being careful about monitoring source IPs for requests... but then again if whoever is doing this is smart they could get around that too). But also for end users on both sides. On my side, this is creating a problem that I have to keep an eye on, and the person who is doing the search is looking at something that
looks legitimate but isn't. Hm. The problems of openness.

PS: Note that in yesterday's post I mentioned that webmasterworld thread that clarified that Google was tracking through JavaScript. The beta site (new design) clearly tracks directly through URLs. What is not clear is whether the beta site also has a new form of tracking, or whether the tracking will again be through javascript when the new site is release. We'll have to wait and see.

Categories: soft.dev
Posted by diego on March 8, 2004 at 11:01 PM

another bad consequence of click tracking

In relation to my post yesterday on click tracking, Yahoo and Google, there's another consequence of the practice of link tracking that I just realized affects noticeably the experience of using search engines: the "visited link" problem. When you get a result from a search engine (and are looking for particular information using multiple keyword searches) it's incredibly useful to be able to see at a glance which URLs you have already visited. This depends on the CSS style settings of the page (or lack thereof), and it's something enabled on the client side. When the browser detects that there's a link you have visited it shows it differently. So far so good.

But when search engines do permanent click tracking, they are affecting the URLs that you receive. If you've visited a certain site in the past, or even during the same sequence of searches but through a referal chain or from another search engine, you're out of luck. The link will be shown as if you've never "seen it" before. Any change whatsoever in the way the click tracking is done affects it, since it affects the URL.

This is a problem, IMO, that search engines should be looking at hard, since it affects the usability of their product quite a lot. As it is, for example, the new Google interface (that I mentioned I was trying out) is tracking every single link, I assume during the testing phase (since they've never done such aggressive tracking before). I'm not talking about that in particular since it's "beta" anyway. But when using different search engines this is definitely a problem. I wonder how it could be solved...

Categories: soft.dev
Posted by diego on March 8, 2004 at 7:34 PM

on click tracking, logfiles--and back to google

So I was wondering today what was happening with one of my logfiles, which couldn't be parsed (This after spending ONE HOUR cleaning up a rash of spam comments on this weblog). The parsing was failing even though the log format was fine, and I finally identified the culprit as a line which included something similar to the following baffling text in the referer field:

onmousedown=\"return clk(2,this)\"

This was, needless to say, extremely weird. After some digging I found this thread at webmasterworld. Quote:

I know, Google has been sporadically tracking clicks on result listings through redirects. It looks like click tracking is obviously now a default but instead of rewriting the listings' urls to redirects, the tracking is done through a "hidden" JavaScript function that activates a image request to track the click + position of the listing.

[...]

The listings' urls within the serps look like this:
<a href="http:*//www.webmasterworld.com/forum3/" onmousedown="return clk(2,this)">Google News</a>

By itself, tracking is a teeny bit worrying. But tracking that destroys my ability to parse logfiles without fuss is evil. Evil I tellsya!

Okay, maybe not Evil. But certainly a problem. I've never seen this tracking myself before from Google (see below), not sure if Google is aware of the problem with this method--Hopefully they'll fix it.

Speaking of tracking.

Currently with Google when I see tracking (which happens more and more often these days) I see something of the form:

http://www.google.com/url?sa=U&start=1
&q=http://www.dynamicobjects.com/d2r/archives/002587.html&e=7898

While Yahoo! Search shows the following:

http://rds.yahoo.com/S=2766279/K
=d2r+new+yahoo+search/v=2/TID=F223_45/SID=w/l=WS1/R=1/H=0/*-
http://www.dynamicobjects.com/d2r/archives/002587.html
Yikes! (Sorry for the line breaks, those are some pretty long and ugly URLs!).

Now, since the new Yahoo! Search came online I have never, never seen a link NOT tracked. They are tracking everything. And as the example above shows, Yahoo! tracks not only what you're clicking, but also the query and a number of other things that are unknown. Is that a cookie value there? Are they tying cookie IDs to searches? Their privacy policy didn't clarify this, as it has no reference to tracking that I can see.

I switched to Yahoo! a few weeks ago, and there are a number of other things that I find annoying about their new search. While the results are relevant and it's generally fast, I find the "sponsor results" which appear both at the top and at the bottom something that is pretty intrusive. In only two weeks I've learned to ignore them completely (I never click on them, barely see them) but now they're getting annoying. Sometimes I have to scroll down to get past the "sponsor results" which is ridiculous. Not only that, but the constant tracking slows down the request, if not by much, at least noticeably (something that doesn't happen with Google in my experience).

Finally, a small point: the page is too busy. It's weird that I feel this way now, but that's what it seems. The current Google results page is too busy as well, but less so. Then through Aaron I found a way to activate the new (as yet unreleased) Google look. So I tried it, and I liked it!. I like it better than anything else out there. Fast, and simple (the ads look a little weird without clear borders though, and make them a bit confusing). Let's hope that hack that makes it available doesn't stop working anytime soon.

So I'm switching back to Google, and Google is again my default in the Firefox/Firebird search box. All the reasons are fine, but in the end, it's as simple as this: at the moment, minor problems aside, Google is a better product, period.

Categories: soft.dev
Posted by diego on March 7, 2004 at 3:51 PM

the wonders of SSH

I've been using SSH more and more (for a number of reasons that include a restrictive internet access setup at the office until everything is set correctly). Tunneling X Window over it is common, but sometimes I forget that it's useful for almost anything you need to do that involves sockets. For example, today I set it up to tunnel SQL access, using PuTTY and these instructions. Not too complex, and much more secure. :)

Categories: soft.dev
Posted by diego on March 7, 2004 at 2:11 AM

top twelve tips...

...for running a beta program from Joel. Good reading. :)

Categories: soft.dev
Posted by diego on March 3, 2004 at 9:02 PM

feedvalidator bug or feature?

I was just alerted that checking this feed against the feedvalidator gives an "This feed is invalid" message. You can try it yourself by clicking here. Now, having checked the feed against the validator when I did the change to valid RSS 2.0, I was surprised. The error in particular says "description should not contain onclick tag". Essentially it's complaining about an onclick handler on one of the HREFs, which is used for a popup window of an image.

Reading the description of the error I get this explanation:

Some RSS elements are allowed to contain HTML. However, some HTML tags, like script, are potentially dangerous and could cause unwanted side effects in browser-based news aggregators. In a perfect world, these dangerous tags would be stripped out on the client side, but it's not a perfect world, so you should make sure to strip them out yourself.
I was sure there was nothing like this on the spec itself, and re-reading it (just in case) proved it.

The disconnect here is that this seems to me to be a guideline, something more suitable for a warning, than something that would mark the feed as "not valid", when it clearly is valid as far as the spec is concerned...

Interestingly enough, the RSS Validator (at rss.scripting.com) does (correctly) validate the feed, you can click here to see the result. I thought they were based on the same code, but clearly there are some differences.

Update: Dave replies, clarifying that they haven't made significant changes to the validator since they took a snapshot of the sources a few weeks ago, and notes that he's seen other instances of this recently.

I had a hunch. Curiosity overtook me. :) I downloaded the sources and poked around it a bit. Unless I'm reading it completely wrong, my suspicion was confirmed: the RSS and Atom validators share most of the code (which is just good design sense, since they are doing very similar things). However, this also means that errors that are flagged for Atom are also in some cases being flagged for RSS. For example, the error that I described above is detected by the function check4evil in validator.py, which is called by htmlEater in the same file, which itself is called in item.py (which parses RSS items). In his reply Dave describes a different case though, of the validator rejecting duplicates which is being done (as far as I can see) only for RSS (in the call do_pubDate(self) in item.py).

I definitely think that these two things are more "should fix" type of guidelines than problems that define non-validity, according to the RSS 2 spec.

On a related note, the version history for some of the validator sources note several changes in the last few weeks. Lots of activity there.

Another update: Sam (through email) encourages me or others interested to post to the list at sourceforge with suggestions. He also points to a message on that list from Phil Ringnalda in which he comments on what's discussed in this post (warnings vs. errors) and finally, as reference, he sends a link to the original bug tracker item which is related to this issue (and includes dates of posting, resolution, etc.). Regarding warnings, mostly I agree with Phil, but hopefully I'll have time to add my 2c to the list during the next couple of days, even if it's something small (lots of work which has nothing to do with this, and that has priority of course...).

Plus: Generic question: what happens when a validator of anything is truly open source? This applies to RSS, Atom, or whatever format that requires validators. Suppose that in a couple of years the original designers of the validator have moved on. New developers have taken over. After some time, they decide that X is good (X is something that most people agree is goot, but is definitely not on the spec). So they update the validator to reflect those views. Meanwhile, the spec hasn't changed. It seems to me that in this case either the validator loses credibitily, or the spec does. Neither option is good (but the spec losing credibility is worse IMO). I wonder what the experience of other formats has been in this regard, but I do think that having many validators is good, and that would be an automatic safeguard against one validator suddenly redefining the spec by itself, or taking it on a different direction. XML is probably a good example, there are many XML validators around... is that why XML has remained stable?

Categories: soft.dev
Posted by diego on February 29, 2004 at 2:01 AM

the lack of an ethics conversation in computer science

In April 2000 Bill Joy published in Wired an excellent article titled Why the future doesn't need us. In the article he was saying that, for once, maybe we should stop for a moment and think, because the technologies that are emerging now (molecular nanotechnology, genetic engineering, etc) present both a promise and a distinctive threat to the human species: things like near immortality on one hand, and complete destruction on the other. I'd like to quote at relative length a few paragraphs with an eye on that I want to discuss, so bear with me a little:

["Unabomber" Theodore] Kaczynski's dystopian vision describes unintended consequences, a well-known problem with the design and use of technology, and one that is clearly related to Murphy's law - "Anything that can go wrong, will." (Actually, this is Finagle's law, which in itself shows that Finagle was right.) Our overuse of antibiotics has led to what may be the biggest such problem so far: the emergence of antibiotic-resistant and much more dangerous bacteria. Similar things happened when attempts to eliminate malarial mosquitoes using DDT caused them to acquire DDT resistance; malarial parasites likewise acquired multi-drug-resistant genes.

The cause of many such surprises seems clear: The systems involved are complex, involving interaction among and feedback between many parts. Any changes to such a system will cascade in ways that are difficult to predict; this is especially true when human actions are involved.

[...]

What was different in the 20th century? Certainly, the technologies underlying the weapons of mass destruction (WMD) - nuclear, biological, and chemical (NBC) - were powerful, and the weapons an enormous threat. But building nuclear weapons required, at least for a time, access to both rare - indeed, effectively unavailable - raw materials and highly protected information; biological and chemical weapons programs also tended to require large-scale activities.

The 21st-century technologies - genetics, nanotechnology, and robotics (GNR) - are so powerful that they can spawn whole new classes of accidents and abuses. Most dangerously, for the first time, these accidents and abuses are widely within the reach of individuals or small groups. They will not require large facilities or rare raw materials. Knowledge alone will enable the use of them.

Thus we have the possibility not just of weapons of mass destruction but of knowledge-enabled mass destruction (KMD), this destructiveness hugely amplified by the power of self-replication.

I think it is no exaggeration to say we are on the cusp of the further perfection of extreme evil, an evil whose possibility spreads well beyond that which weapons of mass destruction bequeathed to the nation-states, on to a surprising and terrible empowerment of extreme individuals.

[...]

Nothing about the way I got involved with computers suggested to me that I was going to be facing these kinds of issues.

(My emphasis). What Joy (who I personally consider among some of the greatest people in the history of computing) describes in that last sentence is striking not because of what it implies, but because we don't hear it often enough.

When we hear the word "ethics" together with "computers" we immediately think about issues like copyright, file trading, and the like. While at Drexel as an undergrad I took a "computer ethics" class where indeed the main topics of discussion where copying, copyright law, the "hacker ethos", etc. The class was fantastic, but there was something missing, and it took me a good while to figure out what it was.

What was missing was a discussion of the most fundamental problems of ethics of all when dealing with a certain discipline, particularly one like ours where "yesterday" means an hour ago and last year is barely last month. We try to run faster and faster, trying to "catch up" and "stay ahead of the curve" (and any number of other cliches). But we never, ever ask ourselves: should we do this at all?

In other words: what about the consequences?

Let's take a detour through history. Pull back in time: It is June, 1942. Nuclear weapons, discussed theoretically for some time, are rumored to be under development in Nazi Germany (the rumors started around 1939--but of course, back then most people didn't quite realize the viciousness of the Nazis). The US government, urged by some of the most brilliant scientists in history (including Einstein) started the Manhattan project at Los Alamos to develop its own nuclear weapon, a fission device, or A-Bomb. (Fusion devices --also known as H-bombs-- , that use a fission reaction as the starting point and are orders of magnitude more powerful, would come later, based on the breakthroughs of the Manhattan Project).

But then, after the first successful test at the Trinity Test site in July 16, 1945, something happened. The scientists, which up until that point had been too worried with technological questions that they had forgotten to think about the philosophical ones, realized what they had built. Oppenheimer, the scientific leader of the project, famously said

I remembered the line from the Hindu scripture, the Bhagavad-Gita: Vishnu is trying to persuade the Prince that he should do his duty and to impress him he takes on his multi-armed form and says, "Now I am become Death, the destroyer of worlds."
While Kenneth Bainbridge, in charge of the test, later said at that time that he told Oppenheimer:
"Now we are all sons of bitches."
Following the test, the scientists got together and tried to stop the bomb from ever being used. To which Truman said (I'm paraphrasing):
"What did they think they were building it for? We can't uninvent it."
Which was, of course, quite true.

"All of this sanctimonious preaching is all well and good" (I hear you think) "But what the hell does this have to do with computer science?".

Well. :)

When Bill Joy's piece came out, there was a lot of discussion on the topic. Many reacted viscerally, attacking Joy as a doomsayer, a Cassandra, and so on. Eventually the topic sort of died down. Not much happened. September 11 and then the war in Iraq, surprisingly, did nothing to revive it (contrary to what one might expect). Technology was called upon in aid of the military, spying, anti-terrorism efforts, and so on. The larger question, of whether we should stop to think for a moment before rushing to create things that "we can't uninvent" has been largely set aside. Joy was essentially trying to jump-start the discussion that should have happened before the Mahattan project was started. True, given the Nazi threat, it might have been done anyway. But the more important point to make is that if the Manhattan Project had never started, nuclear weapons might not exist today.

Uh?

After WW2 Europe was in tatters, and Germany in particular was completely destroyed. There were only two powers left, only two that had the resources, the know-how, and the incentive, to create Nuclear Weapons. So if the US had not developed them, it would be reasonable to ask: What about the Soviets?

As it has been documented in books like The Sword and the Shield (based on KGB files), the Soviet Union, while powerful and full of brilliant scientists, could not have brought its own nuclear effort to fruition but for two reasons: 1) The Americans had nuclear weapons, and 2) they stole the most crucial parts of the technology from the Americans. The Soviet Union was well informed, through spies and "conscientious objectors" of the advances in the US nuclear effort. Key elements, such as the spherical implosion device, were copied verbatim. And even so, it took the Soviet Union two four more years (until its first test in August 29, 1949) to duplicate the technology.

Is it obvious then, that, had the Manhattan project never existed, nuclear weapons wouldn't have been developed? Of course not. But it is clear that the nature of the Cold War might have been radically altered (if there was to be a Cold War at all), and at a minimum nuclear weapons wouldn't have existed for several more years.

Now, historical revisionism is not my thing: what happened, happened. But we can learn from it. Had there been a meaningful discussion on nuclear power before the Manhattan Project, even if it had been completed, maybe we would have come up with ways to avert the nuclear arms race that followed. Maybe protective measures that took time, and trial, and error, to work out would have been in place earlier.

Maybe not. But at least it wouldn't have been for lack of trying.

"Fine. But why do you talk about computer science?" Someone might say. "What about, say, bioengineering?". Take cloning, for example, a field similarly ripe with both peril and promise. An ongoing discussion exists, even among lawmakers. Maybe the answer we'll get to at the end will be wrong. Maybe we'll bungle it anyway. But it's a good bet that whatever happens, we'll be walking into it with our eyes wide open. It will be our choice, not an unforseen consequence that is almost forced upon us.

The difference between CS and everything else is that we seem to be blissfully unaware of the consequences of what we're doing. Consider for a second: of all the weapon systems that exist today, of all the increasingly sophisticated missiles and bombs, of all the combat airplanes designed since the early 80's, which would have been possible without computers?

The answer: Zero. Zilch. None.

Airplanes like the B-2 bomber or the F-117, in fact, cannot fly at all without computers. They're too unstable for humans to handle. Reagan's SDSI (aka "Star Wars"), credited by some with bringing about the fall of the Soviet Union, was a perfect example of the influence of computers (unworkable at the time, true, but a perfect example nevertheless).

During the war in Iraq last year, as I watched the (conveniently) sanitized nightscope visuals of bombs falling on Baghdad and other places in Iraq, I couldn't help but think, constantly, of the amount of programs and microchips and PCI buses that were making it possible. Forget about whether the war was right or wrong. What matters is that, for ill or good, it is the technology we built and continue to build every day that enables this capabilities for both defense and destruction.

So what's our share of the responsibility in this? If we are to believe the deafening silence on the matter, absolutely none.

This responsibility appears obvious when something goes wrong (like in this case, or in any of the other occasions when bugs have caused crashes, accidents, or equipment failures), but it is always there.

It could be argued that after the military-industrial complex (as Eisenhower aptly described it) took over, market forces, which are inherently non-ethical (note, non-ethical, not un-ethical), we lost all hope of having any say in this. But is that the truth? Isn't it about people in the end?

And this is relevant today. Take cameras in cell phones. Wow, cool stuff we said. But now that we've got 50 million of the little critters out there, suddenly people are screaming: the vanishing of privacy! aiee!. Well, why didn't we think of it before? How many people were involved at the early stages of this development? A few, as with anything. And how many thought about the consequences? How many tried to anticipate and maybe even somehow circumvent some of the problems we're facing today?

Wanna bet on that number?

Now, to make it absolutely clear: I'm not saying we should all just stow our keyboards away and start farming or something of the sort. I'm all too aware that this sounds too preachy and gloomy, but I put myself squarely with the rest. I am no better, or worse, and I mean that.

All I'm saying is that, when we make a choice to go forward, we should be aware of what we know, and what we don't know. We should have thought about the risks. We should be thinking about ways to minimize them. We should pause for a moment and, in Einstein's terms, perform a small gedankenexperiment: what are the consequences of what I'm doing? Do the benefits outweigh the risks? What would happen if anyone could build this? How hard is it to build? What would others do with it? And so on.

We should be discussing this topic in our universities, for starters. Talking about copyright is useful, but there are larger things at stake, the RIAA's pronouncements notwhistanding.

This is all the more necessary because we're reaching a point were technologies are increasingly dealing with self-replicating systems that are even more difficult to understand, not to mention control (computer viruses, anyone?), as Joy so clearly put it in his article.

We should be having a meaningful, ongoing conversation about what we do and why. Yes, market forces are all well and good, but in the end it comes down to people. And it's people, us, that should be thinking about these issues before we do things, not after.

These are difficult questions, with no clear-cut answers. Sometimes the questions themselves aren't even clear. But we should try, at least.

Because, when there's an oncoming train and you're tied to the tracks, closing your eyes and humming to yourself doesn't really do anything to get you out of there.

Categories: science, soft.dev, technology
Posted by diego on February 23, 2004 at 10:27 PM

Ant within Eclipse: switching JDKs and finding tools.jar

I've been doing quite a lot of work creating new Ant build processes and grokking Eclipse (installing and reinstalling on different machines), and this is a problem that keeps recurring. This morning I cleaned up vast amounts of garbage on my main Windows machine, garbage left over from old J2SDK installs (I had FOUR--when will Sun fix the install problem?) and I reinstalled JDK 1.4.2_03 and then tried running everything again within Eclipse (v3.0 M7). Needless to say, Ant was running fine before, after I had pointed to tools.jar but now that I had changed JDKs it wasn't guaranteed that it would run--and it didn't. While it is possible that this is so well known that people do it without thinking, it certainly isn't clearly documented, and it's a situation that people will probably find regularly doing a clean install of Eclipse and the JDK on a machine, or when upgrading JDKs after the settings have been done long ago--and forgotten. :)

First, the situation. On restart, Eclipse correctly detected the new JRE (clearly from the registry entries created by the JDK/JRE install) to the one the JSDK installs in C:\Program Files\Java\... but it's better to change the pointer to the JRE within the JDK IMO. Even then, Ant doesn't work. The error message you get is for Ant:

[javac] BUILD FAILED: [build file location here]
Unable to find a javac compiler;
com.sun.tools.javac.Main is not on the classpath.
Perhaps JAVA_HOME does not point to the JDK

Of course, JAVA_HOME is pointing to the right location, in both the OS environment and within Eclipse (This variable can be set within Eclipse through Window > Preferences > Java > Classpath Variables).

So how to fix the Ant build problem?

I found various solutions searching, for example running Eclipse with "eclipse -vm [JDKPATH]\bin" but that didn't quite satisfy me (I wanted something that could be configured within the environment). Other solutions to the problem where even more esoteric.

The best solution I've found (after a little bit of digging through Eclipse's options) is to edit Ant's runtime properties. Go to Window > Preferences > Ant > Runtime. Choose the Classpath tab. Select the Global Properties node in the tree and click Add External JARs. Select tools.jar from your JDK directory (e.g., j2sdk1.4.2_03\lib\tools.jar). Click Apply and OK, and you should be on your way. Not too hard when you know what to do. Now if this could only be done automatically by Eclipse on install...

Categories: soft.dev
Posted by diego on February 22, 2004 at 4:26 PM

gender and computer science

Jon had a good post a couple of days ago titled gender, personality, and social software, based on a column of his at InfoWorld: is social software just another men's group?. He makes some interesting points in both.

There is part of that thread that I wanted to comment on (but didn't get around to doing it until now for some reason!), and it's the question of how much computer science is "gender biased." Towards men, of course.

Having spent a good part of the last 10 years in academic institutions one way or another in various countries and continents (as well as in companies of all sizes), here are my impressions. This is of course, just what I've observed.

That there are few women in computer science is obviously true. Surveys or not, you can see it and feel it. That said, I have noted that something similar happens in other disciplines, such as civil engineering. In the basic sciences, there are more men than women in Physics for example, but the difference is not as marked. In Chemistry, or Biology, the differences largely disappear.

One thing I can say, from experience, is this: of my groups of students, both here in Ireland and in the US, an interesting thing happened: even though there are fewer women (much fewer) than men, the number of women that are very good is roughly similar to the number of men that are very good. (Hacker types, the "take no showers or bathroom breaks until I finish coding this M-Tree algorithm using Scheme, just for the fun of it" have been invariably men in my experience, and generally there has been, if any, one of those at most per class, but I have no doubt that there are women like this, I just haven't met them. :)) Note that I'm referring specifically to computer hacking (in the good sense) here--I know women with the same attitude toward their work, just not computer hacking :).

To elaborate a bit on the point of the last paragraph, if, say you have a CS class of 40 people, maybe 5 at most would be women. But of those five women, two would be very good. And there would be maybe three, at most four good computer-scientists-in-brewing on the boys' side.

My conclusion after all this time is that there's a difference of quality over quantity. In some weird way, talent for computer science seems to me to be constant regardless of gender (maybe this is the case for everything?). There might be more men doing development, sure, but there are also more that are not very good at it (or do it for the wrong reasons, such as money, or parent's pressure, or just "because"; in my opinion, if you don't really like doing something, you shouldn't be doing it, period.)

The other thing I've noticed in recent years is that, as software (and hardware) have become more oriented towards art, social and real-world interactions (The stuff done at the Media Lab is a good example), I've seen more women on that side of the fence. In fact, in some of these areas women dominate the landscape.

Now, I don't want to get carried away on speculating on the reasons for this split since they would almost certainly be hand-waving of the nth order. I will say however that I think that sexism (which I despise--for example I enjoy James Bond movies but the blatant misogynism in them gives me the creeps--, and, btw, if you want to know how serious I am in using the word "despise" here, you can read this to see what I think about semantics in our world today) has to be partly a factor here. But I'm sure there are others, and history plays a part too. Consider that we're still using UIs and sometimes tools that have very clear roots twenty, sometimes thirty or even forty (!) years ago (e.g., LISP, or COBOL, or Mainframes). Back then gender-based prejudices were even worse, and it's reasonable to assume that we're still carrying that burden in indirect fashion.

So maybe it's not a surprise that now that we're working on technologies that had its start ten or fifteen years ago women are getting more into it? Maybe. I sure hope so.

What do others think? Women's opinions are especially welcome. And if any of this sounds ridiculous (I'm under no illusion that what I've said here is completely accurate), please feel free to whack me in the head. I'm taking painkillers for a horrible pain in the neck I have, so it won't hurt too much. :-)

Categories: science, soft.dev
Posted by diego on February 19, 2004 at 11:01 AM

snipsnap: wow

snipsnap-logo.png

A few days ago I was asking on #mobitopia what people preferred as a wiki/weblog system and someone (I think it was csete) mentioned SnipSnap. I didn't have time to try it out until today. My comments: WOW.

It took me literally five minutes to set up. It seamlessly connected to the local mysql installation (all I had to do was create a db and a user for it) and ran under my Tomcat/Apache config. After setting a couple of options I was on my way. It combines the idea of Wikis (easily creating links to pages) to the format/structure/features of a weblog. The "wikiness" of snipsnap does not extend to requiring WikiWords, which is, as far as I'm concerned, a relief. WikiWords inevitable end up requiring weird names for links.

It's a java app, so it runs everywhere. The only potential problem I could find is that in edit mode there are tons of options to edit content and sometimes it can be confusing (or rather, a little overwhelming), but I get the impression that it wouldn't be hard to get used to it.

If you're looking for a weblog/wiki solution in Java that it's easy to get started with, SnipSnap is definitely worth checking out.

Categories: soft.dev
Posted by diego on February 14, 2004 at 5:47 PM

eclipse 3.0 M7 release

Finally! As R.J. said, commenting on my recent entry about IDEs, Eclipse 3.0 M7 has been released today. There are a number of changes, including better templates and managing of annotations, new ways of navigating code (inherited methods for example), and improvements to SWT components such as tables and trees, and the browser components. Also, there is new automagic creation of bundles of Mac OS X (which can be done through a free Apple tool as well, but Eclipse makes it much easier), better "scalability" of the UI and a number of APIs have finally been frozen.

Anyway. Will have more comments once I've used it for a bit longer.

Categories: soft.dev
Posted by diego on February 13, 2004 at 9:36 PM

the key is real, the lock is not

In the movie The Game part of the plot centered around a (simulated) "attack" on a rich man (Michael Douglas) that forced him to give up the passwords and such to his bank accounts by intercepting the cell phone call and answering it, pretending to be the bank. The basic idea (make the environment familiar enough so that you slip up) has been used online in various forms, but so far any attentive person could figure out that things were not what they seemed.

Don has posted about an unsettling idea he calls visual spoofing. Essentially he's exposing the biggest threat of all: that we end up becoming used to our UIs to the point where we trust them implicitly.

I brought up the movie at the beginning because Don's example is the online version of it (granted, there are details missing, but does anyone doubt that you could conceivably spoof the entire UI? And what then?). Douglas' character in the movie has no way at all of telling that the person on the other side is not working for the bank, but for the enemy. His keys (passwords) are intact, but the lock (bank) isn't real.

The problem is, at the core, that we tend to guard (and trust, or distrust), the key, while we implicitly trust the lock. Why? The lock is "solid, real". It's "unmovable": built into the door, or ever present in your computer screen. The key can be duplicated without you knowing. The lock cannot.

Except that the locks we've got on computer screens are themselves open to duplication. Seamless. What Don is talking about is applied to browsers. But given the ever-present infestation of all kinds of worms and viruses, how long will it take until this applies to other software too? Software that monitors keypresses has been around for a long time, but digging through all the information generated is a mess (nevermind having to get it out of the machine). This is targeted, targeted at the user, not at the system. You could simulate accounting software. Social engineering meets cracking, or phreaking (no, I don't like to use the term hacking, which I prefer to use in its original context).

Thanks, Don, for the eye-opener. Looking forward for the follow up where he'll talk about an idea he had to minimize this problem. I don't want to start thinking about possible solutions yet: I haven't even finished absorbing all the implications.

Categories: soft.dev
Posted by diego on February 12, 2004 at 8:59 PM

configuring apache 2 + tomcat 5 + mysql + jdbc access on linux and windows

Heh. That title took almost as long to write as it took to complete the configuration. :)

I spent some time today preparing the basics for webapp development and runtime. The "basics" include:

Update (Jan 26, 2005): I've posted some new information related to the tomcat config for versions higher than 5.0.18.
To make sure I understood the differences across environments, I configured the system in parallel on both a Linux Red Hat 9 machine and a Windows XP machine. Before I begin with describing the steps I took to configure it, I want to thank Erik for his help in finding/building the right packages and distributions, particularly on the Linux side of things. It would have taken a lot longer without it.

So here are the steps I took to get it up and running...

Apache

The installation of Apache is pretty straightforward, both for Linux and Windows. Red Hat 9 usually includes Apache "out of the box" so there's one less step to go through. When in doubt, the Apache docs usually fill in the picture (documentation for 2.0 has improved a lot with respect to 1.3.x).

Tomcat

Here's where things got interesting. The last time I used Tomcat was when 4.0 was about to be released, and I had switched over to the dev tree for 4 since 3.x had serious scalability problems. There are tons of new things in the package, but the basic configuration doesn't need most of them. Installing Tomcat itself is in fact also quite straightforward (Again, the docs are quite complete), but it's when you want it to access it through Apache that things get a little more complicated.

Apache + Tomcat: The Fun Begins

To access Tomcat through a Apache you need a connector. Tomcat has connectors for both Apache and IIS, but the problem is that apache.org doesn't include RPMs (and in some cases) binaries. The connector that I wanted to use was JK2, but binaries for RH9 where not available (I got the Windows binaries from there though). So. I first tried downloading the package supplied at JPackage.org (which is a really handy resource for Java stuff on Linux) but after a few tries (both getting the binaries and rebuilding from the source RPMS, including having to install most of the dev stuff which still wasn't in the server) it wasn't working. Most probable reason for this was that these packages are actually being done for Fedora, not RH9.... it's amazing that Fedora hasn't officially taken over and already we've got compatibility problems. Finally Erik pointed me to this page at RPM.pbone.net where I could actually get the binary RPMs directly and install it. So far so good. Now for the configuration.

Configuring the connector is not really that complicated, and it worked on the first try. The steps are as follows ("APACHE_DIR" is Apache's installation directory, and "TOMCAT_DIR" is Tomcat's install dir):

  1. Go to TOMCAT_DIR/conf and edit the file jk2.properties, adding the following line:
    channelSocket.port=8009
    (This is just because I chose to use a port other than the default).
  2. In the same directory, open server.xml and look for the "non-SSL Coyote HTTP/1.1 Connector". This is the standard Tomcat-only connector and comment it out since we'll be using Apache for handling HTTP requests. In the same file, look for a commented line that says "<Context path="" docBase="ROOT" debug="0">". Right after that line, add the following Context path:
    <Context path="" docBase="APACHE_DIR/htdocs" debug="0" reloadable="true" crossContext="true"/>
    Btw, htdocs is the standard Apache root document dir, you will need to change that if you have moved it to another location.
  3. Now go to the APACHE_DIR/conf directory. There, create a file workers2.properties, with the following contents:
    [shm]
    file=APACHE_DIR/logs/shm.file
    size=1048576

    # socket channel
    [channel.socket:localhost:8009]
    port=8009
    host=127.0.0.1

    # worker for the connector
    [ajp13:localhost:8009]
    channel=channel.socket:localhost:8009

    Note that the port matches that defined in server.xml above for Tomcat.
  4. Copy the module file into APACHE_DIR/modules (for Windows this will be something of the form mod_jk2*.dll and for linux mod_jk2*.so).
  5. Edit the file APACHE_DIR/conf/httpd.conf and add the following lines at the end of the list of modules loaded into Apache:
    LoadModule jk2_module modules/MODULE_FILE_NAME

    <Location "/*.jsp">
    JkUriSet worker ajp13:localhost:8009
    </Location>

    <Location "/mywebapp">
    JkUriSet worker ajp13:localhost:8009
    </Location>

    The "mywebapp" reference points to a directory that will be handled by Tomcat, you can add as many mappings/file types as you need to be handled by the connector.
And we're set! You can now drop a simple test.jsp file into the Apache document root, something like this:
<HTML>
<BODY>
<H1><%= " Tomcat works!" %></h1><%= "at " + java.util.Calendar.getInstance().getTime() %>
</BODY>
</HTML>
And then access it simply with
http://localhost/test.jsp

JDBC + MySQL

Before moving on to configuring database pools on Tomcat and so on, it's a good idea to test JDBC in an isolated environment. This is easy. First, get the MySQL Control Center application to create a test user and database/table to prepare the environment. This app is quite complete, and multiplatform to boot (Erik also mentioned Navicat as an app with similar functionality but better UI for Mac OS). For this test I created a a database called testdb and a single table in it, called user. I added three fields to the table: name (varchar), password (varchar) and id (int). I also created a test user (username=test, password=testpwd). Note that the user has to be allowed access from the host that you'll be running the test on, typically localhost, as well as permissions on the database that you'll be using (in this case, testdb).

Once the db is ready, you can get MySQL's JDBC driver, Connector/J, from this page. After adding it to the classpath, you should be able to both compile and run the following simple JDBC test app:

import java.sql.*;

public class TestSQL {

    public static void main(String[] s)
    {
        try {
            Connection conn = null;
            Class.forName("com.mysql.jdbc.Driver").newInstance();
            String dbUrl = "jdbc:mysql://localhost/testdb?user=test&password=testpwd";
            //fields are name (String), password (String) and ID (int)
            conn = DriverManager.getConnection(dbUrl);
            String statement = "INSERT INTO user (name, password, id) VALUES ('usr1', 'pwd1', '1')";
            Statement stmt = conn.createStatement();
            boolean res = stmt.execute(statement);
                    
            String query = "SELECT * FROM user";
            
            ResultSet st = stmt.executeQuery(query);
            while (st.next()) {
                String name = st.getString("name");
                String pwd = st.getString("password");
                String id = st.getString("id");
                System.err.println("Query result="+name+"/"+pwd+"/"+id);
            }

            if (stmt != null) {
                stmt.close();
            }

            if (conn != null) {
                conn.close();
            }
                        
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
}

Which, when executed, should print Query result: usr1/pwd1/1.

Using JDBC/MySQL from Tomcat

Once JDBC and MySQL are up and running, we can move to the final step, namelyl, access the MySQL database through JDBC from Tomcat. For this I used the guide provided here within the Tomcat docs. For completeness (and to maintain the context of this example), the following are the steps required to set up the JNDI reference to the connection pool (managed from code built into Tomcat that uses the Apache Commons Pool library among other things):

  1. Copy the JDBC driver JAR file into TOMCAT_DIR/common/lib
  2. Edit TOMCAT_DIR/conf/server.xml and after the Context tag for the root dir (added at the beginning) add the following:
    <Context path="/testdb" docBase="APACHE_DIR/htdocs/testdb"
            debug="5" reloadable="true" crossContext="true">

      <Logger className="org.apache.catalina.logger.FileLogger"
                 prefix="localhost_DBTest_log." suffix=".txt"
                 timestamp="true"/>

      <Resource name="jdbc/TestDB"
                   auth="Container"
                   type="javax.sql.DataSource"/>

      <ResourceParams name="jdbc/TestDB">
        <parameter>
          <name>factory</name>
          <value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
        </parameter>

        <!-- Maximum number of dB connections in pool. Make sure you
             configure your mysqld max_connections large enough to handle
             all of your db connections. Set to 0 for no limit.
             -->
        <parameter>
          <name>maxActive</name>
          <value>10</value>
        </parameter>

        <!-- Maximum number of idle dB connections to retain in pool.
             Set to 0 for no limit.
             -->
        <parameter>
          <name>maxIdle</name>
          <value>5</value>
        </parameter>

        <!-- Maximum time to wait for a dB connection to become available
             in ms, in this example 10 seconds. An Exception is thrown if
             this timeout is exceeded.  Set to -1 to wait indefinitely.
             -->
        <parameter>
          <name>maxWait</name>
          <value>10000</value>
        </parameter>

        <!-- MySQL dB username and password for dB connections  -->
        <parameter>
         <name>username</name>
         <value>test</value>
        </parameter>
        <parameter>
         <name>password</name>
         <value>testpwd</value>
        </parameter>

        <!-- Class name for mm.mysql JDBC driver -->
        <parameter>
           <name>driverClassName</name>
           <value>com.mysql.jdbc.Driver</value>
        </parameter>

        <!-- The JDBC connection url for connecting to your MySQL dB.
             The autoReconnect=true argument to the url makes sure that the
             mm.mysql JDBC Driver will automatically reconnect if mysqld closed the
             connection.  mysqld by default closes idle connections after 8 hours.
             -->
        <parameter>
          <name>url</name>
          <value>jdbc:mysql://localhost/testdb?autoReconnect=true</value>
        </parameter>
      </ResourceParams>
    </Context>

    This sets up the JNDI resource and paths.
  3. Now edit TOMCAT_DIR/conf/web.xml and add the following after the default servlet:
      <resource-ref>
          <description>DB Connection</description>
          <res-ref-name>jdbc/TestDB</res-ref-name>
          <res-type>javax.sql.DataSource</res-type>
          <res-auth>Container</res-auth>
      </resource-ref>
    Finally, compile and package into a JAR the following java file (TestSQLLoad.java):
    import javax.naming.*;
    import javax.sql.*;
    import java.sql.*;

    public class TestSQLLoad
    {
      
     String user = "Not Connected";
     String pwd = "no pwd";
     int id = -1;
     
     public void init() {
      try{
       Context ctx = new InitialContext();
       if(ctx == null ) {
        throw new Exception("Boom - No Context");
       }

       DataSource ds = (DataSource)ctx.lookup("java:comp/env/jdbc/TestDB");

       if (ds != null) {
        Connection conn = ds.getConnection();
        
        if(conn != null) {
         user = "Got Connection "+conn.toString();
         Statement stmt = conn.createStatement();
         String q = "select name, password, id from user";
         ResultSet rst = stmt.executeQuery(q);
         if(rst.next()) {
          user=rst.getString(1);
          pwd=rst.getString(2);
          id = rst.getInt(3);
         }
         conn.close();
        }
       }
      }catch(Exception e) {
       e.printStackTrace();
      }
     }

     public String getUser() {
      return user;
     }
     
     public String getPassword() {
      return pwd;
     }
     
     public int getID()
     {
      return id;
     }
    }

  4. Once you've got the JAR ready, drop it into TOMCAT_DIR/common/lib as well.
    Important note: Normally we would configure the JNDI resources and drop the JAR file into an independent web application, but I am placing it into the default web app and dropping everything into common/lib (not a good idea in general, except for the JDBC driver's library) to simplify the example. Quick and dirty, and yes, maybe a bit too dirty, but it's better not to be configuring a web app at the same time; all we need to know is that the JDBC driver, the JNDI reference, and the pool work properly.
  5. We're almost ready. Now create a directory testdb under APACHE_DIR/htdocs (or whatever your Apache document root is) and create a file test-db.jsp with the following contents:
    <html>
      <head>
        <title>DB Test</title>
      </head>
      <body>

      <%
        TestSQLLoad tst = new TestSQLLoad();
        tst.init();
      %>

      <h2>Results</h2>
        User <%= tst.getUser() %><br/>
        Pwd <%= tst.getPassword() %><br/>
        Id <%= tst.getID() %>

      </body>
    </html>

    You should now be able to access it through the following URL:
    http://localhost/testdb/test-db.jsp
And that's it! Phew!

Apache, Tomcat and MySQL are ready to go. Hope this helps, and as usual, comments/questions are welcome.

Categories: soft.dev
Posted by diego on February 11, 2004 at 12:16 AM

realvnc rocks

If you are in need of a remote desktop solution that is simple, small (500K to 1MB depending on server or server+client download), and "just works", check out RealVNC. It's fantastic. (Thanks Dylan for the pointer! :)).

Categories: soft.dev
Posted by diego on February 9, 2004 at 10:12 PM

...and the great IDE hunt

Aside from trying out the new J2SE 1.5 beta, I've been looking at IDEs, since we're now going to buy some extra licenses I wanted to make sure we made a good choice. IDEA, a longtime favorite of mine, is sadly out of the picture for reasons unrelated to development which I'll discuss later. (The increased bloatedness of the product --There's a bazillion features on the upcoming IDEA 4.0 that mean nothing to me whatsoever-- also weighs in). Don't bother posting comments saying that I'm an idiot for ditching IDEA. I think it's one of the best IDEs out there and it's probably a good choice for many people, but there are circumstances that go beyond the IDE that made it impossible to depend on it. As I said, I'll talk about that later.

So what have I been looking at--particularly with the change to JDK 1.5 now on the horizon? Well, the first IDE I checked out was CodeGuide from Omnicore. Using CodeGuide today took me back to how I felt when I tried IDEA for the first time nearly three years ago. It is simple, small, fast, and it looks good (Best looking Java IDE I've seen, in fact, better than Eclipse). Additionally, the latest CodeGuide is the first IDE with a final release (6.1) to fully support Tiger features, including an uber-cool refactoring called "Generify" which helps a lot in converting old projects to use generics. What's even better about CodeGuide is what's on the pipeline: CodeGuide 7.0 (codenamed "Amethyst") will include a new "back in time" debugger. Check out the webpage where they describe this new feature. Is that fantastic or what? It seems that Omnicore is really committed to keeping an edge on good functionality and maintaining a simple IDE while including more advanced features.

CodeGuide does have some bad points: it doesn't seem to support some of the standard keybindings on Windows (Ctrl+Insert, Shift+Insert, etc) which is not good for keyboard junkies like me, and its code generation/formatting facilities are pretty limited (among other things). Sadly, these seemingly trivial problems are pretty big when dealing day-to-day with large amounts of code, and they can easily end up being show-stoppers.

I also tried out the latest JBuilder (JBuilder X) and it's improved quite a lot over the last few revs, and is now easier to use as well. The UI designer is nice but as usual it has the terrible habit of sprinkling the code with references to Borland's own libraries (layout classes are a good example), which bloat your app without a clear advantage. Pricing is ridiculous for anything but the Foundation version though, and their focus on Enterprise features means that there are probably more control panels in it than on the main console of the Space Shuttle.

Finally, I tried NetBeans 3.6Beta, and I have to say I was impressed (my expectations were pretty low though, having used early version of it...). It's reasonably fast and looks pretty good, and the UI designer generates simple code which I think makes it very useful for prototyping (I don't really believe on UI designers for building the final app, but that's just me). Charles commented on the release here. It is a bit on the heavy side in terms of features and that's always a problem since I end up navigating menus with feature after feature that I don't really care about (Eclipse can also be daunting in this sense).

And what about Eclipse? Well, I'm waiting for the release of 3.0M7, due tomorrow. We'll see. :) I'll post an update with my impressions after I've tried it, with conclusions to follow.

Categories: soft.dev
Posted by diego on February 8, 2004 at 1:26 PM

eye of the tiger...

duke_swinging.gifSo the J2SE 1.5 Beta1 (codenamed Tiger) was released a few days ago. Here are some of the changes on the release, which should probably be called Java 3, since there are so many changes, both to the deployment/UI elements and the language itself (with the biggest IMO being the addition of Generics, of which I did a short eval about a year ago).

Predictably enough there has been quite a lot of coverage on weblogs of the release. Some of them: Brian Duff on the new L&Fs, Guy on Tiger Goodies, Brian McAllister on what he likes about it. Some of the conversation has centered around the new Network Transfer archive format, which brings JARs to 10% of their original size by doing compression tailored to java class format and usage. Eu does some analysis on it and Kumar talks about his experience when using it.

I installed it yesterday and played around a bit with Generics and tried the new L&Fs with the internal clevercactus b3r7. Alan has had problems with it, but I haven't seen anything as what he describes--maybe I'm immune to having multiple JDKs by now and I unconsciously route around the problems before they happen (which is a problem with designing UIs too, btw). Not that this has to stay in this way. :)

My experience has been surprisingly good. Everything works as it should, and aside from a few UI glitches or weird momentary lockups it all went well. For a beta, it looks incredibly promising, and I'm really, really itching to start using Generics all over the place (btw, looking at the Collections package docs now is a bit daunting, with all the generics stuff now included).

The new L&Fs are very, very nice. Particularly welcome is the change to the Java Look and Feel (Ocean, replacing what used to be Metal) which by 1.4x was looking not just old, but downright crappy. How good is it? Well, let me put it this way: if Ocean was available today I'd have no problems deploying it. Plus, the new Synth L&F, which supports skins, is what we've all been waiting for.

Overall: looks like Tiger is going to be the best update in years. Can't wait for the final release.

Next: looking for a new IDE.

Categories: soft.dev
Posted by diego on February 8, 2004 at 1:03 PM

social software: representing relationships, part 3

[See parts one and two]

Anne linked to my first entry on representing relationships and Alex (sorry, no link) posted a comment there that centered around the following point:

it's interesting that dimensions are here thought of in terms of as static. That the space of visual representation is either 2d/3d. I was under the impression that interactive real-estate is multi-dimensional. I suppose if the design of such renderings is informed by scientific or mathematical diagrams then you are bound - to some degree - to the constraints of such formulations.
I started to reply in a comment there and I just kept typing and typing, so I came to the conclusion it'd be better to post it here...

I noted in my post that relationship patterns are actually n-dimensional, that is, I agree with Alex's comment in that sense. My reasons for looking at 2D/3D formulations are, however, less abstract than Alex implies. Plus, I'll go a bit further (since I don't think that Alex was suggesting that we should all suddenly move to n-dimensional maps) in analyzing why there is a tendency to go after linear, planar (and eventually, at most, volumetric) representations for data.

The 2D/3D representation "lock-in" that we see in UIs today actually has a solid basis in reality. Beyond the physiological limitations that our neural structure and binocular vision create, the laws of physics (as we understand them today :)) dictate that we'll never go beyond 3D visualization. Additionally our current technology dictates that it's impractical to design everything around a 3D display. (Sorry if this seems a bit too obvious, I just want to clarify in which context I'm looking at things).

From that follows that, if we represent n-dimensional data structures, we'll have to create projections. Projections are easy stuff, mathematically speaking (i.e., they involve fairly simple vector math). Visualizing them is not too difficult either. For example, consider hypercubes, which are one of the easiest cases because they're fully symmetrical graphs. For example this is what projections of hypercubes of dimensions n > 3 into 2D look like [source]:

hypercubes.gif

A 2D projection of a, say, 12D space might be pretty to look at, but I think most users would avoid that kind of complexity and its consequent cognitive overload.

My point (I do have a point apparently) is that if we agree that we are bound by 2D (eventually 3D) displays, and that n-dimensional projections onto 2D/3D spaces are confusing to navigate for the majority of users, then we should try to use, as much as possible, actual 2D or 3D representations, simply because they are in their "native" form and can be properly optimized for easy, useful tasks that users might have to perform. The data is "transparent"; there are no abstractions to understand to make use of it (which would be necessary for higher-order spaces).

Additionally, those diagrams (while plausible UIs) are in my view more useful as tools for visualizing what is important about the data we're trying to represent (and allow to be manipulated/analyzed). And while they might be "overused" in the research sense, they haven't been used in actual software products that much. Part of the reason is that they feel "alien" as a way of manipulating data. Products like The Brain have been around for a long time, and yet they haven't taken over the world. Clearly, it's not something that can be simply assigned to, say, a failure of marketing or whatever. People like linearity, they are more comfortable with it, and in a pure design sense the less data there is to deal with the more users can focus on their work instead of focusing on how to navigate around the damn thing. All the major interfaces today are linear: the most complicated data structure people usually deal with (in filesystems, email programs, etc) are linearized hierarchies where they can deal with one linear subspace at a time (the current folder). Yes, there are historical reasons for this, but I also think that there's a strong component of user preference in it.

So, if we could pull off a switch from linear to 2D interfaces, even if they are a bit ancient as far as research is concerned, it would be a good step forward, always with the goal of providing the most accurate representation of the complexity we know it's there within the constraints we've got. After all, this is about making it easy for the majority of users, not people that will read something like this and not run away from the room in a panic. :)

Categories: soft.dev
Posted by diego on February 5, 2004 at 2:14 PM

social software: automatic relationship clustering

[See also this post and a follow-up here]

Regarding my post on tuesday on social software: representing relationships, my mind kept coming back to one of the things I wrote:

People don't always agree on what the relationship means to each other. This to me points to the need to let each person define their own relationship/trust structures and then let the software mesh them seamlessly if possible.
The reason I kept thinking about it is that I didn't really explain properly what I meant by that.

So what did I have in mind when I said that? Well, lots of things :), but let's start with the basics.

The first thing that the software should be able to do is infer what groups are there, rather than be told what the groups are. With this in hand, if you simply define relationships to your friends, and you take into account their friends and how they relate to you, you should be able to create a graph of probable relationship clusters, that is, groups formed around strong interpersonal relationships. Sounds farfetched? Read on...

Well, as it happens while I was at Drexel I did research on automatic graph clustering, applying genetic algorithms to techniques developed by my advisor at the time, Spiros Mancoridis, except back then it was applied to software systems. But what I realized a couple of days ago is that the same technique, maybe with a few mods, it would work just as well (if not better, because the graphs are smaller). The technique is described here, but to make a long story short, there's basically a set of equations that can be used to provide an objective measure of how well the clustering is done on a certain graph. The graph must define clusters to which nodes belong, along with the relationships between the nodes. The equation system favors loosely coupled clusters with dense inter-cluster relationships between the nodes (cluster coupling is determined by the edges that connect different nodes across clusters). The objective value is called the modularization quality of the graph, which is calculated by using an inter-cluster connectivity measure and an intra-cluster connectivity measure .

To make things more concrete, let's look at the simplest type of MQ, one for a directed graph with unweighted edges. Don't panic, it's not as complex as it looks at first glance! :) The intra-cluster measure is calculated as follows:

clustering-ai.gif

Where Ai is the intra-connectivity measure for cluster i, Ni is the number of nodes in the cluster, and mi is the number of edges in the cluster.

The inter-connectivity measure Eij is calculated as:

clustering-eij.gif
with i and j the cluster numbers (or IDs) to calculate interdependency for, N the number of nodes in each and eij the number of edges between them.

These two measures are combined to calculate the MQ of the whole graph:

clustering-mq.gif

With k the number of clusters in the graph. (Btw, this is all much better explained in the paper, but this is good enough to get an idea of what's going on).

Now, the problem with this calculation is that it depends on a particular graph clustering, which is precisely what we want to find out since we are assuming a set of relationships with no clustering. We have one advantage though, we know that MQ function has the property of being higher the "better" the clustering is (according to the measures just described).

So what we need to do is treat this as an optimization problem.

There are a number of sub-optimal techniques to traverse a space of values, including hill climbing, genetic algorithms, and so on. We just need to choose one, with the caveat that the larger the space the larger the probability that we are hitting a local maximum (rather than the overall maximum) of the space. This is not a problem for relatively small graphs (<50 nodes), with that size we can even do an exhaustive (ie., optimal) search on the space.

If all of this sounds a bit iffy, let me demonstrate with an example. Let's say that we've got the following relationship set from the, um, "real world" (heh):

  • Homer and Marge are married, and they have three children, Bart, Lisa and Maggie. Grandpa is Homer's Dad.
  • Homer works with Carl and Lenny.
  • Homer, Barnie and Moe are friends.
  • Bart, Milhaus, and Lisa know Otto.
The relationship graph file can be represented simply with a text file, relationship per line. (Click here to see the file). Note: because all the relationships described are bidirectional, we need to describe the connection between each "node" twice, i.e., both "Marge Homer" and "Homer Marge" are necessary to describe the relationship.

With the relationship file ready, but no clusters defined in the file, we can now process the graph and see what the optimization process discovers as "clusters". Here is the result (graph visualized using AT&T's dotty tool--ignore the labels for each cluster, they're automatically generated IDs):

cluster-rel-graph.gif

It is easy to underestimate the significance of obtaining this graph automatically, since the "clusters" that we see are for us obvious (if you've watched The Simpsons, that is :)), but keep in mind: the software has no knowledge of the actual groups, just of the relationships between nodes, ie., individuals. Additionally, there are more complex MQ calculations that involves weights on the node relationships; using different dimensions for different target groups allows creating different clustered views based on them. It wouldn't be hard to adapt this to parse, say, FOAF files and do some pretty interesting things.

This is clearly only a first step, but once reasonable clusters are obtained the software can begin doing more interesting things, such as suggesting which people you should meet (e.g., when someone belongs to the same cluster as you but you don't know them), defining levels of warning for requests (exchanges between individuals in the same cluster would have less friction), etc.

Cool eh? :)

PS: Speaking of clustering, check this out. The clustering in this case is fairly obvious, but for more complicated sets of relationships the technique I describe would also apply in this space. [via Political Wire]

Categories: soft.dev
Posted by diego on February 5, 2004 at 1:53 AM

social software: representing relationships

[Followups to this post here and here]

In all the recent talk about social software (particularly a lot of discussion generated by the release of Orkut, see Ross' Why Orkut doesn't work, Wired's Social nets not making friends, Cory's Toward a non-evil social networking service, Anne's Social Beasts, Zephoria's Venting about Orkut (many good follow-up links at the end of her post as well), David on the identity ownership issues that arise), one of the oft-mentioned points is that these tools force people to define relationships in binary fashion ("Is X your friend? Yes or no.") or along limited one-dimensional axes. Also, a lot of the talk has been attacked as mere bashing of beta services by the "digerati" (particularly in what relates to Orkut), and while there is definitely be an element of hype-sickness that contributes to it (felt more by those who see new things every day), I also think that some of these concerns are valid and part of the process of figuring out how to build better software in this space.

Don had an interesting post on Sunday on which he discusses his idea of "Friendship circles" to define relationships. I think this is most definitely an improvement over current binary or one-dimensional approaches (and I think it's quite intuitive too). I do think though that relationships maps like these are often multi-dimensional. While Don's approach covers, I'd say, 80-90% of the cases, there will be overlaps where someone might belong to two or three categories, which makes it harder to place someone in a certain section of the circle (with two categories though you could place someone on the edge where they connect though). I see a chooser of this sort as something more along the lines of a Venn diagram, as follows:

rel-diagram.gif
However, Don's diagram has one big, big advantage, which is that it visually relates people to you rather than placing them in categories. That is, with his diagram you can visually define/see the "distance" between a given person and yourself, or in other words, how "close" they are to you, while the approach I'm describing requires that you define in abstracto how you view these people, but not in relation to you.

What I'm describing is thus probably more accurate for some uses (and scalable to self-defined categories, rather than predetermined ones, which would show up as additional circles) but also has more cognitive overload.

This point of "scalability" however is important I think, because it addresses the issue of fixed representation more directly. How so? Well, current "social networking" tools basically force every person in the network to adapt to whatever categories are generally common. Furthermore, they force the parties in a relationship (implicitly) to agree on what their relationship is. I think it's not uncommon that you'd see a person as being, say, an acquaintance, and that person to view you as a friend (if not a close one). People don't always agree on what the relationship means to each other. This to me points to the need to let each person define their own relationship/trust structures and then let the software mesh them seamlessly if possible.

In the end I think that a more accurate representation would be three-dimensional (okay, maybe the most accurate would be n-dimensional, but we can't draw that very well, can we? We always need transformation to 2D planes, at least until 3D displays come along :)). Something that mixes Venn diagrams with trust circles like Don describes.

Needless to say, this is but a tiny clue of a small piece of the puzzle. Whatever solutions we come up with now will be incomplete and just marginally useful, as all our theories (and consequently what we can build with them, such as software) are but a faint, innapropriate (read: linear) reflection of the complexity (read: nonlinearity) that exists in the world.

Another thing that I find interesting of the discussion is that there seems to be an implicit assumption of whether you'd want to expose all of this information to other people. But that's how current tools generally work, it doesn't mean that you can't selectively expose elements of your relationships/trust circles to certain people and not others (and keep some entirely private). Problem is that this usually requires complex management, and a web interface is not well adjusted to that. You need rich (read: client-side) UIs, IMO (but that's just me). Client-side software also helps with privacy issues.

We have lots to figure out in this area yet, but we're getting there, inch-by-inch Or should it be byte-by-byte? :)).

Categories: soft.dev
Posted by diego on February 3, 2004 at 2:30 PM

web application stress testing

Was thinking about this topic today, and I remembered a few years back I used Microsoft's Web Application Stress Tool. It did the job (simple stuff, nothing terribly complicated), and it was free, if sometimes a little difficult to use properly. Apparently it's not maintained anymore, since the listed version is still compatible only with W2K.

Now, I was any apps out there that people really like for this job, on any platform? What do you use/recommend for web apps stress testing?

Categories: soft.dev
Posted by diego on February 3, 2004 at 1:16 AM

the 2.6 linux kernel

[via Jon] a great article at InfoWorld comparing versions 2.4 and 2.6 of the Linux kernel. Upsides of the new kernel: speed and scalability. Downsides: Not much support for drivers, etc. The benchmark results are really impressive. I guess that it was worth the wait then. :)

Categories: soft.dev
Posted by diego on February 2, 2004 at 4:44 PM

blogtip: pinging technorati and yahoo

A small tip, probably not new to most, but anyway: It is common to have weblog tools "ping" a change-server such as weblogs.com. This is used by blog-oriented search engines to both find your blog and provide faster updates. MovableType includes, built-in "ping" support for weblogs.com and blo.gs. However, you can also add your own. Jeremy recently posted how to do it for Yahoo! (very useful now that My Yahoo! supports RSS) and you can do it for Technorati as well, using the information in this page.

When I have time over the next few weeks I'll post a follow up to my introduction to weblogs and introduction to syndication, which have turned out to be quite popular. Sounds like a good idea to write down incrementally which of the more "advanced" topics would be in it. :)

Categories: soft.dev
Posted by diego on February 2, 2004 at 9:11 AM

atomenabled.org

[via Don] Sam released AtomEnabled.org. Missing: a feed to keep up with updates. Site looks great though!

Categories: soft.dev
Posted by diego on January 25, 2004 at 3:02 PM

porting swing to swt: a tutorial, and the Tiger is out

Must check out when I have the time (In a couple of days? Early next week? Maybe). Looks interesting, both items via Erik. First, a tutorial from IBM DeveloperWorks on porting Swing apps to SWT. Would be a good way to update on my first impressions.

And, in other Java-related news, (I saw this a couple of days ago, but was reminded again today) a pre-release of Tiger (J2SE 1.5) is now available for testing, very quietly (here's the download page). From the looks of it the installation is still a pain, particularly when juggling multiple JDKs in a single machine (The other day I tried out JBuilder Foundation for a moment and it decided to set itself as the default JDK. Reminds me of the days when media apps like Windows Media, Quicktime and Realplayer used to take control of the registry entries for media formats without asking). Hopefully Sun plans to fix these problems at some point.

Categories: soft.dev
Posted by diego on January 21, 2004 at 11:53 PM

feedexplorer

Over the weekend I released clevercactus feedexplorer, a simple free app to browse the data from the Share Your OPML commons (thanks Dave for making this resource available!) and choose feeds that you find interesting, then allowing you to save them into OPML files that can be imported into a news aggregator. It runs on Windows, Mac OS X, and other OSes.

Here is the page with installation instructions and a short user guide.

If you can, take a moment to read the user guide as it explains how to change the sorting, perform searches, etc (Btw, I think the UI is pretty self-explanatory, but reading the doc should leave little doubt as to how to do something :)).

Note: if you have any problems with the installation, please take a moment to read the installation page, as it answers common questions and problems.

Another Note: the first time the app loads it will obtain the data from the site, but afterwards it only downloads the changes (through combined use of Etags and Last-Modified HTTP headers with the data in the main feed, which also includes change dates). Additionally, transfers use GZIP compression to minimize both server load and download times.

Yet Another Note: I find the incremental search function to be strangely mesmerizing. :)

Screenshots
Here are a couple of screenshots (click on the images to see a larger version).
feedexplorer running on WinXP:

sshot-small-winxp.png

And under Mac OS X (thanks to Erik for the image):

sshot-small-macosx.png

A bit of background
As I noted a few days ago, Dave had released Share Your OPML. After that he released an SDK to allow others to tap into the data and provide new applications. I had an idea last week for an app that would use the data, but was too busy to do it. Finally, on Saturday morning I decided to let off some steam by coding something else, and this application seemed like a good idea: it had a simple goal, and I could do it quickly. In the end it took me about three hours to write the app, and a couple of hours more to finish the docs and the install pages. :)

So what's the idea?
The idea is that you can peruse subscription lists in two ways, one by looking at them starting from the feed and being able to see who subscribes to it, and two by looking at people that have shared their sub lists and see what they subscribe to individually. As you look through the list you can choose feeds you find interesting and add them to your own subscription list, which you can then save into an OPML file in your local hard drive to import into your news aggregator.

Now, there are other ways of getting subscription lists, but what I found interesting about this dataset is that it tells you who's reading what, which maybe leads you to find feeds (that you'd otherwise not look at) simply because someone you know is reading them--sort of an implicit recommendation system. If your own feed is listed, you can find out some of the people who are subscribing to you.

And what's feedexplorer have to do with clevercactus pro?
Well, eventually functionality like this would be added to cc pro. I think that it would make it easier for people to subscribe to feeds within the app, find out what's going on in the blogsphere, etc. feedexplorer, however, stands on its own as a simple, free utility that is generic and not tied to any particular product.

So that's it! And, as usual, comments welcome (if I have to close the comment section of this entry due to spam, you can always send me an email).

Categories: soft.dev
Posted by diego on January 19, 2004 at 3:34 PM

postel's law is for implementors, not designers

Another discussion that recently flared up (again) is regarding the applicability of constraints within specifications, more specifically (heh) of constraints that should or should not be placed in the Atom API. The first I heard about this was through this post on Mark's weblog, where among other things he says:

Another entire class of unhelpful suggestions that seems to pop up on a regular basis is unproductive mandates about how producers can produce Atom feeds, or how clients can consume them. Things like “let’s mandate that feeds can’t use CDATA blocks” (runs contrary to the XML specification), or “let’s mandate that feeds can’t contain processing instructions” (technically possible, but to what purpose?), or “let’s mandate that clients can only consume feeds with conforming XML parsers”.

This last one is interesting, in that it tries to wish away Postel’s Law (originally stated in RFC 793 as “be conservative in what you do, be liberal in what you accept from others”). Various people have tried to mandate this principle out of existence, some going so far as to claim that Postel’s Law should not apply to XML, because (apparently) the three letters “X”, “M”, and “L” are a magical combination that signal a glorious revolution that somehow overturns the fundamental principles of interoperability.

There are no exceptions to Postel’s Law. Anyone who tries to tell you differently is probably a client-side developer who wants the entire world to change so that their life might be 0.00001% easier. The world doesn’t work that way.

Mark then goes on to describe the ability of his ultra-liberal feed parser to handle different types of RSS, RDF and Atom. (Note: I do agree with Mark that CDATA statements should be permitted, as per the XML spec). In fact I do agree with Mark's statement, but I don't agree with the context in which he applies it.

Today, Dave points to a message on the Atom-syntax mailing list where Bob Wyman gives his view on the barriers created by the "ultra-liberal" approach to specifications, using HTML as an example.

I italicized the word "specifications" because I think there's a disconnect in the discussion here, and the context in which Postel's Law is being applied is at the center of it.

As I understand it, Mark is saying that writing down constraints in the Atom spec (or any other for that matter) is something to be avoided when possible, because people will do whatever they want anyway, and it's not a big deal (and he gives his parser as an example). But whether his parser or any other can deal with anything you throw at it is beside the point I think, or rather it proves that Postel's law is properly applied to implementation, but it doesn't prove that it applies to design.

Mark quotes the original expression of Postel's Law in RFC 793, but his quote is incomplete. Here is the full quote:

2.10. Robustness Principle

TCP implementations will follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.

(my emphasis). The comment in the RFC clearly states that implementations will be flexible, not the spec itself. I agree with Mark's statement: there are no exceptions to Postel's law. But I disagree in how he applies it, because it doesn't affect design, but rather implementation.

Getting a little bit into the semantic of things, I think it's interesting to note that placing a comment like that on the RFC is actually defining accepted practice (dealing with reality rather than the abstractions of the spec) and so it is a constraint (a constraint that requests you accept anything, rather than reject it, is nevertheless a constraint). So the fact that this "Robustness principle" is within that particular RFC as an example shows that placing constraints is a good idea.

Implementations can and often do differ from specs, unintentionally (ie., because of a bug) or otherwise. But the less constraints there are in a spec, the easier it is to get away with extensions that kill interoperability. So I don't think it's bad to say what's "within spec" and what is not within spec. Saying flat-out that "constraints are bad" is not a good idea IMO.

One example of a reasonable constraint that I think would be useful for Atom would be to say that if an entry's content is not text or HTML/XHTML (e.g., it's a Word document, something that as far as I can see could be done on an Atom feed according to the current spec) then the feed must provide the equivalent text in plain text or HTML. Sure, it might happen that someone starts serving word documents, but they'd be clearly disregaring the spec, and so taking a big chance. Maybe they can pull it off. Just as Netscape introduced new tags that they liked when they had 80 or 90% market share. But when that happened, no one had any doubts that using that tag was "non-standard". And that's a plus I think.

So, my opinion in a nutshell: constraints are good. The more things can be defined with the agreement of those involved, the better, since once something is "out in the wild" accepted practices emerge and the ability to place new constraints (e.g., to fix problems) becomes more limited, as we all know.

What I would say, then, is: Postel's law has no exceptions, but it applies to implementation, not design.

Categories: soft.dev
Posted by diego on January 11, 2004 at 2:23 PM

from components to modules

Right now I'm refactoring/rebuilding the user interface of a new release coming out soon (oh right... Note to self: talk about that) and I'm facing the fight against "sticky" APIs. Or, in more technical terms, their coupling.

Ideally, a certain component set that is self-contained (say, and HTML component) will be isolated from other components at the same level. This makes it both simpler, easier to maintain and, contrary to what one might think, often faster. While I was at Drexel, at the Software Engineering Research Group, I did work on source code analysis, studying things like automatic clustering (paper) of software systems, that is, creating software that was able to infer the modules present on a source code base using API cross-references as a basis. Since then I've always been aware (more than I was before that, that is) of the subtle pull created by API references.

The holy grail in this sense is, for me, to create applications that are built of fully interchangeable pieces, that connect dynamically at runtime, thus avoiding compile-time dependencies. In theory, we have many ways of achieving this decoupling between components or component sets; in practice there are some barriers that make it hard to get it right the first time. Or the second. Or...

First, the most common ways of achieving component decoupling are:

  1. Through data: usually this means a configuration file, but it could be a database or whatever else is editable post-compilation. This is one of the reasons why XML is so important, btw.
  2. Through dynamic binding: that is, references "by name" of classes or methods. This is useful mostly with OO languages, as you'll generally end up dynamically allocating a superclass and then using an interface (or superclass) to access the underlying object without losing generality (and thus without increasing coupling).

Achieving decoupling in non-UI components is not too difficult (the data model has to flexible enough though, see below). But UIs are almost by definition something that pulls together all the components of a program so they can be used or managed. The UI references (almost) everything else by necessity, directly or indirectly, and visual components affect each other (say, a list on the left that changes what you see on the right).

In my experience, MVC is an absolute necessity to achieve at least a minimal level of decoupling. Going further is possible by using a combination of data (ie., config files) to connect dynamically loaded visual components removes the coupling created at the UI level, but that is difficult to achieve, because it complicates the initial development process (with dynamically loaded components bugs become more difficult to track, the build process is more complex, etc.) and development tools in general deal with code-units (e.g., classes, or source files) rather than with modules. They go from fine-grained view of a system (say, a class or even a method) to a project, with little in between. We are left with separating files in directories to make a project manageable, which is kind of crazy when you think how far we've come in other areas, particularly in recent years.

The process then becomes iterative, one of achieving higher degrees of decoupling on each release. One thing I've found: that the underlying data model of the application has to be flexible enough, be completely isolated (as a module) and relatively abstract, not just to evolve itself but also to allow the developer to change everything that's "on top" of it and improve the structure of the application without affecting users, etc.

Yes, this is relatively "common knowledge", but I'm a bit frustrated at the moment because I know how things "should be" structured in the code I'm working on but I also know that time is limited, so I make some improvements and move on, leaving the rest for the next release.

Final thought: Until major development tools fully incorporate the concept of modules into their operation (and I mean going beyond the lame use of, for example, things like Java packages in today's Java tools), until they treat a piece of user interface as more than a source file (so far, all of the UI designers I've seen maintain a pretty strict correspondence between a UI design "form" and a single file/class/whatever that references everything else), it will be difficult to get things right on the first try.

Categories: soft.dev
Posted by diego on January 11, 2004 at 10:17 AM

weird referer

Recently (but I just noticed it today) I started getting HTTP referers that are a variation of the following: "XXXX:+++++++++++++++++++++++" (the number of plus signs varies). A google search with appropriate terms quickly turned up discussions like this one that suggest that the referer is someone using an anonymizer or internet security product of some kind. Without that information it smells like an attempt at an exploit of some kind... but of what kind (and if so, I've never heard of it)?

Anyone knows about this? Has anyone else seen it? I'm curious. :)

Categories: soft.dev
Posted by diego on January 7, 2004 at 12:59 AM

spambots get smarter

Since today started as a "spam kind of day"...

Something I noticed over the last few weeks is that I've started to receive spam that is way more targeted than before. In what sense?

Well, let's say this: I'm getting spam that not only knows my full name, but also my address. Okay, not my current address, but I've already gotten spam that explictly mentions both my New York address (from 5+ years ago) and my SF Bay address (from 2+ years ago). This is bad, not only they know my email address, but they also know where I live(d)! Yes, we know that with time and money you can get a lot of information on anyone, but this has to be done automatically and massively, or otherwise it wouldn't be a practical option for spammers.

Clearly, one way this could happen is if someone (say, buy.com) has been selling their customer information. Since I usually take care of buying online only when my privacy is more or less protected, this is unlikely, though certainly possible.

There's a more likely way in which this connection was made: Google.

Google not only knows the web, it also knows other information... like phone numbers (at least in the US). Jon mentioned this some time ago.

A spambot to get "connected information" would work like this. Say you write an automated script to go through phone numbers on Google. Then the script takes the address data and the person's name, and then googles the person's name. It takes the first few results (or maybe only the first one) and scans the resulting pages to match an email's name to the person's name. Sure, this won't be 100% correct, but spammers don't care about that. And Google's reach makes it reasonable to think that you'd have a reasonably high hit rate. You could even write a program that uses the GoogleAPI for it.

Sure, we could say, as Scott McNealy does, that "you're privacy is gone, get over it". Even if you agree with that statement (and I don't, at least I want to resist it!), this is nevertheless disturbing. And the question that follows is: does Google have any responsibility for this? They'd probably say that they're providing a service by integrating yellow pages information, which would be true.

I'm not picking on Google, rather Google is the example here because of its reach and pervasiveness, but I'm sure that similar things can be done with other search engines and if not it won't be long before you can. Can we fix this at all? If so, how?

Since this is the tip of the iceberg, my main thought at the moment is that I'm a character from Lost in Space and all I hear is "Danger Will Robinson! Danger!".

Categories: personal, soft.dev, technology
Posted by diego on January 6, 2004 at 5:10 PM

swt and swing, cont'd.

Yesterday Russ was ranting (his term :)) on how Sun was botching it by not getting behind SWT, because SWT is, in his view, better than Swing. I have written about both a few times before, more recently in this short review of my initial impressions of developing with SWT, and earlier here, here and here among others. Specifically on what Russ is saying I had a couple of things to add. One is that, although I'm obviously partial on this :), I think that clevercactus shows that Swing interfaces need not feel out of place, or be slow, or whatever. And I think it looks better than LimeWire too :). IDEA is also a fine example IMO. However, it's true that all of that is subjective and that for hard-core Windows users there are small differences. For power users in particular the differences might indeed be difficult to accept. The situation is much better in other platforms though.

That aside, there is the other matter that Russ mentions, that of Sun not joining the Eclipse consortium. The main reason given for this is that, for all its platform appeal Eclipse is still, at heart, an IDE toolkit. If you doubt that's true, spend some time perusing the Eclipse APIs, and you'll notice how many times you have to use components from within the IDE package rather than "platform" packages (e.g., "org.eclipse.swt"). Restated, what I mean is that the boundaries between platform and IDE APIs are not clear at all, and I guess that some people would say that's precisely the point, Eclipse is both an IDE and a platform, and that's fine. Fine indeed, but what does that matter? Well, keep in mind that Sun has NetBeans to take care of. With its own community, and plugins, and additional tools, and so on. Were Sun to ditch NetBeans in favor of Eclipse as a platform, they would have to a) port all sorts of plugins and code to the new platform, not to mention "convert" their community, both of open source developers and third party developers, to Eclipse. This is by no means impossible, but it's not easy either.

Then there is the small matter of SWT. If Sun joined Eclipse, SWT would have to be included in the JDK would it not? Sun would have to maintain and release simultaneously three different windowing toolkits for each release: AWT, Swing, and SWT. That doesn't sound good either. And while I like some things of SWT, ditching Swing completely is to me not an option.

Why?

First, Swing does run on every single platform that the full JDK runs on. For example, some users today are running clevercactus on OS/2. That would be impossible if cc was written in SWT.

Second, Swing is, for all its complexity (or perhaps because of it) and incredibly rich and flexible toolkit. Much more so than SWT. Surely this will change as SWT evolves, but that's the reality at the moment. With SWT you are forced to write custom components more often than with Swing, as I discovered when I worked for about a week replicating the clevercactus UI using SWT.

And, finally (although this is a small matter compared to the two above), SWT still requires release of resources "by hand". I find this a horrible step back. Moreover, debugging becomes more difficult. Something might fail not just on your java code, not just on the SWT-to-Native code (say, if you're running it on Windows), but something might also fail at the Native component level. Suddenly bugs have to be tracked on three levels. SWT will be buggy for a while, particularly on non-Win32 platforms (Win32 support is pretty good). And Native errors are very difficult to pin down.

Please note, these are not reasons why "Swing is better than SWT" but reasons why I think Swing can't be discarded at the moment and for some time to come. And that puts Sun in a difficult position.

Ideally, yes, Sun would join Eclipse, ditch AWT in favor of SWT keeping the latter as an alternative to Swing, plus using something like the SWTSwing project to bridge between both worlds. But for the moment, staying out of Eclipse might have been a good choice by Sun to avoid creating even more confusion.

Update: more news today on Sun's efforts regarding standarization on the non-Eclipse side of development tools.

Categories: soft.dev
Posted by diego on January 6, 2004 at 12:40 PM

quick mobi & java links

Erik is not feeling well hope you get better soon Erik! I've been fighting a cold for the last week, but I've kept it at bay. Hopefully going uphill now. Anyway, here's a mini-list of quick links, Erik-style. Paltry compared to the deep-wide filtering Erik does. It's good I don't do this every day. :-)

Ok. That's it for now. Back to work. :)

Categories: soft.dev
Posted by diego on December 5, 2003 at 1:34 PM

the atom discussion heats up again

I'm listening to Sunday Bloody Sunday Live at Slane 2001, and there's Bono saying "Compromise: Another dirty word. Compromise."

Ok, enough with the hyperbole. Here it goes...

I've been pretty busy through the day (damn, actually I just looked at the time and I should say yesterday), but just now I check and another blog-firestorm is developing. And once again, the discussion seems to be close to turning into a pile of rubble.

Deja-vu.

I've already been seeing some things that were not apparent to me when the Atom process started back in late June. And I think that Don's idea is good: given the current situation, it would be preferable if Atom adopted RSS as its feed format.

I know that many have said that Atom sacrificed backward compatibility for the sake of more flexibility in the future, but looking at the current spec I can't see clearly where is this additional flexibility obtained. I'd like to see an example of a feature that can be done with Atom but not with RSS 2.0. This would go a long way to make me (and I'm sure, others) understand more clearly why we should revise our position.

True, it is highly unlikely that RSS embedded in Atom will happen---positions seem to be too entrenched for that. Blogger will probably release soon. MT is sure to follow. But Don is right, at least we can make our views known. Of course, I contributed to this in my own small way. What can I say: my position in July might have been reasonable, but that's no longer the case. Here's why.

First, the background.

Things flared up again yesterday, when Robert pointed out that Evan had posted a link to his Atom feed, and he said "(generated by Blogger)" and nothing more. This led Robert to ask why a new syndication format was necessary, and why Microsoft shouldn't just develop its own. (This last thing was half in jest, as I understand it). This in turn created a major discussion on his entry, with lots of different participants, but very few posts by the major stakeholders in Atom. Then Don posted some thoughts and Dave put forth his opinion. I think, as I posted in the comments, that the issue was not necessarily whether the format was going to be used by Blogger or not, but rather that Blogger was not giving a context for what was happening or explained clearly what the path was (more on that below), which led to speculation and some fiery responses.

When Atom began I was for it: as I had noted the API situation in blogland was not good, and Atom pointed to a solution. I still think that a unified API would be a step forward, and I am protocol-agnostic (XML-RPC, REST, SOAP--I might have my preference but mainly I just care that everyone agrees to support it). Then it became clear that Atom would also redefine the syndication format, and I said that shouldn't be a problem (see here and here). But then, over the next couple of months, things changed.

Changed how?

  • The first thing that changed is that I noted that some people had enough time to spend in the Wiki to "out-comment" anyone else. At the same time, the format was quickly evolving, in at least one instance changing completely from one day to the next. There was no clear process to how decisions were made, and voting on different things was repeatedly delayed and in some cases (such as naming) ignored and/or set aside. I put forth my opinion more than once (example, here and here, and others, like Russ, expressed similar ideas) that someone should take charge and responsibility for the ultimate decisions made: as it stood (and as it stands), the process is opaque, and the Wiki didn't (doesn't) help matters. A mailing list got started, and though I did not subscribe to it I kept updated by reading the archives. The truth is that I didn't subscribe because (right or wrong on my part) I felt things were happening without me being able to contribute anything of value. Sam once pointed out to me through email that the spec was influenced by "running code" more than by words, but even though I was one of the first people to add Atom support to an RSS reader (as well as adding it later for other things, like the Java Google-Feeds bridge), there was no effect from that either, even if I was engaged in the discussion at the time when I was working on the code. It didn't matter one bit. This is not a question of me not "getting my way", but it's a question of civility and of giving real answers to questions, of giving real world examples instead of going off in theoretical tangents, and of giving reasons instead of saying "your idea is ridiculous" and leaving it at that.
  • Microsoft and others (e.g. AOL) are now in the game. Had Atom converged on a spec within four weeks, we might be talking about something different today. Instead, it's nearly six months after it started and the spec is still at 0.3 (although the newest "full spec" I could find is 0.2) with no clear reason of why has 0.2 been declared 0.2, what were the reasons for choosing A over B (which was supposed to be one of the pluses of the Atom process) and such. Blogger is coming out with 0.3, according to Jason (He mentioned this in the comments to Robert's entry). Robert's in-jest "threat" of "why shouldn't Microsoft do its own format" is a very real concern that anyone should have. There is a comment here that says: "When people ask "why Atom", Atom's answer should be "because we can". Microsoft could rightly say exactly the same thing.
  • One big concern of some Atom backers had was that Dave had control over the RSS spec (this is still being mentioned today) Dave disputed this all along, but right now it's irrelevant: this claim should have changed (but it didn't) when Dave gave control of it to Berkman (on which I also commented here). In fact, it's been said over and over that it was "too little, too late". But Atom feeds (note: feeds, not the API) don't provide anything that cannot be done today with RSS 2 (ie., including namespaces). If the Atom feed format is still at 0.2 or 0.3, and even that has taken 5 months to define, is that not "too little, too late" as well?
  • The Atom camp started as a genial group of people wanting to improve things. But it has turned ugly. Some of its defenders at the moment are resorting to anonymous comments that say that a) RSS is dead and b) you're either on the Atom bandwagon or you will be left behind in your little poor RSS world. Regardless of the truth of those statements, I find it worrying that constructive criticism or a clear-minded defense of a belief in a certain direction has given way to (anonymous) aggression. Instead of supporting inclusion for people like me, anyone who doesn't agree is attacked. But "evangelizing" is important as well as useful, communicating what's being done, etc, helps developers churn out better code and create better conditions for users, and it's difficult for this to happen if developers are attacked when they ask questions. This is not a recipe for friendliness from developers and users alike. Which brings me to my final point.
  • One might always find people who resort to aggression, and obviously anyone can lose it once in a while. But I think that if Blogger and MT where to clearly spell out their position, saying why they want another format, why (if) they'd like to replace RSS, and, perhaps more importantly, what is the evolution path that they have planned, then it would be easier to get above the noise. At a minimum it would be easier to, based on that position, take it or leave it. I tend to think that it's completely within a company's or an individual's right to scrap something and start again. I might not like it, but they should be able to do it and let the marketplace have a go at it. Maybe it works! But when there's an installed user base, and developers that have a stake on things, not giving clear information or plans makes, at least me, wary. By not saying anything, by not participating in the discussion a lot more than they do today, the main Atom stakeholders are allowing others to define what they mean, others that might or might not reflect their true intent. No one can speak for Blogger or MT, they have to do it for themselves. I pointed this out in the comments to Robert's entry, when Mark was giving some reasons that were fine in themselves (that is, you might not agree, but that's not the point) but they were Mark's reasons, not Blogger's. As it is, one side is asking questions but no one is replying on the other end.


I simply don't understand why, if we are building communication tools, it appears that there's a lot of talk back and forth, but no real communication is happening.

Things can get better. The question is, Will we try? Given how things are, I think that just having a reasonable conversation would be a big step forward.

Categories: soft.dev, technology
Posted by diego on December 4, 2003 at 1:28 AM

XAML and... Swing

Let's see. There's this new language+API. It is, in theory platform independent. It's pretty high level. Below the high-level description, it runs on top of a virtual machine. It's verbose. Some people say it will never work.

Gotta be Swing, right?

How about XAML?

On Saturday Sam commented on a XAML example. He makes a number of good points. Which jump-started earlier XAML-related musings.

XAML will be Windows-only, so in that sense the comparison is stretched. But this is a matter of practice, in theory an XML-based language could be made portable (when there's a will there's a way). XAML was compared a lot to Mozilla's XUL, and rightly so, but I think there are some parallels between it and Swing as well.

One big difference that XAML will have, for sure, is that it will have a nice UI designer, something that Swing still lacks. On the other hand, I think that whatever code an automated designer generates will be horribly bloated. And who will be able to write XAML by hand? And: the problem of "bytecode protection" in Java comes back with XAML, but with a vengeance. How will the code be protected? Obfuscation of XML code? Really? How would it be validated then? And why hasn't anyone talked about this.

And another thing: Sun has shown in the past few years that they've taken a liking to countering Microsoft announcements with some of their own. ie., MS comes out with Web services, they come out with web services. MS does X, Sun does it too, but in Java. One wish: that Sun would ignore XAML and just continue improving Swing, and create a simple, good UI designer for Swing. Supposedly Project Rave will do this... but here's hoping there won't be any course corrections simply to show up Microsoft. Please, pretty please, Sun.

On a related note, Robert says this regarding XAML:

[...] you will see some business build two sites: one in HTML and one in XAML. Why? Because they'll be able to offer their customers experiences that are impossible to deliver in HTML.
Come on, Robert, these days, when everyone's resources are stretched to the limit, when CIOs want to squeeze every possible drop of code from their people, when everyone works 60-hour weeks as a matter of common practice, are you seriously saying that companies will have two teams to develop a single website? Is this Microsoft's selling point? "Here, just retrain all of your people, and double the size and expense of your development team, and you'll be fine."

Of course not. Most companies will have one team, not two. Hence, logically, either people will use it or won't, without a lot of middle ground in between. That leaves two possibilities: 1) XAML will be niche and never really used a lot (think ActiveX, or, hey, even Java Applets!) or 2) XAML will kill HTML.

Which one do you think Microsoft is betting on?

Categories: soft.dev
Posted by diego on December 1, 2003 at 12:09 PM

lopica

Looking for Java Web Start-related things I found Lopica. Tons of information on JWS and related things. Very, very useful.

Categories: soft.dev
Posted by diego on November 28, 2003 at 11:32 PM

blogstax

Don Park has released BLogStAX, an StAX-based RSS 2.0 parser. Very cool.

Categories: soft.dev
Posted by diego on November 28, 2003 at 2:24 AM

on Microsoft: a walk down memory lane (aka wading through the Byte archives)

"But it may be that although the senses sometimes deceive us concerning things which are hardly perceptible, or very far away, there are yet many others to be met with as to which we cannot reasonably have any doubt, although we recognise them by their means. For example, there is the fact that I am here, seated by the fire, attired in a dressing gown, having this paper in my hands and other similar matters."

Rene Descartes, Meditations On First Philosophy, 1641.

Wow! Quoting Descartes! This must be good!

Actually, that was probably the high point of this post. :) But it does tie in with what I wanted to talk about, it's not that I engage just for kicks in quoting philosophers with whom I don't agree at all.

Besides, I am not sitting by the fire (the warm glow of the LCD doesn't count, I'm sure), and I am not attired in a dressing gown (now that's a thought! Who needs WiFi or distributed object systems? Dressing gowns for all! Forget high tech!).

Boy, am I a riot tonight.

What was the point again? Oh, right. Longhorn, Microsoft, and that other magic word that for tech people, for a short period of time, became more than the name of the Capital of Egypt and started to embody The Future (in Technicolor). Those were the days, when Microsoft OSes were named after cities--remember Chicago and Daytona? (Win95 and WinNT 3.5 respectively.) How about Memphis? (Win98). And by the way, since Cairo ended up being NT 4, Memphis clearly did not point the State of Tennessee, to the city of the Blues and Civil Rights struggles in the 60's, but rather, it was a reference to the ancient Capital of Egypt. Memphis and Cairo, the old and the new. City names were cool.

Certainly better than Longhorn. I mean, come on!.

(Yes, I anticipate running for cover when people start explaining what's in reference to--I'm sure there's a reason).

Anyway, it's not about the names, although that's kind of interesting. This is about the technology (or I think it is), as I'll try to explain below.

Through the afternoon I kept thinking about the long-winded history of all the "innovation" that will be showered on us in a couple of years when Longhorn is released, particularly all the brouhaha surrounding its object filesystem and such.

Descartes came to mind at first because I remembered his leisurely attitude when writing his Meditations. Through the text, Descartes repeatedly goes back to the whole sitting-by-the-fire thing to use as examples and so on, impressing us with his (flawed to me) logic, but he also tends to create the unwelcome impression that he's just a basically a well-off guy (consider when he wrote it...and the conditions in general back then) with too much time on his hands.

Both that situation and the quote have some parallels to what Microsoft is doing methinks.

The Cairo-Longhorn connection has been raised before (I remember seeing it mentioned in at least one weblog recently). This is not new. But the similarities are just so startling that it's interesting to take a closer look.

Part of what I thought about were those excellent articles in Byte through which I gathered a lot of useful information (PC Magazine was always crap as far as I could see, except for their lab tests). I started wading through the Byte print archives, looking for some of those articles.

Let's begin with this one (which, as I remember, wasn't an article but a box in a bigger section on OO technology) entitled Signs to Cairo. Choice quote:

Now peek into the future. The top level will no longer be a separate application such as PowerPoint, but the Cairo desktop itself. The streams comprising the compound document will no longer be inside a DOS file allocation table (FAT) file system. Cairo's Object File System (OFS) makes the whole hard disk a single huge docfile that exposes its internal objects to the user.

That was in November, 1995. Eight years ago.

And another one, from Jon Udell (now at InfoWorld): Exploring Chicago and Daytona:

In Daytona's successor, Cairo, OLE structured storage will be able to attach to, and extend, the file system. As the Explorer navigates from a file store into an object store, control will be transferred from Explorer's viewer to an object-supplied viewer. Object internals won't be stored in user-visible directory structures, so users won't trip over them.

And more, from Inside the Mind of Microsoft:

OLE DB, the newest member of the OLE family, interfaces OLE to multiple databases. Among them is Microsoft's future object-oriented file system for Windows NT (see the sidebar "A Peek at OFS"). Ultimately, we could be looking at a distributed file system based on this technology.

Almost all this technology is expected to converge in Cairo. By then, 16- or 24-MB systems will be the baseline, so hardware shouldn't be a limitation. Cairo will inherit desirable features from Windows 95 and Memphis, until finally the day arrives when Microsoft can offer a single OS to all desktop users.

Even more interesting is Cairo Inside, an article from 1996 (with the tagline "An object-oriented, next-generation operating system called Cairo may never ship. However, future versions of Windows NT will enjoy the fruits of the Cairo development effort."). Most interesting of all in this page is the following description of Cairo's OFS:

Lets you create a pseudodirectory that unifies local, network, and Internet files.
Internet files. Interesting no?

Even when it was clear that Cairo would never be "Object Oriented" at all, it was still commonly described as "Microsoft's Next Generation Object Oriented OS". This is almost certainly due to the fact that NextStep was seen as the coolest thing around and it was, well, yes, truly Object Oriented.

Now, this is another Byte column from Jon (hosted at his site), from 1999, when NT4 had been released for some time, and the promises of Cairo's OO attitude were just a memory: From Virtual Memory to Object Storage. Quote:

MS Cairo was headed in this direction, and I saw early demos of some of these ideas back in 1993.
By the way, there's something about those articles that makes them fascinating to read, even now that their vaporware roots have been exposed in all their guts and glory.

Now, these people were not hallucinating, even though the breathable air at COMDEX was probably dwindling by then. This is what they were told by Microsoft. This was the promise.

I think about what Dave said in the comments to my post about how the browser is not the web: "it's Microsoft's dream to turn the clock back to 1993, the end of their brief period of total and utter domination of the computing world," and the history shows that the parallels are quite striking.

Specifically regarding that post, Robert replied by saying that there was an RSS aggregator in Windows among other things, a show of support for the web/openness in part, but even though that's cool, I don't see how it changes the potential ambitions that MS might have. (By the way, on that particular point of whether the browser is the web or not, for more viewpoints check out Karlin, who agreed (if briefly :)), Dare, who agreed only partially, and Christian, writing in Lockergnome, who did not agree with me).

Now, I was basically saying that Robert's argument that RSS was taking him away from the web was flawed. Dare's answer in particular was trying to nail down a pretty exact definition, but I don't think that's what's at stake here (even though it's of course useful and important).

Comparing Microsoft's proclamations 10 years ago from those today makes the little hairs on the back of my neck stand up. Notice how everything seemed going along this exact same path that MS is on today (the parallels don't stop with OFS) until the Internet happened. Then MS had to divert itself for a few years to crush Netscape and so on, and now it's back to the old game.

But what is the old game really? "Providing a better user experience" Microsoft will say. "Taking over the world" will say others. I personally think both are mixed :).

And, set the arguments aside for a moment: what happened to all of this technology?. Why didn't OFS ship, if they had demos going back to 1993? It would be interesting to know, just for historical reasons; it's weird that all of this just vanished. Maybe portions of this eventually made it to the product, but certainly not the whole enchilada.

As far as the reason for not seeing it through, my theory is that (as I said) what happened was the Internet. Suddenly OO wasn't all that hot anymore, and why deliver technology that doesn't let you create cool brochures?

Okay, I'm being a bit flip here. Seriously now: To anyone that might say that the technology could not be built... please. Microsoft is one of the top engineering organizations in the world. NeXT could do it. Why not Microsoft? The only reasonable explanation, as far as I can see, would be a realignment of priorities and the consequent starving of resources that go along with it (which is what killed both OpenDoc and Taligent, for example). Which is all well and good.

But then the question is: could it happen again?

Probably not--then again, never say never.

Just to close with some constructive criticism, since Robert's spirited defense (though a bit flawed in my opinion) deserves it.

We've heard a lot about how Microsoft intends to push these new technologies. Ok. But I'd also like to hear what, exactly, they will do to strengthen the Web and its foundations. I think that right now there's a lot of uncertainty because all of these new technologies seem to imply that Microsoft is back to its old tricks after the brush with the DOJ and the European Commission (something that's still not over yet). But I think that a lot of people would give Microsoft a chance if they announced, publicly and clearly, that they will commit to respecting web standards and support them. Examples: That Microsoft Word will stop generating HTML files that look terrible on browsers whose only problem is that they're not IE. That a future Microsoft blogging tool, if any, will not start embedding MS Office documents and such in the middle of RSS files by default (users embedding it at users' whim is another matter). And so on.

Put another way (Hopelessly idealistic as all of this may sound): I think that a lot of people would give Microsoft a chance if they made it clear that they will do a good job of supporting web standards for both for reading and writing, that most people would give Microsoft a chance if they came up with all the innovation they liked, but didn't force it on anyone, and just played fair on web standards, and, that most people would accept the challenge if Microsoft, for once, really stood up to competition based on product quality rather than on leveraging their market-share.

I know I would.

Categories: soft.dev
Posted by diego on November 20, 2003 at 12:40 AM

back to windows (for now) part 3

Okay, things are more or less back to normal. Installed most -not all- the software I need, definitely everything I need to run the tests (now completed as well). Two things. One, once I finished I got this nagging feeling in the back of my head, a little voice telling me that, if I could go through a Windows install, or any install, in "automatic" mode, trying things until they worked, more or less with all the answers... then something was wrong. I should be using braincells for more useful stuff than this. Oh well.

The second thing was that as soon as I was finished I experienced a moment of total confusion, as in "Now, what the hell was I doing?". The install became an end in itself. Not that it can be avoided when you have to be doing so many things just to get stuff to work normally.

A bunch of comments on the other two posts, will get to them in a moment. For now, I can do what I need and that's enough. I wish Java support on Red Hat et al would be exactly on par with Windows and Mac (there are other small annoyances like the microphone not working under Java on my machine). Hopefully soon I'll be able to get back to Linux! I already miss playing with all the fancy X stuff. Windows Powertoys are a sorry excuse for tweaking the system. :)

Categories: soft.dev
Posted by diego on November 19, 2003 at 12:46 AM

back to windows (for now) part deux

It's now about 6 hours or so since I began the reinstall. Seeing the install/update/patch process all at once is quite an experience. I've spent now close to two hours downloading updates and patches (at 50 KBytes/sec!). First, there was a batch of about five "critical" updates (10 MB). Warning! Your PC may do bad things if you don't install it! and so on. Then Windows Update suggested Service Pack 4. 50 MB. Right after SP4 installed, another check (this time thinking that was it), and now there were twenty (TWENTY!) Critical-install-this-right-now-or-it's-the-end-of-the-world-as-you-know-it patches. Another 50 MB. Plus, I'm not even done with the "recommended" patches (rather than those that are "critical"), which also fix problems for various calamities that might visit you or your loved ones if you don't apply them.

Can anyone in their right mind think that this is normal? We have gotten used to this whole patching idea, but it's ludicruous. By now, every security warning, every patch, elicits a "oh, another one of those...". Mind you, lots of those patches are not just security problems, many are bugfixes that apparently have various disastrous consequences under different circumstances.

Windows is not going away. Would it be much to ask of Microsoft that instead of drooling all over XAML or whatever new thing they are planning to conquer the world with, they would put their considerable resources and smarts to find a solution? You know, I think that Longhorn would be fantastic if instead of all the thingamagic promiseware that it will supposedly have, it was simply Windows XP (or even 2000) and it just worked. Who cares about 3D icons if I'll probably need to find a new "3D Icon critical patch" every fifteen seconds?

Sorry, I know that this has been discussed to death, everyone knows this, Microsoft knows this... but the experience of seeing this whole process in the space of a couple of hours has activated my gripe-cells. We now return to our original programming.

Categories: soft.dev
Posted by diego on November 18, 2003 at 4:13 PM

back to windows (for now)

I've been a happy camper since I switched back to Linux (Red Hat 9) on my Thinkpad T21 laptop about three months ago. Everything worked fine. And aside from some annoyances, such as the tendency of Gnome to crash a few times a day, it was great.

But yesterday I needed to test some of the sound features in clevercactus and Linux bailed. For some time I thought this was a Linux problem, after all, the Gnome sound recorder crashed when I recorded more than once and didn't record anything at all. Then (this morning) I realized that the problem was in the internal microphone (not supported) and using an external mike worked ok. The sound recorder still crashed, but at least it worked. Once.

Problem is, I need Java to work with it, not just a native Linux app. And Java sound support has been spotty outside of Windows and the Mac (Sun is devoting basically no resources to it). Even though output worked ok, microphone input did not. LineUnavailableException.

At the moment I really don't have time to spend two days fixing whatever the problem is. I think that with enough tweaking it should end up working (that's the Linux way after all) but that's not an option right now.

Back to Windows it is, at least for the moment. I dusted off the original Windows 2000 Pro installation disk that came with the notebook (after I found it :)) and I am now in the middle of the install. Disgusting experience. FDISK. FORMAT. Abort, Retry, Fail? messages. I'm now doing the recovery of the install (the IBM recovery disk creates its own partition setup, one more reason to wince) and the file copy is in progress.

*Sigh*. I hope to be able to go back to Linux soon in the machine. Running cygwin is a poor excuse for it.

Update: Linux doesn't want to let go. After reformatting, Fdisk, and such, the Linux boot loader is still there, except that now it just hangs. Damn. Trying again...

Update 2: running "fdisk /mbr" took care of the problem.

Categories: soft.dev
Posted by diego on November 18, 2003 at 12:29 PM

wanted: a breakthrough for testing distributed systems

The last few days I've been heads-down working most of the time on a distributed system that involves going deeply into the arcana of TCP/IP. It's an eye-opening experience. First, because Java proves once more to be a rock, obtaining pretty high data transfer rates (currently 37 KB/sec on a 50 KB/sec connection) even with all the twists and turns the code is taking. Second, but most important, it has reminded me of how little we know of distributed systems, how to build them properly, and how to test them.

By distributed systems, I mean truly distributed. I mean that you can't count on a server to be happily taking over stuff for you with a bunch of TCP ports open and a four-way processor core ready to handle incoming tasks. We might be tempted to call this peer-to-peer (as opposed to client-server) but not really, since I could easily see this being used on a "traditional" client-server server environment. The difference is subtle, in terms of what you assume on the server side, and how you get around the constraints imposed by today's Internet.

That aside, being a test-first-code-later kind of person, I tend to put the burden on testing, or the testing framework rather. So I thought I'd write down my wish-list for a distributed testing framework (as food for thought more than anything else). This framework would work as follows: you'd have a "test listener" that can run on any machine, and a "test controller" app that can run on your desktop. Once the listener is running on the other machines (and maybe even on your desktop too) you can easily choose a JAR to deploy to all the target machines, then run it. The system automatically routes the output (System.err and System.out) to your "test controller" in multiple windows. You can control any of the clients through simple play/pause/stop/restart buttons. Clear the consoles, etc. You would be able to script it, so that this whole process can be run in loops, or automatically every day or every week, or whatever. You would be able to define output values to check for that can alert you of results that don't match expectations.

Looking around, I found the DTF at SourceForge, but it seems to be dead (no binaries, and no updates since February this year). I found papers (if you look hard enough, you can find papers on every conceivable topic I guess, so this doesn't mean much), like this one. But not much, really. Or is there some vast download area somewhere that I'm overlooking?

In any case, I know for a fact that CS curricula still don't pay enough attention to testing, much less to distributed testing. For one, distributed testing is difficult to generalize. But there should be more in this area happening, shouldn't it? Or does anyone doubt that half the future lies with large scale distributed applications? (The other half is web services :-)).

Categories: soft.dev
Posted by diego on November 13, 2003 at 12:58 PM

linux confusion

A bunch of Linux-related news has hit the ... err... "newsstands" recently. Novell announced they would acquire SUSE for 210 million. Predictably, this generated a lots of comments, including this great (if brief) analysis from Charles Cooper over at News.com. A less-noticed element of the Novell announcement was that IBM is buying into Novell as well, getting about 5% of the stock. This sounded strange to me at first, but IBM is clearly covering their bases. Now they have "deep" alliances with both Red Hat and Novell, which are by now the two most prominent Linux "vendors". Since IBM is so intertwined with the Linux thing, this makes some sense. But I still wonder exactly what it means. After all, you don't sell 5% of the company just for money if you have revenue streams, etc (or do you?).

Red Hat, on the other hand, has made some strategic changes to their product line that are still confusing to me. First, they are discontinuing the Red Hat professional line (support of any kind for RH9 ends in April next year). Focus is now profitability, which they go after with their Enterprise Edition. But does this mean they killed the "workstation"-type product? Apparently not. There's a new RH "for hobbyists", down the pipeline. I don't get it. Why announce that they would kill RH9 and then say they'd release another one? What exactly is the difference between the current "workstation" version and the new one that's coming up? Faster changes? Fees? Maybe I'm missing something, or maybe this will become clear in the next few months. In the meantime, I'm already wondering if I will have to go back to the days of chasing around the net for updates and patches to the version of Red Hat 9 I already have installed.

Categories: soft.dev
Posted by diego on November 6, 2003 at 11:19 AM

on longhorn

Ole has posted a comprehensive comment on Microsoft's Longhorn. Great analysis, I agree with basically everything he says. The uptake?

Longhorn will certainly hurt speed. Whether it helps robustness remains to be seen; we can only hope it will, given that this is a big problem for Windows currently.
Microsoft always seems to design OSes for the next generation of technology (for whatever reason). I remember how impressed I was the first time I installed Linux on a 386. Even running X Windows worked well. I think this is something that shouldn't be underestimated. In any case, Longhorn will take a long, long time to have any impact. Most big MS customers are well-known for waiting until the first Service Pack to change to the new technology. If they release in 2006 (as they say) then it won't be until 2007 until it is reasonably deployed.

Four years. A lot can happen in that time.

Update: Scoble responds to Ole's piece. Interesting read.

Categories: soft.dev, technology
Posted by diego on November 3, 2003 at 4:52 PM

an introduction to weblogs, part two: syndication

The first part of this introductory guide was basically about publishing, but there is a second component to weblogs, perhaps as important, to cover, that of reading weblogs.

Note: For those that already know about weblogs, syndication, etc., I will greatly appreciate any feedback on this piece. This is a bit more technical that I would have liked, but there are some issues that, in my opinion, can't be ignored. If you have ideas on how this can be improved for end users (both technical and non-technical), pointers to other descriptions that they can go to and get a different take on this, please send them over. Thanks!

Moving on...

The need for syndication

Wait a minute (I hear you say) what do you mean reading? Don't we use the web browser for that? What would I need to know about reading weblogs?

Well, the answer is, technically, web browsing is just fine. But there is another component of weblog infrastructure that is quite important today, and that will probably become even more important as we have to deal with an ever-increasing number of information sources. This component is usually referred to as syndication. Syndication is also usually known as aggregation or news aggregation. The exact definition of the concept, or how and what it should be used for, is something that people could (and do) discuss ad infinitum, but without getting into specifics we can say that at least everyone agrees that certain things represent the idea of syndication or news aggregation quite well.

In the "hard copy" publishing world, syndication implies arrangements to republish something. Popular newspaper comic strips, for example, are usually syndicated, as are some news articles. While the meaning is similar in the web, it is primarily concerned with the technology, rather than with the contracts, or syndication agencies, etc.

Generally speaking, we could say that syndication is a process through which publishers make their content available in a form that software (as opposed to people) can read.

That is, if a site is supports syndication, and you are using appropriate software, you can subscribe to a certain site using that software. This allows updates on the site to be presented to you by the software, on your desktop (or web site that you use for that purpose) automatically.

This means that you can forget about checking certain websites for updates or news: the updates and news come to you.

Syndication is a dry, unassuming word for a powerful concept (as far as the web is concerned at least). It ties in together many ideas, and it is instrumental in sustaining the 'community' part of weblogs that I talked about in part one.

Why?

For an answer, let's go to an example. You have started your weblog, and you have been running it for a bit of time. You have found other weblogs that you enjoy reading, or that you find useful; you are also reading weblogs of friends and coworkers. Very quickly, you might be reading maybe ten or fifteen different weblogs. Additionally, you might also regularly check news sites, such as CNN or the New York Times. Suddenly, it's difficult to keep up. Bookmarks in the browser don't seem to help anymore, and you find yourself checking sites only every so often. Sometimes you miss a big piece of news that you'd liked to hear about sooner---or sometimes you find yourself wading through stuff simply because you haven't kept up. If you are a self-described 'news junkie' (as I am) you might already know about this problem, since keeping up with multiple news sources is also difficult. But with weblogs, the problem is greatly amplified: weblogs put the power of publishing on the hands of individuals, and as a result there are millions of weblogs. There are simply too many publishers. The problem of just 'keeping up' with what others are saying becomes unavoidable.

This is the problem that syndication solves. And the software that does the magic is usually called an aggregator.

Simply put, an aggregator is a piece of software designed to subscribe to sites through syndication, and automatically download updates. It does this regularly during the day, at intervals you can specify, or only when you are connected to the Internet. If the aggregator is running on your PC or other device, once you have the content you can read it in "offline" mode (unless the aggregator is web-based, which will require connectivity to the Internet at all times). For a more detailed take on what aggregators are, I recommend you read Dave's what is a news aggregator? piece. As usual in the weblog world, there is discussion about these definitions, see here, for example, for comment on Dave's piece.

A word of caution before we go on: for non-technical people, the issues surrounding syndication, aggregators and such can appear to get complicated if you start reading some of the links I provide here. There are acronyms and terms used here and there that you might see, such as "XML", "RSS", "RDF", "namespaces", and so on, that can be confusing. Let's skip that for the moment. I will (try) to go into them below (when necessary), in the section below, 'using your aggregator'.

Which aggregator?

Aggregators (or "news aggregators") come in different "shapes and sizes", and there are two main categories of aggregators on which everyone generally agrees on: 'webpage style' and 'email style' (also referred to as 'three-pane aggregators'). 'Webpage style' aggregators present new entries they have received as a webpage, in reverse chronological order (and so the end result looks very much like a weblog on the web does, but of pieces that are put together dynamically by the software). 'Email style' aggregators generally display new posts as messages (also in reverse chronological order) that you can click on and view on a separate area of the screen.

As in other cases, there are good arguments for preferring one over another, and in the end it comes down to personal choice. Reading different weblogs you might find people that are for one or for the other, and other people propose to do away with the whole thing and come up with something completely new. As with other things with weblogs: reading different opinions, and coming to your own conclusions is best. This is probably good in life in general :) but with
web-related things it becomes so easy to do that you generally end up doing it, sometimes without realizing  Look on search engines, other weblogs you like, leave comments, ask people you know, then try some of the software out. You'll find the one you prefer in no time (and, likely, as your usage changes and you have different needs, you might end up switching from one to another).

As it is the case with weblog software, all aggregators are invariably free to try, and many of them have to be purchased after a trial period (usually a month). Aggregators and weblog software are complementary, you could use both, but you could use one and not the other. It's quite possible that there are more people that use aggregators than people that have weblogs. (Certainly there are more people that read weblogs than people that write them).

If you have a weblog, chances are you also have a news aggregator already as well, because some weblog software includes news aggregation built in (as you'll see below). There are lots of news aggregators (and by lots I mean more than fifty, probably a lot more), and more on the way. Additionally, the underlying technology for syndication is simple enough that many software
developers implement and use their own aggregators.

This means that I can't possibly list all aggregators that exist here, and besides, there are other pages that do this already, such as this one, this one, this one, or this one. As it was the case with weblog directories, no listing of aggregator software is 100% complete (and probably can never be). However, I will mention a few aggregators that I know about and have tried myself, or have seen in action (and, in the case of clevercactus, that I developed :-)). (Lists in alphabetical order).

Some webpage-style aggregators

Some email-style, or 'three-pane' aggregatorsAll have one or two distinguishing features that make them unique. In the end, which one works better for you is all about personal taste and work patterns. Check out the aggregator listings I mentioned above, and look for something that grabs your attention. Try them out, and see which one you like best. Note: in many cases, aggregators are ongoing projects. Some are open source and are updated often. My advise to save time in choosing an aggregator is to go for the simplest route possible at first, and then with time try out new things. For example, if you're a paying-LiveJournal or Radio user, it will probably be better to use the built-in aggregator at first, especially as you try to find your way around all of these new concepts. If you want to try other options, or you use different weblog software, I think that a quick glance at the webpage of a product is usually a good indication of how much knowledge you need to set up something: if you don't understand what the page says, it's probably not for you. It all depends on your knowledge and how much time you want to spend on it. But, by all means: if you have the inclination or the time or both, spend some time looking at the different options. You'll be surprised at some of the cool features that some aggregators have, even if they are sometimes 'experimental'.

Using your aggregator

Once you've installed an aggregator (or decided to use the built-in aggregator of your weblog software), it's time to subscribe to some feeds.

Feeds (or newsfeeds) are usually the name given to sources of information (used by aggregators) to obtain the content they display. Feeds are technically similar web pages, like those that are displayed in a web browser. Web pages, however, are written in HTML (HyperText Markup Language) which is designed to create pages readable by humans. Feeds, on the other hand,
used for syndication, are intended to be "read" (or rather, processed) by software, and so they have different type of information, are more structured and strict in the data they can contain. Feeds are written using a language called XML (eXtensible Markup Language) using a de-facto standard "dialect" of it called RSS.

Aggregators let you 'subscribe' to to these feeds in different ways. Most pages identify the feeds as 'Syndication', or 'RSS', or 'RSS+version number'. (See the next section if you'd like to know more about these differences). Many pages have an orange icon that says "XML" like this one: . Depending on the software you are using, subscription itself can be easier or more difficult. In all cases, the following set of steps will work to subscribe to a feed:

  • Find the link on the page that says "Syndication", "Syndicate this site", "XML", "RSS", etc.
  • Right-click (or press-hold in Macintosh) over that link. Your browser will show a menu of options, and one of them will be "Copy Link Location" or "Copy Shortcut". Select that option.
  • Now go to your aggregator and find the option to Add or Subscribe to a new feed. Select it and when you are requested to type in the URL (link) of the feed, right-click (or press-hold in Macintosh) again on the field and select "Paste". This will make the URL be pasted on to the field. If right-click doesn't work, you can try with keyboard options: Ctrl+V or Shift+Insert on Windows, or Command+V on the Mac.
Now, just to be clear. These instructions are the most basic of all, and supported everywhere, and they are important so that you can use them "when all else fails". Many aggregators come built-in with a choice of feeds that you can subscribe to with only the click of a button. Many aggregators allow you to type in simply the URL for the page you are visiting (as opposed to that of the feed) and then discover the feed for you. Others also establish a "relationship" with your web browser so that when you click on the icon or link for a feed, they give you the option of automatically subscribing to that feed.

Okay, now for a bit of a detour. If you'd like to know a bit more about RSS and related technologies and have an interest in the technical background, or are technically proficient, please read the next section. Otherwise, skip to the following section.

Ready?

Okay, tell me more about RSS.

Before I start: this is a highly charged (and even emotional) issue in the weblog developer community. People have very different opinions, and this is just my take on the situation. By all means, go to different search engines and search for "history of RSS", "history of syndication", "RSS politics" and similar terms to find pointers to different sides of the argument.

Politics? Did I say "Politics"?

Yes. Yes I did.

Sometimes people mean different things when they say "RSS". Some people see it only as a way to syndicate web content. Others see it as a way to pull all sorts of information into clients. There are different opinions as to how it should be used, how it should do what it does, etc.

In the majority of cases RSS stands for "Really Simple Syndication" but you might come across other places where it is described as meaning "RDF Site Summary" (RDF, which stands for "Resource Description Framework" is yet another XML dialect, that is more flexible, but also more complex). I prefer to separate them clearly and call RSS-based feeds RSS feeds and RDF-based feeds RDF feeds, but I might be in the minority (So when I say "RSS" I mean Really Simple Syndication, not the RDF-based format). There is another syndication format being developed at the moment (also XML-based) called "Atom". Additionally, there are different versions of RSS: 0.91, 0.92 ... 2.0 (the current version of RSS is 2.0)... and RDF-based syndication is sometimes called RSS 1.0 (yes, this last one in particular is quite confusing). These are various formats for syndication. If we lived in a perfect world, we'd only have one format. But that is not the case.

I can imagine you're thinking: So, even if I know about technology, why do I care about all of this "XML mumbo-jumbo"?

Well, if you start your own weblog and begin to discover new weblogs and new feeds, and are curious about the technology, more likely than not you will read about this, about people passionately arguing about these things, mentions of RSS of this version and that, and so on. And so it's a subject that can't really be completely avoided. If you're interested in knowing more, not telling you about this would be like pretending that you can fly across the Atlantic and think that you'll never have to know about the fact that you are likely going to experience some kind of delay on departure.

But, as the Hitchhiker's Guide to the Galaxy says: Don't panic. :-)

More specifically, I'm mentioning this for two reasons:

  1. Because you, as a user (that is nevertheless aware or interested in the technology behind this), are likely to encounter this in subtle forms. For example, you might go to one news site and see that they say they provide "RSS 0.91 Feeds". Or you might see the XML orange icon shown above. Or you might see they say just "RSS", or "RDF". You will quite possibly see mention of all of these names and acronyms when you're looking at aggregator software. So, in seeing all this, you might wonder: is one better than the other? Which one should I choose? To that I'd say: start by not worrying too much. RSS is the most common format by a mile, and that is good because it's the one you'll be most likely to encounter and so you won't have to think about this much. Additionally, all aggregators support the most used formats, and many of them support all the formats in existence. In general, you don't really have to even know which of these formats is actually being used. But once you get a grasp on things, it might be a good idea to read about the history behind these differences and make up your own mind about them. (If you are a software developer, and know about XML, good starting points are Dave's RSS 2.0 political FAQ and Mark's What is RSS? and History of the RSS Fork. Beware: these discussions tend to get quite technical very quickly).
  2. Because as you're perusing weblogs or news sites, it's better to be aware of what these things mean to avoid getting confused. Once you have this bit of information, it becomes easy to look for the XML orange icon, or some link that says "RSS" or "RSS+version number" or RDF or whatever.
Again, if we lived in a perfect world only a few people would ever have to deal with this stuff. But this is relatively new technology, and we are still trying to figure out all of its uses, and in some cases, what it is exactly. The important thing is, in my opinion, that you as a technically proficient user don't feel as if things are beyond your grasp. My pet peeve is when people using say "I don't know what I did, it seems I broke something". But when (say) the washing machine stops working, we never say "I broke the washing machine" simply because we were using it. We say "the washing machine broke down". So why is that? I can think of many reasons: error messages in computers generally put the burden on the user, for a start. But regardless of that, what I would say is: if something seems complicated (like all of this "XML mumbo jumbo") it's not a problem with your knowledge of computers. It's our problem, a problem of the software developers. (If you get involved in the technology or the community in any way, then it will be your problem too :-)).

And so what? You ask. Well. Weblogs allow a new level of interaction. You can make a difference. Perhaps for the first time ever, users can actually influence and participate directly in the creation of the tools they use everyday through the tools they use every day. So if there is something that is difficult to use, something confusing, it's likely that you can find a weblog or reference for the software author(s). Post a comment. Write your own post about it. Get involved if you can, and by that I don't mean 'develop software'---simply giving opinions and ideas is a good start. People will listen, and the problem might even be fixed!

Now back from the technical depths of this section, and to simpler things.

So how do I find these 'feeds'? And how do I create them?

First of all, let's deal with feed creation. Just as your weblog software automatically generates the HTML page that is displayed in a browser (when you post an entry), most weblog software also generates the feeds for you, and places a link for the feed in your homepage. All of this is done automatically by default---if you are not sure of whether or how this is happening, check your weblog software's help page for "Syndication" or "RSS" and you should be on your way.

Finding feeds to subscribe to is not so difficult. If you're reading other weblogs or you find one of them that looks interesting and would like to keep up with what they are writing, just look for the link or icon that identifies their feed and subscribe to them "as you go". In some cases, the aggregator software will come pre-subscribed to some feeds, or will suggest new feeds to subscribe to. The methods used to find weblogs (mentioned in part one) apply to finding feeds as well, for example, both Technorati and Blogstreet (as well as other sites) allow you to find new weblogs, and hence new feeds to subscribe to. There are "Feed directories" like Syndic8, that allow you to find feeds of certain topics easily. Finally, there's Feedster which is a cool search engine that deals specifically with syndication. All the results in Feedster come from feeds, and so lets you not only look for information but also find new feeds that you'd be interested in looking at.

At the beginning you mentioned news sites. Are there 'feeds' for news sites too?

Yes. Many news sites and organizations today support news feeds. Examples: the BBC, Rolling
Stone Magazine
, and News.com. Look in your favorite news site, or use the feeds recommended by your aggregator (if any), or use some of the directories mentioned in the previous section to find more.

Why did you say that syndication is 'instrumental' for weblog communities?

I think that weblogs are cool but syndication+weblogs is really cool. It's a case of 1+1 = 4. Because syndication allows you to subscribe to many sources, you can keep up to date with a lot more and so they allow to maintain people up to date easily on what others are doing in their particular community. Things like Feedster and Technorati reinforce the "loop" that feeds create. These loops are "loosely coupled," connected through links and with people notified of updates through feeds, both done in unobtrusive ways. The conversation moves across sites, as people find the time or have the interest to do it.

Posting from your aggregator

Since a big part of weblogs is the 'conversation' that is established between different sites, it would be great if you could just re-post a piece of something you've read, or comment on it, no? Many aggregators let you do just that. For example, since Radio is both weblog software and an aggregator, you can use them 'in tandem', to post comments on things you're reading about. NetNewswire, NewsGator, FeedDemon, clevercactus (as well as others) all allow you to post to weblog software as well as reading feeds. I won't go into the details of how to do this mainly because the configuration varies from software to software, but I just wanted to mention it as something that exists, and that you might find useful as you get more comfortable with weblogs and aggregation.

Final final words

Both part one and two are an overview of concepts that (as I said) are still relatively new. As a result, things are still evolving, and new applications are being created all the time. Sometimes the technology can appear to be daunting, but there's lots of people working on making it better, and easier to use. Once you are more 'embedded' in the world of weblogs, you will start finding new uses and applications---things that were simply not possible only a few years ago---to communicate, collaborate and express yourself.

See you in the blogsphere! :-)

Categories: soft.dev, technology
Posted by diego on November 2, 2003 at 7:33 PM

an introduction to weblogs

During the last Dublin webloggers' meeting I was asked the question, "How do I start a weblog?" I began answering, somehow under the impression that it would be a simple answer. But it wasn't. As I went into more detail I realized that I was giving out more information that anyone in their right mind could digest easily. I then decided to write up this short intro so that I could use it in the future. A big part of this for me is an excercise in writing down things that might seem obvious to me (and others) but not so much to those that aren't involved in weblogs yet.

For this short intro I will assume very little: that you use the Internet regularly and that you might check news sites now and then, such as CNN or the New York Times. And that's it!

And of course, any corrections, additions and comments are most welcome. A note: this deals only with weblogs, not with newsfeeds, RSS. newsreaders, and such. Hopefully I'll get around to writing another similar introduction sometime in the near future, or to add to this one soon. :)

Update: I have posted part two of this guide here.

So, here it goes...

Intro to the intro

Before we begin...

...some terminology: there are words that you will see often with weblogs: client, server, host (or hosting). Some of these words might be familiar or not (and probably they're obvious to everyone!), but just to be 100% sure, here go some definitions as I'll use them trying to avoid taking too much liberty with their actual technical definition

  • weblog: the subject of this piece. :-) Seriously though, weblogs are often also called blogs, and some publications refer to them as "web logs"(note the space between the words). There are all sorts of blog-related terms, such as blogsphere, blogosphere, blogland, etc. 
  • content: basically, information. Content is anything that can be either produced or consumed. Web pages (HTML), images, photos, videos, are all types of content. The most common type of content in weblogs today is text and links, with images growing in popularity and audio in a slightly more experimental phase. Videos are not common, but it's possible to find examples.
  • client: a PC, or a mobile device such as a Palm or a cellphone. Clients, or client devices, allow you to create content (text, images, etc) and then move them to a serverat your leisure.
  • server:a machine that resides somewhere on the Internet that has (nearly) 100% connectivity. Servers are where the content for a weblog is published, that is, made available to the world. (We'll get later to how to or whether you even need to choose a server, in most cases this choice depends on the software used). Servers are also commonly called hosts, and the "action" of leaving information on a server is usually referred to as hosting.
  • URL: or "Uniform Resource Locator," the text that (usually) identifies a webpage and (generally) begins with "http://...". Sample URLs are: http://www.cnn.com, http://www.nytimes.com, and so on.
  • link: a hyperlink, essentially a URL embedded within a webpage. Hyperlinks are those (usually blue) pieces of text that take you to another page. Links are a crucial component of the web, but more so (if that's at all possible) with weblogs. Links are what bind weblogs together, so to speak. Through links I can discover new content, follow a discussion, and make my viewpoint on something known in an unobtrusive manner (more on this later).
  • client/server: the basic model through which weblogs are published today, and the model around which the Internet itself is largely based (This is not technically 100% true, since the original Internet was peer to peer (P2P), and we are all using P2P applications such as Kazaa these days, but let's overlook that for the purposes of this document). Clients create the content and then send (publish) it to a server. The server then makes the content available.
  • post: or posting, or entry, a single element of one or more types of content.
  • referrer: another crucial component of weblogs, referrers are automatically embedded by your web browser when you click on a link. (See more about referrers in the subsection on 'community' near the end of this page.)
  • public weblog and private weblog, are two terms I'll use to separate the two main types of weblogs that exist. Public weblogs are published on the Internet without password or any other type of "protection", available for the world to see. Private weblogs are either published on the Internet but protected (e.g., by a password) or within a company's network. Most of what I'll discuss here applies to both public and private weblogs.
  • permalink: a permalink is a "permanent link", a way to reference a certain post "forever". "Forever" here means until a) the person that created the post changes their weblogging software, or b) until their server goes down for whatever reason. If you read news sites, you'll notice that news stories generally have long, convoluted URLs, this is because every single news article ever published is uniquely identified by their URL. If you copy the name of a URL and save it in a file, and then use it again six months or six years later, it should still work. All weblog software automatically and transparently generates a permalink for each post you create, and the way in which weblogs reference each other is by using the permalink of the posts or entries.

Getting started

First of all, what is a weblog?

There are many good descriptions of what weblogs are (and aren't) scattered through the web. Meg's article what we're doing when we blog is a good starting point. One of the oldest descriptions around is Dave's history of weblogs page, and he went further in his recent essay what makes a weblog a weblog. Others interesting essays are Rebecca's weblogs: a history and perspective, and Andrew's  deep thinking about weblogs.

Not surprisingly (as you might have noticed from reading the articles/essays linked above), people are have slightly different takes about what exactly constitutes a weblog, but there is a general acceptance that the format in which content is published matters, as well as the style in which the content is created. Additionally weblogs are usually defined by what they generally are, rather than trying to provide an overarching definition.

Here's my own attempt at a short list of common characteristics of weblogs. Weblogs:

  • generally present content (posts) in reverse chronological order.
  • are usually informal, and generally personal
  • are updated regularly
  • don't involve professional editors in the process (that is, someone who is getting paid explicitly to review the content)
Beyond that, format, style and content varies greatly. I think that this is because weblogs, being as they are generally personal, that is, by or about a person, have and will have as many styles as personal styles there are.

What's the difference between weblogs and "classic" homepages? Technically, there isn't a lot of difference. The main difference is in terms of how current they are, how frequently they are updated, and the navigation format in which they are presented (weblogs have a strong component of dates attached to entries). I'd say that homepages are a subset of weblogs. You could easily create a weblog that looked like a homepage (by never updating it!) but not the other way around.

Sometimes weblogs have been billed as "the death of journalism," which I think isn't true. If there are any doubts, you can check out weblogs written by journalists, and compare that to the articles they write. They are qualitatively different. I think there will always be some room for people that make a living reporting, searching for stories, editors that correct what they write, etc. The role of news organization and journalism might change because of weblogs a bit, maybe it will become more clear and focused, but that doesn't mean it will disappear. Weblogs are a different kind of expression, period, and as such they are complementary to everything else that's already out there.

However, the best way to see what weblogs are like is to read them (as opposed to reading about them), and then try one yourself. As I've mentioned, weblogs come in different shapes and sizes. Some people tend to post long essays, some people just write short posts. Some talk about their work, or about their personal life. There are an untold number of weblogs that are simply ways for small groups to share information efficiently within their company's network, to create a "knowledge store" for projects. Some people post links that they find interesting. Some add commentary. Others only comment on other's weblog entries. Some weblogs are deeply personal. Some talk only about politics, or sports. Quite a number of them talk about technology. Some weblogs have huge number of readers. Others only a few dozen. Even others are completely personal and are only read by the person who writes them. Some public weblogs (relatively few) are anonymous, most identify the person. Some are updated many times a day, others once a day, others a few times a week.

You get the idea. :-)

So, some good examples of well-known weblogs (at least within their communities) to read and get an idea of what they're about. Check them out, read them and about the people that create them (alphabetically). Anil, Atrios, Betsy, Burningbird, Dan, Dave, Doc, Esther, Erik, Evan, Glenn,Gnome-Girl, Jason, Jon, Joi, Karlin, Halley, Mark, Meg, Rageboy, Russ. All of these weblogs are, in my opinion, great examples of weblogging in general. You may or may not agree with what they say, you may or may not care, but they are all a good starting point to show what weblogs are and what they make possible.

Those that are more embedded in the weblogging community that might object to presenting such a small list to represent anything, or might put forward different names, so I just want to say: Yes, I agree. But to show different styles of weblogging, and provide some initial pointers, we have to start somewhere. I'll go further on the subject of discovering weblogs below, in the subsection about community.

This all sounds intriguing, but will I like it?

That's a difficult question. :) I guess my answer would be "try it to see if it fits". As mentioned below, weblog software is invariably free to try (at least) and so there is no cost in getting started. My opinion is that some people are more attuned to the concept than others, because they are already sort of weblogging even if they don't describe it as such. For example, if you like to rant about anything, if you keep pestering your friends, family and coworkers about different things that you've seen or read or thought about, or if you regularly launch into diatribes about all and any kinds of topics (e.g. "The emerging threat to African Anthills and their effect on the landscape") then you might be a Natural Born Blogger. :-)

So, again, just try it out. If it doesn't work out, no harm done. It's certainly not for everyone. But you just might discover a cool way of expression and create a new channel to communicate with the people you know, and a way to find new friends and for other people to find about you.

Okay, I'm sold. How do I get started?

First step in starting a weblog is choosing the software you will use. There are many products available.

But before going into them, there are two main categories of software to choose. I'd ask: how much do you know about software, or how much do you want to know? Do you run or maintain your own server? Are you interested in running a private, rather than public, weblog, for say your workgroup, and you don't want to worry (too much) about passwords and such, and can handle yourself technically?

If the answer to any of the questions above is yes, skip this next item and go directly to 'Self-managed weblog software' below. Otherwise, you'll probably be better off with 'End-user software'.

End-user weblog software

Here are some of the most popular products (in alphabetical order). All of them have been around for several years and have been important drivers of the weblog phenomenon (except for TypePad, that launched in mid-2003 but is based on MovableType, which is another popular tool "from the old days"--see below).

  • Blogger. A fully hosted service, Blogger lets you post and manage your weblog completely from within your web browser. Blogger is now part of Google (yes, the search engine). A good starting point for blogger use is blogger's own help page.
  • LiveJournal. A hosted service, like blogger, with a long-time emphasis on community features. For help on live journal, check out LiveJournal's FAQ page.
  • Radio Userland. Radio runs a client as well as a server in your PC and lets you look at your content locally through your web browser. To publish information, Radio sends the content to Userland's public servers. Radio's homepage contains a good amount of information and links to get started, and a more step-by-step introduction to Radio can be found in this article.
  • TypePad. Fully hosted service. An end-user version of MovableType (see below) with more capabilities (in some cases) and some nice community features. To get started, check out the TypePad FAQ.
So, which one of these should I choose?

Short answer: it depends.

Long answer: it depends. :) That is, it depends on which model you prefer. Blogger is free, LiveJournal has a free and a paid version. Radio and TypePad are not free but offer trial versions. Blogger, LiveJournal and TypePad are fully hosted, while Radio keeps a copy of your content on your PC as well as hosting your content on a public server. All of them are free to try, so looking around for which one you find best is not a bad idea. :)

Self-managed weblog software

Here are some of the most popular products (again, in alphabetical order)

All of these products involve some sort of setup and, at a minimum, some knowledge of Internet servers and such. (All the links from the list contain information on installation and setup). If you have set up anything Internet-related at all in the past (say, Apache or IIS), you should be able to install and configure these products without too much of a problem. (If you don't know what IIS or Apache is you should probably be looking at the previous section, 'end-user software').

Beyond the first post

Are there any rules to posting?

Generally, weblogs being what they are, the answer is no. But there are some things that I personally consider good practice that I could mention:

  • Links are good for you. Always link back to whatever it is you're talking about, if possible. A hugely important component of weblogs is the context in which something is said, and links provide a big part of that context.
  • The back button rules: Never repost a full entry from another person without their permission. "Reposting" implies to take someone's text and include it in your own entry. Usually this is done to comment on it, but I think it's better to send people to whatever it is you're talking about, with quotes when necessary to add specific comments, rather than reposting everything. All web browsers have "back" buttons; once someone's read what you're talking about they can always go back and continue reading your take.
  • Quote thy quotes: Quotes of another person's (or organization's) content should always be clearly marked.
  • Thou shalt not steal. Never, ever, ever, repost a full entry that someone else wrote without at the very minimum providing proper reference to the person who wrote it. Even then, try to get permission from the author. See 'the back button rules' above.
There is another question that usually starts up discussion in the weblogging community, the subject of editing. As I mentioned above, weblogs in general are self-edited, but even if they are, how much self-editing is appropriate? Again, it depends on your personal style. Some bloggers don't edit at all and just post whatever comes to their mind. Some write, post, and then edit what they posted. Others do self-editing before posting and publish something only when they're happy with it. You should choose the style you're comfortable with.

What about comments to my posts? And what's this 'Trackback' I keep hearing about?

Weblog software usually allows you to activate (or comes by default with) the ability for readers to leave comments to your posts. This is generally useful but you might not want to do it. As usual, it's up to you.

Trackback is something that allows someone who has linked to you to announce explicitly that they have done so, thus avoiding you (and others) having to wade through referrers to find out who is linking to you, and providing more context for the conversation. Some weblogging systems (e.g., TypePad, Radio, MovableType, Manila) support Trackback, but some don't (e.g., Blogger, LiveJournal). Once you have become familiar with weblogs, Trackback is definitely something that you should take a look at to see if you might be interested in using it. Here's a beginner's guide to Trackback from Six Apart, the company behind MovableType and TypePad (that created the Trackback protocol), as well as a good page that explains in detail how Trackback works.

These mechanisms are useful more for the community aspects of weblogs than anything else, and usage of them varies widely from weblog to weblog.

And what about all this 'community' stuff?

Because weblogs are inherently a decentralized medium (that is, there is no single central point of control, or one around which they organize), it's much harder to account for the communities they create and to track their usage. (For example, the actual number of weblogs worldwide is estimated at the moment to be anywhere between 2 and 5 million. Not very precise!). But there are ways to find new weblogs, and here are a few of my favorites.

Update directories

There are sites like weblogs.com and blo.gs as well as others that are usually notified automatically by weblog software when a new entry is posted. Because of that they are a good way of (randomly) finding new weblogs.

Blog directories

What? This sounds a lot like "the central point" I just said didn't exist. Well, it does and it doesn't. There are directories, but they are not 100% complete because they rely on automatically finding new weblogs (for example, through weblogs.com updates and other means) or through people registering their weblogs with them, and both methods are fallible. Two examples of this are Technorati and Blogstreet. When you go to those sites you'll notice they talk about "ecosystems" to refer to weblogging communities, and that's a pretty accurate word for what they are. Those sites, as others that perform different but related functions (such as Blogshares, or BlogTree), also let you explore communities around your weblog, discover new weblogs, etc. Daypop, Blogdex and blogosphere.us focus a bit more on tracking "trends" within the weblog community (particularly Blogdex). Technorati and Blogstreet do this as well.

Search engines

A lot (and I mean a lot) of result for search engine queries these days lead to weblog entries on or related to the topic you're looking for. Chances are, those weblogs contain other stuff that you'll find interesting as well. Some good search engines are Google, Teoma, and AllTheWeb.

Targeted directory/community sites

There are sites that center around a particular topic and put together a number of weblogs that are devoted to or usually talk about that topic. For example, Javablogs is a weblog directory for weblogs that have to do with the Java programming language.

Referrers

Referrers are a mechanism that exists since the early days of the web, but that have acquired new meaning with weblogs. The mechanism is as follows: if you click on a link on a page, the server that is hosting the page you are going to will record both the "hit" on that page, as well as the source for the link. Those statistics are generally analyzed frequently (e.g., once every ten minutes, once every hour) and displayed on a page for your perusal. So if someone posts a link to your weblog and people start clicking on that link to read what you've said (and depending on the weblog software you're running) you will be able to see not only how many people are reaching you through that link, but also who has linked to you, which then helps you discover new opinions, people that have similar interests, etc. Directories like Technorati also track who is linking to your site, and so serve a similar function (but, again, as they are not 100% accurate you might not get the "full picture" just from looking at them).

Other options

There are many. :) The best additional example I can think of is some of the community features of LiveJournal and TypePad, which allow you to create groups of friends with whom you prefer to share what you write, etc.

Is blogging dangerous?

Yes. Most definitely. And addictive, too. :-)

Seriously though, while blogging might not be literally dangerous, it is most definitely not free of consequences. We sometimes have a tendency to take ourselves too seriously, or to misinterpret, or to rush to judgement (I wrote about these and other things in rethoric, semantics, and Microsoft). Some people have been fired from their jobs because of their weblogs. Others have lost friends, made enemies, and gotten into huge fights (mostly wars of words, but that nevertheless have impact on both online and offline life). On the bright side, weblogs have been at the core of a large number of positive developments in recent years, mostly technical but also other kinds, have provided comfort and even news when everything else seemed to be collapsing both in large scale (for example, the Sept. 11 terrorist attacks in the US) and for individuals and small communities. People have made scores of new friends, gotten job offerings, and started companies through them.

The number one reason for this is that, contrary to what you might think (and unless you're writing for yourself and not publishing anything anywhere), people will read what you write. It might be a few people. It might be many. It might be your family, your friends, boss, or your company's CEO, or a customer. (Robert Scoble, who works at Microsoft, posted some thoughts on this topic today, here and here). This is easier to see with a private weblog, but I'm always surprised at how easy it is for me to forget that it happens (of course) with public weblogs, all the time.

My opinion is that in weblogs, as in life, whenever you expose part of yourself in any way, whenever you engage in a community, whenever you express yourself, these things tend to happen. :-)

Final words

You might have noticed that there are a lot of "do what you think is best" comments interespersed with the text above. This is not a coincidence. Blogs are, above all, expression. Blogs and the web in general allow us to look at many viewpoints easily, cross-reference them, etc. Check things out. Look for second, third, fourth, and n-th opinions (and this definitely includes the contents of this guide!).

You have the power!, or in other words: It's up to you.


Read more in an introduction of weblogs, part two: syndication.

Categories: art.media, soft.dev, technology
Posted by diego on October 31, 2003 at 10:34 PM

why sun should change the Java app icon

Because it looks horrible, that's why. Look:

java-icon.PNG

This is the crappy icon that a) is shown whenever Java is loading with Java web start, at any stage, b) the icon that is used for Java config options, and c) the icon that appears on dialogs or frames that don't have an icon properly set.
I know that doing a good 16x16 icon of the Java coffee mug must be hard. But right now every time a Java application that isn't properly set up loads, or JWS loads, I cringe. It looks broken. It looks outdated (outdated as in 1992 outdated, not even 1997). It looks bad.

Java apps deserve a cool, 3d-ish, metallic thingamagic icon that will do justice to the platform. If Sun has staked so much in the logo and the brand it's high time that details like these (that are nevertheless hugely important) are also taken seriously.

And, while we're at it, make it so that JDK 1.5 loads the system L&F by default. Right now loading the Metal L&F makes apps look terrible. Metal might have looked cool in 1998, but we're a bit past it, don't you think?

Okay, end of rant. :)

Update: Martin pointed out that JDK 1.4.2_02 had a new icon. I had avoided 1.4.2_02 because of some reports of problems with JWS, but I decided to try anyway. Here's the new icon:

new-j-logo.gif

It's definitely an improvement, but still not good enough. Funny that I said metallic-3dish looking, this is very much that... but it should be brighter, similar to the actual Java logo. Right now it's hard to relate the new logo to this icon.

Categories: soft.dev
Posted by diego on October 30, 2003 at 8:50 PM

rethoric, semantics, and Microsoft

I got quite a lot of feedback on my Microsoft press release parody. Even Scoble had fun :). Anyway, I wanted to add something a bit more serious to it, particularly after I read Scoble's entry on the reactions to his "How to hate Microsoft" post.

I've written about my own feelings towards MS before (a good starting point is here), so I won't go into that. But I wanted to address the issue of the rethoric involved.

About the only thing that I found to be truly a problem for me is the way Scoble talks about people. His definition of "there are two kinds of people: people that hate Microsoft, and people that hate Microsoft but want to see it improve" is probably a good reflection of how Microsofties in general see the world. Now, I know that Scoble was in part making fun of the situation, I certainly hope that is the case, but as usual when we say something, even jokingly, there's always a kernel of truth to it, at least from our subjective perspective.

I have written before about an excellent book on Microsoft called Breaking Windows which shows not just how Microsoft works in a Darwinian fashion within itself but also how it views the world: everything is a threat, and Microsoft is always the underdog about to be wiped out by whatever New New Thing comes along. This view has obviously served them well to stay competitive, but there comes a point when you should (simply from the point of view of being a good citizen) really consider if what you think is actually the truth of the situation. So, news flash, Microsoft: you are not the underdog. You are not even the proverbial 800-pound Gorilla. You are the only Gorilla left because all the other Gorillas are dead and you have the steaming machine gun in your hands.

You liked that metaphor? It depends on your view. But precisely the fact that the metaphor is seen as funny and maybe true by some people (I know some part of me does) shows how far we've gone in terms of applying extreme rethoric to this whole situation. Which brings me to my point.

(Gasp! Yay! He has a point!)

Yeah. Heh. Anyway. My point is that Language (yes, capital L) has been steadily distorted to the point in which the only way to get attention is to scream at the top of your lungs and be "controversial." This is not just Microsoft, or just the technology industry. It's a trend in all societies in general, and particularly in the US where phrases that involve the words "culture wars" are currently bandied about with apparent disdain. We could argue forever about the roots of this: the desensitazion of the general public to harsh news, extremism (not just of the religious kind), an appetite for voyeurism (witness the meteoric rise of all sorts of reality crap shows) that implicitly says that our simple lives are not interesting enough, and so on. The roots and solutions might be interesting but I don't think he have really pinpointed the problem yet, so this is my take.

The problem, in my opinion, is that we have gotten used to extreme rethoric and we take it as a fact of life ("You're either with us, or against us") but at the same time, we have forgotten that 99% of the time extreme rethoric is just that. Words. A facade. In western societies in particular, we seem to have huge problems in differentiating between our public and private personas (something I've also written about in the past).

In the vast majority of cases, when we use extreme rethoric we don't really, really, really mean it. We are just trying to make a point, and we know it's not a matter of "life and death".

But we forget that.

Let me try to say this again more concisely and in a slightly different way: we forget that what a person says is not who the person is. The mapping between what's in our heads and what comes out of our mouth (or fingers!) is imperfect. It takes a while to get it right. And some things can't be expressed at all without totally missing the point (It's not a coincidence I like Taoism so much is it?).

Case in point, while we're at it: Blogging.

There are a number of great bloggers that have the ability or the psychological endurance or the need or all of the above to be a lot more open than others about their personal life (put me in the "others" pile there, at least that's how I see myself). Examples don't really abound, but some immediately come to mind: Halley, Dave, Mark, Russ.

If your reaction when seeing any of the names linked above is "why the hell is he putting so-and-so as an example of anything? they're [insert expletive here]!", then, I'd say: thanks, you've made my point.

You see, as open as people can be on their weblogs, there is really no substitute for knowing the person. A weblog is a slice of life. It is not life. Sure, this is obvious. We tend to forget it anyway. It's the double-edged sword of expression: you can never make it truly objective because interpretation is a step in the process. But we treat them as if they're objective anyway, which is probably one of the single greatest flaw of the decontructionist approach of the western way of looking at things.

Some lines from from Eminem's Sing for the moment come to mind:

See what these kids do/hear about us toting pistols/they wanna get one/they think this shit's cool/not knowing we're really just protecting ourselves/we're entertainers/of course this shit's affecting our sales/you ignoramus/but music is reflection of self/we just explain it/and then we get our checks in the mail

It's not a coincidence that Eminem's lyrics are often misunderstood: they are often personal. And it's easy to lose sight of the context, the personal context in which something is said by someone. Maybe they didn't fully explain themselves. Fine. But did they have to? Why do we have this need to rush to judgement before we've heard it all? Or why do we have to pass judgement at all? What if someone is just expressing something, as completely as they can?

So, in this context, :) back to Microsoft and Scoble and all of those ridiculous generalizations. If I say, "Windows sucks". Does it mean I hate Microsoft? Of course not. But it's much, much easier to jump from A to B and so we do it all the time. It's easy because we don't really have to look at the problem. It's easy because if I hate Microsoft then, there you go, that's the explanation right? Nothing's wrong with Windows itself, in a single act of simplification we have just shifted the discussion from the real point (which is that Windows might have er... a few tiny problems) to something that is completely unrelated, and even worse, not real which is that a person somehow "hates" a corporation. I tried to make this point in the fake press release: give me a break, do we really have the time or the inclination to hate a software company? When you get home and start to cook dinner, are you thinking "God, how I hate Microsoft" while you stir the ravioli? And let's not even go to the hundreds of millions of people that don't even have food to begin with. Yes, I'm sure that there are people that are truly filled with hate. But I'd contend that they are very, very few, and that it's ludicrous to lump large sections of the population in that corner simply because we find it easier to deal with our problems, or ignore them.

Just check out Scoble's own entries for a single day from about two years ago. Does it mean that he "hated" Microsoft back then? No. Does it mean that he "loves" Microsoft now? (The false choice we are presenting when we put hate in the other basket) No. So why oversimplify? It's not as if we have 15 seconds to express ourselves. We have the luxury of a medium that allows for more complete expression.

And even if expression is at fault, if someone is not fully clear, even if they think they are, rushing to judgement (and worse, extreme judgement) doesn't sound like a good idea to me either.

Hate is a very strong word, one of the strongest we can ever use to express what we feel. Do we really want to trivialize it that much? Because if we do, then other words lose their meaning too. Words like love, trust, friendship, honesty, heroism, and yes, hate, despise, disgust. These words should not be used often, or otherwise they lose all meaning. They simply stop working, their semantics vanish and we are left with empty shells that don't communicate anything at all (and don't get me started on the misuse of the word "heroism" these days. It seems to me that if the Media is right, if I make it safely across to the convenience store to get some bananas then I'm a hero too).

The other problem with extreme rethoric is that it forces people to choose between two choices that are not even real. If I say flatly "you either hate Microsoft or you hate Microsoft but want to see it improve" then, what's someone that's in between to do? You suddenly have forced me to put myself in a category. By saying something like that I have instantly forced everyone to become extremists, even though they are not. The nuances are completely lost. And with them, the truth is lost too. Suddnly, we are just flinging dung at each other.

My wish is that we could, for once, go past the rethoric. Or use it but use it well. Separate. This works comparatively well in literature (but for some reason not in music, or even weblogs--probably because it's easy to go down that route when things get more "personal" as both of those things are). That is, if someone says "I hate that book," it's understood that you don't like the book, not that you "hate" the author". So If I'm pissed off about something, maybe, just maybe, it's not that I hate you or your group of your organization. Maybe I'm just pissed off about that one thing. Stop putting the general population into some bag that allows us to easily categorize them and forget about whatever they say. Stop shifting the discussion and talk about the real issues. Or not. But don't pretend you are. It is not real, it is not useful. At all.

Sure, all sides engage in this game. But someone should start by setting the bar just a little bit higher. And someone who has a bigger stake in the process than others, like Microsoft, should have more reasons than most to do it. Or maybe the blogsphere could lead by example by showing that it is possible. As a community we have shown that we tend to fall into the same extreme/destructive patterns as we do in the "real world" quite often, but sometimes the light shines through. Here's hoping that the latter will triumph over the former.

"Just a thought". :-)

Categories: soft.dev
Posted by diego on October 28, 2003 at 2:36 PM

hate microsoft? like working for free? we've got a job for you.

REDMOND, WA -- Microsoft Corp. unveiled a new strategy today designed to off-load development of its products to the very same people that hate them. Under the program, self-described Microsoft haters that subscribe to Microsoft's MSDN program at the low cost of between 500 and 2000 USD, will be able to download the latest build of Longhorn, Microsoft's next-generation operating system. After spending untold hours setting the system up, those users will be able to write up and even publish their ideas and criticism on their own weblog, or public forums or talk about them with friends and family. More significantly, Microsoft vowed to actually pay attention to some of the feedback. Robert Scoble described this unprecedented move of allowing people to talk about things as follows: "Why is this a massive change? Everytime we've released a version of Windows before we kept it secret. We made anyone who saw it sign an NDA (non-disclosure agreement). Even many of those of you who signed NDAs weren't really given full access to the development teams and often if you were, it was too late to really help improve the product." Microsoft noted that they hoped that these new hate-filled testers would prove more effective than the estimated 50,000 internal and 20,000 external testers that had given feedback on previous versions, going as far back as Windows 2000. "Honestly," said one Microsoft executive who wished to remain anonymous, "All those guys must have been asleep at the wheel. I mean, look at the stuff we've released in the last three, four years. Nothing works. We've had so many viruses and worms that we've got calls from WHO offering to send out a team to help."

In his posting, Scoble added "The problem is, there are two types of people: 1) Those who hate Microsoft. 2) Those who hate Microsoft but want to see it improve."

When asked if Scoble had grossly over-simplified the situation by assuming that everyone in the planet was into either of those two categories, a Microsoft representative said "Not at all. We did a lot of research on this. People care about three things: food, whether Ben and J.Lo will get married, and hating Microsoft." The representative added that there is always a margin of error. "The survey was world-wide, so there were flukes. For example, some respondents from a small town north of London put someone or something called Robbie Williams, or Williamson instead of Ben & J.Lo, but we don't know who or what that is. We presume it's codename for a Linux Kernel build."

And what about people that say they don't hate Microsoft, but would simply, only, like to see it play fair in the market and stop leveraging one monopoly to get to the next? What about people that say that Windows is fine and that the only problem is with how Microsoft attacks competitors with a lot less resources? "Nonsense. Those people are just confused, or need to get off the glue. Just like those losers in the middle of Africa or whatever. Like, people that say they can't afford computers, or are worried about wars, famine, terrorism, AIDS, whatever. They watch too much TV and they get ideas."

"There is no bigger deal in the world right now aside from Longhorn, and people, all people, understand that. They want to improve their lives and testing Windows for us for free is the way to go." The representative went on to note that their research had shown that "people" were "tired of dealing with bugs" in the "old" versions of Windows. "This is all about giving customers what they want, and guess what, they want new bugs, too. They're tried OS X, for example, but it just works, and they have go back to Windows." Many people said they "missed the thrill" of dealing with the possibility of losing a day's work in a crash, while others loved rebooting, because "it allows them to go get coffee regularly, or a sandwich, things of that nature, which is not surprising since the survey also found that food is somehow important to people."

Offloading the design and testing process has other benefits too. "It's also the blame factor," the executive added. "Imagine. Longhorn is released and it doesn't work very well. All those Microsoft haters--I mean, they are the ones who signed off on it in the first place, right? How are they going to criticize it then? It would be their fault, right?"

Would the hatemongers be rewarded in some way? "Hell no." The executive said. "With Windows XP we actually charged people to get the beta, and it worked like a charm. Although it has been suggested that we try out BillG's ham sandwich bundling theory, we probably won't do---Ham has too much fat. It's just not healthy."

Although all companies appreciate and use feedback from users and developers, Industry commentators noted that much smaller companies such as Apple or QNX, as well as the group of developers that work on Linux, have been able to develop OS products without formally off-loading design and early testing tasks to the general development public for free. But when asked why Microsoft, who holds USD 50 billion in cash and short-term securities as well as two of the most profitable monopolies in history, can't deploy resources to develop the product on its own, the executive explained. "Well, the truth is that a large part of that 50 billion is going to be used for our new project, a Microsoft theme park. Bill wants to buy Seattle, including the Boeing factories to the north, and turn it all into an amusement park. You'll have all the classics: the SteveB roller coaster, the Blaster Worm House of Horrors, and the DOJ shooting range."

Finally, he hinted "And watch out. ClipIt will be a favorite character."

[Scoble's "How to hate Microsoft" originally via Dave]

Categories: soft.dev, technology
Posted by diego on October 23, 2003 at 5:05 PM

auto-everything

I had a mini-epiphany when I was writing a search plugin for firebird in terms of being as automated as possible when exposing the functionality of the clevercactus plugin API. In Java, we are used to thinking in terms of classes and classpaths and so on--even ResourceBundles (for i18n) are primarily a class-based mechanism (although, yes, you can load a ResourceBundle from a property file).

As a result of the mini-epiphany, clevercactus is now undergoing refactoring to make the configuration both changeable and dynamic. Not necessarily runtime-dynamic (i.e., ala Swing) but reload-dynamic. All configuration is stored in XML files. Example? The resources.

Resource management for i18n is relatively simple: you create a level of indirection and you use the current locale (or a user-specified locale) to access the string you want for a particular case. This theoretically applies to any time of resources but 99% of the time strings are what used, either as text/labels/etc or filenames to access stuff (such as images).

So, how does clevercactus work now? There's a new directory called resources which must exist somewhere on the classpath. That directory contains a main file that points to all the resources and system-wide settings (such as which resource is the default, or the default for a particular language). A sample resource file looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<resources>
<locale>
  <language>en</language>
  <country>US</country>
</locale>
<parent-resource-set>
  <locale>
    <language>en</language>
  </locale>
</parent-resource-set>
<resource>
	<name>REZ1</name>
	<value>some resource</value>
</resource>
<resource>
	<name>REZ2</name>
	<value>some other resource</value>
</resource>
<resource>
	<name>REZFILE</name>
	<value>/images/test.gif</value>
</resource>
<resource>
	<name>MULTIREZ</name>
	<value>value 1</value>
	<value>value 2</value>
	<value>value 3</value>
</resource>
</resources>
Because it's XML you can support cleanly multiple character sets, do validation (when the DTD is available :)), etc. I also added support for multi-values resources, which ResourceBundle doesn't allow (not easily anyway). Finally, note the parent-resource-set tag which allows to specify resource set hierarchies (the country is always optional, so it's not present in that particular example, but it could be).

ResourceManager, the main class, has two main access methods (aside from management methods such load(String path) and setSelectedResourceSet(Locale selectedLocale):

  public static String getString(String name);
and
  public static java.util.List getList(String name);
And the manager converts transparently between types if you request a list as a string and viceversa.

Then, if you drop a new resource file in the resources directory (something with extension .resources) then cc will pick it up automatically on the next restart. And presto! The new language is supported in cc. Nice eh?

This mechanism will apply to basically everything: plugins, menus, commands, etc. It will be interesting particularly for plugins. Regarding plugins in particular, as part of the refactoring the cc components themselves (email, etc) are being turned into plugins. Nothing better to test the framework properly. :)

Oh, and here's a simple nice trick I came up with to find the resources automatically in the classpath (It might be a well-known trick, but whatever--). The problem is that you know the main dir ("/resources") but you don't know where it resides in the classpath. So the code does something like this to perform the autodetection:

  java.net.URL classUrl = ResourceManager.class.getResource(path_sd);
  File fdir = new File(classUrl.getFile());
  String[] files = fdir.list();
  if (files != null) {
    for (int i = 0; i < files.length; i++) {
	String file = files[i].toLowerCase();
        if (file.endsWith(".resources")) {
            //load the file, etc.
        }
     }
   }
The lines in bold are what's important here: Java resolves the file within the classpath and from that you can derive a File object which then allows you to list the files in the directory. Useful and simple. :)

Okay, back to work. After this is all done, it's IMAP and the Calendar all the way.

Categories: soft.dev
Posted by diego on October 21, 2003 at 11:18 AM

micro-everything

According to Sun's CTO, microprocessors are on their way out. Quote:

"Microprocessors are dead," Papadopoulos said, trying to provoke an audience of chip aficionados at the Microprocessor Forum here. As new chip manufacturing techniques converge with new realities about the software jobs that computers handle, central microprocessors will gradually assume almost all the functions currently handled by an army of supporting chips, he said.

Eventually, Papadopoulos predicted, almost an entire computer will exist on a single chip--not a microprocessor but a "microsystem." Each microsystem will have three connections: to memory, to other microsystems and to the network, Papadopoulos said.

He predicted that as more and more circuitry can be packed onto a chip, not just a single system but an entire network of systems will make its way onto a lone piece of silicon. He dubbed the concept "micronetworks."

This trend is certainly visible in low-end systems, where chips (particularly for notebooks) come built-in with video, sound, network, and other features built-in. In that sense, I don't see what's there to "predict". The idea of a system-on-a-chip is already here. Sure, he's probably talking about even more integration, which is reasonable since he's from Sun Mic... well, you get the idea.

I think that it's a little premature to assume that individual components for devices and boards will disappear outright, for a simple reason: when you have different components created by different manufacturers, competition can be more partitioned among different areas of a system, resulting in better quality overall. It's not a coincidence that PCs, which are essentially a bunch of chips from many, many different sources brought together, are cheaper and faster today than anything else in their category. Hey, even Apple switched over to PCI eventually.

Speaking of Sun, there's an interesting article in today's Wall Street Journal (subscription required) on Sun's new strategy. The article's title ("Cloud Over Sun Microsystems: Plummeting Computer Prices") makes it sound as if it's going to be less harsh than it is. The writer all but declares Sun dead as it is customary these days--as an example, consider the quote: "The Silicon Valley legend that once boasted of putting the dot in dot-com is staring into the abyss". Staring into the abyss. Jeez. And then the writer adds snippets like Sun is sitting on $5.7 billion in cash and securities. Heh. I'd have no problems "staring into the abyss" under those conditions. :)

The truth is that the picture that emerges from the information, the quotes from customers and Sun execs, etc, is less clear. I think there's simply confusion on the part of "analysis" that see everything as either-this-or-that on Sun's new ideas, which are a relatively new breed (and quite a gamble, one might add). It's going to take one or two more years to see if Sun is really going to get through this or not.

One of the problems that the article focuses on is Sun's chaotic behavior during the past two years, as the new strategy was developed and put in place:

In December 2002, Jim Melvin, chief executive of Siva Corp., a Delray Beach, Fla., firm that runs back-office systems for restaurants, wanted to invest several million dollars to build a corporate data center using Sun equipment. But when Mr. Melvin approached Sun, he found the company's restructuring was causing chaos. "Sun said call back in two months, because the guy I was talking to there didn't know if he'd still have his job," he says. Frustrated, Mr. Melvin bought IBM and Dell gear instead.
and which I saw for myself here and there. Sun also made the mistake of being a dot-com baby itself, instead of just selling stuff to dot-coms (as IBM did):
Sun compounded its problems by responding slowly to the slumping market. Even as tech spending dried up in 2001, the company increased its work force to 44,000 employees from 37,000. Other firms axed costs early, or launched big deals to remake themselves. Cisco Systems Inc. cut nearly 20% of its work force beginning in March 2001. H-P launched a controversial purchase of Compaq Computer Corp. that same year, and has since slashed $3.5 billion in annual costs.

Sun put the brakes on the hiring in the fall of 2001 and trimmed around 10% of its work force that October -- the first of several big cuts. But by mid-2001, Sun's quarterly sales had dipped to $4 billion and it began reporting net losses. Its share of world-wide server revenue fell to 13.2% in 2001 from 17% in late 2000, according to Gartner.

The new strategy does makes sense. As McNealy is quoted as saying in the article, "we're long on strategy. If we execute well, we'll do just fine."

Exactly.

Categories: soft.dev, technology
Posted by diego on October 16, 2003 at 9:16 PM

google's lack of manners

Gather 'round children, for a sad tale of many emails and waiting for a reply from the Google Pantheon...

It all began about 2 months ago when I had the idea to use the Google API within clevercactus to provide a new way of looking at result sets. Essentially you could get google results in a structured form within the app and then manipulate them (create a link in the app, forward as message, etc). It was the tip of the Iceberg, as I saw it, eventually it would be much more useful to integrate Groups and News into that, but there's no API for them---and the search engine was a good, simple case to start anyway.

So the first step was to email the Google API team asking for a) an up on my Google key limit and b) if they could give me even small feedback on the idea. I gave them a URL to look at the app, and a description of what I was going to do. So I sent the email and waited.

I sent the first email on July 16.

The reply finally came on August 6 (note: 20 days later). What did they say? "Please provide more information about your project". Yes, I had provided information about the project in my first email. But maybe it wasn't enough. So I expanded on my first email and sent another email the next day (August 7).

The reply to that email came immediately. Wow! I thought. It had been a fluke before.

They upped the key limit, and gave me an answer that I thought was ok, but in re-reading it now I can see that it was simply a non-answer. "Google grants you a limited right to use the Google Web APIs service for commercial purposes". That was it. While I was actually asking about the redistribution issue, and how to deal with their library inside cactus, etc. Anyway, at the time I didn't quite realize that my questions hadn't been answered, so I proceeded happily.

Then, on September 1, I sent them another email because I wanted to release the app but suddenly I realized that there were google icons and functionality all over the place. It was subtle, but there was no denying that they might get frazzled. So to be on the safe side, I asked for them to please review the app and let me know if the usage was ok. I provided them with a link to the beta with instructions and an pretty extensive explanation of what we wanted to do, including a question to be put in touch with someone directly. Obviously this was going to affect our release schedule otherwise. I expected that it might take a while to sort out since I'd never heard of the API being used in this way before and I could imagine they'd need some time to decide how to treat it.

So. September 1 I sent that email. And waited. And waited.

And waited.

On September 11 I sent another email requesting a reply to my previous email. Again, I waited.

And waited.

The reply finally came in a few days ago, on October 6. More than a month later.

What did the reply say? Nothing. They told me to go read the terms and conditions of API usage, and they said that regarding a contact, well, they had AdSense (which I had mentioned of course in the previous email). So essentially I waited a month to be told again what I already knew, and furthermore what I had already told them I knew.

Obviously if I thought that simply by reading the API terms the problem would have been fixed I'd have done that. But the API terms are pretty restrictive, and it's not clear 100% what you can and can't do.

The conclusion is, of course, that, the Google functionality has as of yesterday been removed from cactus. I wrote a new search plugin system that uses a similar format to Mozilla's and the new default search engine for cactus is Teoma. (Btw, IMO, the look and behavior of Teoma is better than Google's--although their results are still not as comprehensive, they've gotten better).

Now, I understand that Google is growing fast, that they're under immense pressure, etc. I sympathize, yes. But. But. If you put out an API, then why not support it? I get the impression that they've put this out as a show of "look how cool we are" and then ignored it. Furthermore, giving condescending replies that show that you obviously haven't read the email people sent (ie., that repeat the information (you sent) back to you, psychologist-style) is not a very good idea, is it? The API has the potential but by ignoring it and the people that use it they are killing the very thing that it might help with.

Looking back, having spent nearly three months waiting for them to reply has been a complete waste of time. I put myself in their shoes, imagine the thousands of emails and request they get and I understand that it must be hard to deal with this. But if they don't want to deal with fire, then don't run around lighting matches. It's that simple.

I want Google to succeed, and I've put time and effort into doing something that I thought was useful to users and that, further, Google should have been interested in as well since it's a new avenue for their product. But Google's attitude says to me that they are not interested in developers. Why put the API out then? Who knows. I can't see any reason at all for it except, as I said, to "be cool". Which people might appreciate when you're dealing with a small company, but not when the one who does it is essentially a monopolist with a penchant for secrecy.

And yes, objectively, Google is a monopolist with a penchant for secrecy. That's fine (and besides, monopolies by themselves are not illegal), but it changes how much slack I am prepared to give them.

My wish: that Google would open up a bit. Not much. Just a bit. Let us see what's going on inside. Let us understand why requests are ignored or brushed aside. Put out simple one-paragraph explanations that allow people that don't believe in conspiracy theories to explain things like the recent AdSense license agreement brouhaha. It's so simple! Instead of having to read dozens of angry emails, put out a simple reply on a website explaining what happened. Then route queries to that explanation. Put out a press-release. Whatever. Or get the PR people to host an interview with developers instead. Google's PR seems to be pretty good from all the coverage they get. Most people would understand, I think.

However, it's not a wish that I expect will be fulfilled soon. Google must be, by now, preparing for its IPO. And they still haven't dealt with Microsoft at all. The Bill Kill Machine is still gearing up for them (no pun intended--but maybe we should ask Tarantino to do a movie on that? :)). I don't have any hopes of them suddenly rerouting resources to deal with this.

Why?

Because Google is basically an advertising company. Have you heard many advertising companies engaged with their community or working with developers on anything? No, right?

But even advertising companies should have good manners. :-)

Categories: soft.dev
Posted by diego on October 14, 2003 at 12:02 PM

rss autodiscovery, take 3

Dave has noted the beginnings of his spec for RSS autodiscovery. He has gone the OPML way, and basically his discussion mirrors what I wrote regarding the topic and the choices we faced about a month ago. Jeez. A month! A month! What is up with time these days? Aiieeeeee! [diego runs flailing arms around].

[diego gets back to the computer]

Ok. Sorry about that. I was saying. We're basically in agreement I think, and I had mockups for different options (Tima had also put forward a mockup in WSIL as another option). Anyway, regarding OPML, here's the mockup I did then which differs with Dave's proposal basically on the tag names. I think we're basically in agreement (In the comments of the original post from Jeremy, while some people, including me, thought the idea of using maybe RSS or something else could be better, everyone preferred getting something out the door, no matter what format. In the end it's only agreeing on a few tags, isn't it?).

Just one comment: I think that as a final step, aside from completing a spec (adding samples, etc), OPML should be more rigidly specified so that a) people can't create public feed listings with whatever tags they want (the OPML spec is too flexible in this) and so that OPML files can be validated. Only minor changes are necessary, such as clarifying that no new tags can be used, and maybe that the strings for text/description are UTF-8 to make sure we don't get into a situation where, say, a Japanese news agency decides to use a Japanese encoding instead of Unicode.

Update: Sam also noted the need for this format and started a Wiki to deal with it. I have a feeling of deja-vu here.

Categories: soft.dev
Posted by diego on October 14, 2003 at 10:16 AM

java google tag library

Yesterday Erik released version 1.0 of the google tag library. Nice! We've talked with Erik about integrating my Google RSS/Atom code into it. Anyway, go check it out! (both literally and figuratively :)).

Categories: soft.dev
Posted by diego on October 11, 2003 at 11:46 AM

feedster search plugin for Mozilla Firebird

If you use Moz Firebird you'd probably agree that one of the coolest small features it has is the search box that is at the top-right. Built-in, that search box supports Google and a mozilla search engine of some sort that I've never used. Its workings are a bit opaque, but once you've got a handle on it it's not bad.

That box is in fact an easily extensible mechanism. Since I find myself depending on weblogs to find stuff that I want (and google is returning weblog results all over the place anyway) I spent a bit of time this morning creating a plugin for feedster search. So here it goes!

Steps to install:

  • First, download the plugin ZIP file, which contains two tiny files, the plugin spec file (.src) and an icon that I cooked up since I couldn't find a feedster icon on the site (not that I spent too much time looking...)
  • Then, go to the directory on which you've installed Firebird. In there you should find a subdirectory "searchplugins" uncompress the content of the ZIP file into there. You'll notice that the other options are also found with .src and .gif files for the icons in that directory.
  • Restart Firebird.
And it's done! Now you can simply choose it by clicking (one left-click) on the icon of the search box. That pulls up a list of options, in which feedster should be one now. Choose it, and then search away!

PS: This is actually a format that Mozilla (the original) uses for search plugins--it might work on Mozilla too. Haven't tried it though. If anyone does try it, let me know the results.

Categories: soft.dev
Posted by diego on October 9, 2003 at 12:02 PM

java xml pull and push: a comparison

Yesterday Russ pointed to an article at O'Reilly's XML.com about StAX (XML Streaming API), a Java API that allows parsing of XML through a pull-mechanism, currently in the final laps of the JSR process as JSR 173. I was immediately intrigued. While many find it common to use additional APIs to solve some problems, I tend to prefer removing layers of complexity and abstraction that aren't absolutely necessary, using only JDK-standard (or standard extension) classes as much as possible. Since this API appears to be not only frozen, but also on track to be a standard extension (and hopefully it will be included in the next JDK release!), I decided to give it a try.

While the benefits in simplicity of parsing are quite obvious, I was a little weary of the performance of this package, since it's a reference implementation and it is almost certainly not fully optimized (one of the main uses that I'd give it is to parse RSS, and I was already looking at how to improve performance of RSS checking--but that's another story). So I created two parsers, one using StAX and one using plain SAX, and compared them in terms of usability and performance.

I ran the tests agains an RSS 2 feed of 500 KB, essentially 20 copies of this morning's RSS feed for my weblog.

The results are pretty surprising. StAX wins by a mile. Check out the results:

SAX results

-----------------------------------------------
Start element count = 2109
Characters event count = 4303
-----------------------------------------------
Time elapsed = 140 (msec)
-----------------------------------------------
StAX (Streaming) results
-----------------------------------------------
Start element count = 2109
Characters event count = 4236
-----------------------------------------------
Time elapsed = 63 (msec)
-----------------------------------------------
A couple of notes: "characters event" count refers to the time that an element is of type characters in StAX or when the characters method is called in SAX by the parser. For an entry that contains text (e.g., CDATA), multiple character calls may be received, depending on new lines, etc. Apparently StAX splits the text a bit differently than plain SAX, since it finds more character elements, but that's ok (to parse those you only need to keep state and append the new characters value to the current element you're parsing). Both element counts are the same, which only proves that both are parsing the same structure properly. The results, are, of course, consistent over several runs, with the expected slight differences. (And, btw, I tried testing memory usage but the variability in initial free memory, etc, was too big to be able to measure which one is better. At a minimum, they appear to be equivalent in that sense).

Here is the code used in the tests: for the TestSAXParser class and the TestXMLStream class. Note: to run the code you'll need to download the current specification of the JSR. In that ZIP file there's the spec itself (a PDF) and a JAR file, jsr173.jar, which contains a number of classes, API docs and such. Of those jars, the only ones necessary to run the example (ie., that must be added to the classpath) are jsr173_07_api.jar and jsr173_07_ri.jar, that is, the API and the reference implementation respectively.

As it is plain to see, the StAX code is a lot simpler than the SAX code, because it doesn't require a wrapper to make it look more event-based. Add to that the fact that StAX code runs at more than twice the speed of SAX, and, as I said, StAX wins hands down. No contest.

Cool eh?

Categories: soft.dev
Posted by diego on September 19, 2003 at 10:40 AM

update on rss autodiscovery

In response to my previous entry that further explored some of the issues we face on RSS autodiscovery, Tima has posted examples of how my original mockups would look in WSIL. Looks interesting--a bit more complex, but it's a recognized standard.

It seems to me that a the main element that would have to be decided at this point is whether to go with OPML, RSS, or WSIL (the decision of "stakeholders" in this process---such as Jeremy, Dave, and others members of the community, particularly those that would either create the content or write the aggregators---being the most important IMO).

This format could be a big help in simplifying the subscription of news feeds for users. Hopefully we will be able to get it done quickly!

Categories: soft.dev
Posted by diego on September 18, 2003 at 6:48 PM

google code jam 2003

Coming in October: Google Code Jam 2003. The proverbial carrot-on-a-stick is, in this case, $10,000 for a first prize and the possibility of an interview at Google. Will certainly be something to check out--it's always fun to solve interesting problems under time-pressure!

Categories: soft.dev
Posted by diego on September 18, 2003 at 10:17 AM

sun's new strategy

I spent some time yesterday watching the presentations and keynotes for Sun's NC Q3, which happened in sync with SunNetwork in San Francisco...

The highlights were McNealy's and Schwartz's keynotes, in which they finally presented the big picture of the strategy that Sun has been embarking on for the last year or so, bringing together all the stuff we've been hearing about for the last few months: Orion, N1, the MadHatter desktop, etc.

The strategy is, actually, simplicity itself (Here's coverage from Wired and eWeek. Essentially Sun is switching to offer a single integrated (sorry, "integratable" as they say, which means they're integrated but you can integrate your own stuff if you want) products, for servers, desktops, and developer environments, with flat annual subscription model for each, each comprising a different "Java System". $100 for the server stack per employee. $60 for the desktop per employee. And an additional $5 per employee for development environment (or a one-time $1895--I'm not terribly clear about these two options but anyway). That's it.

With that, you get everything, the software, training, migration, support, setup, and the ability to run this to any scale you want, no limits. That is, if you have 200 employees you pay $20,000 for all your server software needs per year and that lets you serve an infinite (well, theoretically at least) number of customers. Sounds good eh?

There's more. The license agreement for this thing is three pages. Yes. Three. Isn't that great?

It's not clear to me, however, what's the lower limit for the licenses, if there is one. Somehow I can't believe that you'd get all that for $100 per year per employee if you only have two employees, but who knows.

The other note of interest is that, of course, you have to run this somewhere so you'll need to get a ton of hardware. Guess from whom. Heh.

I wondered at times if this was all a clever ploy to sell more metal, but the strategy is reasonable, and it would set a good precedent for properly-priced, simple-licensed software subscription services. (Oh, and btw, why use employee count for licenses? Because it's one of the few measures of a company that doesn't require extensive audits to be determined).

They also announced that Sun would indemnify anyone using the Java Desktop System (Mad Hatter). This was an obvious reference to SCO's FUD, but it was unclear if it went beyond that (say, Star Office crashes and you lose a crucial document...)--I'd say it doesn't, and it only applies to litigation related to UNIX licensing. Again, who knows.

And for those who are skeptical that Microsoft might be too entrenched on the desktop, there's some anectodal evidence that the recent (okay, long-time) security problems of Windows at all levels might be beginning to make a dent. There's a story in Today's WSJ (subscription required) that talks about a number of companies that are considering switching out of a Microsoft environment on the basis of security problems alone. I'm sure that cost will also start playing into the picture, at least in part, with Sun's new offering.

This info was spread across both keynotes. Schwartz's presentation, aside from the details, went on to some demos (one of which bombed on stage). In the Mad Hatter demo, Schwartz showed the features of a standard desktop, using Mozilla, Start Office 7, etc. Nothing groundbreaking here, except that it was Star Office 7 (which is about to be released I guess...)

Much, much better than the Mad Hatter demo was a demo of a new user interface futuristic project titled Looking Glass. Now, it was totally unclear what relation, if any, this had with everything else that came before (Schwartz said that the project was open source, but after a bit of googling all I could find that was remotely concrete was this news item on it, about another demo given by Schwartz a month ago). This demo was a nice demonstration of 3D on windowed user interfaces, with perspectives, transparency (layers--you can put one window behind another and see through the first), etc. and see one through the other as though the window in front is translucent. To make space on the screen Schwartz rotated several windows back and forth along the Z axis (like opening and closing a door, performing 180 and 360 degree vertical rotations of the windows, etc. All of this while some of the windows where playing MPEG video --and the icons in the taskbar showed a minimized version of the icon. Yes, it sounds awfully familiar to OS X (except for the 3D). As I said, it was impressive but not clear at all what Looking Glass had to do with anything else.

Overall though, the strategy is reasonable and they don't even need to sell me on the idea of simplifying software and licensing. The idea of different Java Systems for different usage needs is also very cool.

The question is, will they pull it off? We'll know in a year or two.

Categories: soft.dev
Posted by diego on September 18, 2003 at 10:09 AM

rss autodiscovery, take 2

Being a weekend, not a whole lot has happened, but there have been several good comments on the topic of rss autodiscovery, which has made me think further about the choices we face in making it a reality...

As a recap, Jeremy proposed coming up with an OPML-based standard for specifying lists of feeds on sites, and Russ brought up the idea of using RSS, which some, including me, liked. I followed that up with a couple of mockups with both OPML and RSS so that we could compare that pros and cons or each, along with some comments on the apparent tradeoffs for each.

A note, before going on. The second I read Jeremy's post I thought: "but don't we already have RSD?" I immediately checked and got my answer, but I thought that, for completeness, I'd include it here. The answer is: no. RSD is intended for autodiscovery of APIs that will be used to access/modify the content programmatically. Similar solutions that have been proposed for Atom also deal with APIs rather than feeds. The bottom line is that re-using current autodiscovery techniques/specs from APIs would imply re-spec'ing them, at least partially, which brings us back to square one.

Using either RSS or OPML seems to me like a good solution that will get things done. It might not be the most perfect solution, but it will work. There seems to be some resistance to using OPML. The main basis for this resistance is that OPML can't be validated (or easily used) because its spec is relatively loose. However, I think that if OPML was used for this, it have to be specified properly; which means that what could (and could not) be done would be known and therefore it could allow validation. The fact that the current iteration of OPML cannot be validated is not enough grounds to reject it out of hand, in my opinion. A few small improvements in the spec of OPML or this new OPML-derived format would do the trick.

In summary, my (possibly narrow-minded) view is that all we're doing is agreeging on using a number of tags and the structure of a document. Any solution will look similar to any other, and I think it's eminently useful to base things on a format that can already be parsed by most, if not all, aggregators, as is the case with both RSS and OPML.

There are other possibilities though. Tima put forward the idea of using WSIL (also echoed in this lockergnome entry). As I don't know the intrincacies of the format I can't come up with an example that will re-write in WSIL either of my two mockups and be sure that it won't be broken, and for comparison purposes we need, I think, to be looking at exactly the same content. Conclusion: if Tima or someone else would have a bit of time to re-write my mock-up structure using WSIL, it would be most welcome!

Regardless of format, the main issue that seems to me would drive how the format is used is how hierarchy in the feed is handled. Hierarchy will be necessary to provide the structure used by many news sites (e.g. "Technology/Mobile Technology/Phones"). So, with a heavy emphasis on how hierarchy would be represented, here's a summary of the issues in choosing one format or another as far as I can see:

  • Regarding hierarchy, OPML is clearly a winner here since it is designed to support hierarchies. OPML would, however, properly spec'ing a couple of elements to represent the data that we'd like to represent. OPML, for example, has been variously used to specify links with url, htmlUrl, as well as others like href as this example from Philip Pearson demonstrates (in fact, Philip was actually using OPML to provide a feed directory there). That would be the extent of the work required for an OPML implementation.
  • RSS, on the other hand, is not "naturally" geared towards dealing with hierarchical content: the structure of the information represented is flat. This can be solved in one of two ways:
    1. It is possible to create an implied hierarchy within the file by using category names. All the feeds for a site would be on a single file, and hierarchy would be specified by using a forward slash "/" between category levels. Pros: simple, and it doesn't stretch the use of RSS beyond its single-file origin, and it simplifies checking for new feeds on a given "watched" site since a singet GET is required. Cons: it would be a semantic convention, rather than syntactical, which makes it harder to verify properly.
    2. The alternative is to specify sub-feed sets through the use of the domain attribute in category elements. That is, whenever a category in an entry includes a domain, then the entry is defined as pointing to another feed-of-feeds subset, rather than to a particular feed itself. A backpointer to the original "parent" feed set can be defined by using the source element on RSS entries, which gives us the good side-effect of making the hierarchy fully traversable from any starting point. Pros: the connections between feed sets and their children would be syntactically defined, thus making it easier to validate and verify, all without having to bend in any way the definition of what an RSS feed is. Cons: it makes the structure a bit more difficult to maintain (multiple files) and to access (multiple gets) which also impacts the ease of the process of validation a bit.

In his entry, Tima mentions that WSIL describes hierarchy through the use of multiple files, much like the second RSS alternative mentioned above.

A final element that would also have to be agreed upon is how this master file is usually found. Jeremy, in his original posting, proposed using a standard location similar to robots.txt, and with a standard name, like feeds.opml or rss.opml which sounds quite reasonable.

Okay, so what would be the steps necessary to be able to spec this? A possible outline would be:

  • Define which format would be used, based on pros and cons.
  • For the format used, define the structure and the meaning of the tags used.
  • Agree on a standard location for the top feed-of-feeds set.
  • Formalize the results in a spec.
How does that sound? Did I miss anything?

Categories: soft.dev
Posted by diego on September 14, 2003 at 6:30 PM

on rss autodiscovery

After reading Jeremy's post on creating a sort of "auto discovery for RSS 2.0" (and agreeing that it was a great idea) I thought, what the hell. Why not try it on for size? Here it goes, then...

The idea is basically that a site could publish a centralized directory of all the feeds it serves. This would allow auto-discovery by aggregators and suggestions to users when new feeds come online on sites they already watch.

Jeremy was proposing to do it on OPML, and Russ floated the idea of using RSS directly, based on this, which I liked.

To get some actual idea of how something would look like, I decided to whip up possible versions of this both in OPML and in RSS. So I spent some time looking at both the OPML and RSS specs and thinking how they would be used in this case (the use, of course, from my point of view which is that of a user and someone who would have to add this to an aggregator :) -- it's possible that I missed something that would be obvious to a content producer).

The results of my little experiment are here: in OPML and in RSS, for a fictitious news site "News4Humans".

RSS, being a richer format than OPML (and possibly more generic as far as content is concerned), has no problem accomodating all the elements. There are a few elements that I added to the OPML version to mirror the data exposed; even though they are not included in the OPML spec it's ok since the spec does not preclude adding new elements. That said (and as Dave mentioned specifically regarding the issue of recursive inclusion), everyone would have to agree on them or they would be useless--possibly an addendum to the spec would be useful as well.

One of the main differences is in structure. OPML supports recursivity, RSS does not. So where OPML can define the category and the feeds for that category as sub-tags, RSS needs to use the category tag, essentially making the structure flat. This seems to be fine to me, unless "deeper recursivity" (or is it recursiveness?) is needed--but I can't think of a news site with more than one level down from the main category at the moment, so I let it stand.

Second, the OPML version contains two additional tags: link to specify the feed on feeds'... well, link :) (to match the same tag of RSS, which could potentially be useful for redirects) and dateCreated, a per-entry element, the idea with this tag being that the aggregator can record when was the last time the feed on feeds was checked, and diff against this date to very easily find out which ones were added since the last check (of course, keeping a full list and doing a diff on that is possible, but then again if there are, say, 50 feeds, and the user subscribes only to one, the aggregator would have to keep all 50 to do a proper diff against number 51, which seems kind of wasteful. RSS, incidentally, supports this functionality by its own basic item date tag. And, again, the date could be useful for redirects if necessary: changing the date on an item you already knew about implies that it has moved.

As I noted in the comments on Jeremy's entry, I sort of instinctively thought that RSS was a better idea. The OPML version however looks enticingly simple and still functional. Hm. Surprising.

Anyway, which one do you like best?

Update: As I mentioned in the comments (replying to Zoe's idea of establishing hierarchy through multiple files) the issue of hierarchy is not terribly clear with RSS. I see two ways of doing it:

  • One, as Zoe proposed, using files. This would require that we agree on a convention that says, for example, that if the item has only a link and nothing else (allowed by the RSS spec--all items are optional) then the link is to a sub-directory. This is feasible and would imply, on the client that subscribes, a multi-step process to obtain the full list.
  • Two, the creation of a "virtual" hierarchy by way of category names. Already the mockup is using category to specify the main topic to which the feed belongs. If the category is, for example News/Sports and there's another category News/Politics then the hierarchy is implicit in a single RSS file, even though the actual structure is flat. This would require a single GET but a bit more processing on the data once received.
I prefer option two since option one, while enticing, implies that we would be giving two different meanings to the tag link, something that's never desirable, but it's possible that I missed something there... Also, if the "hierarchy through files" method was chosen, the connections could be made two way between the files, which is nice, by using RSS's source sub-element for item, so a feed can be traced back to its "parent" feed.

Update #2: Another advantage of using RSS as-is that I keep forgetting to mention is that the language for the feed can be specified. You could automagically define that you only want to see feeds in a certain language and the aggregator could automatically disregard anything else. The same functionality for OPML would require adding a tag for that purpose.

Update #3: I've just noticed that, in the RSS spec, the category element has a domain attribute. If the domain is used to point to sub-category feeds, then hierarchy can be achieved cleanly. Therefore a simple solution that doesn't require any changes to the RSS spec (and as far as I can see doesn't bend its meaning either) could be as follows:

  • When an item contains a link, that item points to an actual news feed.
  • When there's no link, then the category must have a domain, which points to the sub-tree. description and title in that case are the desc and title of the subfeed's category, respectively
How does that sound?

Categories: soft.dev
Posted by diego on September 13, 2003 at 12:55 AM

you like your myths rare or well done?

This pisses me off slightly, which accounts for the more sarcastic tone. In case you were wondering. :-)

I was just reading Vasanth's entry "study reveals not-so-hot java" in which he happily perpetuates myths that for some reason keep sticking to the Java platform. The only point that is half-true in his list is that Java is not managed by an open standards body. I know that many people are not happy with the JSRs, but it's half way there, and Eclipse keeps gaining momentum (memory refresh: Eclipse started about three years ago).

As for his other "points": I'd suggest this: the next time you hear someone say things like "Java is slow on the desktop" your should ask:

  • compared to what? Assembly code? and
  • Where, exactly is your proof? How about a few examples of non-performing Java desktop applications?
I have a number of examples that actually prove quite the opposite to the slow-on-the-desktop claim.

"Write once, run anywhere not true" he says. Really? Then how is it possible that I could write a client application that was deployed successfully on everything between Windows, Linux, MacOS, and even OS/2? Without a single line of platform-dependent code? Is it magic? I can't remember any chanting being involved...

And as far as those much discussed "scalability problems", hey, isn't eBay's use of J2EE enough proof that Java can scale?

Case closed.

Categories: soft.dev
Posted by diego on September 12, 2003 at 8:35 PM

simplicity applied

I was going off the deep end in Win32 (yes, I know...) to finish some tests I need to do for my thesis research, and I decided to get some instant gratification by doing something simple: update my templates since there were a couple of weblogs in my blogroll that had recently changed location, and check out my feeds to see what, exactly, was being generated. I had been using the default movable type templates (which in my installation, an upgrade from 2.4 or something, where RSS 0.91 and RDF). In the process I discovered that my original 0.91 feed did not validate due to the date format, which was not RFC 822 (as the RSS spec requires), but ISO 8601...

So I went looking and I found that Movable Type now has a template for RSS 2.0 feeds. Nice! Grabbed it updated, and tested through the feed validator. It worked.

So far so good.

Then I read Sam's great presentation on RSS at Seybold, and there he had a mention of "Funky" feeds.

I remembered that a big argument had started a few weeks ago in this regard. At the time the discussion had turned ugly so fast that I simply stayed away, and didn't even follow it that much.

But now I was intrigued. So I started looking at the RSS 2.0 feed that was being generated by MT, and I understood what the discussion was about.

What was happening was that MT's RSS 2.0 template was using Dublin Core elements to replace elements for which RSS 2.0 had equivalents.

Aha! It wasn't clear to me why this was being done. The feed was valid, true, but somehow it didn't feel quite right... I felt it was like using JNI to access C code for, say, calculating the tangent of a value when using java.lang.Math would suit just fine.

If RSS 2.0 had the elements, then why replace them with something else? I revisited the discussion a bit and saw that Mark had argued that DC elements were more of a standard than RSS 2.0 equivalents, which was a fair point but still didn't quite explain why you'd require aggregators to deal with additional namespaces when you could get away with simply using "built-in" tags. Besides, it was Mark's opinion, rather than MT's, so as reasonable as his argument was it didn't definitely explain why MT was going in a certain direction. Furthermore, I didn't quite agree with the logic; as much as I like the idea of Dublin Core, I'd prefer to go with built-in elements any day of the week (as Atom has done, btw, in not using DC elements even when it could have done so). I now had the opportunity to follow up on what I had been talking about a couple of days ago regarding simplicity, with something small but concrete.

Okay, so I started investigating more and trying to change the feed template into pure RSS 2.0 (no namespaces). Everything seemed to be going fine until I hit the pubDate and lastBuildDate elements. MT was using, for example, dc:date. When I tried to take the date "out of the namespace" it didn't work, even if I changed the formatting to match that of RFC 822. Why? Because MT does not have a tag to generate RFC 822 timezones. The only tag to generate a timezone included in MT is $MTBlogTimezone$, which generates ISO 8601-style timezones.

Things now started to make sense. MT didn't have a tag for that, hence the best way to generate a valid feed was to use an ISO 8601 date, which can only be included if you're using Dublin Core elements, rather than the RFC 822 elements that the feed requires. And after you include one namespace, well, why not do it all on namespaces, since the line has been crossed so to speak. At first I thought that this "line crossing" had been because of the use of category in an entry through dc:subject tags, but rechecking the RSS spec I saw that RSS 2.0 has a category tag for items as well as for feeds, so that wasn't it. It was only the date that was bringing this whole cascade of namespaces tumbling in. That's my theory anyway. :-)

Regardless of why this was happening, I was sure there must be a solution. The Movable Type tutorials at feedvalidator.org were empty, so no luck there. One googling, though, turned up John Gruber's RFC 822 plugin for MT. John's plugin adds the $MTrfc822BlogTimeZone$ tag, which is all that was missing to generate the correct date. Great! Now I had all I needed.

The result is this template which depends on John's RFC plugin and generates valid RSS 2.0 with the tags that I need and avoids using namespaces (maybe when adding more functionality not supported in the base spec, namespaces will be necessary, but I prefer to avoid them if possible). Now I have a pointer for both RSS and RDF feeds on the page. Still have to re-generate the whole site, though, which will take a while.

Phew!

Categories: soft.dev
Posted by diego on September 12, 2003 at 11:46 AM

the power of simplicity

and this from the it-seems-obvious-in-retrospect dept...

hit by a virtual hammer

This rant has been brewing in my head for a couple of weeks now, but every time I started writing it fizzled out, for whatever reason. I can see that this might seem obvious to a lot of people. It wasn't to me though! :)

The thought process behind this started when I spent a couple of hours writing my google-rss bridge. While I had written an RSS reader component, I had never written an application that created RSS. I was pretty astonished at the power of a standard that can be used so easily both ways. It was like being hit by a virtual hammer. I thought about it further when I added Atom support to it, and I mentioned some of these issues in the entries in a roundabout sort of way...

What jelled today in my head was the distinction of three elements of the process that weblogs and RSS make possible.

The three elements are: Content creation, publishing, and access.

The magic is that each step can happen in a decentralized fashion. All tied together through the thin ice of a few XML and HTML tags.

HTML is very similar (and similarly disruptive), but less oriented towards decentralization because it has evolved to be oriented towards display rather than automated consumption (something that, at least theoretically, CSS was supposed to fix).

Understanding that split between creation and publishing is what brought everything together for me. Where I write the content (creation) has nothing to do with where it resides (publishing). The last part, access, is obviously separate. The other two weren't, at least not to me, before today.

And why is the creation/publishing split important? Because it's what drives the full decentralization of the process, and the one that makes simplicity a lot more relevant than before. Without full decentralization, simplicity is a lot less powerful. Decentralization+simplicity means that everyone's invited to the game. After all, if creation and publishing are together, if you need expensive or complex centralized infrastructure (and let's face it, infrastructure to publish web content is no child's play) to set up a content system, no matter how simple the content format or protocols themselves are, it will still have limited impact.

the price of complexity

Complexity plus its associated cost and monopolies (or oligopolies) go hand in hand, since they constitute one of the most important barriers of entry. But weblogs and feeds, as the web itself, have split the lever of power: now the glue that ties the components together is as much a point of control as actually creating the clients or the servers themselves. In the web in particular, as better development tools for both clients and servers have evolved, the format itself became the most important element that brought complexity into the equation. And the reason, I think, is the separation that happened split between creation, publishing, and access.

Consider, first, HTML. In the days of HTML 2.0, it was relatively trivial to write a web browser. The biggest problem in writing a browser was not, in fact, in parsing or displaying HTML: it was in using the TCP/IP stacks that at the time were difficult to use. Over time, the shift of complexity into HTML has brought us the situation that we have today, where writing a standards-compliant browser requires huge investment and knowledge, and the earlier barrier of entry (the network stack) is now easy to use and readily accessible. Sure, there are many web browsers in the market today. But power is not distributed evenly. HTML 4.0 raised the bar and in fact IE 4 won over many people simply because it worked better than Communicator 4 (I was one of those people).

What I realized today is: there's a huge side effect that the format has on content access: monopolizing the market for access becomes easier the more complex the content format is.

My point: This should give pause to anyone in the "go-Atom-crowd", including me.

Keeping the barrier of entry low applies in more than one case, of course, but here it's crucial because weblogs, creating RSS feeds and accessing them, etc., is just so damned easy today. This allows developers to concentrate on making the tool good rather than dealing with the format.

Dave has said things along these lines repeatedly, but honestly I hadn't fully understood what he meant until now.

Let's see if we're all on the same page. The message is: It is no coincidence that basically every single RSS reader out there is high-quality software.

Big and small companies, single developers, groups, whatever.

A simple statement, with profound implications.

Back to Atom.

I am not implying that the (slight) additional complexity found on Atom will make it fail. I am saying that its increased flexibility brings on complexity that also increases the barriers of entry for using it with the consequent loss of vitality on the area. Without proper care, these barriers can slowly chip away at the ease with which tools can be created, and in the process split the fields into incomplete or low-quality software used for tinkering and mainstream software available for general users, with a lot of entries in the first category and a few on the second.

and why is this important?

This stuff matters. There are many examples today of how weblogs are changing things, from influencing politics to breaking down proprietary software interfaces and affecting how the spread of news itself happens. In my opinion a big part of that is because weblogs really, finally, put the power of publishing on individuals' hands, something that "the plain web" had promised but had failed to do (after all, here's-a-picture-of-my-dog-type-homepages were around for quite a while without anything interesting happening). But if barriers are raised, the Microsofts and the AOLs suddenly have a fighting chance.

Jump to the future: Microsoft announces support for Atom, built into a new IIS content-management system. Great! Says everyone. Then you look at the feed itself and you discover that every single entry is published using content type "application/ms-word-xml". This wouldn't be new. Already Microsoft claims to great effect that Office supports XML but everyone knows that trying to parse a Word XML document is literally impossible. XML is too generic to be taken over though, in a sense it was designed for that, as a template for content formats. HTML wasn't, but it was subverted anyway. RSS is, quite purposefully I think, holding out. With Atom coming up, there's a chance it might happen.

I hate to point out problems without also proposing at least one possible solution. So: An extremely simple way of getting around this problem would be to specify that text/html content is required on a feed. If it's not, then your feeds don't validate. That simple.

Call it the anti-monopoly requirement. :-) The same focus on simplicity should be, IMHO, the drive of every other feature.

Another example: Today, with email, Microsoft has used MIME to great effect to screw up clients that are not Outlook or Outlook Express. It couldn't happen with simple plain text. But MIME allowed it. The result: people can send each other Microsoft-generated HTML that only the Outlook+IE combination can display without hacks.

We can't allow that to happen.

Some might argue (quite persuasively) that it doesn't really matter whether content-creation is slightly more complex. To that I'd say: it's just my opinion, but I think it does. It could also be argued that this is all simply a matter of evolution, it was bound to happen, etcetera. But it wasn't "bound to happen". We are making it happen. It's in our hands.

If the rise of the web was a lost opportunity in this sense, well, amazingly, we have been given a second chance.

Let's not blow it.

Categories: soft.dev
Posted by diego on September 10, 2003 at 9:03 PM

the new Swing GTK look and feel in JDK 1.4.2

I took a few minutes today to test clevercactus against the new GTK look and feel, introduced with JDK 1.4.2, in my Red Hat 9 Linux machine. My first reaction was sheer horror at seeing how awful the application looked. I think I even blacked out for a moment.

A little investigation showed what was at the root of how the app looked: the fact that the GTK does not depend on a "typical" Swing L&F but rather defines its own dynamically, based on gtkrc files and it ignores the programmatic settings you might give to your components.

Let me say that again: the Swing GTK L&F ignores the programmatic settings you give to your components.

Are you settings your own borders for, say, a panel? Gone. Different colors for menus? Poof. You prefer a different font for your lists? Sorry, can't help you. Changing the look of a button by setting setBorderPainted(false)? Bye-bye.

But no fear, all of these things are set in the gtkrc file. Therefore, whatever stuff you were doing programatically now has to be duplicated in the RC file. And there is a relatively simple way to load (ie package) your own RC file for your application. Which means that, yes, you can modify the L&F but in a non-Swing-standard way.

In the end, after some tinkering with the RC file, cc still doesn't look quite right: the default colors and fonts for lists are all wrong and I can't find which setting is responsible for that. Using the Metal L&F (or Motif) on Linux is still the only viable option until I get a decent RC file in place.

Overall, the new GTK L&F is a good addition. We just have to hope that by the time Javasoft makes it the default L&F for Linux (something that's due to happen in JDK 1.5) programmatic overrides work exactly as with the other L&Fs.

Categories: soft.dev
Posted by diego on September 10, 2003 at 3:21 PM

SWT: first impressions

After spending a couple of days actually using SWT and trying out things, these are my first impressions.

First, for an IDEA-junkie like me it takes a while to adapt to Eclipse. There are a few refactoring functions that just aren't there and the editor behaves just... well, weird. But that's not a huge issue.

Specifically about SWT, it is simple and works reasonably well. However, it is too simple. In fact, it is downright primitive, and it seriously changes the way you think about operating system resources (more specifically, Graphics resources). Maybe that's good, but being used to the idea of Swing, where you can create components or colors or whatever and move them around and pass them between contexts with impunity, it is, well, shocking to, for example, not be able to create a component without a parent.

This is more a change in style (application-oriented, rather than component-oriented). What's a bigger problem is how completely, utterly primitive the tools to deal with graphics are. (Yes, even more primitive than AWT). Take, for example, Fonts. You can create a font (and remember to dispose of it!!) but if you want to calculate the length of a string on that font, you're out of luck. In fact, if you want to calculate the length of anything related to fonts without referencing an existing GC (SWT's Graphics) context, you're out of luck altogether. It can't be done. (While in AWT/Swing you have Toolkit.getDefaultToolkit().getFontMetrics(font)). Even if you do get a FontMetrics with a reference to a GC, the methods you do have are simply pathetic: getAverageCharWidth(). That's it. There's another method in GC to obtain the actual length of a character (getCharWidth) and the length of a string (textExtent). Color management is also bad: essentially the only way to create colors is to use directly the RGB values -- no predefined constants for anything, not WHITE, not BLACK, and no way to do what's so useful in Swing, call a brighter() method to obtain a variant of the color. (And, again, once you create them, they have to be dispose()d of.)

Lists, Trees, Tables and TreeTables are good, and in fact they are easier to use than Swing. But they are wayyy less customizable. For example, you can't insert an arbitrary component on a table. You can only show strings (single line) or images, or other one-line components (like a combobox). More complex components are also lacking. Take, for example, rich text editing or display. While the JEditorKit in Swing might be a massive nightmare, at least it exists. SWT has no equivalent to it. JFace, which is a higher-level library built on top of SWT, is an improvement but not enough.

On the other hand, Eclipse itself is built on SWT and Eclipse does have some of these components. It's not clear, however, how to access them. Documentation is improving, but still lacking.

Now for the good points: the platform is thought as a layer on top of any OS, rather than an independent platform, so it has some simple ways of doing crucial things that the JDK should have added long, long, long ago (think 1996 :-)). Example: launching the default program for a document. In standard Java, you have to resort to ridiculous Runtime.getRuntime().exec() calls that fail half the time and have to be tested in more combinations that is possible. Eclipse, on the other hand, has a handy Program class that lets you obtain the program for a given file extension as follows:Program.findProgram (".html"); and then obtaining the icon (cool!), launching it, etc. Native browser support is currently in beta and it works relatively well, the only question that opens up is whether it's reasonable to resort to platform dependent browsers when you are bringing in all their baggage (I'm thinking of security problems mainly, yes).

And, programs in SWT look fantastic, without a lot of work (Programs in Swing can look fantastic, but only with a lot of work), . In particular if you have ClearType on Win XP, it's a huge improvement, something that can't be done in Swing at all. Even antialiasing doesn't look too good, and it's a hack to use it. Swing can use it though, as the excellent Mac OS X implementation of JDK 1.4.2 shows, so if only Sun would get really into supporting the desktop and doing a reasonable implementation of text rendering for WinXP...

For many people, I think, SWT would be a good choice. For programmers that are only now approaching Java, I get the feeling that it is definitely easier than Swing. OTOH, it's less customizable once you get a handle on it (the reverse of Swing), although I imagine that will be fixed as the platform evolves.

Overall: As Gordon Gekko says in Wall Street: "Mixed emotions Buddy... like Larry Wildman going over a cliff... in my new Maserati."

The Rich Text editor problem is probably the main issue that seems difficult to ignore at the moment, at least for me. Looking into that now (and have been for the last few hours). More later!

Categories: soft.dev
Posted by diego on September 9, 2003 at 12:27 PM

now with Atom support: java google feeds bridge

After last week's experiment of a Google-RSS bridge in Java, I took the next step and decided to check out how hard it was to generate a valid Atom feed as well. The result is an update on the Google bridge page and new code.

The idea, as before, was to write code that could generate valid feeds with as little dependencies as possible (It might even be considered a quick and dirty solution). For reference I re-checked the Atom Wiki as well as Mark's prototype Atom 0.2 feed. In the end, it worked. Adding support for Atom begged for some generalization and refactoring (which I did) but aside from that adding Atom support took a few minutes. Here are some notes:

  • ISO 8601 Dates are terrible. I'd much rather Atom had used RFC 822 dates, which are not only easier to generate but way more readable. It's true, however, that once you have the date generator working it doesn't matter. But boy are they a pain. I put forward my opinion on that when date formats were being discussed, but I didn't get my way.
  • I was confused at the beginning regarding the entry content, particularly because of the more stringent requirements that the feed puts on content type. For example, the content of an entry must be tagged with something like "<content type="text/html" mode="escaped" xml:lang="en">". Now, I must be honest here: about a month ago there was a huge discussion on the Wiki about whether content should be escaped, or not, and how, but I didn't think it was too crucial since, on the parser side, which I added way back when in July to clevercactus, it's pretty clear that you get the content type and you deal with it. But I was sort of missing the point, which is generation. When generating... what do you do? Do you go for a particular type? Is it all the same? Would all readers support it? The pain of generating multiple types would seem to outweigh any advantages...Hard to answer, these questions are, Master Yoda would say. So I went for a basic text/html type enclosed in a CDATA section. (Btw, enclosing in CDATA doesn't seem to be required. The Atom feed validator was happy either way).
  • Another thing that was weird was that the author element was required, but that it could go either in the entry or the feed. I understand the logic behind it, but it's slightly confusing (for whatever reason...)
Overall, not bad. But Atom, while similar to RSS, is more complex than RSS. While I have been able to implement a feed that validates relatively easily, it concerns me a bit that I might be missing something (what with all those content types and all). Maybe all that's needed is a simple step-by-step tutorial that explains the "dos and don't dos" for feed generation. Maybe all that's needed is a simple disclaimer that says "Don't Panic!" in good H2G2 style.

Is it bad that Atom would need something like a tutorial? Probably. Is it too high a price to pay? Probably not. After all, more strict guidelines for the content are good for reader software. I thought "maybe if there's a way to create a simple feed without all the content-type stuff..." but then everyone would do that, and ignore the rest, wouldn't they.

Of course, maybe I misunderstood the whole issue... comments and clarifications on this area would be most welcome.

I guess there's no silver-bullet solution to this. The price of more strict definitions is loss of (some) simplicity. The comparison between a language with weak typing (say LISP) and one with strong typing (say, Java) comes to mind when comparing RSS and Atom in this particular sense. I think that I would go with RSS when I can, since it will be more forgiving... on the other hand I do like strong typing. But should content be "strongly typed"? I'll have to think more about this.

Interesting stuff nevertheless.

PS: there's a hidden feature for the search. It's a hack, yes. It might not work forever. Still worth checking the code for it though :-)

Categories: soft.dev
Posted by diego on September 5, 2003 at 10:26 PM

a google-rss bridge in java

I've been experimenting with the Google API for an upcoming Google-query feature in clevercactus and I thought that it would be cool to show how simple is to generate RSS 2.0 feeds (which validate) from within Java based on the Google query results, using only JAXP and SAX, which come built-in with JDK 1.4. (Incidentally, if you haven't checked the RSS spec recently I recommend another look, there have been a few --but key-- clarifications and improvements made to it in the past few weeks).

The result is Google RSS Feeds which is a simple (emphasis on simple :-)) web service using pure Servlets that generates an RSS feed based on a query. (The feed generation is encapsulated in a separate class though, so it can be reused in other contexts). The whole thing took about an hour to write, package and deploy. Not bad. :)

The page linked above is an example of the service working; and since the idea of all of this is to provide a sort of "code-tutorial" for Java, the sources are also available, provided under a simple (again!) open source license that allows modification and redistribution, both in source and binary forms, as long as the original copyright notice is maintained.

As always, comments (and in this case, improvements!) are most welcome.

Enjoy! :-)

Categories: soft.dev
Posted by diego on August 30, 2003 at 1:04 AM

diego's excellent symbian adventure, part two

Mobitopia

in which diego discovers that J2ME is a lot less, and a lot more, than previously thought

[Part one, notes].

I concluded in part one that native development in Symbian is a difficult proposition at the moment if your main concern is, like mine, minimizing development time by targeting as many devices as possible with a single code base. Creating 80-85% portable "native" Symbian apps is possible, but complex, and difficult for new developers. Because of this, for many applications, J2ME will be the way to go.

micro-Java, not micro-J2SE

J2ME is truly a "micro" version of Java, rather than a reduced J2SE (Here's a good take on the topic from Russ). When J2ME was first launched there was another "contender" called Personal Java (based on JDK 1.1.8) which we'll ignore since it's currently being phased out by Sun. To get some context, let's look again at the varieties of Java:

  • J2ME: a Java-based platform for small/memory limited devices.
  • J2SE: the Standard Java platform for desktop/mobile computing and above.
  • J2EE: Enterprise-oriented additions to J2SE, mostly in the forms of new libraries and APIs that plug into enterprise runtime environments.

J2ME targets areas that, unlike the (slightly) more uniform target of J2SE/J2EE, require support of widly different devices and capabilities, in terms of I/O, processing power, memory, and everything in between. As a consequence, J2ME is actually a set of specifications. Each specification targets a configuration, which then can be further defined by using it with different profiles.

And herein begins the acronym-fest. There are, currently, two main types of configurations. The CLDC and the CDC (not a bad name, in our virus-ridden times). The CLDC ("Connected Limited Device Configuration") is a low-end configuration target: cellphones, low-end PDAs, etc. CDC (Connected Device Configuration) on the other hand targets everything between the high-end of CLDC and the low end of devices that begin to support J2SE. (CDC + the Foundation Profile, the Personal Basis Profile and the Personal Profile define the equivalent of Personal Java but with more flexibility, which is why Personal Java has been discontinued, btw).

Now, both the CLDC and CDC reference VMs differ from the J2SE VM, and from each other. I'll be most interested in the CLDC VM (The KVM, or Kylobyte Virtual Machine), since it's the one used in most Symbian phones. The KVM is limited compared to a J2SE VM, not just in features (e.g., no advanced JIT techniques) but also in capabilities (e.g., floating point arithmetic is not required by the CLDC spec).

Talking about VMs is all well and good, but in the end what makes Java a platform is its libraries as much as it is its VM. So what about libraries in J2ME? Well, that's where the profiles come in.

profiles

Profiles add packages and classes to configurations, and each configuration has one or more associated profiles. And for CLDC, the most popular profile is one called the Mobile Information Device Profile or MIDP.

The MIDP adds basic networking, UI elements and minimal storage capabilities to the CLDC, and it is normally used for wireless devices (phones, PDAs, etc). Recently Sun announced the release of a new an improved MIDP, MIDP 2.0. But most Symbian phones implement CLDC with MIDP 1.0, and only one so far (the Nokia 6600) supports MIDP 2.0 (that I know of). SonyEricsson's P800 supports Personal Java, but that's a dead end since it has been discontinued.

In general, profiles try to use subsets of classes from J2SE. Whenever a class from J2SE is used, only methods already existing in the J2SE version can appear in the J2ME version (i.e., no new methods can be added). That's why the classes/packages some of what we already know from J2SE (say, java.lang.*), or are completely different (the storage classes in MIDP 1.0).

MIDP allows various optional packages such as the Mobile Media API and Wireless Messaging API which are included on some phones such as the Nokia 3650 and newer APIs such as the Bluetooth API on the 6600, as well as APIs in development such as 3D Graphics API for future models.

enough with the theory

After a while it's quite clear that the best way to do multi-device deployment with J2ME is to use MIDP 1.0. Using this profile, you can create Midlets, similar to Applets. Like applets, midlets are limited in their access to the local device, something that is even more visible because of the limitations of the MIDP 1.0 API (PDF, 171 KB).

Now for getting together a J2ME development environment for Symbian requires. To start with, I'll need Sun's J2ME WTK (Wireless Toolkit) either version 1.0.4 (for MIDP 1.0) or version 2.0 (for targeting MIDP 2.0). So I get WTK 1.0.4. Plus, let's say that I'd like to test with a Nokia simulator. I need to get the Nokia Developer's Suite for J2ME, plus an emulator such as the Series 60 Emulator/SDK (which for this phone includes cool things like the Bluetooth Java APIs--but careful! Since other MIDP 1.0 phones might not support that API yet).

The final element in all of this is the IDE integration. The steps necessary to go from cource to binary, and then to deploy it into the phone emulator (or into the phone) are Definitely Not Fun. So getting an IDE that will do them automatically is essential to keeping my sanity. The first option I looked at was CodeWarrior, but CW came integrated with WTK 1.0.3 (instead of 1.0.4) and didn't allow upgrading it. JBuilder 9 for Mobile Dev didn't have a trial download. Eclipse... well. Eclipse was an option :). But I thought that maybe IDEA, which I use for other Java development, had some Plug-In... and sure enough, there it was. Excellent. A bit of fiddling, and finally I was able to see my simple Hello World! application running in one of the default "phone covers".

Note: a big Thanks to Russ, who guided me in my search for tools and toolkits. Without his help, it would all have taken a lot longer than it did.

Now, this is all a lot more complicated than it needs to be, and Sun isn't necessarily helping matters. Hopefully the situation will improve in the near future, to make it easier for new developers to approach J2ME on Symbian platforms (and others, too).

native v. java

What is probably one of the most ridiculous limitations of MIDP 1.0 is its inability to access the device's store (e.g., contacts, calendar entries, notes, etc). This hugely limits the kinds of useful applications that can be written in Java (which in many cases will want to interact with phone data and services). And there's still no standard to access the bluetooth functionality (although it's coming, too).

At a minimum, creating a prototype with J2ME should be easy enough (once the development environment is set up) so that then certain Symbian devices can be specifically targeted with native development. J2ME apps can be targeted to devices beyond symbian, such as Palm. However, they have some serious limitations. If an application needs to access low-level functions or device data, native development is a must. Otherwise, J2ME might be a good alternative to simplify portability.

Categories: soft.dev
Posted by diego on August 24, 2003 at 9:12 AM

location and services

(Now to change the topic from all that email-related whining...) Jamie has an excellent entry comparing the differences between JXTA and Jabber. Choice quote:

The key thing JXTA can do which Jabber can't, is Discovery (Jabber browsing and any other Jabber discovery specs aren't really up to the task as far as I can see). But then Jabber has presence information, permanent user addresses and the facilities to handle 1 user, multiple devices, all of which would have to be built on top of JXTA.
Okay, since this is what my research is on ;-) I wanted to add my 2c about this topic. What Jamie refers to as Discovery is more generally considered to be part of Resource Location and Discovery which is a huge area that covers everything from DNS to LDAP to other seemingly unrelated stuff like Mobile IP (consider that a MobileIP node has a home address, and essentially systems that want to connect to the mobile node have to use that home address as a sort of mini-dynamically updated location server to find the node). Discovery implies finding resources that match certain characteristics, which might then require an additional step to locate them. Typical case of discovery is finding a printer in your immediate vicinity. One you've found the name, a location service of some sort will be used to connect to it.

Now, in that sense, Jabber does not have discovery, unless you count some kind of centralized directory of Jabber users as a source for discovering Jabber IDs, but it does provide location. JXTA, on the other hand, due to its nature allows discovery of services.

But, I think that was Jamie was referring to as "JXTA has discovery" (since he was talking about it in the context of ad hoc networks) was actually the potential of JXTA to perform self-organizing location (it can do discovery too, but that's another matter--and, Jamie, corrections welcome if I got it wrong :)). Consider that an ad hoc network might not be connected to the Internet. Then there's no fixed infrastructure available (to, say, talk to a Jabber server) and Jabber clients in an ad hoc environment won't be able to find each other. JXTA on the other hand works perfectly well. JXTA is "blowing up" the location service that Jabber provides on servers, making it work in self-organizing fashion.

Then there's the problem of routing. Once you've located the client you want to talk to, you want to connect to it to communicate, request tasks, etc. With Jabber, this happens over TCP/IP, which is simple and well understood. JXTA adds another layer on top of TCP/IP. This additional layer may or may not be written using a direct TCP/IP channel: the routing could be happening across the JXTA network itself. And that is a problem in some situations.

It's only a terminology mismatch between what's in my head and what Jamie was saying though, as his conclusions hit the mark: JXTA is good for location/discovery, particularly in ad hoc environments. But if you're on the Internet, or if you have access to fixed infrastructure, a centralized system like Jabber is probably the way to go. And, in all cases, it's uncommon that anyone will want to do routing over anything but the logical transport, such as TCP/IP or, in the case of wireless ad hoc, maybe DSR or AODV.

Plus: unrelated-- Jamie mentions a Slashdot discussion on the relevance of PhDs, and how they might affect job opportunities. Surprinsingly enough, the discussion is actually relatively civil (for /. that is). The idea that a PhD hurts your employment opportunities sounds just plain silly to me. Ethernet created a multibillion dollar industry, and quite literally changed our lives. For more recent developments, consider that REST was also the product of a PhD thesis.

There are people who fall on a PhD since they have no idea of what else to do, and that's lame. Any kind of multi-year commitment to something like that should imply that it's something that you really want to do, rather than doing it because it gives you status or because society saysut areas that interested me, and, if I could, create something new that could at so, or whatever (I feel the same way about any third-level education, btw). As for me, I got into it because I really wanted to do it, I wanted to create something new and learn about the "really bleeding edge" in the process. As far as I'm concerned, the same thing can be achieved in other ways. It's just a matter of which way seems right at a particular moment. (Very new-wavey kind of idea, I know. Heh).

Categories: soft.dev
Posted by diego on August 20, 2003 at 7:57 PM

the one-minute-guide to JXTA

After my IEEE article on P2P network topologies I got some comments (including one in the entry) about JXTA. Recently I was looking in more detail at the JXTA site and I realized that there's a lot of documentation, but a lot (and I mean a lot) of it is in PDF form, and so not "browser ready". Heretics that suggest that opening a PDF inside a browser is still web browsing--so comparing HTML with what is essentially a Postscript derivative and the experience and simplicity of dealing with one as with the other--should continue along their path and don't bother trying to convince me that PDF is like the web but better, or something like that. It's a nice format for eBooks and documents meant to be printed or archived at high resolution, or where formatting is key. Period.

But I digress. Lack of HTML documentation in the JXTA site (at least docs that are easily accessible for intro material, the "tutorials page" is quite useful as a hands-on programming guide however) seems to be a problem to those looking for a short introduction. So here goes a one-minute introductory guide of JXTA concepts (ie., sans programming examples). For more detail check out this longer article from a couple of years ago at O'Reilly's website. Also useful.

intro

JXTA is a generic, protocol-independent P2P system. JXTA is not based, or dependent on, Java. JXTA depends on XML as a message passing format, and nothing else. Although the first implementation of JXTA was written in Java, there are JXTA implementations under way in other languages, such as C, Perl, Python and Ruby.

JXTA is open source, and its code is licensed with the Apache Software license. There always seems to be a fair amount of activity in the projects section (including the core), quite a number of projects going on, even if some of them are not evolving too fast.

abstractions and protocols

JXTA is a set of specifications to handle the core functions of P2P communication. Through these protocols and abstractions, JXTA establishes a P2P network on top of the Internet and non-IP networks, allowing peers to directly interact and organize independently of their network location. All nodes connected using the JXTA system form the JXTA network. JXTA employs five abstractions to abstract existing computing networks:

  • Addressing. A uniform peer addressing scheme that spans the entire JXTA network. Every node is uniquely identified by a Peer ID, connected to a peer endpoint. Peer endpoints encapsulate all the network interfaces available to a node.
  • Peergroups. Nodes can organize autonomously into protected domains called peergroups. A node can belong to as many peergroups as it wishes. Users, service providers, and network administrators can create peergroups to control peer interactions.
  • Advertisements. Advertisements are XML documents that allow a node to to publish and discover network resources through a uniform interface.
  • Binding. JXTA defines a universal binding mechanism, called the resolver, to perform all resolution operations required (for example, resolving a name to an IP address, binding an IP socket to a port, or locating a service).
  • Pipes. Pipes are dynamically defined communication channels that enable services and applications to advertise communication access points through advertisements. Pipes allow nodes to dynamically connect to each other. Pipes can be of multiple types: point-to-point, point-to-multipoint, encrypted, and others.

These abstractions are handled through a set of protocols (although the mapping between abstractions and protocols is not
one-to-one), as follows:

  • Peer Discovery Protocol. Enables nodes to discover services on the network.
  • Peer Resolver Protocol. Allows nodes to send and process generic requests. This search/retrieval mechanism is what enables nodes to locate each other, to find peergroups, find services, and retrieve content, also performing authentication and verification of credentials.
  • Rendezvous Protocol. Defines the details of message-propagation between nodes.
  • Peer Information Protocol. Allows nodes to obtain information about other nodes in the network.
  • Pipe Binding Protocol. Provides a mechanism to bind a virtual communication channel to a peer endpoint.
  • Endpoint Routing Protocol. Provides a set of messages used to enable message routing from a source peer to a destination peer.

Each JXTA protocol defines a set of XML messages to coordinate one of the aspects of JXTA interactions. In JXTA, all resolution operations are unified under the simple discovery of one or more advertisements. All binding operations are implemented as the discovery or search of one or more XML documents. However, JXTA does not specify how the search of advertisements is
performed
. Each peergroup can tailor its resolver implementation to use a decentralized, centralized, or hybrid search approach to match the peergroup's requirements.

back to the real world

JXTA has made a lot of progress recently, but using it can still be difficult at first, particularly if you'd like to tinker with the source (check out the instructions in the core build download page and see for yourself), and in some cases its performance is terrible. However, it's very useful as a base to whip up a quick P2P prototype (quick in terms of development times, not in performance :-)).

Hm. If I keep typing, this is going to take longer than one minute to read. Okay, I'll stop. Done.

Categories: soft.dev
Posted by diego on August 17, 2003 at 10:28 AM

in praise of RMI

I meant to write this sooner; now I'm running a long test on an algorithm and I've got some time; so here it goes.

Ever since Java RMI got released, it was derided by critics at all levels. Too slow, they said. Hides too many of the details, they said. Does not hide enough, said others. And everyone hated those checked RemoteExceptions. In my experience, however, all of those complaints are baseless, or simplymiss the point.

Not to boast >grin< but I was one of the first people to actually write a useful application in RMI (See this Sun list of applications--the one I wrote was the real time collaboration server for dynamicobjects' perspectives, and btw, dynamicobjects was not a corporation, it was a development group of me and three friends, but we were labeled as a "corporation" for some reason). When I wrote the application RMI had just been released, and, even then, it just worked. And it worked well. Of course, the context in which it worked was a LAN. Forget about the internet--too much latency. Problems with firewalls. Etcetera.

Since then, RMI has become a lot better: IIOP interoperability, proxies, and a very cool automatic fallback mechanism that defaults to using RMI-over-HTTP when a firewall is preventing TCP communication.

The "Internet problem" still remains, though. Latencies over TCP are an issue, the marshalling/unmarshalling of the objects is partially an issue (since it increases data transfer requirements through serialization), and the fact that RMI hides so many of the connection details from the developer means that it's more difficult to create a "controlled environment" that deals properly with firewall situations, etc.

Even so, RMI is still the best choice for two tasks:

  • Prototyping of any Java networked application. Creating a network application with RMI is by far the fastest and more reliable method for getting it up and running quickly. Often, a first shot at an idea is enough to udnerstand the issues involved. Then, if the design was correct in the first place, you can just rewrite the transport mechanisms maybe even using Serialization directly over TCP sockets. This is exactly what I did recently when working on clevercactus collaboration (though it hasn't been released yet), and the process worked really well for me.
  • LAN-only applications. If the application to be deployed is LAN-only, then there is no question that RMI is the way to go. RMI is easier to debug and maintain, and it performs really well over fast, low-latency networks without firewalls.
So, in conclusion: RMI is not for everything (and no one claimed it would be, but I think a lot of the criticism comes from the misplaced expectation that it should be good for everything), but, in many cases, it's exactly what is required.

Categories: soft.dev
Posted by diego on August 12, 2003 at 12:53 PM

reverse engineering: a case study

From CMU's Software Engineering Institute: Into the Black Box: A Case Study in Obtaining Visibility into Commercial Software:

We were recently involved with a project that faced an interesting and not uncommon dilemma. The project needed to programmatically extract private keys and digital certificates from the Netscape Communicator v4.5 database. Netscape documentation was inadequate for us to figure out how to do this. As it turns out, this inadequacy was intentional-Netscape was concerned that releasing this information might possibly violate export control laws concerning encryption technology. Since our interest was in building a system and not exporting cryptographic technology, we decided to further investigate how to achieve our objectives even without support from Netscape. We restricted ourselves to the use of Netscape-provided code and documentation, and to information available on the Web. Our objective was to build our system, and to provide feedback to Netscape on how to engineer their product to provide the capability that we (and others) need, while not making the product vulnerable or expose the vendor to violations of export control laws. This paper describes our experiences peering "into the black box."
Great analysis, very detailed and extremely interesting.

Categories: soft.dev
Posted by diego on August 12, 2003 at 3:00 AM

the java almanac

Another one: Russ mentioned the Java Almanac yesterday. Tons of code samples, including organization by packages that makes it even more useful. Great resource.

Categories: soft.dev
Posted by diego on August 11, 2003 at 5:46 PM

an apology to javablogs readers

In my previous entry there was a comment from either a) someone from Javablogs, or b) someone that wished to remain anonymous while venting his/her frustration at the entries from d2r being "off-topic" from Javablogs. (Charles also wrote to say that he thought it was ok. Thanks! :))

What happened was: several months ago, when the first discussion about the on- or off-topic"ity" of javablogs occurred, I created the category indexes and so on, and edited the config of javablogs to point to the soft.dev category, which is where I'd comment on Java, etc. I tested it and it seemed to work, so afterwards I just forgot about it. Well, apparently, it wasn't working. I have just changed it by deleting the old configuration and adding a new one (since editing of the URL wasn't possible).

So, apologies for "polluting" the javablogs space with non-java related entries from my weblog. Hopefully it's fixed for good now.

Categories: soft.dev
Posted by diego on August 5, 2003 at 9:57 AM

excellent java tools

For whatever reason I kept forgetting to mention this. Anyway, not anymore. For the last few months I've been testing or using tools for Java development from ej-technologies. They are really, really well done tools, reasonably priced, with good UIs, fast, etc. They don't get in the way. They do what they're supposed to, and they do it well. Highly recommended.

Categories: soft.dev
Posted by diego on July 31, 2003 at 5:44 PM

diego's excellent symbian adventure, part one

Mobitopia

in which diego discovers that navigating a sea of acronyms requires some patience

This is the first in a series of articles that describe my experiences learning my way around the Symbian platform. Going into it, what I knew wasn't much. One of my interests was using Java to develop applications for Symbian phones. I had played with the first versions of the KVM (The Kilobyte Virtual Machine, which is at the core of J2ME) when it was first released for Palm in JavaOne 1999 but since then I had lost touch with the platform. I did know that Java's Micro Edition (J2ME) wasn't enough for many applications, and that related to its limitations was something mysterious called MIDP 1.0. I knew that I preferred to stay with Java but that it was likely that I'd have to develop native applications, most probably on C++, so, where to begin?

I first evaluated Symbian as a development platform years ago when it was mainly known as EPOC. Back then the main choices for development in it were C++ and OPL (a relatively simple high level language that is unique to Symbian OS), and Java (the full 1.x JRE) would be available for it soon afterwards on some devices. "Back then" there was, that I remember, a single SDK that you could get from Symbian and use with an IDE. I thought that the situation couldn't have changed that much since then.

I was wrong.

The best starting point for Symbian development is, at the moment, the website of the Symbian Developer Network (which is commonly referred to as 'DevNet' throughout the site even though the name DevNet appears mentioned only once in the homepage, in the 'About DevNet' link). DevNet contains a lot of information, but its organization is a bit confusing. For example, in the homepage we can see links for SDKs, Tools, Languages (C++, Java, OPL and VB). Clicking into one of the options shows even more options, which require some context. In the next few paragraphs I'll outline some of the technologies available and how they relate to each other. I will consider options available for development with C++ and Java, because that's what I know. OPL is a favourite of many in the Symbian development community, so it shouldn't be discounted (Ewan maintains a cool OPL site here).

The idea is to structure things as follows: an introduction (this section), then move on to high-level looks at C++ development and Java, and finally an overview of the different tools for both C++ and Java development.

going native

Starting off with C++, DevNet has a page for "getting started". Again, lots of information, not much context, which I'll try to provide here.

First, at the moment there are development toolkits for three versions of Symbian OS: 5.0, 6.0 and 7.0 (Additionally, there are multiple SDKs for each OS, sometimes multiple SDKs --from different vendors-- for the same language for a single platform combination!). 5.0 is a legacy product, as far as I could see, so we can safely ignore it (It is used basically on Psion handhelds at the moment).

If your interest is (like mine) development for Symbian phones, then Symbian 6/6.1 is the way to go right now, as the list of Symbian phones shows, all but two of the Symbian phones out there or soon to be released run Symbian 7.0. A few interesting devices that won't be released until closer to the end of the year (e.g., the Nokia n-gage) will run v6.x, so it's not as if the platform is being phased out.

But the real question is: if I develop a native application for, say, OS 6, will it run on OS 7?

The short answer to that is: It depends.

Binary compatibility isn't yet the issue, because the compatibility problems are still happening at a higher level. The most popular Symbian phones today use different user interface toolkits, and here is where two new acronyms come in: S60 (Short for Series 60) and UIQ.

S60 is a UI toolkit original developed by Nokia and freely licensed to other manufacturers, although the licensing has to far been, shall we say, a bit limited (to only one other manufacturer so far: Samsung). S60 phones currently on the market include the Nokia 7650, the Nokia 3650, and the Samsung SGH-D700. Well, the S60 platform is primarily oriented towards "one-hand browsing" (that is, typical cell phone use), and it includes browsing, multimedia services, messaging, PIM functions and a UI library. Sounds useful, yes. But, technically, it's not a standard since other handset manufacturers using Symbian have to license it from Nokia first, albeit for free.

UIQ on the other hand, was originally developed by Symbian and then spun off into a company called UIQ Technology and initially adopted by SonyEricsson for its P800 phone. One of the main points of UIQ is that it is a UI designed for larger, touch-sensitive screens (i.e., pen-based interfaces), and maybe devices that are a bit more powerful. UIQ runs, at the moment, only on Symbian OS 7.0.

While both UI toolkits are C++, and they are in fact very similar, they are not source-code or binary-compatible. Moving an application from S60 to UIQ implies changing the libraries you're compiling against, include files, and possibly some of the functions themselves. Another option is to use a toolkit like the recently announced S2S from Peroon, which also provides Symbian-to-PocketPC compiler tools. At the moment, then, the best you could wish for is to have to recompile from one platform to another, using this additional toolkit (Pricing or availability of S2S is unclear, since the site doesn't have downloads or information--maybe it's not ready yet).

Reading documentation for some of this technologies can be sometimes be a baffling ordeal. Consider for example this paragraph from a paper describing UIQ's approach to user interfaces: "On top of the generic technologies are the application UIs and a few other reference design-specific libraries. There’s one library of particular interest that straddles this gap, called Qikon. Qikon is the UI library for UIQ, and is derived from Uikon, the generic UI library across all of the DRFDs. Uikon in turn was an iteration of Eikon from Symbian OS Release 5." (Note that this article, while still applicable is a bit old: it applies to an older version of UIQ, 1.x, while the P800 uses UIQ 2.0--which is why the mention of Symbian 5). If you think that the "Eikon" mention is strange, consider this step-by-step introductory tutorial for developing a Hello World application for Series 60. Quote: "From the wizard dialog you can select which type of application you would like to create. Leave the "EIKON Control" option selected and specify "Hello World" as the Application Title." You see, EIKON shows up again here. We are supposed to "leave the EIKON option selected" but we have no idea why (in fact, the tutorial does not explain what EIKON is at all, but we get the impression that this Eikon fellow was quite something-- and a bit of digging tells us that EIKON was Psion's UI framework for EPOC).

It might sound a bit unfair of me to take paragraphs out of context, but I wanted to show it as an example of what one can regularly run across when reading Symbian development documents of any kind. You're happily reading and suddenly you get a sentence (like in the first example) that says: "Qikon is the UI library for UIQ, and is derived from Uikon, the generic UI library across all of the DRFDs". That leaves you thinking: DRFDs? What the hell is a DRFD???" A google search on the acronym appears to imply that it's a Demolition Remote Firing Device, but I'm sure that's not the case. Searching for "Symbian DRFD" gives only three results, but google suggests "Symbian DFRD" --note the transposition of the F and the R, which yields more results, and after a few clicks and searches we discover that a DFRDs are "Device Family Reference Designs" which are variations of Symbian OS tailored for different types of mobile devices. The Symbian FAQ at Nokia's Site (PDF) has some useful information on the acronyms and such. When clicking on Symbian articles, be careful: a surprising amount of them exist only in PDF and so you'll end up waiting quite a bit to see anything at all. Better to look at the file type on the link and download them if necessary.

The point of this digression is to give an example of what you are likely to find as you delve into Symbian development. Keep in mind that Symbian is responding to the needs of different manufacturers that compete with each other but are all shareholders in Symbian; this is bound to create some conflicts. Additionally, there are still "historical leftovers" like the EIKON references that tend to be confusing and only known to those that have been working with Symbian/EPOC for a long time.

Russ pointed me to this presentation (PDF) from the Symbian Exposium '03 on "Targeting multiple Symbian OS Phones", which is a good summary of how portability can be handled between phones running the different toolkits. Highlights are "Some application components have to be altered", and "80% of code can be kept unchanged between ports".

Things are getting better though, and we are still in an early stage of this process (For example, a few months ago Nokia and SonyEricsson announced that they'd align their developer tools for S60 and UIQ, so you'll be able to use a single toolset to target applications for both environments.)

close, but no dice

The conclusion is that, at the moment, there aren't any options for developing a cross-platform application in C++ that would port across all Symbian 6.0/7.0 devices. The rational choice is to target S60 or UIQ (and then possibly use another toolkit to recompile the sources to target the other platform). In any case, S60 is not yet supported by all Symbian phones, so you should check whether the phone you want to target supports it.

The last point in particular means that for true cross-platform applications you will have to use Java. J2ME and its flavors (with Symbian-specific information) will be the topic of the next part in this series. Until then!

Categories: soft.dev
Posted by diego on July 27, 2003 at 12:43 PM

the three-pane question revisited

There were some good comments to my entry a few days ago "the three-pane question", as well as great posts in other weblogs from Eugene (of JetBrains, the makers of the excellent IDEA) and from Cristian. Thanks to everyone for the feedback.

Both Eugene and Cristian think the UI is good given the correct implementation, while some of the comments on the entry are against it (such as Russ's, or Bruno's). My personal intuition is that this change is good. There are all sorts of theoretical UI-reasons I can come up with for why the linear three-pane is better than the "old" version. For example, the eye-movement effect, that is better (particularly for western readers, without left-to-right writing/reading system), or that it's been studied to death how we read faster and how eyes get less tired when they have to move less horizontally (If you don't believe it, pick any book from your shelf. Count the words a few sentences at random. You'll rarely find more than 11 or 12 on average, maximum. This is not a coincidence).

What I find the most interesting, however, is that several comments in the entry felt that they'd have less real-estate for the content-viewing area, which is not the case. A simple example: look at the first drawing in the post, with the "classic" three-pane view, and assume that the first level pane has a width (at the base) of 1, while the second level pane has a height (at the far right) of 1 as well. Then assume that the height/width ratio of the display is 4:3, a standard TV or monitor (not those new fancy widescreen monitors though...) Okay, so the total area is 12. The area for the first-level pane is 3, the area for the second-level pane is 3 (because one unit is lost at the beginning from the 1st level pane, so 3 by 1). The content-view pane is 3 by 2, area 6.

Now, same widths for the second case: 1 at the base for both the 1st and 2nd level navpanes, which leaves 2 of width of the content pane. Area of 1st level nav: 3. Area for 2nd level nav, 3, area for content pane: 6.

Every single area is the same. Doesn't look like it, does it?

However, now, if you want to see more rows you can (by showing only one line per row) or if you want to retain almost the same level of information and still show more rows, you can do that too. No space lost.

All that said, some people might have their own reasons for why they'd prefer to keep the old UI (resolution used, or simply more comfort, and after all, that's what UI design is all about), so providing choice on the matter is probably a good idea. :)

Besides, there are other issues very specific to email that also have to be taken into account. This display mode is great for flow-formats like HTML, but in the comments Adrian pointed out that the display of quoted text emails would almost certainly require horizontal scrolling.

I've been playing on-and-off with an altogether different possibility, I'll post more details when I have processed it enough to explain it coherently.

I will add this though: I think that one of the problems with making an objective analysis of the linear three-pane UI (for everyone, for, or against it) is that it doesn't look radical. Where's the 3D? Where's the VR? Where are the visual gizmos? You call this "innovation"? We have sort of come to expect a certain "revolutionary" feel from UI changes over the years, and the linear three-pane UI disappoints there.

Anyway. More later!

Categories: soft.dev
Posted by diego on July 7, 2003 at 12:48 AM

the sound of java

For reasons I won't go fully into at the moment <wink>, I've been looking at the JavaSound API, and of course, looking for information all over regarding known problems, best-practices, etc. I got a strange feeling when looking at the information; information was updated but not quite.... Even information on the J2SE 1.5 roadmap for javasound seemed a bit... half-hearted.

Then I found this article/entry/whatever (java.net is great, but with the weblogs in there sometimes I can't tell whether it's a personal bullhorn or really a weblog, since often they don't really post often... anyway) by Jonathan Simon where he discusses the history of the JavaSound API and the state it's currently in.

First of all, my own experiences with JavaSound haven't been as bad as he describes (constant crashes, etc), quite possibly (almost certainly) because I don't use any advanced features of the API.

So, if the state of JavaSound so bad? It seems to me that the criticism against Sun on this regard is for not turning Java into some kind of super-sound-processing platform. But is that really necessary? IMO, no. We need good, stable support of the basic functionality that works well across all platforms. Adding advanced features should be well behind this, a distant second priority.

As I've argued before, Java at this point needs, more than anything, to provide consistent user experience across platforms. It starts with an invisible installer, but it should continue with consistent behavior of the JDK across platforms. The Swing team has done a great job of making Swing work consistenly across platforms (adapting when necessary), but there are things missing yet (some hooks into common platform-dependent elements, such as taskbars, wouldn't be a bad idea). JavaSound could improve in stability and in supporting some features in software rather than depending on the hardware. From there on, we can worry about adding real-time sound processing.

Categories: soft.dev
Posted by diego on July 2, 2003 at 11:17 AM

invisible features

[via Erik, Charles]: Joel gets it right in his short comment about invisible software features:
Unnecessary UIs [...] that pop up to brag about a cool feature the developer implemented are a little bit obnoxious. Too many software developers just can't bring themselves to implement completely invisible features. They need to show off about what a great feature they just implemented, even at the cost of confusing people. Really great UI design disappears. It's a matter of taking away, not adding.
I agree 100%.

This is really hard to do, and it definitely takes discipline. For example, a lot of work in clevercactus has gone into creating deep integration without forcing abstractions, a bit like playing around with puzzle-pieces until they fit. The result is that, if you look at a screenshot, cactus looks very much like an email reader. This is both good and bad.

It's good because there is less cognitive overload (in UI-speak), and because features "appear" when you need them, and can be ignored when you don't. It's bad because users might tend to focus on one feature alone, and not see the other features at all. And, this not by any fault on the user's part, but because the program doesn't expose them properly.

For example, a couple of weeks ago Francois posted a comment about cactus on his weblog that is a good example of what I mean. He discovered things incrementally, things that were "invisible". Francois knows software, and he can poke around the program and understand what he's seeing. Most users, though, won't. They probably won't get to the point where they have to ask themselves "What is this weblog posting thing he's talking about?". They won't see it at all. And, arguably, no one should have to spend time discovering features.

So cactus needs more work in creating "soft exposure" of features that are available. A short tutorial would be useful, but the UI needs soft clues to show the user what the program can do. Clues that are soft enough so they can be safely ignored, but also visible enough so that if the user is curious, and has some time to spare, the feature can be checked, help on it obtained, etc.

Which brings me back to Joel's example.

Apparently, this is not necessarily the case Joel was mentioning. His example is a feature where there should only be one behavior, and so asking the user is overkill (and/or showing off). The feature should be invisible, yes. But somehow the user should be made aware of it. Why? Well, for starters, it's good to know what your software is doing to your data (if the data was created by for program for internal use it's a different matter), and then we have the more prosaic case in which a user wonders "I know this site changed its URL, I wonder how did the program deal with it". Giving information in this case seems eminently useful. So how to do it? In this particular case, in my opinion the URL should update itself in the bookmarks with no message, but the next time the user opens the menu the bookmark will appear with a different color and maybe with a temporary submenu or other UI widget to "learn more" about what the color means. The temporary UI widget would explain what changed and why. User satisfied, feature exposed. But only if necessary.

Categories: clevercactus, soft.dev
Posted by diego on June 30, 2003 at 12:40 PM

J2SE 1.4.2 and the J2SE roadmap

[via Matt]: A Roadmap for the Java2 Platform, Standard Edition. Some good information there.

Also, the J2SE v1.4.2 has been released. I've been using the beta for the past couple of months, and it's quite good (I do have some issues with the installation system though... we'll see if they've been addressed), particularly the WinXP L&F (clevercactus looks exactly like a native WinXP app with it). Oddly enough, the download page includes as a first set of options the JDK plus the Netbeans IDE, which might be interesting to developers (well... theoretically at least) but will be confusing for users. Bad choice of ordering I think. Anyway. Great to see the release happening on time!

Categories: soft.dev
Posted by diego on June 27, 2003 at 8:13 PM

IEEE article on overlay networks

Here's an article (PDF, 190KB) I wrote which will appear in the July/August issue of IEEE Internet Computing. (IEEE Copyright Notice: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.)

As usual, comments welcome!

It's a short introduction to overlay networks and how they compare to "standard" flooding-type P2P networks (ie., Gnutella-type). Overlays are also discussed in the literature as distributed hash tables. (Because of the way they allow exact key/value pair mappings to be done over a network, and because they support basic hashtable operations: put/get/remove). It's written more for developers (or, if you will, for a general audience with technical proficiency) rather than researchers (not enough space to go in depth into the subject for that). It's quite something to write with limited space and for a subject like this one, that tends to be err... "mathematical". I end up feeling that not all the possibilities/ambiguities are explained, that sometimes in simplifying people will get the wrong idea, etc. This always happens, on any topic, on any magazine, or journal, or even when presenting for a conference (the typical 12 or 15 page limit sounds like a lot--it isn't). In the end the only way to scratch this particular "completeness" itch is to write a book.

It was an interesting experience, spanning several months: from initial draft, review, approval... a short period of quiet and then a flurry of activity in the last week or so, where we went from ugly zero-format Word document (using the Track Changes feature in word to collaborate with the editor) to nicely finished final layout-version, ready for inclusion in the magazine. The article's editor, Keri Schreiner was great to work with, and I learned a lot from the process. There are several "Editors" involved in the magazine of course, Lead Editor, Department Editor, and so on... when you think about it, it's fascinating, a process that might take say, six months in total (from idea to camera-ready copy), happening in parallel for a set of articles that will appear in a single magazine. It's a top down process mostly. It got me thinking about how it could be done in a less centralized way, which would improve feedback for all parties. I won't go into this in more detail now, though, I still have to post a follow up on the decentralized media discussion. :-)

Categories: soft.dev, technology
Posted by diego on June 24, 2003 at 8:40 PM

a new java resource

New for me that is... :-) java.net. Even more interesting, they have a weblogs section which includes a link to James Gosling's weblog. Cool!

Categories: soft.dev
Posted by diego on June 10, 2003 at 9:34 PM

coming up: javaone

JavaOne begins today, and lasts until friday. As usual, a barrage of announcements are expected, particularly in the area of J2ME and development tools, along with a new branding campaign. I imagine there will be some discussion on JDK 1.5 ("Tiger") as well, maybe even an estimated release date (and for JDK 1.4.2 as well). Should be interesting!

Categories: soft.dev
Posted by diego on June 10, 2003 at 2:57 PM

more bluetooth and java

Jamie has some good comments to my previous post on blueetooth and java:

I could just find that Symbian OS roadmap which showed that Symbian 8 will include the Java Bluetooth API (JSR-82). Oh, and you could you use JNI (Java Native Interface) from PersonalJava and access the Bluetooth API but C++ programming on Symbian is not an easy experience. It produces great stable applications for the user but headaches for the developer (from what I remember, there are macros surrounding everything and errors are produced if the application doesn't release 100% of it's memory on exit - no mem leaks). Good but tough.
So PersonalJava does support JNI. That's good (J2ME most probably doesn't, which limits the usefulness of this to PersonalJava devices, of the new Symbian mobiles I think the SonyEricsson P800 is the only one that supports it). Jaime also talks in more detail about similar concerns regarding how flexible the API is, another important factor that I was mentioning.

Categories: soft.dev
Posted by diego on June 1, 2003 at 12:49 AM

bluetooth on java

Russ posted a cool entry on thursday about bluetooth on Java, and the lack of OSS toolkits. What seems to be more worrying is that most of the toolkits the book he mentions pointed to are for Win32 devices, Linux, or Palm, but no mention of Symbian. I guess that to use Bluetooth from Java on the new Symbian mobiles you'd have to wrap it on JNI, but I don't think that's possible on Personal Java, or even J2ME. In any case, at this stage Java would probably be too high level for this kind of thing; I'd bet (no knowledge whatsoever of this, just speaking from past experience) that the blueetoth API for Java is going to miss some options that turn out to be important, and that won't be included until a next release (An example of this is the HttpURLConnection classes in java.net.*, which still don't let you do certain things, like supporting WEBDAV by simply modifying the method type on the call, just because the API refuses to accept anything it doesn't know about).

Categories: soft.dev
Posted by diego on May 31, 2003 at 1:07 AM

'Visual Java'?

News.com: New Java tool targets simplified development. They also mention Sun ONE Studio, "which is targeted as more highly skilled developers." No kidding. Sun ONE Studio is a mess, incredibly difficult to use. We'll see what they come up with to make Java easier to program against, it's certainly doable.

Categories: soft.dev
Posted by diego on May 23, 2003 at 3:00 PM

when abstractions don't work

As I've been reimplementing the functionality provided by the JavaMail API for spaces, I've realized that JavaMail has one important flaw: its level of abstraction. That is, it's too abstract.

Theoretically, from a design point of view, the JavaMail API makes sense, you've got Folders, Sessions, Transports, and so on, with the appropriate subclasses. But then...

POP3 and IMAP4 are for receiving, but they are very different. IMAP essentially provides a server-side back-end for a mail client, while POP provides only transient storage between sender and receiver (although you can use it for long-term storage, it's not really designed for it). IMAP supports folder hierarchies. POP doesn't. IMAP supports searching and understand the structures of RFC822 messages. POP can tell you the size of a message and a server UID, or you have to get the entire message. IMAP lets you subscribe to a folder. POP doesn't even know that folders exist.

And then, of course, SMTP is for sending, rather than receiving.

Similar "impedances" could be identified, for example, for NNTP, something else that is sometimes implemented with JavaMail interfaces.

In the end, the main thing that these things have in common is that they: a) deal with Mime Messages (or RFC822 plaintext), b) they require connection, disconnection, and possibly login. Everything else is different.

So, does it really make sense to fit these wildly different systems under the same interface?

Having used JavaMail on the server side, I understand that there is an advantage to its abstraction, and it's relatively easy to use. But when you need it to build the functionality of a full email client on top of it, the abstractions start to become unwieldy.

That said, I don't think it was intended for that! (or was it?) I think it was something that provided basic mail functionality to applications that needed it, but where it wasn't central. Maybe I just got carried away!

Talk about rants that come full circle. So this was a feature of the design then, and not a flaw? Depends on how you look at it...

Yeah, too much coffee, I know... :-)

Categories: soft.dev
Posted by diego on April 14, 2003 at 10:54 PM

version control systems: the comments

Several good comments, through email and on the weblog, to my short review of SCM systems.

First point is regarding Perfoce. Jason said that I was overestimating the cost of hardware, and Chris Bailey agreed. Although I did mention the hardware/maintenance costs, I didn't say that Perforce would end up being more expensive than BitKeeper, I said: "[BitKeeper] it is probably cheaper in the long run (at the very least, the cost should be similar)". By "long run" I meant as the dev teams get bigger.

I think that it depends on how centralized and "careful" is the organization. If it's a freak-control org (and they have money!), then probably they will end up having an admin just for the SCM server (software+hardware). I think that most cases would probably be like Jason described. My guess (note: I say guess) is that the cost of BitKeeper and Perforce would end up being similar (specially since BK is leased rather than licensed). So cost would not seem to be much of an issue. And in any case, this factor can be big when you have more than a couple of dozen developers. In that case, you probably can afford either one, so again cost is not so much an issue, but rather usability and features.

In fact, if CVS was somehow wiped out of the planet today, I'd switch to Perforce. The tools are excellent, branch management is very good, it's relatively easy to manage, and it integrates well with the underlying platform. BitKeeper, while "cooler" because it is decentralized, probably makes sense only when you have an all-*NIX shop. The Windows version is just awful. That said, I think the "staging" concept of BitKeeper is brilliant, I wish all tools had it.

Chris also mentioned CVS's default that files are read-write, saying he liked it better than Perforce's "check-out-to-edit" (which is more like RCS). I definitely agree. Having to check out every single file I have to work on is a pain.

Finally, both Alsak and Christian were commenting on my shunning of Rational's ClearCase. I think they are right, it is difficult to use and complex, and I didn't consider it much partly because of that. The only Rational tool I've ever been able to use for an extended period has been Purify, and only because what it does is so simple that you'd need to work actively to make it complicated, and because when you're working with C/C++ code it is absolutely essential. The other thing that is a big problem with Rational is their licensing process, which is the most convoluted I've ever seen. They have "node locked licenses" that are essentially tied to a single machine (!), or "floating licenses" which are several times more expensive. I wasn't surprised when IBM acquired Rational, the top-down approach IBM gives to its tools (the shining exception being Eclipse) is just like Rational's.

ClearCase is an expensive, ugly beast that is probably useful if you're using other rational tools (they integrate well, I'll give you that) but in few other cases. That's why I didn't consider it much.

It seems to me that if Perforce adds some level of decentralization (for example, the ability to handle multiple server-side repositories that synchronize transparently), then allowing things like staging, it will end up being the leader for a while.

Okay, on to other things now...

Categories: soft.dev
Posted by diego on April 9, 2003 at 4:24 PM

version control systems: a short review

Last week I spent some time looking at Source Control Management (SCM) systems (previous related entries here and here). Before my impressions, it's useful to state what I was looking for.

For one, I wanted to get a feel of how SCM had evolved in the last couple of years. I've been using CVS for the last 2 years or so, and before that I mostly used Visual Source Safe. I knew that both CVS and VSS had remained largely unchanged since then, but I had also heard about new systems that had recently appeared that made some things easier. Once of the things that was important to me was better branch management, and possibly looking for something that helped the process aside from helping manage the code base itself.

Anyway, so, the three main systems I looked at were subversion, Perforce and BitKeeper. There were others I found in the process of looking, but nothing major or too evolved.

Subversion looks promising, but it's just getting started. It's not easy to find plugins for many development tools, and there doesn't seem to be one available for IDEA, which is what I use (with the new machine I tried switching to Eclipse, but kept wasting time trying to find my way around it for refactoring features and other things, including keyboard shortcuts, so I gave up after about one hour. No time for that. Another thing: I use a local CVS repository. Eclipse only supports remote repositories! So I had to install a CVS server locally to even try it--ridiculous). Subversion does have a plugin for Eclipse though.

Perforce is a good improvement over CVS. Good client GUI tools. They use their own terminology for many things (for example, they have "submits" instead of commits), which can be confusing. The client is loaded with features, but that can be confusing as well. You can use Perforce for free for up to 2 developers, after that you enter a commercial license and it's $750 per developer, including support costs.

Both subversion and perforce are server-based, so as you scale you need to add hardware (and the time of the person that has to maintain both the server software and the server itself), so for many developers the cost will actually be more than it looks just from the license (Subversion is free and open-source, btw).

BitKeeper, however, is fully distributed, so there is no server cost that grows as the number of developers grows (you have a main repository facility for all the code, but it doesn't have to scale with the organization). The main repository in BitKeeper is used only as a main storage facility were the changes from the child (and even grandchild) repositories typically end up. Another cool feature of BitKeeper is staging. It allows you to stage the commit of a set of machines to one machine (a sort of "mock" main commit) and test from there. I can see this greatly helping when work from several teams has to be integrated into a single release, where you want to test all the commits before putting everything into the "real" main repository. As for cost, BitKeeper is free for one developer, then $1750 per developer, in the form of a one-year lease, so it's actually $1750 per year. Apparently you can purchase the license outright but this makes sense only after having a 5-year license on the seat. It sounds pretty expensive, but then again there's less cost in terms of server and management, so it is probably cheaper in the long run (at the very least, the cost should be similar).

In terms of ease of installation, Perforce wins by a mile, with Subversion coming second. The BitKeeper installlation process is just awful, requiring three different sets of installs in Windows (Cygwing, Tcl, and then BitKeeper), and in the end you end up with... a bash shell. There are Tcl-based GUI tools though. BitKeeper is very clearly a UNIX tool with a Windows version that feels more like a hack rather than a product.

All of these tools support something that to me is crucial (and that CVS doesn't have): atomic commits. When commiting multiple files, the whole set is commited or nothing is, so an error in one file will not mean that the version now in the repository is now broken.

So, conclusion?

Some interesting new tools, but you need a reason (like requiring atomic commits, for example). Most small to medium sized projects are probably okay with CVS, and maybe eventually subversion. Both Perforce and BitKeeper are more sophisticated systems, but prepare yourself for the learning curve (and get out your wallet).

Categories: soft.dev
Posted by diego on April 7, 2003 at 9:36 AM

the IMAP connection

While looking for information on the IMAP protocol, server compliance and so on, I found this: The IMAP connection. Very useful site, maintained by the people at the University of Washington, who also maintain one of the best UNIX IMAP servers available.

Categories: soft.dev
Posted by diego on April 6, 2003 at 8:28 PM

JDK 1.4.2 beta

The JDK 1.4.2 beta was released a couple of days ago. It includes an improved look and feel for Windows XP! Here are some screenshots. Too bad Sun has to chase a moving target in terms of UI conformity, but they are doing a reasonably good job. If JDK 1.4.2 is out within a month or two, then they'd be about 6 months behind the OS release for UI changes (not to mention good side effect of all the bugs they fix). Maybe I won't need to find another look and feel for spaces to have a more modern UI.

Categories: clevercactus, soft.dev
Posted by diego on April 6, 2003 at 12:53 AM

overloading the whitespace operator

My friend Martin sent me this through email: A paper by Bjarne Stroustroup on overloading the whitespace operator in C++. It would allow expressions such as z = x y instead of z = x * y and other interesting and (incredibly useful!) applications. Most interesting of all is the impact of this new operator on non-ASCII character sets, and the final recommendations they make for implementing it, near the end, so that it can be useful in 3D display devices. A must read.

Chris then sent another, more current (from this year) effort in this direction: the Whitespace programming language:

Most modern programming languages do not consider white space characters (spaces, tabs and newlines) syntax, ignoring them, as if they weren't there. Whitespace is a language that seeks to redress the balance. Any non whitespace characters are ignored; only spaces, tabs and newlines are considered syntax.
LOL!

Categories: soft.dev
Posted by diego on April 1, 2003 at 6:49 PM

SCM and TCO

Murph posted a good comment on my previous entry on SCM systems:

Hmm, coming from an MS background I know and, more or less, understand VSS. But that costs if you can't afford a sufficiently serious MSDN sub (I can) so I too have looked. I've been somewhat blinkered by the desire for IDE integration but think I'm going to have to give up on that )-:

Perforce is seriously not cheap (even by the heady standards of version control software...)

I might have taken a punt on subversion if a) I hadn't quite given up on IDE integration and b) I hadn't discovered that I already have CVS on my server (!)

CVS... well there is a certain weight of argument and an abundance of tools available across several platforms. Time will tell...

Cost can definitely be a problem, and it's one of the variables I'm looking at. I've used VSS in the past and I'd choose CVS over VSS any day of the week, even though both integrate with the tool I'm using (IDEA). What I want from a SCM system is some advanced features, such as proper branch management, distributed development and so on. We'll see how they compare after I do some testing.

Categories: soft.dev
Posted by diego on March 31, 2003 at 2:19 PM

more on version control

Still checking to look at the recent advances in source control management (SCM) systems. I began looking about two months ago and at that time I saw BitKeeper and subversion. Then Chris Bailey from CodeIntensity and Dylan both recommended Perforce. So now I've downloaded both Perforce and Bitkeeper. Installed Perforce, created a repository, etc. Still not installed BitKeeper. Will post impressions of both systems once I've given them a workout, which is not so hard since I already have a source structure that is reasonably complex.

Categories: clevercactus, soft.dev
Posted by diego on March 31, 2003 at 10:52 AM

look and feel, part 3

On the previous entry about my search for a different Swing L&F Mark posted a reference to JGoodies Looks. Nice! Another one to consider.

In another comment Roberto was wondering what are my thoughts on SWT (I've made some comments/references before on the state of Swing here and here). I have several reasons, not the least of which is that the idea of using an SDK that is more recent (and therefore more buggy) sounds a bit risky, but the main one is this: objects allocated in SWT have to be released "by hand". To me, this is unnacceptable. I don't want to go back into the having to find memory leaks. If they fix that, I might reconsider :-).

Categories: clevercactus, soft.dev
Posted by diego on March 27, 2003 at 4:56 PM

looking for a new look and feel, cont.

A couple of comments on my previous entry about my search for a new look and feel. Matt commented that the Alloy L&F is quite good (also a comment here. I agree (I had mentioned it in passing in my entry), but as I said I find it slightly outdated in some sense I can't quite describe. I should try it a bit more though.

Aside from noting my search, Cristian said:

Decent font rendering by the Java VM is what I hope Sun will provide, sooner or later.
I couldn't agree more. Although font rendering improved vastly with the addition of Java2D, there are still holes. In particular, an easy way to turn on anti-aliasing would be a godsend (right now the only way to do it is to redefine the paint() method of a component by subclassing and change the parameters on the Graphics2D object that is received). Well-designed, platform-independent font management would also be an important step forward.

Categories: clevercactus, soft.dev
Posted by diego on March 26, 2003 at 8:46 PM

using all those GHz and MB

Now that I have a machine that can handle more than two applications at once, I installed VMWare Workstation 4 (Beta). Such an amazing application. So well done, so simple and powerful. I've already installed W2K Server in one VM, now I'm going for Red Hat 7.2 in another. Testing spaces is about to get a lot easier.

Something else that I've been looking at, in the interest of making life easier for users, is InstallShield multiplatform, which builds native installers for multiple target platforms from a single build configuration. Still haven't tamed it (somewhat non-intuitive at times) but getting there. In the end, for Windows installs the Multiplatform product might not be the best idea since the installation doesn't look too good. But for targeting multiple UNIX environments, plus MacOS and Windows, it's great. Then again, many people that use UNIX don't have much use for installers. ;-)

Categories: clevercactus, soft.dev
Posted by diego on March 26, 2003 at 5:20 PM

looking for a new look and feel

As part of changes for the beta I've been exploring the area of Pluggable Look and Feels for Java. There aren't that many really, and most are not very complete. Javootoo is a good listing of the best-known (there are several more out there, like the Alloy L&F). One that I found that looks really good is the Skin Look and Feel. Alloy also looks nice, although a bit dated. Oyoaha is also interesting. And finally, the Simple L&F. I've been modifying one of the L&Fs from Skin, and it looks good.

It's possible that a different look and feel might be the default for spaces, always having the option for reverting to "native" L&F. I think that wouldn't be too much of an impact for users, who are by now more used to seeing different UIs (Winamp, Quicktime, Media Player, games...)

Categories: clevercactus, soft.dev
Posted by diego on March 26, 2003 at 2:44 PM

that microsoft thing

Russ responds to a comment by Robert Scoble regarding his anti-Microsoft tendencies:

I have no problems admitting that I hate all things Microsoft. You can check my weblog for a variety of rants against the company and their technology. Institutionally the company lies and cheats. They use their illegal monopoly to manipulate smaller companies, developers and customers. They copy great technology and ideas (WIMP, Netscape, Java) and then use their financial power to muscle the originator out of the market. Microsoft's dominance of the industry is literally a crime. There is NOTHING to like about that company.
I disagree. I don't like Microsoft's aggresive behavior, and I think their products aren't as good as they should be, and I agree that some of the things they do are inexcusable. They do stifle innovation. But, that doesn't mean that they are "criminals". As I've argued before, I think that they are simply better than many others at playing the game of Capitalism. Their monopoly is not illegal (monopolies per se aren't illegal, and in fact network effects tend to dictacte that monopolies happen more or less "naturally" in market economies), but the use Microsoft gives to their monopoly is illegal in many cases.

Yes, my view is possibly overly "darwinian", but I think that's the way things are. It's not Microsoft's fault that monopolies encourage this kind of predatory behavior, that free-markets-for-all policies squash the little guy, and so on. It is their fault that, having so much power they don't rise above that and become a more "benevolent" force, but hey, they're human after all. They have the power and the money to create better things from the start, instead of playing the waiting game and doing things only when it's absolutely necessary, but they don't do it. How many others, in their position, wouldn't defend it at all costs? I'm not sure. It's easy to be on moral high ground when there's nothing to lose. I like to think that I'd personally would not behave like that, but I don't think I can say that my own principles are above others. They're just my own.

Russ then goes on to say:

There is no middle of the road here. If you don't actively oppose Microsoft then you are just conceding them whatever market they want and this will directly affect you sooner or later. If you're one of the millions of Java programmers you need to actively oppose ANY Microsoft advancement into your company, otherwise your time and effort learning Java will go straight in the trash. I don't mind competition - Java is improving already by the presence of Dot Not for example - but Microsoft doesn't compete, remember? They use their monopoly advantage to take over whichever market they set their eyes on. And this means that whether you're a Java programmer or a mobile developer, Microsoft is actively planning to make your skills and livelihood obsolete. Don't forget it.
It's true. They are always a serious threat, and no one should forget it or dismiss it. That doesn't mean that we have to consider them to be evil, or despise them.

Now, I don't know if my argument qualifies for "middle of the road" or what. The problem I see with saying "you're either with us or against us" is that it polarizes the argument unnecessarily. For example, I find many things from Microsoft impressive: their single-minded focus, their ability to somehow make their products work in spite of the size of their market and of their code base. Windows XP is more than 35 million lines of code. And it still works (more or less). Now, to me, that says that there is a really good development organization in place, and that many of their people must be talented. They are in more software markets than anyone else and they have products in many areas. They weren't always so powerful; keep in mind that through the 80s the biggest software company in the world was Lotus. So they are doing something right. Whether it's moral or not is another matter.

The argument that, more often than not, bully their way into markets using their monopoly position is true, but then we should remember that all the companies that have been crushed by them have made fatal mistakes along the way. Those that haven't have survived. Intuit survived Microsoft's determined attack into end-user accounting/tax products in the late 90s. AOL survived the bundling of MSN into every consumer version of Windows since Windows 95. In fact, not only did AOL and Intuit survive, they also maintained their market dominance. How? Better products, better prices, better understanding of what the customer needed or wanted. And, each in their way, through innovation. This is not to say that Microsoft behaved lawfully in the Netscape case, for example, but Netscape compounded Microsoft's onslaught with its own set of crucial errors (can you think of anyone that considered Communicator 4.0 anything other than bloated, buggy software?). Additionally, a monopoly doesn't include any guarantees. Microsoft keeps moving: slowly, making many mistakes along the way, sometimes behaving in a predatory manner, but they keep moving. Many companies had monopolies in the past, and some lost them because they couldn't adjust. Take IBM, for example, who invented the PC only to see it erode its own market dominance. Sure, the antitrust trial that ended at the beginning of the 80s had an effect, but when the company's leadership thinks that "there is probably a worldwide market for 5 computers." (As Watson had expressed at the end of the 50s) it's not a surprise that when they created the PC they didn't know what they had. Compare that to Microsoft's mission statement for its first 20 years "A computer in every desk and in every home." They had this since the mid-1970s. Now we take it for granted, but saying that back then was truly crazy, as it was thinking that they could build a business selling only software, at a time when people saw software as an "add-on" to hardware sales. So there was some vision in place, even at the beginning, and Gates deserves some credit for that.

My point is: sure, we don't like their tactics and we certainly don't like their dominance. But that doesn't mean that everything they do is absolutely bad. I think that there are many things that can be learned from Microsoft, and the only way to do that is to acknowledge where they have been successful, and why.

PS: Russ's comments about all weblogs being subjective are right on the mark. And isn't that why we like weblogs? Because of their inherent subjectivity?

Categories: soft.dev, technology
Posted by diego on March 24, 2003 at 6:52 PM

reference handling in JNI

As I was working on memory optimizations for spaces I was also trying to fix a memory leak that seemed to be coming from the DLL used to import data from Outlook. Of course, this would only affect the Outlook import process, but it was nevertheless an important part. So I started looking at the JNI code.

The Outlook DLL import reads Outlook information using MAPI (actually wrapping the MAPI objects with ATL) and creates string pairs (field/value) that are then passed to Java. The C++ call looks like this:

jniEnv->NewStringUTF(fieldValue)
Where fieldValue is a char pointer. Now, the JNI documentation doesn't say anything at all of having to release strings or data created with this particular method, but they have to be. The way to do it is by calling
const char* reschr2 = jniEnv->GetStringUTFChars(fieldName, JNI_FALSE);
jniEnv->ReleaseStringUTFChars(fieldName, reschr2);
jniEnv->DeleteLocalRef(fieldName);
Now, once we've seen this code it sounds straightforward, but the JNI documentation does not specify that calling the ReleaseStringUTFChars and DeleteLocalRef methods is necessary, or, in fact, required to release the object allocated. Online discussions in various places are also silent on this. The JNI tutorial doesn't mention it when talking about how to create strings from the C/C++ side. Which raises the unsettling thought that there are thousands of JNI apps out there that almost certainly have memory leaks of this type in them, because unless you're creating tens of thousands of objects, it becomes difficult to see that the memory leak is actually there.

Anyway, the bug is now fixed and I just did an import from Outlook into spaces of 10,000 items (emails, calendar items, contacts, etc.) with the default memory settings for the JVM (64 MB max heap), and it worked fine, finishing in 20 minutes. According to different tests I've done that speed is linear, so about 500 items imported per minute. At some point I should create a bigger Outlook database to see how far it'll go.

Categories: clevercactus, soft.dev
Posted by diego on March 22, 2003 at 6:47 PM

some cool features of JSE 1.4

Roaming the net while I was waiting for a performance/memory test of the the spaces database to complete, I found this old article from O'Reilly Network: Top Ten Cool New Features of Java 2SE 1.4. The features they list are:

  • Parsing XML
  • Transforming XML
  • Preferences
  • Logging
  • Secure Sockets and HTTPS
  • LinkedHashMap
  • FileChannel
  • Non-Blocking I/O
  • Regular Expressions
  • Assertions
You might argue whether some of then belong in the "top ten" or not, but they are all extremely useful and important additions since JSE 1.3. How important? Looking over the list I realized that basically all of them are in use one way or another in spaces, and not just because they are new--it's because they are necessary. Or at least, once you've done something using them, you'll never want to go back to the old workarounds again.

Categories: soft.dev
Posted by diego on March 14, 2003 at 3:22 AM

Java garbage collection algorithms

[via Matt]: an article on the JDK Garbage Collection algorithms. Interesting

Categories: soft.dev
Posted by diego on March 11, 2003 at 1:01 PM

on tabbed browsing

Dave Hyatt, the developer of Tabbed Browsing on Mozilla, Phoenix and Chimera, talks about his views on tabbed browsing, different usability issues that affect them, and other related things. Very cool.

Categories: soft.dev
Posted by diego on March 8, 2003 at 12:05 AM

C# vs. Java: A debate

A debate on Builder.com about C# and Java. A little "light" and short, but still interesting.

Categories: soft.dev
Posted by diego on March 7, 2003 at 10:49 PM

Microsoft's P2P Breaks Windows

In a comment to my earlier post on JXTA, RefuX had said that his experience with JXTA had been very negative. Mine weren't very good for the initial releases as well, but it's gotten better.

Part of the problem, IMO, is that, P2P APIs being new, we haven't yet quite figured out what works and what doesn't. Regardless of the API, what's cool about JXTA is its API-, Platform- and Language-independent protocol, based on XML. It has improved a lot over the past two years. I don't want to sound like an apologist, JXTA still has a ways to go, but when something new is being done problems are to be expected.

Microsoft, that recently announced some P2P apps and protocols, has had a few problems of its own:


With this week's unveiling of threedegrees for MSN Messenger and the Windows XP Peer-to-Peer Update, users got a taste of Microsoft's plans in the P2P space. But the experience quickly turned sour for many who were left with broken Internet connectivity or an inability to reach certain Web sites including those of AOL and retailer NewEgg.com.

A bit more serious than I'd expect though... after all, P2P should be a layer on top of all the other standard services of the OS, right? And this isn't even a toolkit, it's an application that should not affect other apps in the system. I guess that this the bad side of Microsoft's taste for "ultra integration" of everything into the Windows core.

Categories: soft.dev, technology
Posted by diego on March 5, 2003 at 5:13 PM

on Sun's Orion

An InfoWorld article with some more details on Sun's new Orion offering. Going for a really close integration between software and hardware it seems, and hoping to make its N1 strategy (and the rest of its updates more predictable.

Categories: soft.dev
Posted by diego on March 5, 2003 at 4:57 PM

one million JXTA downloads!

A press release from Sun announces that:

one million developers have downloaded Project JXTA from the Sun Web site. JXTA is the only open source, standards-based, peer-to-peer technology that supports collaboration and communication on any networked device anywhere, anytime. Sun also announced that the National Association of Realtors and the National Association of Convenience Stores are implementing JXTA-based applications and that InView Software and Internet Access Methods have released commercial products based on JXTA.
Impressive. Microsoft is a little behind the curve on this one, no?

Categories: soft.dev, technology
Posted by diego on March 4, 2003 at 5:47 PM

Microsoft's P2P Toolkit

Almost two years after JXTA got going, Microsoft has announced the availability of a P2P software development kit for Windows XP. Some or all of the APIs included in the Kit are expected to make their way into Windows. No hope that they would use the JXTA Protocol for that uh?

Categories: soft.dev, technology
Posted by diego on March 2, 2003 at 10:03 PM

RSSLibJ

In a comment to a previous entry about more RSS fixes, Joseph pointed out that:

You might want to look at RSSLibJ, which takes a common object model (items/channels/etc) and renders an RDF/RSS stream according to the type you want, so you can change a string and get a completely different (valid) feed.
Nice! I didn't know RSSLibJ could do that, though I was aware of its existence, for some reason I thought that it was "just" a Java library to parse RSS files, but it's more than that, in fact, the FAQ points out that it is primarily intended for writing RSS feeds rather than reading them.

As a sidenote, my main complaint was that all of these versions of RSS have to be supported at all. One should be enough, either 1.0 or 2.0.

Pointless griping, I know. Just venting.

Categories: soft.dev
Posted by diego on March 2, 2003 at 7:44 PM

jdk 1.4.1_02 released

I just saw that Javasoft released JDK 1.4.1_02. The download location is the same as for all 1.4.1.x releases. Here are the Release notes for 1.4.1_02. Many fixes, including apparently some related to video cards (for example, this one) that I had mentioned some time ago.

Categories: soft.dev
Posted by diego on February 28, 2003 at 11:12 PM

on data and views

First post on the newly created spaces category! Partially taken from an email I sent yesterday to the dev-list...

A couple of days ago Greg Greg posted an entry on 'Data Views' and some things he would like to see in software that deals with information. I agree with most of what he says, although I think that sometimes it's not so easy to make views purely "virtual" while making it clear for the user what's going on. Many people assume that information is "physical" somehow, and they might be confused by seeing that an item deleted in a certain place disappears from another. So UI elements are really important to making this work properly and to avoid confusing the user. That said, spaces will definitely include a way to create "dynamic spaces", essentially views generated against a query. A space today is already that, although you're not allowed to really modify the query that creates them. That mechanism just needs a bit of an extension for dynamic spaces to exist.

Besides the idea of dynamic spaces there is something else I'm calling "cross cutting filters" (the 'cross-cutting' liberally borrowed from 'cross-cutting concerns' such as those that AspectJ deals with) that basically define an orthogonal category of filtering on a space. This is not new, other programs do it as well, although I want it to be easier and more frequently used in spaces. An example of a cross-cutting filter would be "see unread only" or "see sent msgs only" or "see RSS msgs only" and so on. Additionally, cross cutting filters will be accessible for user-defined tags for the items. (kind of like categories). This will allow to make a space (which is commonly a reflection of a real-world activity or task) to be filtered by information-dependent parameters, and so make it much easier to navigate. I will talk about this in more detail over the next few days, and post some screenshots.

Categories: clevercactus, soft.dev
Posted by diego on February 28, 2003 at 7:29 PM

Copyright © Diego Doval 2002-2007.
Powered by
Movable Type 4.37