Now blogging at diego's weblog. See you over there!

what my thesis is about

Since one of the starting points will be to talk more about my thesis, here goes the abstract. Maybe I'll start a new site or blog for it, to keep things cleaner (there's a lot of stuff to discuss) but for the moment this will do, and the abstract is as good a place to start as any.

Sorry about the use of the "royal 'we'" but this is pretty much a copy/paste of the abstract from the dissertation. Also, maybe it takes some flights of fancy in terms of possibilities, but that's the point of research, isn't it?

Anyway, here goes:

Self-Organizing Resource Location and Discovery

by Diego Doval (abstract - 30 September, 2003)

Networked applications were originally centered around backbone inter-host communication. Over time, communications moved to a client-server model, where inter-host communication was used mainly for routing purposes. As network nodes became more powerful and mobile, traffic and usage of networked applications has increasingly moved towards the edge of the network, where node mobility and changes in topology and network properties are the norm rather than the exception.

Distributed self-organizing systems, where every node in the network is the functional equivalent of any other, have recently seen renewed interest due to two important developments. First, the emergence on the Internet of peer-to-peer networks to exchange data has provided clear proof that large-scale deployments of these types of networks provide reliable solutions. Second, the growing need to support highly dynamic network topologies, in particular mobile ad hoc networks, has underscored the design limits of current centralized systems, in many cases creating unwieldy or inadequate infrastructure to support these these new types of networks.

Resource Location and Discovery (RLD) is a key, yet seldom-noticed, building block for networked systems. For all its importance, comparatively little research has been done to systematically improve RLD systems and protocols that adapt well to different types of network conditions. As a result, the most widely used RLD systems today (e.g., the Internet's DNS system) have evolved in ad hoc fashion, mainly through IETF Request For Comments (RFC) documents, and so require increasingly complex and unwieldy solutions to adapt to the growing variety of usage modes, topologies, and scalability requirements found in today's networked environments.

Current large-scale systems rely on centralized, hierarchical name resolution and resource location services that are not well-suited to quick updates and changes in topology. The increasingly ad hoc nature of networks in general and of the Internet in particular is making it difficult to interact consistently with these RLD services, which in some cases were designed twenty years ago for a hard-wired Internet of a few thousand nodes.

Ideally, a resource location and discovery system for today's networked environments must be able to adapt to an evolving network topology; it should maintain correct resource location even when confronted with fast topological changes; and it should support work in an ad hoc environment, where no central server is available and the network can have a short lifetime. Needless to say, such a service should also be robust and scalable.

The thesis addresses the problem of generic, network-independent resource location and discovery through a system, Manifold, based on two peer-to-peer self-organizing protocols that fulfill the requirements for generic RLD services. Our Manifold design is completely distributed and highly scalable, providing local discovery of resources as well as global location of resources independent of the underlying network transport or topology. The self-organizing properties of the system simplify deployment and maintenance of RLD services by eliminating dependence on expensive, centrally managed and maintained servers.

As described, Manifold could eventually replace today's centralized, static RLD infrastructure with one that is self-organizing, scalable, reliable, and well-adapted to the requirements of modern networked applications and systems.

Posted by diego on November 30, 2004 at 6:36 PM

location, location, location

First, the support I've received in the last 24 hours is regarding my post on clevercactus amazing and heartening. When I collect myself I'll be more specific, for for now I just wanted to mention this.

One of the comments I've heard most since yesterday has to do with the funding thing. As Dave said, "Consumer focused companies are always difficult to get funding for in Europe". I can definitely attest to that. Lots of enterprise and "vertical" focus, and generally low tolerance for the risk/opportunity equation presented by those opportunities. To their credit, VCs are very candid about this, so it's not as if it's a secret or anything.

One thing we tried to get across is that we'd be totally open to moving to the US if that's what it took to get funded. We'd have no problem with that. But then again, not being there, it's hard to get in the door.

But the "virtuous cycle" created by Silicon Valley is hard to beat. One of the conclusions I will take away from this experience is that all the talk about the "Indian Silicon Valley" or "European Silicon Valley" or "[Insert geolocation here] Silicon Valley" is just plain ridiculous. There isn't a place like it in terms of investors, press, talent, etc., all deeply interconnected (well, maybe a couple of places in the US come close, namely the Seattle/Redmond area, Route 128, and NYC). I remember going to Il Fornaio in Palo Alto and just feeling it. It's where things happen. (Plus it ain't a bad place to eat).

So: there's only one Valley, the rest are close, but no dice. :)

Categories: clevercactus
Posted by diego on November 30, 2004 at 5:35 PM

where to begin?

I've been up for more three hours now (woke up at 5 or so, that's what I get for going to sleep at midnight), and I've been thinking more about what to write and/or do in the next couple of weeks, aside from looking for the next big thing. As you can imagine I'm sort of in a bit of a hole right now and I think the best way to climb out of it is to get moving.

So... let's see.

  • I want to talk about my thesis work, now that the whole process is basically over (I picked up the bound copies of the dissertation last Friday, they look great). There's a ton of stuff there, but to begin with, the title of the thesis, is Self-Organizing Resource Location and Discovery. :)
  • Ideally I'd spend a bit of time coding something interesting and totally, absolutely, positively unrelated to what I've been doing for the last two and a half years. Reading (and re-reading) weblogs today, looking for some inspiration, I came across this post from Don Park in which he talks about the idea of creating a conversation category, to formalize (structure? make more "solid"?) a bit the tenuous links between cross-weblog conversation threads. I need to think a bit about that today. Anne's Forgetting Machine concepts are enticing, and with a cool name to boot, but when I think about that my brain keeps dragging itself into Gibsonesque (or maybe Stephensonesque) vistas of the datastream where bots run around forgetting where they've been and asking for directions. Entertaining, but not something I'd be able to code in a few days.
  • I'd also like to write up what I've learned, what my experience was with clevercactus. But I need a few days for that. Minimum. Will revisit that next week.
Okay. That sounds like a good starting point no?

Categories: personal
Posted by diego on November 30, 2004 at 7:09 AM

ads in rss - not as easy as it sounds?

Last week Jeremy was talking about ads in RSS and how it seems a foregone conclusion that they will, eventually, become the norm. I agree that this is more likely than not, but I doubt that today's web ad infrastructure (as understood by what Yahoo!, Google, do) will be used directly.

The reason why I say this was actually mentioned by Jeremy, but not explored. While talking about the options (full text with ads, summaries without), he said:

I don't want to have to choose between ad-laden full-content feeds and the pain in the ass summary only feeds. Anyone whose ever tried to catch up on their reading while on an airplane or train gets this.
The problem with ads in RSS lies in the second sentence: "Anyone whose ever tried to catch up on their reading while on an airplane or train gets this."

Many RSS readers are web-based, and those would always work for web ads (unless a plugin is added to stop them, see below). But many, many RSS readers are rich clients, and clients will sometimes be working in disconnected mode.

"Disconnected mode" throws a wrench in the ad-serving business model, by either preventing the download of the ad, or preventing clickthrough.

If that's the case, then how do you serve the ads? You could embed them into the content, sure, but then you'd have the problem of a) showing relevant/uptodate ads, b) measuring ad-views and c) allowing click-throughs, which are impossible while disconnected.

Someone might say that most people are wired most of the time, and so this problem is minimal. But I have no doubt that, were ads in RSS to become pervasive, rich clients would include a simple way of working in "disconnect mode" (and those that don't would fall behind those that do), not to speak of plugins that would surely be developed, both for clients and browsers, just like adblock exists for Mozilla.

If the readers were to be integrated into the ad serving-viewing-clicking cycle (keeping stats, allowing clickthroughs, etc), then maybe things would be closer to web ads, but who is to say that users will not flock to RSS readers that will support the "ad-free" mode? Or modify their ad-friendly readers?

So even though ads in RSS might be just around the corner, I'd bet that they (and the business model behind them) will have to change at least a bit--the current way in which web ads work probably won't be enough.

Categories: technology
Posted by diego on November 30, 2004 at 6:57 AM

the drive to discover


In the latest issue of Wired magazine, James Cameron has a great article, The Drive to Discover. Reminds me of this post I wrote a little over a year ago.

Also in the latest Wired, lots of other great exploration-related articles, such as The New Space Race by Bruce Sterling and Taming the Red Planet, by Kim Stanley Robinson, author of the Mars trilogy (Red Mars, Blue Mars, and Green Mars).

Categories: science
Posted by diego on November 30, 2004 at 6:34 AM

looking for the next big thing

So. A week has gone by with no posting. Lots has happened, but more than anything it's been a time of consolidation of what had been happening in the previous weeks. First, the short version (if you have a couple of minutes, I recommend you read the extended version below): tomorrow is my last day working for clevercactus. And that means I'm looking for the next thing to do. So if you know of anything you think I could be interested in, please let me know.

Now for the extended version.

For the last couple of months (and according to our plan) we have been looking for funding. Sadly, we haven't been able to get it. This hasn't just been a matter of what we were doing or how (although that must be partly a problem) but also a combination of factors: the funding "market" in Europe and more specifically in Ireland (what people put money into, etc), our target market (consumer) and other things. Suffice it to say that we really tried, and, well, clearly it was a possibility that we wouldn't be able to find it.

On top of this, I haven't been quite myself in the last few weeks, maybe even going back to September (and my erratic blogging probably is a measure of that). By then I was quite burned out. Last year was crazy in terms of work, and this one was no different: between January and the end of July I only took two days off work (yes, literally, a couple of Sundays) and the stress plus that obviously got to be too much. I see signs of recovery, but clearly this affected how much I could do in terms of moving the technology forward in recent weeks. Since there's only two of us, and it's only me coding (my partner deals with the business side of things), this wasn't the most appropriate time to have a burnout like that. I screwed up in not pacing myself better. Definitely a lesson learned there.

At this point, the company is running out of its seed funding and we don't have many options left. Even though it's possible that something would happen (e.g., acquisition), what we'll be doing now is to stop full time work on the company, which after all won't be able to pay for our salaries much longer, and look for alternatives since of course we need to, you know, buy food and such things. The service will remain up for the time being, and I'll try to gather my strength to make one last upgrade (long-planned) to the site and the app, if only just for the symmetry of the thing. Plus, you can't just make a service with thousands of users disappear overnight. Or rather, you can, but it wouldn't be a nice thing to do.

Now I have a few weeks before things get tight, and I'll use that time to get in the groove again and hopefully find something new to do that not only will help pay for the bills but is cool as well. Who knows? I might even end up in a different country! As I said at the beginning, if you know of something that I might find interesting, please send it my way. Both email and comments are fine (my email address can be found in my about page).

In the meantime, I'm going to start blogging more. No, really. I have some ideas I want to talk about, and maybe I can get back into shape by coding (or thinking about) something fun and harmless.

Or, as the amended H2G2 reads: Mostly harmless. :)

because search bots have feelings too

For reasons passing understanding, in the last couple of weeks I've developed a curiosity for the topology of both content and links in certain groups of webpages.

So today I sat down and wrote an extremely simple bot/parser to get some data. I was done in about an hour, tested a bit, fiddled, and it started to dawn on me just how hard it is to build a good search bot.

We hear (or read) to no end about the algorithms that provide search results, most notably with Google's. There's a vast number of articles about Google that can be summarized as follows: "PageRank! PageRank! PageRank is the code that Rules the Known Universe! All bow before PageRank! Booo!" (insert "blah blah" at your leisure instead of spaces).

But what's barely mentioned is how complex the Bots (for Google, Yahoo!, Feedster, etc) must be at this point (I bet the parsers aren't a walk in the park either, but that's another story). You see, the algorithm (PageRank! PageRank! Booo!) counts on data already processed in some form. Analyzing the wonderful mess that is the web ain't easy, but the "messiness" that it has to deal with is inherent to its task.

But the task of a Bot, strictly speaking, is to download pages and store them (or maybe pass them on to the parser, but I assume that no one in their right mind would tie in parsing with crawling--it seems obvious to me that you'd want to do that in parallel and through separate processes, using the DB as common point). And yet, even though the task of the Bot is just to download pages, it has to deal with a huge amount of, um, "externalities."

In other words, the bot is the one that has to deal with the planet (ie., reality), while the ranking algorithm (PageRank! PageRank! Booo!) sits happily analyzing data, lost in its own little world of abstractions.

Consider: some sites might lock on the socket and not let go for long periods. Tons of links are invalid, and yet the Bot has to test each one. There are 301s, 404s, 403s, 500s, and the rest of the lot of HTTP return codes. Compressed streams using various algorithms (GZIP, ZLib...). Authentication of various sorts. Dynamic pages that are a little too dynamic. Encoding issues. Content types that don't match the content. Pages that quite simply return garbage. And on and on.

What makes it even harder is that the chaotic nature of the Internet forces the Bot (and those in charge of it) to go down many routes to try to get the content. A Bot has to be:

  • extremely flexible, able to deal with a variety of response codes, encodings, content types, etc.
  • extremely lax in its error management (being able to recover from various types of catastrophic failures).
  • extremely good at reporting those errors with enough information so that the developers can go back and make fixes as appropriate (to deal with some kind of unsupported encoding, for example).
  • as fast as possible, minimizing bandwidth usage.
  • respectful of all sort of external factors: sites that don't want to be crawled, crawling fast, but not too fast, (or webmasters get angry), robots.txt and meta-tag restrictions, etc.
  • massively distributed (with all that it entails). well as any number of things that I probably can't think of right now.

Bots are like plumbing: you only think about them when they don't work. Of course, the algorithm is crucial, but the brave souls that develop and maintain the bots deserve some recognition. (The parser people too :)).

Don't you think?

PS: (tangentially related) Yahoo! should get a cool name for its algorithm, at least for press purposes. (Does it even have a name? I couldn't find it). Otherwise referring to it simply as a "ranking algorithm" --or something-- is kind of lame, and journalists steer towards PageRank and we end up with "PageRank! PageRank! Booo!". :)

Posted by diego on November 20, 2004 at 4:04 PM


Russ will be consulting for Yahoo! starting Monday. Congratz!

Between Russ in mobile stuff and Jeremy in search (another cool move) you've got two big areas of Y! covered by great bloggers. (I wonder if someone is blogging from the services side...Y!Mail, etc.). Most excellent.

Categories: technology
Posted by diego on November 20, 2004 at 3:55 PM

some the small advantages of living in Dublin


About two years ago I was walking through Dublin and I noticed that the then-new U2 Best of... collection had gone on sale. So I got it (of course). Yesterday I was about to go into town but the terrible weather discouraged me, and I ended up going today, around noon.

Since I appear to be attuned to releases that interest me (or my subconscious knows more than I do and is in charge, take your pick) I happen to wander into town when the new U2 album has just been released, three days before the rest of the world. Eventually I walk up to HMV and go in, get the Collector's Edition of HTDAAB, which contains an extra track on the CD, Fast Cars---a song that made me thought of Arabian, Flamenco, and Indian styles of music, all at the same time (!)---a DVD, and a book with photos and writings by the band. I don't even look at the price, something that happens to me with certain categories of goods which my head apparently refuses to consider from a financial point of view, such as with almost any kind of book--this is why I avoid browsing bookstores, I go in, walk out five minutes later and somehow I've bought a book or two. But I digress...

I put the package in my backpack, and walk out.

And at the door my first thought is: What the...?

Right there, obviously just arriving, are the Macnas U2 heads (Macnas is a performance arts group out of Galway), which first made their appearance in the ZooTV tour. So naturally I got my camera out and snapped a few pictures, such as the one above (click on it to see a larger version), and here are a few more: one, two, three, four. Or, as Bono would say: Uno, dos, tres, catorce!

Anyway, such are some of the advantages of living in Dublin and walking Grafton street now and then... :)

PS: if that isn't enough, the latest survey by The Economist says that Ireland is the best place to live in the world, quality-of-life-wise.

Posted by diego on November 19, 2004 at 3:16 PM

the market of one

Tangentially related to my previous post, in terms of usage patterns, context, and so on, I was thinking of this notion I call "the market of one".

The market of one is... yourself. You (in theory at least:)) have the best insights on what drives you and not, what you like about something, use patterns, etc. It is the "eat your own dog food" concept but with some insight applied, and only for one person.

The market of one seems crucial to me when either a) your organization is small (large companies being able to create focus groups, commission marketing studies, etc, and then being able to survive massive failures of product lines) or b) when you're doing something completely new (when focus groups aren't much help--people generally react badly to the unknown).

It's not just using what you're creating, but also asking yourself: how much am I using it? Does it satisfy my needs? Why, or why not? And so on.

When the product doesn't exist, it is, I think, the "standard way of thinking." We project ourselves and our own needs and based on that we evaluate whether we think something's good or not. For example, a lot of the disagreement over web-on-mobiles usage mentioned in the previous post comes from people transparently applying their own use-cases to what they think the product is, and then extrapolating from there.

I use this idea all the time when thinking about a product, or when designing it. But I think I haven't been as consistent in applying it during and after development, when reality takes over the grand designs that are in my head.

Somehow, I think that I keep seeing the product that I know it will eventually be, rather than what it is today. That future-vision can become harmful if it blinds you to the problems that exist today. Something to pay more attention to in the future.

Categories: technology
Posted by diego on November 18, 2004 at 11:30 AM

the web is not the browser, a year later

About a year ago (a year and three days, actually) I wrote a post titled the web is not the browser. At that point the discussion was whether "RSS usage" was "web usage" or not. Russ posted his thoughts on the Mobile Web yesterday, and there's a bit of deja-vu there, but going in some new directions. So here's my thoughts, updated. :)

The Web isn't just HTML+HTTP

Russ starts:

Now, when I say "web", you know what I saying right? I'm generally thinking HTML over HTTP and though you could probably say there's a lot of "dark content" out there on the internet - like in email, etc. it's generally not publicly accessible. The web in 2004 is the lingua franca of internet based information, I don't think there's much argument on this...
Actually, I think there is some argument on this, and Russ and I basically discussed it in the comments of that entry of mine a year ago. Russ made exactly the same argument in his comment and I replied with this in another comment:
Russ: you say the web is HTTP + HTML. Okay. However, what you type in your browser is a URL. The URL is post-web (RFC 1738 is dated Dec. 1994), and as it says in the RFC: "The specification is derived from concepts introduced by the World-Wide Web global information initiative" [Just to clarify, I mean "post-web" in that the concepts in general use today were developed after the initial web was launched]. Yes, originally "The Web" was the world wide web, HTML+HTTP. But very quickly things became intermixed. Today, you'll click on a link that is "ftp://" to download a file. FTP dates back to the pre-WWW days. Is clicking on ftp:// *not* the web too? If people click on a text file obtained through FTP and read it on their browser, is that not the web? Or on a quicktime video?

Today we often use HTTP to download files, not to see HTML, or to stream video, or audio, or Flash animations, or, yes, RSS, and it's an integral part of the "web experience". Some clients take HTML and present it in different ways "syndicating" it. So what I'm saying is that, even though at the beginning the web was indeed just HTTP+HTML, it went beyond that very quickly. Even if you only consider what you see within the browser the "web experience" includes a multitude of formats and protocols.

Purely in terms of content "weight" (storage capacity + bandwidth) I'd bet that web protocols are carrying as much, if not more, content types that are not text/html. As an example of what I mean, consider that downloading the full page for Russ's post on the mobile web clocks at 35,703 bytes--34 KB or so, and it's a typical post, image, a decent amount of text, some comments. Now, about a month ago Russ was talking about that 25 GB that he saw of downloads from an MP3 he posted (size 5,649,826 bytes, or about 5.5 MB).

That single MP3 amounts to about 160 posts.

Russ generally posts once a day. So that MP3 equaled his production for half a year in terms of size (the bandwidth ratio is probably a bit lower than 1:1 though, since the MP3 doesn't get hits from search). Since Russ has also been doing audioblogging, there are several MP3s posted in his blog, which I'd wager amount for quite a lot more than all of the text/html content he's ever produced (even if you count images, CSS, etc). Then, elsewhere, there's Flash, Java, video, and everything else.

So, I think the assumption that most content is text/html is wrong. I don't see this as a big problem for the discussion that follows though, because mobile browsers will eventually support most if not all of the advanced features.

To mobile or not to mobile

Russ's next point is that delivering a "dumbed down" version of the web (or using some other kind of content delivery system) is wrong because a) devices are appropriate and b) people want "the whole web". I agree with the second, but not the first (not for any use-case+context at least, more on that in a bit).

Paradoxically, the title of Russ's post undermines his argument a bit. If you have to use an adjective for something, then it's different right? Russ's vision is not a "mobile web" but "the web on mobiles". After all, we didn't start calling the web "the video web" or "the audio web" when streaming arrived--it was just the web. This is just semantics though. :)

Russ then goes through the common arguments against browsing on mobiles. These arguments generally fall into one of two categories:

  • Capabilities. These include: "The screen is too small," "Mobile phone browsers are too limited (no Javascript, no frames, etc.)," "Other platforms (Flash, Java, Brew) provide more richer experience (with less latency)," "Mobile data connectivity - even using 3G networks - has massive latency between pages."
  • Usage. "People use their mobile devices "differently" - thus need snippets of data." "Why use a mobile phone when you're not more than 10 minutes away from a PC ever," "Prefer laptops and WiFi, rather than struggling with a mobile."
To this list I'd add "Navigation" in the Capabilities section. I think the small screen is less a factor than the fact that with most phones you have to use the keypad/tiny-joystick for input. The Web, however, has been designed with keyboard/mouse in mind. I think solving the navigation issue is more important (and more difficult) than worrying about the latency for example. People browsed the web with 14.4kpbs modems for quite a while, after all.

The capabilities problems might be a factor at the moment but clearly will not go on forever. If there's a need, they'll be fixed (eventually). Moore's Law and all that. So I don't see those things as a major problem.

Usage is to me the most important category, but it's generally overlooked. Capabilities matter, but assuming you fix most of them (it will be hard to get around the screen-size problem for a few years though--but even so I don't think it's a major roadblock), usage is really where the crux of the matter is, and one that can easily get muddled. When Russ mentions "People use their mobile devices "differently" - thus need snippets of data," he's no doubt writing down something he's heard many times. And while the first part is hard to argue with (People definitely use their mobile devices differently), the second part doesn't follow from it. At all. Blackberry users handle quite a lot of email. Many users enjoy streaming on phones, or online games. Those hardly count as "snippets" of data. However makes that statement is pushing forward their own assumptions about phones: "People use mobiles differently, and mobiles are small and puny, hence you need snippets of data."

But the use cases are different and it's worthwhile to spend a bit of time on that topic, because I think that's a big part of what's actually being discussed, though not directly, when someone says "I don't want the web on a mobile."

It's all about the use case and the context

Use-cases and context are what drive usage, not product features (the "capabilities" from above). Product features can enable new use cases, but given a certain base of devices (i.e., given today's technology, maybe looking a few months into the future at most), it's the use-cases+context that matter most IMO.

Consider the following use-cases+context:

  1. If I'm sitting at a PC with a broadband connection, it's almost unthinkable that I'd use a phone for browsing the web. In fact, if I'm sitting at a PC, it's almost unthinkable that I'd use any other device. And why would I? Everything is right there.
  2. I'm on the road, and I have a laptop that is turned on with WiFi running, and a smartphone. I'd use the laptop. Cheaper, faster, better UI, etc.
  3. Same case as before: laptop and smartphone. Only this time the laptop is turned off. The phone is on standby. How long does it take to get everything up and running on the laptop? Maybe 5 minutes? The phone, though, being always-on, is nearly instantaneous. So if I just need to check something (say, the weather) the phone is probably a better choice. But if I'm going to be online for a while, I'll probably use the laptop anyway.
  4. If I have nothing but a phone, that's clearly what I'd use. But would I use it just like a PC? Doubtful. A phone doesn't go well with "multitasking", e.g., I can't easily browse the web and talk, and check my calendar, and check a relevant email at the same time, and I've often found myself doing all of those things while in the middle of a conversation (real-world, phone, or VoIP). Phones are more of an immersive experience (speaking of phones being immersive, check this out :)).

My point is that what we think "people" will do or won't do is heavily influenced by our own experience and usage. I suspect that phones will find entrenched niches at first, things where the availability, mobility, and form factor takes precedence (similar to how game consoles have their place against ever-more-powerful PCs). If I'm carrying a laptop around most of the time, then it's unlikely that I'll see the phone as more than a stopgap measure. If, however, I only carry a phone around most of the time, the phone will gain importance.

Do I want the whole web on a phone? Absolutely.

Will it eventually become a much smoother experience? Certainly.

Does that mean that I'll (that is, me personally) stop using PC and PC-like devices, and use phones as my primary method of browsing? Not a chance.

But does that mean that nobody is going to use the phone as their primary interface to the web? Of course not.

For some people though, the use-case, or the context, or both, will be there, and maybe what makes the discussion more complex is that "some people" here means tens of millions of users. As I understand it, in Japan, for example, people use their phones to access online services more than their PCs, or at least nearly as much. Phones will grow in importance, and take their rightful place in the continuum of capabilities we have today--browsing the whole web, yes, but not necessarily pushing PCs out of the way because of that.

Categories: technology
Posted by diego on November 17, 2004 at 9:12 AM

the true story of audion

[via Frank]: The True Story of Audion. Quote:

As the kids say, upon seeing some awesome frags and/or gibs: OMFG.
Must read.

Categories: technology
Posted by diego on November 16, 2004 at 1:58 PM

the new sony ultralight pc

tinyvaio.jpg wonders if the new Sony ultralight PC (6x4 inches, 1.2 lb, 512 MB RAM, 20 gig drive) will take on the iPod. This is a ridiculous idea for a number of reasons, starting with the $2000 price tag for the same storage capacity you get on an iPod for about one-tenth of the price (or less, considering that probably 25% of the drive goes to Windows and various apps).

What I'd be wondering, instead, is if this will be finally the ultraportable that cracks the US market, or if this is the first of many "webtop" devices people use around the house, a kind of portable display. Ebook-reading, web browsing, quick note-taking, tasks, and email would be good tasks for this machine.

Regardless, it looks fantastic doesn't it?

Categories: technology
Posted by diego on November 16, 2004 at 12:56 PM

today's reading

The Internet as a Complex System (PDF) by Kihong Park, Chapter 1 of The Internet as a Large-Scale Complex from Oxford University Press, and Anda's Game a short story by Cory Doctorow. Both highly recommended :).

Categories: technology
Posted by diego on November 15, 2004 at 3:59 PM


Today I got the final approval on my Ph.D.! As I mentioned back in August, after I defended it I had to do a final submission taking into consideration the suggestions and comments of the examiners. They were generally small things (explain why you did such-and-such more clearly) and one relatively big example that had to be added (along with a misplaced section) that, I must say, definitely improved the readability of the dissertation.

Now to print a few copies, get them nicely bound and do the final submission.

And, one more thing: :-)

Categories: personal
Posted by diego on November 12, 2004 at 8:17 PM

the new MSN search: an unmitigated disaster

The first pointer I got to it was via Dave (Interestingly, there wasn't a Slashdot article on it--maybe I missed it, but I don't think so). There I went, to

The home page loaded quickly, which was a good sign. I liked its simplicity, but I wasn't going to give them any points for copying Google.

Then I typed in a simple search: "microsoft", and waited.

And waited.

And waited.

Two minutes later, I got this result.

That didn't look good at all. But who knows, maybe it was a fluke.

So I did it again.

Same result.

"Maybe they have deep-seated psychological problems that prevent them from returning their own results properly," I thought. So I tried "linux" (without success), then switched back to MS-themed searches with "microsoft visual studio," then started trying random queries.

Nothing worked!!

This lasted for about twenty minutes. I kept trying, because I couldn't quite believe what I was seeing (I have tons of respect for Microsoft's software development prowess). Then, when I was about to give up, I tried "microsoft" again, you know, just in case, and there it was.

One result. (Yes, one result, look at the screenshot).

Just one?

Just one.

Not only just one result, but also the response was "1-1 of 1", which must mean I've been asleep for a few centuries and now there's only a single page with the term "microsoft" in the planet. Also, note how there didn't seem to be any problems in finding ads for it.

"There goes nothing," I thought, and I tried "linux". Another fantastic one-query-hit-page.

In fact, it wasn't just that it was returning a single result, it was also that it was splattering the page with ads, at the top, at the bottom, and to the right. After those two, er, "successes," I tried a few more queries that returned no results at all, or worse, outdated pages! (weeks and weeks old).

And, in case you're wondering, I am not making this up. Those screenshots are real.

I could add a thousand things: that they should have added more hardware, or made sure that the thing worked before releasing it, or whatever, but I'm not fond of repeating the utterly obvious.

I will say, though, that there are two search engines I use at the moment, Google and A9. Occasionally, I use Teoma and Yahoo!.

And it doesn't seem that I'll be adding Microsoft to the list any time soon.

PS: if any Microsofties happen to wander through this entry and want to know more for debugging purposes, I ran my searches between midnight and 1:00 am PST (8:00-9:00 AM GMT).

Posted by diego on November 11, 2004 at 6:49 PM

new design time?

A few days ago Dylan changed the design for his blog, now Russ has changed it as well. For the last couple of weeks, whenever I want to relax (or use CSS infuriation as a distraction, depending on how you look at it) I've been playing with a new design based on a newspaper-like view, but it hasn't convinced me. I guess I'll add it as an alternate stylesheet and FireFox-enlightened users can switch it using the little gizmo that appears at the bottom-right of the window when a page has alternate stylesheets (have you noticed that one?). Maybe this weekend...

One thing though: new blog designs are always reinvigorating for some reason, even if many readers don't see them, courtesy of syndication. :)

Categories: technology
Posted by diego on November 11, 2004 at 6:36 PM

content, sharing, and user interfaces

A couple of days ago Russ posted an interesting entry (long, but worth the time) on what he dubbed 'communicontent':

Communicontent to me, is a byproduct of communication where traditional content is magically created. As a corollary, the forms of communication that can best be expressed as content almost naturally become communicontent. See this weblog? This is communicontent. I used to drive my friends on mailing lists crazy by writing all these long, in-depth emails. Now I just write all the same thoughts in my weblog instead. The only difference is that the viewers aren't restricted. I'm still just communicating my personal thoughts. It's communication, but because it's been captured in a fixed state to be found later, it's also content.

This is more than just the famous "user generated content." If I take a picture (content I've generated) it doesn't really matter until I decide I want to send that picture to someone. Then it becomes something different. The act of communicating that piece of content makes it more special. In practical terms, it simply adds more meta-data at the very minimum: a title, a description, a place, etc. But it also gives it an inherent value as well: I think this is important enough to send, therefore you may want to think it's important enough to take time to look at.

In general I agree that content that is communicated becomes a different sort of beast (The Google-Gmail analogy he mentionts at one point is stretching it a bit IMO). There are a couple of things I'd add, particularly what I think adds to the success of this type of shared content.

First, is that content relevance (and quality) matters, a lot. Most content people generate has relevance for themselves and a small group, even when we blog we sometimes (or maybe most of the times :)) we post about things a lot of people simply do not find interesting. Quality has a lot to do with the kind of information you're sharing, and with the kind of device/interface you use to create it. For example, there is no way someone can write a well-thought-out argument on anything using T9 on a, say, Nokia 3650. Why? Because the interface gets in the way. Similarly, you might be able to post high-resolution pictures from your PC, but not from most phones (camera quality... network speed... ability to crop/edit if necessary).

Second, as Russ notes:

In order to create communicontent, pure content needs meta-data, and pure communication needs organization.
Consider this and what I said in the previous paragraph, it brings back my recent thoughts on metadata. That is, the ability to create metadata or organization is worthless if there aren't also good ways of navigating that metadata, and viceversa. Both ends have to be covered. FOAF has, in my view, suffered from this. There's no way for non-geeks to make use of all that metadata, and conversely they don't have easy-as-pie ways to create it, which results in limited appreciation of it by non-geeks.

Putting this two thoughts together, what I'd add to Russ's ideas is that the process (which includes generation and access) by which this shared content is created matters a great deal, as does the follow-up access. Both ends of the equation have to be covered, that is:

  • The content (and if possible its accompanying metadata) has to be extremely easy to create and share
  • Once content is created, the content access interface has to be adequate for its purpose
I think moblogs work because it's easy to take a picture, then (relatively) easy to post them, and then the software on the server does the rest for you (organize them according to time, create a slideshow, etc), which covers both ends--and it's when both of these conditions are met that apps cross the boundary from the cool to the useful.

Categories: technology
Posted by diego on November 10, 2004 at 3:43 PM

feedster developer contest

Feedster has launched a Developer Contest (see also). Prizes are iPods for the winner on each category (and there are more than a few of them). Normally I don't have time for contests, but in this case it seems that I already have entries ready for at least two categories: Feedster plugin for FireFox (which I wrote last year and is linked to in the Feedster Help page and it's also available via mycroft... but maybe it counts! :)) and Intro to RSS with my introduction to syndication (with its companion introduction to weblogs).

Should be interesting to see the things people come up with.

Hmmm... iPod....

Posted by diego on November 10, 2004 at 2:43 PM

slides in CSS

[via Joel]: S5: A Simple Standards-Based Slide Show System.

"S5 is a slide show format based entirely on XHTML, CSS, and JavaScript. With one file, you can run a complete slide show and have a printer-friendly version as well. The markup used for the slides is very simple, highly semantic, and completely accessible."
Most excellent. I was looking for something like this!

Posted by diego on November 10, 2004 at 12:30 PM

the new U2 album

cover.jpgI just listened to How to dismantle an atomic bomb and it's excellent. The Vertigo single will be released tomorrow, the album on Nov. 22 in the most of the world (including the UK), Nov. 23 in the US. U2Log has the full track listing as well as some more details on the album (It's also at, but their navigational structure is a external-link-preventing disaster).

Overall ... you get this funny feeling that you've heard this before, somewhere, but of course you haven't, which is one of the U2 trademarks IMO. Some definite whiffs of Electrical Storm, the song released in their second Best of... collection. Also of Always and Summer Rain, songs from one of the Beautiful Day b-sides.

As with most other U2 albums, it starts with a bang (Vertigo) and then mellows out a bit, with bursts of energy in between (such as City of Blinding Lights--which seems to be this album's Where the Streets Have No Name-- and All Because of You), and u2-style love songs, like Miracle Drug and Original of the Species. Then there's Love and Peace or Else not only great rock n' roll, but the political track of the album. Sometimes You Can't Make It on Your Own feels like a worthy follow-up to Kite from ATYCLB (For example, the line "You're the reason why the opera is in me" is clearly a reference to Bono's father).

One prediction: City of Blinding Lights (which refers to New York City, I think) will sound great live when the tour kicks off next year. I can already imagine an entire stadium singing "Oh, you look so beautiful tonight."

Bonus: via Anne, I discovered Do Make Say Think. They sound like I feel at times. Very cool.

Later: I knew that the beginning of City of Blinding Lights reminded me of something: the beginning of Sweetness Follows from R.E.M.'s Automatic for the People.

Posted by diego on November 7, 2004 at 12:47 PM

some thoughts on metadata

Through a series of random links I ended up a recent post by Ramesh Jain on metadata. He raises a number of issues that have crossed my mind a lot recently, particularly with all the hoopla about podcasting ("how do I search all that audio content?") and makes a number of good points. Quote:

Text is effectively one dimensional – though it is organized on a two-dimensional surface for practical reasons. Currently, most meta data is also inserted using textual approaches. To denote the semantics of a data item, a tag is introduced before it to indicate the start of the semantics and another closing tag is introduced to signal the end. These tags can also have structuring mechanisms to build compound semantics and dictionaries of tags may be compiled to standardize and translate use of tags by different people.

When we try to assign tags to other media, things start getting a bit problematic due to the nature of media and the fact that current methods to assign tags are textual. Suppose that you have an audio stream, may be speech or may be other kind of audio, how do we assign tags in this? Luckily audio is still one dimensional and hence one can insert some kind of tag in a similar way as we do in texts. But this tag will not be textual, this should be audio. We have not yet considered mechanisms to insert audio tags.


I believe that we can utilize meta data for multimedia data. But the beauty of the multimedia data is that it brings in a strong experiential component that is not captured using abstract tags. So techniques needs to be developed that will create meta data that will do justice to multimedia data.

I agree. However, I'd point out that the problem is not just one of metadata creation, but of metadata access.

Metadata is inevitably thought of as "extra tags" because, first and foremost, our main interface for dealing with information is still textual. We don't have VR navigation systems, and voice-controlled systems rely on Voice-to-Text translation, rather than using voice itself as a mechanism for navigation.

Creating multimedia metadata will be key, but I suspect that this will have limited applicability until multimedia itself can be navigated in "native" (read: non-textual) form. Until both of these elements exist, I think that using text both as metadata (even if it's generated through conversions, or context, like Google Image Search does) and text-based interfaces will remain the rule, rather than the exception.

Posted by diego on November 6, 2004 at 3:56 PM

the synth look and feel: what Sun should do next

duke.jpgOne of the much-hyped new features in JDK 1.5 (or "Java 5" as we're supposed to call it now) was the new Synth Look and Feel, which is a "skinnable" L&F that allows non-programmers to create new look and feels by editing an XML file. Since creating a look and feel before involved complex acts of witchcraft, this is actually good news for programmers as well.


There's very little documentation available. The most referenced article on Synth is this one by SwingMaster Scott Violet, which is a good intro but doesn't go into much detail. There's a mini-intro over at JDC. There's a more recent article by John Zukowski over at IBM DeveloperWorks which also covers the new Ocean L&F (which replaces the absolutely-positively-obsolete Metal L&F). Then there's the API docs for Synth and the Synth descriptor file format. And... that's about it, as far as I can tell. All the examples stop at the point of showing a single component, usually a JTextField or JButton.

But, let's assume that documentation will slowly emerge. There is something that Sun should do as quickly as possible (and that in fact it should have done for this release), which is to use Synth for its own L&Fs. What better chance to show off Synth than to rewrite the Metal L&F in it? (I am fairly sure that this hasn't happened yet, since the way to load the Metal L&F remains the same, and all the Metal L&F classes remain under its javax.swing.plaf locations in the JDK 1.5 distribution).

In fact, while we're at it, why not write all the look and feels with Synth, including Windows, which would make it much easier to correct the inevitable problems with it that appear after every release (and because of which something like winlaf exists)?

This is also known in the vernacular as "eating your own dog food". :)

Re-writing Metal in Synth would also be a perfect use-case that would serve both as a testing platform and example for others. As it stands, it's hard to know if this wasn't done because of performance limitations, limitations in Synth, time-constraints, or what.

So I'd like to see Sun clearly spell out the reasons why Synth wasn't used for Metal, and where they are taking it next. I, for one, am not thrilled about the idea of yet another look and feel that will remain dead in the water (like Metal did all these years), when there are so many other important things that Sun could be improving in the JRE (platform integration, anyone?).

If all L&Fs will eventually be Synth-etized, that would simplify usage and fixes of L&Fs for all developers (and maintenance on Sun's side), and prove that Synth is the way of the future.

PS: it would also be a good idea to add built-in support for the notion of L&F hierarchies to Synth files (Currently all the commands must exist in a single file; you could create a single stream of XML descriptor out of multiple Synth files, but who's gonna do that?). Having to do copy+paste for everything and then changing two or three lines in a file because all you want is a different image somewhere doesn't sound like good practice to me.

Posted by diego on November 6, 2004 at 3:14 PM

the switch from berkeley db over to mysql

One of the objectives of my recent server switch was to move from Berkeley DB to MySQL (aside from an expected performance improvement, I was tired of seeing plug-ins for MT that I couldn't run-- example). Plus I feel more comfortable with MySQL. I followed the instructions for this in the movable type documentation and aside from some glitches during conversion (weird messages such as "WARNING: Use of uninitialized value in string eq at lib/MT/ line 754.") and one timeout (which forced me to restart the process after deleting the old tables) everything went fine.

One thing to note though, which the documentation doesn't make completely clear: when you run the mt-db2sql.cgi script, you must leave both the previous BerkeleyDB pointer as well as the configuration for MySQL. Once the conversion is done, you can comment out the BerkeleyDB location line. During the process, I also renamed the trackback and comment scripts, to avoid overlaps.

Now all seems to be running fine, and simple tests seem to show that some things are a bit faster (example: posting takes about 20 seconds, as opposed to 30 seconds before).

One more thing I can cross off the list. :)

Categories: technology
Posted by diego on November 6, 2004 at 3:01 PM


tex.jpgI keep forgetting to mention this, so here it goes. :) I wrote my dissertation in LaTeX/TeX. Which, as any sentient mammal knows, is the best system for writing papers and scientific documents ever concieved in the history of the universe (it's good for everything else too).

Okay, maybe I exaggerated a little bit. But I do like it a lot. :)

Anyway, the editor I've used throughout these last couple of years has been WinEdt, an excellent, excellent LaTeX editor which integrates seamlessly with MikTeX, IMO the best LaTeX/TeX distribution for Windows. If you must use Windows and need a good LaTeX editor, the MikTeX/WinEdt combination can't be beaten.

ps: and is the TeXbook a great read or what? :)

Categories: personal
Posted by diego on November 4, 2004 at 11:52 AM

four more years it is

The last couple of days I've been busy with a couple of other things (work and still recovering from my flu/cold from the last two weeks) but of course, politics junkie that I am, I watched closely the comings and goings of the US Presidential election.

Yesterday I watched both Sen. Kerry's concession speech and President Bush's victory speech. I thought that Kerry did a good thing in not keeping this going on for too long when it was clear that it was almost impossible to win, and I wish more people would point out that it was a graceful gesture. They could have continued on, but didn't, and everyone was spared another draining and bitter fight that would almost certainly ended up with the same result. I also thought that Bush's speech was ok, and I sincerely hope he will act on some of the things he said, and maybe (such as working to earn the support of those who didn't vote for him), just maybe, now that the GOP has such clear control, and that Mr. Bush isn't running again for re-election, he will tilt a bit more towards the center, and help generate a climate of more cooperation and mutual respect. Likely? Maybe not. Possible? Yes. As the New York Times noted yesterday: "[...] after the inevitable, and necessary, period of disappointment, mourning, and even anger, among those who opposed his re-election, there should be a period in which his calls today for partisan healing should be taken at face value." No less would have been asked of the other side had Kerry won.

Bush won a clear majority, but that was still determined only by a difference of a couple of percentage points and about 5% of the electoral votes. Half of the US still thinks differently. It's a nation where political (and even philosophical) discourse is being held on a global scale. And there has to be a way for it to be come a bit more reasonable, and reasoned. There was a brief moment after the first debate during which the campaigns suddenly started debating real issues, questions of use of force, or the US's role in the world, etc. After a few days it quickly degenerated back into the usual he said/I said baloney. But that moment showed that a real discussion is possible. Here's hoping that becomes the norm, rather than the exception (I know, I'm an idealist, what can I say).

At least what I was thinking of a clear victory more or less happened. There was no protracted legal fight, and little uncertainty, which is good (Again, kudos to Kerry for that).

Finally: One comment I got a couple of days ago pointed to William Gibson's weblog, who had started to blog again in mid-October to make his voice heard. I totally missed it, I kept the link but had unsubscribed from the feed (which you can bet won't happen again) because back at the beginning of the year he said he wasn't blogging again until his new book came out. Anyway, yesterday he had a good quote:

Virgil, as ever, has it down: "Dis aliter visum."

Categories: geopolitics
Posted by diego on November 4, 2004 at 10:40 AM

one more day

I started writing an entry that somehow, very quickly, became a complete mess of ideas.

So I'll just say this: I hope that the result tomorrow is clear-cut. A prolonged fight like in 2000 (except this time it's likely to be in multiple states) will not be a good thing. At least a clear electoral-college victory--I have the feeling this will happen, regardless of how close the popular vote is, but I have no idea why I think that!

I was watching CNN and they were describing all the new "safeguards" they put in place to avoid the embarrasment of 2000. Which means they'll make entirely new mistakes this time.

Anyway. Tomorrow night will surely be interesting. :)

Categories: geopolitics
Posted by diego on November 1, 2004 at 8:20 PM

Copyright © Diego Doval 2002-2011.