Now blogging at diego's weblog. See you over there!

manifold, the 30,000 ft. view


As a follow-up to my thesis abstract, I wanted to add a sort of introduction-ish post to explain a couple of things in more detail. People have asked for the PDF of the thesis, which I haven't published yet, for a simple reason: everything is ready, everything's approved, and I have four copies nicely bound (two to submit to TCD) but... there's a signature missing somewhere in one of the documents, and they're trying to fix that. Bureaucracy. Yikes. Hopefully that will be fixed by next week. When that is done, right after I've submitted it, I'll post it here (or, more likely, I'll create a site for it... I want to maintain some coherency on the posts and here it gets mixed up with everything else).

Anyway, I was saying. Here's a short intro.

Resource Location, Resource Discovery

In essence, Resource Location creates a level of indirection, and therefore a decoupling, between a resource (which can be a person, a machine, a software services or agents, etc.) and its location. This decoupling can then be used for various things: mapping human-readable names to machine names, obtaining related information, autoconfiguration, supporting mobility, load balancing, etc.

Resource discovery, on the other hand, facilitates search for resources that match certain characteristics, allowing then to perform a location request or to use the resulting data set directly.

The canonical example of Resource Location is DNS, while Resource Discovery is what we do with search engines. Sometimes, Resource Discovery will involve a Location step afterwards. Web search is an example of this as well. Other times, discovery on its own will give you what you need, particularly if the result of the query contains enough metadata and what you're looking for is related information.

RLD always involves search, but the lines seemed a bit blurry. When was something one and not the other? What defines it? My answer was to look at usage patterns.

It's all about the user

It's the user's needs that determine what will be used, how. The user isn't necessarily a person: more often than not, RLD happens between systems, at the lower levels of applications. So, I settled on the usage patterns according to two main categories: locality of the (local/global) search, and whether the search was exact or inexact. I use the term "search" as an abstract action, the action of locating something. "Finding a book I might like to read" and "Finding my copy of Neuromancer among my books" and "Finding reviews of a book on the web" are all examples of search as I'm using it here.

Local/Global, defining at a high level the "depth" that the search will have. This means, for the current search action, the context of the user in relation to what they are trying to find.

Exact/Inexact, defining the "fuziness" of the search. Inexact searches will generally return one or more matches; Exact searches identify a single, unique, item or set.

These categories combined define four main types of RLD.

Examples: DNS is Global/Exact. Google is Global/Inexact. Looking up my own printer on the network is Local/Exact. Looking up any available printer on the network is Local/Inexact.

Now, none of these concepts will come as a shock to anybody. But writing them down, clearly identifying them, was useful to define what I was after, served as a way to categorize when a system did one but not the other, and to know the limits of what I was trying to achieve.

The Manifold Algorithms

With the usage patterns in hand, I looked at how to solve one or more of the problems, considering that my goal was to have something where absolutely no servers of any kind would be involved.

Local RLD is comparatively simple, since the size of the search space is going to be limited, and I had already looked at that part of the problem with my Nom system for ad hoc wireless networks. Looking at the state of the art, one thing that was clear was that every one of the systems currently existing or proposed for global RLD depends on infrastructure of some kind. In some of them, the infrastructure is self-organizing to a large degree, one of the best examples of this being the Internet Indirection Infrastructure (i3). So I set about to design an algorithm that would would work at global scales with guaranteed upper time bounds, which later turned out to be an overlay network algorithm (which ended up being based on a hypercube virtual topology), as opposed to the broadcast type that Nom was. For a bit more on overlays vs. broadcast networks, check out my IEEE article on the topic.

Then the question was whether to use one or the other, and it occurred to me that there was no reason I couldn't use both. It is possible to to embed a multicast tree in an overlay and thus use a single network, but there are other advantages to the broadcast algorithm that were pretty important in completely "disconnected" environments such as wireless ad hoc networks.

So Nom became the local component, Manifold-b, and the second algorithm became Manifold-g.

So that's about it for the intro. I know that the algorithms are pretty crucial but I want to take some time to explain them properly, and their implications, so I'll leave that for later.

As usual, comments welcome!

Categories: science, soft.dev, technology
Posted by diego on December 3 2004 at 12:41 PM

Copyright © Diego Doval 2002-2011.
Powered by
Movable Type 4.37