December 22, 2004

a bit more on manifold-g

Since the Manifold-B algorithm is relatively straightforward, I thought I'd spend a bit more time on Manifold-G. Plus Manifolg-G is a bit cooler and has potentially more applications than this one.

Revisiting: Manifold-G creates a virtual n-cube topology on top of the network, to make search paths predictable (on the virtual topology at least).

The core idea is that any name is in essence a bit-string. This bit string can be obtained directly by either converting the ASCII or Unicode into its bit values, or by hashing (which can cause overlap, but that's not a big problem). Normally you'd use hashing anyway, to compress the string space a bit since lots of strings are not valid names and would never be used.

While a lot of what relates to Manifold-G in the dissertation is treated with set theory, it's useful to visualize it in an euclidean space. Here, the bit strings represent the value for the axis in a particular dimension. Thus a 3-bit string would create a three dimensional cube, a 4-bit string a 4-cube, and so on). In this virtual space the strings are connected by the (imaginary) edges of the n-cube in question, which in the real network become node connections.

Now, a cube can be "navigated" along its edges, and the path between any two nodes is always predictable (and easy to calculate). This is why, for example, hypercube topologies where used in parallel systems like the Connection Machine. The problem is that a network will never be a complete hypercube, it will always be incomplete.

So the question was, how to make it behave as complete, even when it wasn't? The key was to create a "shadow" structure that would also be built with predictable rules. This shadow structure would be self-organizing (and self-balancing), distributing load between nodes evenly. Thus the concept of shadow nodes. Shadow nodes are "representatives" of a node that should be present in the network but isn't. At its simplest, the shadow nodes are actual memory structures (as in the sample code at the end), but because the balancing is mathematically predictable, they can largely be implicit by the values on the nodes. Thus the shadow nodes can be used to "complete" the topology and retain the bounded properties of a hypercube even when not all the vertices for it are present.

I imagine this explanation is a bit confusing, so here is a sample presentation (HTML) that shows how an incomplete 3-cube being built. Hopefully that'll clarify things. :)

Finally, some code. Here's a ZIP file containing a sample manifold-g simulator that runs locally (ie., no network). Crucially though, the information used by the nodes is all local. A small manifold-g network (say, max 5,012 nodes) could be created using exactly this technique (which is easier to see in code, but has the downside of average memory usage = [max number of nodes - actual number of nodes / actual number of nodes]) by simply replacing node-to-node calls with network calls.

PS: The code is GPL'ed, for simplicity.

PS2: yes, there are security issues with this, but they are actually not all that different from DNS itself or any other P2P algorithm for that matter. But that's a topic for another post. :)

PS3: as usual, comments and questions most welcome!

Posted by diego at 12:09 PM | Comments (0)

the manifold algorithms

Okay, so, roughly, what are the Manifold algorithms? They are two different self-organizing algorithms. The first, Manifold-B (Manifold-Broadcast) is used for local/inexact searches, while Manifold-G (Manifold-Global) is used for global self-organizing exact resolution.

Manifold-B

Manifold-B is a typical broadcast-style (or flooding-style) p2p system, similar to gnutella. Basically, on startup a node looks for one or more nodes already in the network (using a list of previously known nodes if available) and establishes connections to some or all of them (depending on individual node load, etc). Once at least one connection is complete queries can begin to propagate. Queries are constrained by a time-to-live (TTL) parameter, propagating across all connections of each node as long as the maximum number of hops has not been reached, as is common in these types of networks.

Some advantages of Manifold-B are:

Some disadvantages:

Manifold-G

Manifold-G is an Overlay, basically a virtual topology built on top of an actual network. Why an overlay? Because when building a virtual topology, we can give it a well-defined structure, that is predictable, with lookup path lengths bounded by a certain limit.

In the case of Manifold-G, the topology used is that of an n-dimensional hypercube (or n-cube, or l-cube, depending on the letter used to specify dimensions :)). Manifold-G assigns each name a particular bit-string, which is controlled by the node with that name (local control was an important component of name resolution, other overlays don't necessarily guarantee locality in that sense, and although you could force them to, they are mostly designed as distributed hashmaps, which maintain a set of keys on each node--which has other great uses!).

The tricky part is that a hypercube can only be navigated predictably when it is complete, i.e., when all its nodes (vertices) are present. So Manifold-G uses the predictability of a structure to its advantage to virtualize those missing nodes. More on the specifics in a later post (if you can't wait, you can also check the text of the dissertation).

Manifold-G advantages:

Disadvantages:

A couple of things

Aside from serving well for the two main usage modes that I was targeting, the algorithms complement each other. For example, Manifold-B has much lower consistency requirements and so can operate when Manifold-G might not be available, and thus either partially cover some queries, or help Manifold-G in building the network.

It has been shown that it is possible to embed broadcast trees within overlays, but in my opinion maintaining two networks is a small price to pay for the advantages of both, particularly on wireless ad hoc (small-to-medium), where Manifold-B has a clear advantage over the overlay.

Posted by diego at 11:27 AM | Comments (0)