wanted: a breakthrough for testing distributed systems


The last few days I've been heads-down working most of the time on a distributed system that involves going deeply into the arcana of TCP/IP. It's an eye-opening experience. First, because Java proves once more to be a rock, obtaining pretty high data transfer rates (currently 37 KB/sec on a 50 KB/sec connection) even with all the twists and turns the code is taking. Second, but most important, it has reminded me of how little we know of distributed systems, how to build them properly, and how to test them.

By distributed systems, I mean truly distributed. I mean that you can't count on a server to be happily taking over stuff for you with a bunch of TCP ports open and a four-way processor core ready to handle incoming tasks. We might be tempted to call this peer-to-peer (as opposed to client-server) but not really, since I could easily see this being used on a "traditional" client-server server environment. The difference is subtle, in terms of what you assume on the server side, and how you get around the constraints imposed by today's Internet.

That aside, being a test-first-code-later kind of person, I tend to put the burden on testing, or the testing framework rather. So I thought I'd write down my wish-list for a distributed testing framework (as food for thought more than anything else). This framework would work as follows: you'd have a "test listener" that can run on any machine, and a "test controller" app that can run on your desktop. Once the listener is running on the other machines (and maybe even on your desktop too) you can easily choose a JAR to deploy to all the target machines, then run it. The system automatically routes the output (System.err and System.out) to your "test controller" in multiple windows. You can control any of the clients through simple play/pause/stop/restart buttons. Clear the consoles, etc. You would be able to script it, so that this whole process can be run in loops, or automatically every day or every week, or whatever. You would be able to define output values to check for that can alert you of results that don't match expectations.

Looking around, I found the DTF at SourceForge, but it seems to be dead (no binaries, and no updates since February this year). I found papers (if you look hard enough, you can find papers on every conceivable topic I guess, so this doesn't mean much), like this one. But not much, really. Or is there some vast download area somewhere that I'm overlooking?

In any case, I know for a fact that CS curricula still don't pay enough attention to testing, much less to distributed testing. For one, distributed testing is difficult to generalize. But there should be more in this area happening, shouldn't it? Or does anyone doubt that half the future lies with large scale distributed applications? (The other half is web services :-)).

Categories: soft.dev
Posted by diego on November 13 2003 at 12:58 PM
Comments (please see the comments & trackback policy).

Perhaps Joshua (http://cs.allegheny.edu:8080/gkapfham/6) might be helpful?

Posted by: jeje at November 13, 2003 3:26 PM

Jerome, thanks for the link. Joshua looks interesting. From what I can see, Joshua does distributed regression testing, rather than testing of a distributed system itself. The core might be useful for other things if it's properly done though. Something to investigate when there's some time. :)

Posted by: Diego at November 14, 2003 1:20 AM

This sounds exactly like SysUnit...

http://sysunit.codehaus.org/

The only difference is there's no graphical / command shell front end - you write the test controller in Java code to start/stop things etc. There's distributed synchronization to orchestrate the different nodes in the test case.

It also allows mulitple builds & developers to have their own classpaths - the jars are distributed for each test (as you describe)

Posted by: James Strachan at November 14, 2003 4:52 AM

I think with a name clever name like Cactus you should at least investigate it - http://jakarta.apache.org/cactus/

Posted by: mal at November 14, 2003 6:32 PM

This post keeps bugging me mostly because I don't have a good answer for it. Browsing around lead me to the Haydes Project http://www.eclipse.org/hyades/ which looks promising. Strangely coincidental is that your recent post titled "it's full of stars!" almost captures the Hyades just to the right of your snapshot of Orion in the Taurus constellation http://www.astro.wisc.edu/~dolan/constellations/java/Orion.html. From the Hyades project Formation document...

Why Hyades?
Project Hyades is named for an open cluster of stars that form the head of the
constellation of Taurus near the star Aldeberan. The Hyades cluster was involved in a
significant milestone in astronomy and physics. During the 1919 eclipse of the sun,
the Hyades enabled scientists to collect the first empirical proof of Einstein's theory of
relativity, by showing how light from the stars of the cluster was bent by the sun’s
gravitational field. The Hyades cluster has helped us measure the universe.
The Eclipse based Hyades project is a groundbreaking initiative to advance the state
of Automated Software Quality tools. Hyades is an open source integrated test, trace
and monitoring environment, based on the open source Eclipse platform, that provides
a common framework for tool interoperability across the software development
process.

Posted by: mal at November 16, 2003 4:03 AM

The latest Java Developers Journal (Nov. 2003) has an article about testing (and primarily managing) J2EE Systems with JMX and jUnit. The feature they use is JMX4ODP http://jmx4odp.sourceforge.net/

Posted by: mal at November 23, 2003 3:13 AM

Copyright © Diego Doval 2002-2007.
Powered by
Movable Type 3.35