diego's weblog: February 2004 Archives
weather chaos: an analysis of its strategic implications
Related to my post a few weeks ago on 'weather chaos', I was just reading a Pentagon report on the strategic implications of such a change (SF Chronicle article here). Here's a link to the full report (PDF, about 1 MB). Quote from the summary:
There is substantial evidence to indicate that significant global warming will occur during the 21st century. Because changes have been gradual so far, and are projected to be similarly gradual in the future, the effects of global warming have the potential to be manageable for most nations. Recent research, however, suggests that there is a possibility that this gradual global warming could lead to a relatively abrupt slowing of the ocean’s thermohaline conveyor, which could lead to harsher winter weather conditions, sharply reduced soil moisture, and more intense winds in certain regions that currently provide a significant fraction of the world’s food production. With inadequate preparation, the result could be a significant drop in the human carrying capacity of the Earth’s environment.Harsh, yes, but a good objective analysis as far as I can see. Must read. Only criticism I can think of is that the report seems to downplay completely internal strife within the US, something that would be unlikely given that coastal communities would be seriously disrupted creating internal migration patterns and the subsequent pressures on society (not counting that the SF Bay area and the New York area are responsible for huge amounts of the economic output of the US). They predict that the political integrity of the EU would be in doubt given these conditions, but similar (though milder) results could be expected within the US. Maybe they consider that as part of the "internally manageable" stuff... I'm not sure.
I'm working non-stop, but that doesn't mean I can't take a break for a moment and read depressing stuff like this. Right.
feedvalidator bug or feature?
I was just alerted that checking this feed against the feedvalidator gives an "This feed is invalid" message. You can try it yourself by clicking here. Now, having checked the feed against the validator when I did the change to valid RSS 2.0, I was surprised. The error in particular says "description should not contain onclick tag". Essentially it's complaining about an onclick handler on one of the HREFs, which is used for a popup window of an image.
Reading the description of the error I get this explanation:
Some RSS elements are allowed to contain HTML. However, some HTML tags, like script, are potentially dangerous and could cause unwanted side effects in browser-based news aggregators. In a perfect world, these dangerous tags would be stripped out on the client side, but it's not a perfect world, so you should make sure to strip them out yourself.I was sure there was nothing like this on the spec itself, and re-reading it (just in case) proved it.
The disconnect here is that this seems to me to be a guideline, something more suitable for a warning, than something that would mark the feed as "not valid", when it clearly is valid as far as the spec is concerned...
Interestingly enough, the RSS Validator (at rss.scripting.com) does (correctly) validate the feed, you can click here to see the result. I thought they were based on the same code, but clearly there are some differences.
Update: Dave replies, clarifying that they haven't made significant changes to the validator since they took a snapshot of the sources a few weeks ago, and notes that he's seen other instances of this recently.
I had a hunch. Curiosity overtook me. :) I downloaded the sources and poked around it a bit. Unless I'm reading it completely wrong, my suspicion was confirmed: the RSS and Atom validators share most of the code (which is just good design sense, since they are doing very similar things). However, this also means that errors that are flagged for Atom are also in some cases being flagged for RSS. For example, the error that I described above is detected by the function check4evil in validator.py, which is called by htmlEater in the same file, which itself is called in item.py (which parses RSS items). In his reply Dave describes a different case though, of the validator rejecting duplicates which is being done (as far as I can see) only for RSS (in the call do_pubDate(self) in item.py).
I definitely think that these two things are more "should fix" type of guidelines than problems that define non-validity, according to the RSS 2 spec.
On a related note, the version history for some of the validator sources note several changes in the last few weeks. Lots of activity there.
Another update: Sam (through email) encourages me or others interested to post to the list at sourceforge with suggestions. He also points to a message on that list from Phil Ringnalda in which he comments on what's discussed in this post (warnings vs. errors) and finally, as reference, he sends a link to the original bug tracker item which is related to this issue (and includes dates of posting, resolution, etc.). Regarding warnings, mostly I agree with Phil, but hopefully I'll have time to add my 2c to the list during the next couple of days, even if it's something small (lots of work which has nothing to do with this, and that has priority of course...).
Plus: Generic question: what happens when a validator of anything is truly open source? This applies to RSS, Atom, or whatever format that requires validators. Suppose that in a couple of years the original designers of the validator have moved on. New developers have taken over. After some time, they decide that X is good (X is something that most people agree is goot, but is definitely not on the spec). So they update the validator to reflect those views. Meanwhile, the spec hasn't changed. It seems to me that in this case either the validator loses credibitily, or the spec does. Neither option is good (but the spec losing credibility is worse IMO). I wonder what the experience of other formats has been in this regard, but I do think that having many validators is good, and that would be an automatic safeguard against one validator suddenly redefining the spec by itself, or taking it on a different direction. XML is probably a good example, there are many XML validators around... is that why XML has remained stable?
subtlety in 'the simpsons'
Through a referral chain I found Subtly Simpsons which documents some of the great background references made by the writers. I had caught most of those, but some where new to me. Worth the read.
Recently there have been a couple of things that I can't get out of my head regarding the Simpsons which, while not cultural references, totally crack me up. One is when Moe becomes friends with Maggie and they are spoofing The Godfather saga left and right, and they have to go find Maggie, who followed some mobsters to an Italian restaurant. The following exchange ensues:
Moe: We're going to Little ItalyHeh. The other one is when Homer is trying to prove to himself (and Bart and Lisa) that he's still "cool" and he's trying to explain to this Dad about it:
Homer: You wouldn't understand, Dad, you're not 'with it'.LOL.
can you spell "hoax"?
I'll try to describe my thinking process in the two (2) seconds that followed reading this (and I recommend you get to the end of this post, where the mystery is revealed!).
As linked in the previous paragraph, Seb at Many2Many has posted a link to a message on the reputation mailing list where Orkut is "outed" as a "Master's thesis" of a random person who (they say) works for Orkut (the man, not the site).
Let's see. You are Google, right? You are a 1500+ person company, one of the most respected in the world, that is soon to be going public. So when a developer comes up from somewhere saying they want to do a "Master's thesis" using the company's name and reputation (including a link of "in affiliation with Google), you say "suure, go ahead". Furthermore, the student wishes to remain anonymous, so another developer (Orkut) is recruited (somehow) to use this own name and take all the credit and responsibility.
So far so good, right?
Now you launch, and after a few days pass the experiment is a success. So this student (presumably Eric Schmidt at this point, or his alter-ego) tells Google to throw a party (and he gets random people to talk about it afterwards) in a posh San Francisco location for hundreds of people. Since Orkut (the man) is still the "patsy", the party in question is announced as Orkut's idea, and makes it coincide with his birthday. To this, Google says "but of course! My pleasure!" and happily pays for the expenses (alternatively, this student is also a millionaire and pays out of his own pocket).
Luckily for everyone involved, someone posts a message to a mailing list quoting an "article" from an unknown writer with no links to an organization of any kind (even a personal site) to back it up.
But of course, it must be true!
Now you look for the source, and you discover it's this page at HACT (What do you mean what's HACT? you mean you've never heard of it? What rock have you been hiding under?).
Oh, but wait, this is the same page that, at the bottom, says: "Please note that this is a humor article and is not true in any way, shape or form, except in that it rings true in a scary way".
Damn. It wasn't true I guess. I thought that my explanation above was so incredibly reasonable, so universal, that you could post happily in indignation about it without thinking twice.
Hm. So you mean I shouldn't post about that message I read somewhere that said that Orkut in reverse spelled the name of the Alien race that actually sent this student to Stanford to do get a Masters and discover what the earthlings are up to?
PS: I find it interesting that something so patently unbelievable could disseminate at all without a ton of smileys and LOLs before and after the text.
link from places unexpected
Just looking at my referers, I see a link that looks a bit strange. I go there. It turns out it's MIT's Research and Innovation project to help MIT's faculty, staff, students and researchers, pointing to my introduction to weblogs. It made me feel good to see a small contribution of mine being useful, even in environments like MIT where they have so many resources at their disposal. :)
Reminds me of the truism that content, especially digital, has a life of its own.
what's in an IPO?
Wired has a good article in their latest issue: The complete guide to google, which actually starts by talking about the challenges any company goes through when pre-, in- and post-IPO. Just the first section alone makes it worth reading. Oh, yes. It talks about Google too. :)
and the 'ethics' rant...
...leaves me incredibly mixed feelings. I don't like the idea of questioning, questioning, questioning, without any clear answers, but that's all there is in this case, or rather, all I see.
This topic has been brewing in my head for some time, and I think it was the open-ended nature of it that kept me from mentioning it. It is easy to misconstrue (or misinterpret--both failures of this writer's limited abilities, and not of the reader) the ideas as a rant against software, or technology, or nuclear weapons, or whatever; when the point is most decidedly not whether these things are good or bad but whether we have some ideas and some ground covered before we turn them into reality or we're always left scrambling after the fact.
Even as it is, in some senses, personal, limited, and introspective, its scope and scale are still daunting.
The good news is: that's never stopped us before. :-)
the lack of an ethics conversation in computer science
In April 2000 Bill Joy published in Wired an excellent article titled Why the future doesn't need us. In the article he was saying that, for once, maybe we should stop for a moment and think, because the technologies that are emerging now (molecular nanotechnology, genetic engineering, etc) present both a promise and a distinctive threat to the human species: things like near immortality on one hand, and complete destruction on the other. I'd like to quote at relative length a few paragraphs with an eye on that I want to discuss, so bear with me a little:
["Unabomber" Theodore] Kaczynski's dystopian vision describes unintended consequences, a well-known problem with the design and use of technology, and one that is clearly related to Murphy's law - "Anything that can go wrong, will." (Actually, this is Finagle's law, which in itself shows that Finagle was right.) Our overuse of antibiotics has led to what may be the biggest such problem so far: the emergence of antibiotic-resistant and much more dangerous bacteria. Similar things happened when attempts to eliminate malarial mosquitoes using DDT caused them to acquire DDT resistance; malarial parasites likewise acquired multi-drug-resistant genes.(My emphasis). What Joy (who I personally consider among some of the greatest people in the history of computing) describes in that last sentence is striking not because of what it implies, but because we don't hear it often enough.
When we hear the word "ethics" together with "computers" we immediately think about issues like copyright, file trading, and the like. While at Drexel as an undergrad I took a "computer ethics" class where indeed the main topics of discussion where copying, copyright law, the "hacker ethos", etc. The class was fantastic, but there was something missing, and it took me a good while to figure out what it was.
What was missing was a discussion of the most fundamental problems of ethics of all when dealing with a certain discipline, particularly one like ours where "yesterday" means an hour ago and last year is barely last month. We try to run faster and faster, trying to "catch up" and "stay ahead of the curve" (and any number of other cliches). But we never, ever ask ourselves: should we do this at all?
In other words: what about the consequences?
Let's take a detour through history. Pull back in time: It is June, 1942. Nuclear weapons, discussed theoretically for some time, are rumored to be under development in Nazi Germany (the rumors started around 1939--but of course, back then most people didn't quite realize the viciousness of the Nazis). The US government, urged by some of the most brilliant scientists in history (including Einstein) started the Manhattan project at Los Alamos to develop its own nuclear weapon, a fission device, or A-Bomb. (Fusion devices --also known as H-bombs-- , that use a fission reaction as the starting point and are orders of magnitude more powerful, would come later, based on the breakthroughs of the Manhattan Project).
But then, after the first successful test at the Trinity Test site in July 16, 1945, something happened. The scientists, which up until that point had been too worried with technological questions that they had forgotten to think about the philosophical ones, realized what they had built. Oppenheimer, the scientific leader of the project, famously said
I remembered the line from the Hindu scripture, the Bhagavad-Gita: Vishnu is trying to persuade the Prince that he should do his duty and to impress him he takes on his multi-armed form and says, "Now I am become Death, the destroyer of worlds."While Kenneth Bainbridge, in charge of the test, later said at that time that he told Oppenheimer:
"Now we are all sons of bitches."Following the test, the scientists got together and tried to stop the bomb from ever being used. To which Truman said (I'm paraphrasing):
"What did they think they were building it for? We can't uninvent it."Which was, of course, quite true.
"All of this sanctimonious preaching is all well and good" (I hear you think) "But what the hell does this have to do with computer science?".
When Bill Joy's piece came out, there was a lot of discussion on the topic. Many reacted viscerally, attacking Joy as a doomsayer, a Cassandra, and so on. Eventually the topic sort of died down. Not much happened. September 11 and then the war in Iraq, surprisingly, did nothing to revive it (contrary to what one might expect). Technology was called upon in aid of the military, spying, anti-terrorism efforts, and so on. The larger question, of whether we should stop to think for a moment before rushing to create things that "we can't uninvent" has been largely set aside. Joy was essentially trying to jump-start the discussion that should have happened before the Mahattan project was started. True, given the Nazi threat, it might have been done anyway. But the more important point to make is that if the Manhattan Project had never started, nuclear weapons might not exist today.
After WW2 Europe was in tatters, and Germany in particular was completely destroyed. There were only two powers left, only two that had the resources, the know-how, and the incentive, to create Nuclear Weapons. So if the US had not developed them, it would be reasonable to ask: What about the Soviets?
As it has been documented in books like The Sword and the Shield (based on KGB files), the Soviet Union, while powerful and full of brilliant scientists, could not have brought its own nuclear effort to fruition but for two reasons: 1) The Americans had nuclear weapons, and 2) they stole the most crucial parts of the technology from the Americans. The Soviet Union was well informed, through spies and "conscientious objectors" of the advances in the US nuclear effort. Key elements, such as the spherical implosion device, were copied verbatim. And even so, it took the Soviet Union two four more years (until its first test in August 29, 1949) to duplicate the technology.
Is it obvious then, that, had the Manhattan project never existed, nuclear weapons wouldn't have been developed? Of course not. But it is clear that the nature of the Cold War might have been radically altered (if there was to be a Cold War at all), and at a minimum nuclear weapons wouldn't have existed for several more years.
Now, historical revisionism is not my thing: what happened, happened. But we can learn from it. Had there been a meaningful discussion on nuclear power before the Manhattan Project, even if it had been completed, maybe we would have come up with ways to avert the nuclear arms race that followed. Maybe protective measures that took time, and trial, and error, to work out would have been in place earlier.
Maybe not. But at least it wouldn't have been for lack of trying.
"Fine. But why do you talk about computer science?" Someone might say. "What about, say, bioengineering?". Take cloning, for example, a field similarly ripe with both peril and promise. An ongoing discussion exists, even among lawmakers. Maybe the answer we'll get to at the end will be wrong. Maybe we'll bungle it anyway. But it's a good bet that whatever happens, we'll be walking into it with our eyes wide open. It will be our choice, not an unforseen consequence that is almost forced upon us.
The difference between CS and everything else is that we seem to be blissfully unaware of the consequences of what we're doing. Consider for a second: of all the weapon systems that exist today, of all the increasingly sophisticated missiles and bombs, of all the combat airplanes designed since the early 80's, which would have been possible without computers?
The answer: Zero. Zilch. None.
Airplanes like the B-2 bomber or the F-117, in fact, cannot fly at all without computers. They're too unstable for humans to handle. Reagan's SDSI (aka "Star Wars"), credited by some with bringing about the fall of the Soviet Union, was a perfect example of the influence of computers (unworkable at the time, true, but a perfect example nevertheless).
During the war in Iraq last year, as I watched the (conveniently) sanitized nightscope visuals of bombs falling on Baghdad and other places in Iraq, I couldn't help but think, constantly, of the amount of programs and microchips and PCI buses that were making it possible. Forget about whether the war was right or wrong. What matters is that, for ill or good, it is the technology we built and continue to build every day that enables this capabilities for both defense and destruction.
So what's our share of the responsibility in this? If we are to believe the deafening silence on the matter, absolutely none.
This responsibility appears obvious when something goes wrong (like in this case, or in any of the other occasions when bugs have caused crashes, accidents, or equipment failures), but it is always there.
It could be argued that after the military-industrial complex (as Eisenhower aptly described it) took over, market forces, which are inherently non-ethical (note, non-ethical, not un-ethical), we lost all hope of having any say in this. But is that the truth? Isn't it about people in the end?
And this is relevant today. Take cameras in cell phones. Wow, cool stuff we said. But now that we've got 50 million of the little critters out there, suddenly people are screaming: the vanishing of privacy! aiee!. Well, why didn't we think of it before? How many people were involved at the early stages of this development? A few, as with anything. And how many thought about the consequences? How many tried to anticipate and maybe even somehow circumvent some of the problems we're facing today?
Wanna bet on that number?
Now, to make it absolutely clear: I'm not saying we should all just stow our keyboards away and start farming or something of the sort. I'm all too aware that this sounds too preachy and gloomy, but I put myself squarely with the rest. I am no better, or worse, and I mean that.
All I'm saying is that, when we make a choice to go forward, we should be aware of what we know, and what we don't know. We should have thought about the risks. We should be thinking about ways to minimize them. We should pause for a moment and, in Einstein's terms, perform a small gedankenexperiment: what are the consequences of what I'm doing? Do the benefits outweigh the risks? What would happen if anyone could build this? How hard is it to build? What would others do with it? And so on.
We should be discussing this topic in our universities, for starters. Talking about copyright is useful, but there are larger things at stake, the RIAA's pronouncements notwhistanding.
This is all the more necessary because we're reaching a point were technologies are increasingly dealing with self-replicating systems that are even more difficult to understand, not to mention control (computer viruses, anyone?), as Joy so clearly put it in his article.
We should be having a meaningful, ongoing conversation about what we do and why. Yes, market forces are all well and good, but in the end it comes down to people. And it's people, us, that should be thinking about these issues before we do things, not after.
These are difficult questions, with no clear-cut answers. Sometimes the questions themselves aren't even clear. But we should try, at least.
Because, when there's an oncoming train and you're tied to the tracks, closing your eyes and humming to yourself doesn't really do anything to get you out of there.
Ant within Eclipse: switching JDKs and finding tools.jar
I've been doing quite a lot of work creating new Ant build processes and grokking Eclipse (installing and reinstalling on different machines), and this is a problem that keeps recurring. This morning I cleaned up vast amounts of garbage on my main Windows machine, garbage left over from old J2SDK installs (I had FOUR--when will Sun fix the install problem?) and I reinstalled JDK 1.4.2_03 and then tried running everything again within Eclipse (v3.0 M7). Needless to say, Ant was running fine before, after I had pointed to tools.jar but now that I had changed JDKs it wasn't guaranteed that it would run--and it didn't. While it is possible that this is so well known that people do it without thinking, it certainly isn't clearly documented, and it's a situation that people will probably find regularly doing a clean install of Eclipse and the JDK on a machine, or when upgrading JDKs after the settings have been done long ago--and forgotten. :)
First, the situation. On restart, Eclipse correctly detected the new JRE (clearly from the registry entries created by the JDK/JRE install) to the one the JSDK installs in C:\Program Files\Java\... but it's better to change the pointer to the JRE within the JDK IMO. Even then, Ant doesn't work. The error message you get is for Ant:
Of course, JAVA_HOME is pointing to the right location, in both the OS environment and within Eclipse (This variable can be set within Eclipse through Window > Preferences > Java > Classpath Variables).
So how to fix the Ant build problem?
I found various solutions searching, for example running Eclipse with "eclipse -vm [JDKPATH]\bin" but that didn't quite satisfy me (I wanted something that could be configured within the environment). Other solutions to the problem where even more esoteric.
The best solution I've found (after a little bit of digging through Eclipse's options) is to edit Ant's runtime properties. Go to Window > Preferences > Ant > Runtime. Choose the Classpath tab. Select the Global Properties node in the tree and click Add External JARs. Select tools.jar from your JDK directory (e.g., j2sdk1.4.2_03\lib\tools.jar). Click Apply and OK, and you should be on your way. Not too hard when you know what to do. Now if this could only be done automatically by Eclipse on install...
so long, ZIP
I'm spending a couple of hours today retiring the few backups I still have on ZIP (250 MB) and moving them over to CD-R. ZIP is just too slow for large amounts of data (at least compared to 48X CD drives) and keeping two separate mediums (CD and ZIPs) is too much of a pain. Plus, CD-R is simply too inexpensive these days to justify ZIPs (can't say I've tried the new 750 MB ZIPs, but I'm not inclined to either). I'm disconnecting the drive (which I haven't used in the last 3 months) and leaving it there for one of those just-in-case situations. After all, it is light and easy to carry around... so it's useful when making backups on the road (unless your notebook has a built-in CD-R that is--mine doesn't).
It was good while it lasted, ZIP was a great technology in its early days, and it certainly had a good 3-4 year run, considering how fast things move in storage technologies. Now to wait for the day when DVD-Rs replace CD-Rs...
the G5 has landed
(click on the image to see a larger version).
More later. :-))
gender and computer science
Jon had a good post a couple of days ago titled gender, personality, and social software, based on a column of his at InfoWorld: is social software just another men's group?. He makes some interesting points in both.
There is part of that thread that I wanted to comment on (but didn't get around to doing it until now for some reason!), and it's the question of how much computer science is "gender biased." Towards men, of course.
Having spent a good part of the last 10 years in academic institutions one way or another in various countries and continents (as well as in companies of all sizes), here are my impressions. This is of course, just what I've observed.
That there are few women in computer science is obviously true. Surveys or not, you can see it and feel it. That said, I have noted that something similar happens in other disciplines, such as civil engineering. In the basic sciences, there are more men than women in Physics for example, but the difference is not as marked. In Chemistry, or Biology, the differences largely disappear.
One thing I can say, from experience, is this: of my groups of students, both here in Ireland and in the US, an interesting thing happened: even though there are fewer women (much fewer) than men, the number of women that are very good is roughly similar to the number of men that are very good. (Hacker types, the "take no showers or bathroom breaks until I finish coding this M-Tree algorithm using Scheme, just for the fun of it" have been invariably men in my experience, and generally there has been, if any, one of those at most per class, but I have no doubt that there are women like this, I just haven't met them. :)) Note that I'm referring specifically to computer hacking (in the good sense) here--I know women with the same attitude toward their work, just not computer hacking :).
To elaborate a bit on the point of the last paragraph, if, say you have a CS class of 40 people, maybe 5 at most would be women. But of those five women, two would be very good. And there would be maybe three, at most four good computer-scientists-in-brewing on the boys' side.
My conclusion after all this time is that there's a difference of quality over quantity. In some weird way, talent for computer science seems to me to be constant regardless of gender (maybe this is the case for everything?). There might be more men doing development, sure, but there are also more that are not very good at it (or do it for the wrong reasons, such as money, or parent's pressure, or just "because"; in my opinion, if you don't really like doing something, you shouldn't be doing it, period.)
The other thing I've noticed in recent years is that, as software (and hardware) have become more oriented towards art, social and real-world interactions (The stuff done at the Media Lab is a good example), I've seen more women on that side of the fence. In fact, in some of these areas women dominate the landscape.
Now, I don't want to get carried away on speculating on the reasons for this split since they would almost certainly be hand-waving of the nth order. I will say however that I think that sexism (which I despise--for example I enjoy James Bond movies but the blatant misogynism in them gives me the creeps--, and, btw, if you want to know how serious I am in using the word "despise" here, you can read this to see what I think about semantics in our world today) has to be partly a factor here. But I'm sure there are others, and history plays a part too. Consider that we're still using UIs and sometimes tools that have very clear roots twenty, sometimes thirty or even forty (!) years ago (e.g., LISP, or COBOL, or Mainframes). Back then gender-based prejudices were even worse, and it's reasonable to assume that we're still carrying that burden in indirect fashion.
So maybe it's not a surprise that now that we're working on technologies that had its start ten or fifteen years ago women are getting more into it? Maybe. I sure hope so.
What do others think? Women's opinions are especially welcome. And if any of this sounds ridiculous (I'm under no illusion that what I've said here is completely accurate), please feel free to whack me in the head. I'm taking painkillers for a horrible pain in the neck I have, so it won't hurt too much. :-)
the new yahoo search
This happened as predicted a little more than a month ago. The new Yahoo! search design had been active for some time already, but using Google for results.
Now consider that Google itself is working on a new design (here are some screenshots of what it might look like, via Aaron). The new Yahoo search looks like a more modern version of what Google does today (at least to me, this is of course subjective). But Google might be changing its design soon. So Yahoo! will end up looking like a "nicer" Google, and Google will end up looking like something else. Funny, isn't it?
I can immediately tell that the results it provides are very good. Comparisons with Google's results for similar keywords show similar (though a bit slower) speeds, less obsession with trackbacks and such, and a good mix of weblog and non-weblog results. I got the Firebird/Firefox plugin for Yahoo! search and replaced my current default (Google of course, although I tried Teoma for a while, it didn't work as well). Let's see if the results are consistently good enough that they convince me to switch.
Plus: here is the link to find the search plugin for the Firebird/Firefox search bar. (Look for "Yahoo" on its own).
that's the spirit!
the russian space program wakes up
From CNN: Russia to build new spacecraft:
The new craft will be able to carry at least six cosmonauts and have a reusable crew section, Russian Aerospace Agency director Yuri Koptev said at a news conference. Soyuz carries three cosmonauts and isn't reusable.I'll forget about the political implications for the moment (a new cold-war style space race?) and just be happy that things seem to be moving again in this area. I'll say one thing though: I'm sure that this and this had a little something to do with it. :)
not everything that shines is made of gold...
Over the last few days an interesting story has developed in the US marketplace, namely Vodafone's bid for AT&T Wireless and then Cingular's counter-bid (Cingular won today). The economist has a couple of interesting articles on it (see Who's the real winner? and Vodafone's dilemma), noting that AT&T Wireless might be less of a prize than one might think at first sight. Problems are not only related to technology integration (AT&T Wireless runs two networks, on different technologies) but also to cost and the real potential of the US market.
The technology is moving so fast that business models are also very susceptible to shifts (e.g., is it content they're selling? Bandwidth? Hosted services? A platform? All of the above?), and so making it much more risky to potentially get stuck with old-to-new rather than new-to-next generation transitions. In my view, Vodafone might have actually been lucky in losing this bidding war. It's not just subscriber numbers that count.
Plus: some good comments on the topic over at Wi-Fi Networking News.
Meet Tim McAuley (or his blog rather), the newest member of the clevercactus team. :-) (He actually starts working with us next week). When I met him a few weeks ago he was (as he mentions in his post) a bit of a skeptic regarding weblogs. Okay, maybe more than a bit. You can imagine, however, that I babbled on about the benefits of weblogs, decentralized communities, and so on, for quite a while, enough for him to consider giving it a try. Very cool.
Tim, welcome to the cat squad! :-)
more on demo 2004
Lots of cool announcements for Demo 2004. Big focus on weblogs and decentralized communities (which I find to be intimately linked with weblogging, in spirit at least if not in practice). As a follow up to my previous post on WaveMarket's release of location-based moblogging tool (here's Russ's own entry on the topic). Doc has a good set of pointers to what went on, but here are a couple of other things that caught my attention:
location-based blogging on mobiles
From the press release:
WaveIQ consists of three software products, all now available:Uber-blogger eh? :-)
Sounds very cool. Congrats again, Russ. And, when can we try this on for size? :)
A few days ago I was asking on #mobitopia what people preferred as a wiki/weblog system and someone (I think it was csete) mentioned SnipSnap. I didn't have time to try it out until today. My comments: WOW.
It took me literally five minutes to set up. It seamlessly connected to the local mysql installation (all I had to do was create a db and a user for it) and ran under my Tomcat/Apache config. After setting a couple of options I was on my way. It combines the idea of Wikis (easily creating links to pages) to the format/structure/features of a weblog. The "wikiness" of snipsnap does not extend to requiring WikiWords, which is, as far as I'm concerned, a relief. WikiWords inevitable end up requiring weird names for links.
It's a java app, so it runs everywhere. The only potential problem I could find is that in edit mode there are tons of options to edit content and sometimes it can be confusing (or rather, a little overwhelming), but I get the impression that it wouldn't be hard to get used to it.
If you're looking for a weblog/wiki solution in Java that it's easy to get started with, SnipSnap is definitely worth checking out.
At ETech Matt Webb presented Glancing (slides here):
Glancing is an application to support small groups by simulating a very limited form of eye contact online. By small groups I mean about 2 to a dozen people.Which covers part of the often overlooked area of underlying and implicit group dynamics, rather than the more overt and explicit kind. Anne has some interesting comments on it too. Very interesting ideas. Will have to think more about this (I've been saying this a lot recently both here and to myself, which is probably a measure of how many other things I have to do... so many things going on this week... :)).
eclipse 3.0 M7 release
Finally! As R.J. said, commenting on my recent entry about IDEs, Eclipse 3.0 M7 has been released today. There are a number of changes, including better templates and managing of annotations, new ways of navigating code (inherited methods for example), and improvements to SWT components such as tables and trees, and the browser components. Also, there is new automagic creation of bundles of Mac OS X (which can be done through a free Apple tool as well, but Eclipse makes it much easier), better "scalability" of the UI and a number of APIs have finally been frozen.
Anyway. Will have more comments once I've used it for a bit longer.
the key is real, the lock is not
In the movie The Game part of the plot centered around a (simulated) "attack" on a rich man (Michael Douglas) that forced him to give up the passwords and such to his bank accounts by intercepting the cell phone call and answering it, pretending to be the bank. The basic idea (make the environment familiar enough so that you slip up) has been used online in various forms, but so far any attentive person could figure out that things were not what they seemed.
Don has posted about an unsettling idea he calls visual spoofing. Essentially he's exposing the biggest threat of all: that we end up becoming used to our UIs to the point where we trust them implicitly.
I brought up the movie at the beginning because Don's example is the online version of it (granted, there are details missing, but does anyone doubt that you could conceivably spoof the entire UI? And what then?). Douglas' character in the movie has no way at all of telling that the person on the other side is not working for the bank, but for the enemy. His keys (passwords) are intact, but the lock (bank) isn't real.
The problem is, at the core, that we tend to guard (and trust, or distrust), the key, while we implicitly trust the lock. Why? The lock is "solid, real". It's "unmovable": built into the door, or ever present in your computer screen. The key can be duplicated without you knowing. The lock cannot.
Except that the locks we've got on computer screens are themselves open to duplication. Seamless. What Don is talking about is applied to browsers. But given the ever-present infestation of all kinds of worms and viruses, how long will it take until this applies to other software too? Software that monitors keypresses has been around for a long time, but digging through all the information generated is a mess (nevermind having to get it out of the machine). This is targeted, targeted at the user, not at the system. You could simulate accounting software. Social engineering meets cracking, or phreaking (no, I don't like to use the term hacking, which I prefer to use in its original context).
Thanks, Don, for the eye-opener. Looking forward for the follow up where he'll talk about an idea he had to minimize this problem. I don't want to start thinking about possible solutions yet: I haven't even finished absorbing all the implications.
first day at the new office
Here are some pictures I took of the offices. There aren't that many of our space (actually I appear to have a certain fixation with my own desk!). I did take a couple but they were out of focus so I didn't include them. We spent the afternoon rearranging the desks and so on. When I was getting back home I realized we should have taken pictures of the place before we got it ready. Oh well. I suppose we can always spend a day messing it up, taking pictures and then putting it back together. :)
Technically the first day was yesterday, but not all the legalities were finished and we just used one of the meeting rooms and couldn't set up the offices.
The phones we got installed are Cisco IP phones: they connect through Ethernet and have an Ethernet-100 output that gives us Internet access on each desk (from the phone!). I babble about the phone because this is the first desk phone that I've used that has its own IP address, network config, DNS setup, :)), ringtones like mobiles, LDAP directory integration (!), updatable firmware, and the first one for which I've found an online tutorial (!). Very cool. (And maybe I'm wrong, but I think those are the phones they use at the fictional CTU in the series 24, which I've always liked). Anyway, not that the cables are that important, once we finished arranging the space it was about 5 minutes before I set up an 802.11g router.
Anyway. It's been one of those days were you feel the change. We were talking with Paul about yesterday's meeting and he said "It feels like it was a week ago, doesn't it?", and it was true. I didn't want to leave :) but since we haven't received the machines yet there wasn't much I could do there (the laptop is good for some things, but can't handle all I need for full development). If I'm not very responsive to email (or IM, or whatever), this is why; just keep badgering me :). We've got so many things planned for the next few of weeks that it's a bit overwhelming, but that's how it is.
In case you missed the link above, one more time: here are the pictures.
:-) :-) :-)
digital music and subculture
An interesting paper by Sean Ebare on:
[...] a new approach for the study of online music sharing communities, drawing from popular music studies and cyberethnography. I describe how issues familiar to popular music scholars — identity and difference, subculture and genre hybridity, and the political economy of technology and music production and consumption — find homologues in the dynamics of online communication, centering around issues of anonymity and trust, identity experimentation, and online communication as a form of "productive consumption." Subculture is viewed as an entry point into the analysis of online media sharing, in light of the user–driven, interactive experience of online culture. An understanding of the "user–driven" dynamics of music audience subcultures is an invaluable tool in not only forecasting the future of online music consumption patterns, but in understanding other online social dynamics as well.While focused on music, there are interesting ideas for the area of sharing in general.
Some comments: anonymity, in my opinion, is a big factor in the (exploitable, see also here) power-law behavior of virtual communities (alluded to but not explicitly mentioned in the paper with the "citizen/leech" concept among other things), and it also affects group dynamics, even, possibly, affecting producer/consumer dynamics. The fact that these networks are anonymous is almost implicit in the paper, I find it interesting that it is often taken as an axiom. Additionally, the perceived "safety" (also mentioned in the paper) given by anonymity is mostly mirage: in most cases the only thing you are achieving is partial hiding (Networks like Freenet are a different matter in this sense), and the possibility that the content might be manipulated (remember this?) or used as a trojan for something else (read: ads, viruses...) is very real and yet barely considered. These networks create a parallel universe that requires people to engage in behavior like that described in the paper, since your "identity" has to be created from the ground up. Forget music: even types of content sharing that are not in dispute are generally of this type. So what are we missing?
I think that advances in this sense will be seen in the mixing of meatspace trust/knowledge relationships with the ability to share/utilize the network, and in fact feed back into it. Cyperspace and meatspace complementing each other, not moving in parallel universes.
configuring apache 2 + tomcat 5 + mysql + jdbc access on linux and windows
Heh. That title took almost as long to write as it took to complete the configuration. :)
I spent some time today preparing the basics for webapp development and runtime. The "basics" include:
Update (Jan 26, 2005): I've posted some new information related to the tomcat config for versions higher than 5.0.18.To make sure I understood the differences across environments, I configured the system in parallel on both a Linux Red Hat 9 machine and a Windows XP machine. Before I begin with describing the steps I took to configure it, I want to thank Erik for his help in finding/building the right packages and distributions, particularly on the Linux side of things. It would have taken a lot longer without it.
So here are the steps I took to get it up and running...
The installation of Apache is pretty straightforward, both for Linux and Windows. Red Hat 9 usually includes Apache "out of the box" so there's one less step to go through. When in doubt, the Apache docs usually fill in the picture (documentation for 2.0 has improved a lot with respect to 1.3.x).
Here's where things got interesting. The last time I used Tomcat was when 4.0 was about to be released, and I had switched over to the dev tree for 4 since 3.x had serious scalability problems. There are tons of new things in the package, but the basic configuration doesn't need most of them. Installing Tomcat itself is in fact also quite straightforward (Again, the docs are quite complete), but it's when you want it to access it through Apache that things get a little more complicated.
Apache + Tomcat: The Fun Begins
To access Tomcat through a Apache you need a connector. Tomcat has connectors for both Apache and IIS, but the problem is that apache.org doesn't include RPMs (and in some cases) binaries. The connector that I wanted to use was JK2, but binaries for RH9 where not available (I got the Windows binaries from there though). So. I first tried downloading the package supplied at JPackage.org (which is a really handy resource for Java stuff on Linux) but after a few tries (both getting the binaries and rebuilding from the source RPMS, including having to install most of the dev stuff which still wasn't in the server) it wasn't working. Most probable reason for this was that these packages are actually being done for Fedora, not RH9.... it's amazing that Fedora hasn't officially taken over and already we've got compatibility problems. Finally Erik pointed me to this page at RPM.pbone.net where I could actually get the binary RPMs directly and install it. So far so good. Now for the configuration.
Configuring the connector is not really that complicated, and it worked on the first try. The steps are as follows ("APACHE_DIR" is Apache's installation directory, and "TOMCAT_DIR" is Tomcat's install dir):
And then access it simply with
JDBC + MySQL
Before moving on to configuring database pools on Tomcat and so on, it's a good idea to test JDBC in an isolated environment. This is easy. First, get the MySQL Control Center application to create a test user and database/table to prepare the environment. This app is quite complete, and multiplatform to boot (Erik also mentioned Navicat as an app with similar functionality but better UI for Mac OS). For this test I created a a database called testdb and a single table in it, called user. I added three fields to the table: name (varchar), password (varchar) and id (int). I also created a test user (username=test, password=testpwd). Note that the user has to be allowed access from the host that you'll be running the test on, typically localhost, as well as permissions on the database that you'll be using (in this case, testdb).
Once the db is ready, you can get MySQL's JDBC driver, Connector/J, from this page. After adding it to the classpath, you should be able to both compile and run the following simple JDBC test app:
Which, when executed, should print Query result: usr1/pwd1/1.
Using JDBC/MySQL from Tomcat
Once JDBC and MySQL are up and running, we can move to the final step, namelyl, access the MySQL database through JDBC from Tomcat. For this I used the guide provided here within the Tomcat docs. For completeness (and to maintain the context of this example), the following are the steps required to set up the JNDI reference to the connection pool (managed from code built into Tomcat that uses the Apache Commons Pool library among other things):
Apache, Tomcat and MySQL are ready to go. Hope this helps, and as usual, comments/questions are welcome.
If you are in need of a remote desktop solution that is simple, small (500K to 1MB depending on server or server+client download), and "just works", check out RealVNC. It's fantastic. (Thanks Dylan for the pointer! :)).
a new kind of science - online
[via Danny] Stephen Wolfram's A New Kind of Science is online here. Good seeing it like that, but it's better to get the book (slightly expensive though, but it was worth it). I got it when it came out in 2002. Must-read. :)
angels in america
I first heard of Angels in America through this Salon review and it left me intrigued. However, knowing how slowly Irish/UK TV (we get both here in Ireland) moves to get shows, both good and bad, from across the pond, I didn't read the review and then simply forgot about it. Why bother?
But then Channel 4 showed both parts of this six-hour miniseries Saturday and yesterday. It was a surprising experience. These days it's very difficult to walk into a theater or watch a show or anything where you haven't seen previews, opinions, discussions on it, etc. But it happened to me in this case. I had no idea what I was about to see.
What I saw was a great piece of art, funny, sad, and deep, all at the same time. There are moments when the characters turn to poetry mid-conversation and it almost feels (as the Salon review says) as if they're reciting Shakespeare. Al Pacino and Meryl Streep are excellent (as usual) but the other actors are on par with them, except maybe for Mary-Louise Parker, who doesn't quite pull it off. The story ostensibly centers around five gay men and two women whose lives are intertwined one way or another at the start of the AIDS pandemic in Reagan-era US. The writing is intensely political, but it never gets preachy. Magical realism is the order of the day. But there's more than that, for example people just trying to regain their balance in a world that is undergoing a massive tectonic shift, something some of the characters can perceive but not quite put their finger on, dealing with the ghosts of your past and the shadows cast by the future... Anyway, I don't want to ruin it for someone who hasn't seen it :), but if you can, check it out.
...and the great IDE hunt
Aside from trying out the new J2SE 1.5 beta, I've been looking at IDEs, since we're now going to buy some extra licenses I wanted to make sure we made a good choice. IDEA, a longtime favorite of mine, is sadly out of the picture for reasons unrelated to development which I'll discuss later. (The increased bloatedness of the product --There's a bazillion features on the upcoming IDEA 4.0 that mean nothing to me whatsoever-- also weighs in). Don't bother posting comments saying that I'm an idiot for ditching IDEA. I think it's one of the best IDEs out there and it's probably a good choice for many people, but there are circumstances that go beyond the IDE that made it impossible to depend on it. As I said, I'll talk about that later.
So what have I been looking at--particularly with the change to JDK 1.5 now on the horizon? Well, the first IDE I checked out was CodeGuide from Omnicore. Using CodeGuide today took me back to how I felt when I tried IDEA for the first time nearly three years ago. It is simple, small, fast, and it looks good (Best looking Java IDE I've seen, in fact, better than Eclipse). Additionally, the latest CodeGuide is the first IDE with a final release (6.1) to fully support Tiger features, including an uber-cool refactoring called "Generify" which helps a lot in converting old projects to use generics. What's even better about CodeGuide is what's on the pipeline: CodeGuide 7.0 (codenamed "Amethyst") will include a new "back in time" debugger. Check out the webpage where they describe this new feature. Is that fantastic or what? It seems that Omnicore is really committed to keeping an edge on good functionality and maintaining a simple IDE while including more advanced features.
CodeGuide does have some bad points: it doesn't seem to support some of the standard keybindings on Windows (Ctrl+Insert, Shift+Insert, etc) which is not good for keyboard junkies like me, and its code generation/formatting facilities are pretty limited (among other things). Sadly, these seemingly trivial problems are pretty big when dealing day-to-day with large amounts of code, and they can easily end up being show-stoppers.
I also tried out the latest JBuilder (JBuilder X) and it's improved quite a lot over the last few revs, and is now easier to use as well. The UI designer is nice but as usual it has the terrible habit of sprinkling the code with references to Borland's own libraries (layout classes are a good example), which bloat your app without a clear advantage. Pricing is ridiculous for anything but the Foundation version though, and their focus on Enterprise features means that there are probably more control panels in it than on the main console of the Space Shuttle.
Finally, I tried NetBeans 3.6Beta, and I have to say I was impressed (my expectations were pretty low though, having used early version of it...). It's reasonably fast and looks pretty good, and the UI designer generates simple code which I think makes it very useful for prototyping (I don't really believe on UI designers for building the final app, but that's just me). Charles commented on the release here. It is a bit on the heavy side in terms of features and that's always a problem since I end up navigating menus with feature after feature that I don't really care about (Eclipse can also be daunting in this sense).
And what about Eclipse? Well, I'm waiting for the release of 3.0M7, due tomorrow. We'll see. :) I'll post an update with my impressions after I've tried it, with conclusions to follow.
eye of the tiger...
So the J2SE 1.5 Beta1 (codenamed Tiger) was released a few days ago. Here are some of the changes on the release, which should probably be called Java 3, since there are so many changes, both to the deployment/UI elements and the language itself (with the biggest IMO being the addition of Generics, of which I did a short eval about a year ago).
Predictably enough there has been quite a lot of coverage on weblogs of the release. Some of them: Brian Duff on the new L&Fs, Guy on Tiger Goodies, Brian McAllister on what he likes about it. Some of the conversation has centered around the new Network Transfer archive format, which brings JARs to 10% of their original size by doing compression tailored to java class format and usage. Eu does some analysis on it and Kumar talks about his experience when using it.
I installed it yesterday and played around a bit with Generics and tried the new L&Fs with the internal clevercactus b3r7. Alan has had problems with it, but I haven't seen anything as what he describes--maybe I'm immune to having multiple JDKs by now and I unconsciously route around the problems before they happen (which is a problem with designing UIs too, btw). Not that this has to stay in this way. :)
My experience has been surprisingly good. Everything works as it should, and aside from a few UI glitches or weird momentary lockups it all went well. For a beta, it looks incredibly promising, and I'm really, really itching to start using Generics all over the place (btw, looking at the Collections package docs now is a bit daunting, with all the generics stuff now included).
The new L&Fs are very, very nice. Particularly welcome is the change to the Java Look and Feel (Ocean, replacing what used to be Metal) which by 1.4x was looking not just old, but downright crappy. How good is it? Well, let me put it this way: if Ocean was available today I'd have no problems deploying it. Plus, the new Synth L&F, which supports skins, is what we've all been waiting for.
Overall: looks like Tiger is going to be the best update in years. Can't wait for the final release.
Next: looking for a new IDE.
why we do what we do
Over at Anne's weblog an interesting conversation developed, and I thought that my latest comment there merited an entry here (ie., post here a slightly edited version of what I wrote over there). We were talking about how we look at design, and in one of her comments Anne said:
And here's another context issue: Diego runs a business and wants to build "useful" things right now.
Anne is right about my context, but just as a clarification, the situation is actually the other way around. That is, I don't think like this because I'm building a company, rather, I'm building a company because I think like this.
A company shouldn't, can't, be an end in itself (As I've said before). A successful company (IMO) is not one that only makes money (although that's important of course) but also contributes to the life of its employees, its community and society, and does its part, to put it simply, in making things better.
What I probably didn't make quite clear (as usual :)) is that I do agree with taking a long-term view of things, doing basic research, and generally pushing boundaries. (Plus, I enjoy these things immensely). But as a history buff in general (and tech history buff in particular) I also think often of the hundreds of great ideas that have fallen by the wayside simply because they never left the lab, and not because they were "bad" ideas but because of external market factors (price, availability, compatibility, etc). I'm often frustrated with how little of all the great research actually reaches most end-users. My (possibly misplaced) personal brand of idealism (or is that pragmatism?) pushes me to try to reconcile far-out concepts with the reality of markets--which inevitably leads to compromises of one sort or another.
If I could only make small, incremental changes that make things a bit better and help simplify our lives in some way, I'd take them any day of the week. But these conversations are hugely important to me because they are a good reminder of how far we have yet to go before we definitely leave behind us this Era of Crappy Software (TM!) from which we can't seem to escape :).
Plus, it is my opinion that lots of the "new breed" of companies (particularly those that deal with blogging, search, collaboration and social software) have similar goals, implicitly if not explicitly. I don't want to name names :), but don't hide, you know who you are. Which makes me feel as less of a crackpot in saying all of this. :)
movabletype and db versions
Dylan has a great post on recovering from a database version change that left his MT weblog data inaccessible. Got me thinking about my recent brush with disaster, and the possibility of moving to mysql (I didn't know BerkeleyDB had problems with large DBs, and my weblog is well over 1,500 entries right now). Not with this server but, since I'm planning to switch servers soon maybe I'll do it then.
social software: representing relationships, part 3
it's interesting that dimensions are here thought of in terms of as static. That the space of visual representation is either 2d/3d. I was under the impression that interactive real-estate is multi-dimensional. I suppose if the design of such renderings is informed by scientific or mathematical diagrams then you are bound - to some degree - to the constraints of such formulations.I started to reply in a comment there and I just kept typing and typing, so I came to the conclusion it'd be better to post it here...
I noted in my post that relationship patterns are actually n-dimensional, that is, I agree with Alex's comment in that sense. My reasons for looking at 2D/3D formulations are, however, less abstract than Alex implies. Plus, I'll go a bit further (since I don't think that Alex was suggesting that we should all suddenly move to n-dimensional maps) in analyzing why there is a tendency to go after linear, planar (and eventually, at most, volumetric) representations for data.
The 2D/3D representation "lock-in" that we see in UIs today actually has a solid basis in reality. Beyond the physiological limitations that our neural structure and binocular vision create, the laws of physics (as we understand them today :)) dictate that we'll never go beyond 3D visualization. Additionally our current technology dictates that it's impractical to design everything around a 3D display. (Sorry if this seems a bit too obvious, I just want to clarify in which context I'm looking at things).
From that follows that, if we represent n-dimensional data structures, we'll have to create projections. Projections are easy stuff, mathematically speaking (i.e., they involve fairly simple vector math). Visualizing them is not too difficult either. For example, consider hypercubes, which are one of the easiest cases because they're fully symmetrical graphs. For example this is what projections of hypercubes of dimensions n > 3 into 2D look like [source]:
A 2D projection of a, say, 12D space might be pretty to look at, but I think most users would avoid that kind of complexity and its consequent cognitive overload.
My point (I do have a point apparently) is that if we agree that we are bound by 2D (eventually 3D) displays, and that n-dimensional projections onto 2D/3D spaces are confusing to navigate for the majority of users, then we should try to use, as much as possible, actual 2D or 3D representations, simply because they are in their "native" form and can be properly optimized for easy, useful tasks that users might have to perform. The data is "transparent"; there are no abstractions to understand to make use of it (which would be necessary for higher-order spaces).
Additionally, those diagrams (while plausible UIs) are in my view more useful as tools for visualizing what is important about the data we're trying to represent (and allow to be manipulated/analyzed). And while they might be "overused" in the research sense, they haven't been used in actual software products that much. Part of the reason is that they feel "alien" as a way of manipulating data. Products like The Brain have been around for a long time, and yet they haven't taken over the world. Clearly, it's not something that can be simply assigned to, say, a failure of marketing or whatever. People like linearity, they are more comfortable with it, and in a pure design sense the less data there is to deal with the more users can focus on their work instead of focusing on how to navigate around the damn thing. All the major interfaces today are linear: the most complicated data structure people usually deal with (in filesystems, email programs, etc) are linearized hierarchies where they can deal with one linear subspace at a time (the current folder). Yes, there are historical reasons for this, but I also think that there's a strong component of user preference in it.
So, if we could pull off a switch from linear to 2D interfaces, even if they are a bit ancient as far as research is concerned, it would be a good step forward, always with the goal of providing the most accurate representation of the complexity we know it's there within the constraints we've got. After all, this is about making it easy for the majority of users, not people that will read something like this and not run away from the room in a panic. :)
social software: automatic relationship clustering
Regarding my post on tuesday on social software: representing relationships, my mind kept coming back to one of the things I wrote:
People don't always agree on what the relationship means to each other. This to me points to the need to let each person define their own relationship/trust structures and then let the software mesh them seamlessly if possible.The reason I kept thinking about it is that I didn't really explain properly what I meant by that.
So what did I have in mind when I said that? Well, lots of things :), but let's start with the basics.
The first thing that the software should be able to do is infer what groups are there, rather than be told what the groups are. With this in hand, if you simply define relationships to your friends, and you take into account their friends and how they relate to you, you should be able to create a graph of probable relationship clusters, that is, groups formed around strong interpersonal relationships. Sounds farfetched? Read on...
Well, as it happens while I was at Drexel I did research on automatic graph clustering, applying genetic algorithms to techniques developed by my advisor at the time, Spiros Mancoridis, except back then it was applied to software systems. But what I realized a couple of days ago is that the same technique, maybe with a few mods, it would work just as well (if not better, because the graphs are smaller). The technique is described here, but to make a long story short, there's basically a set of equations that can be used to provide an objective measure of how well the clustering is done on a certain graph. The graph must define clusters to which nodes belong, along with the relationships between the nodes. The equation system favors loosely coupled clusters with dense inter-cluster relationships between the nodes (cluster coupling is determined by the edges that connect different nodes across clusters). The objective value is called the modularization quality of the graph, which is calculated by using an inter-cluster connectivity measure and an intra-cluster connectivity measure .
To make things more concrete, let's look at the simplest type of MQ, one for a directed graph with unweighted edges. Don't panic, it's not as complex as it looks at first glance! :) The intra-cluster measure is calculated as follows:
Where Ai is the intra-connectivity measure for cluster i, Ni is the number of nodes in the cluster, and mi is the number of edges in the cluster.
The inter-connectivity measure Eij is calculated as:
These two measures are combined to calculate the MQ of the whole graph:
With k the number of clusters in the graph. (Btw, this is all much better explained in the paper, but this is good enough to get an idea of what's going on).
Now, the problem with this calculation is that it depends on a particular graph clustering, which is precisely what we want to find out since we are assuming a set of relationships with no clustering. We have one advantage though, we know that MQ function has the property of being higher the "better" the clustering is (according to the measures just described).
So what we need to do is treat this as an optimization problem.
There are a number of sub-optimal techniques to traverse a space of values, including hill climbing, genetic algorithms, and so on. We just need to choose one, with the caveat that the larger the space the larger the probability that we are hitting a local maximum (rather than the overall maximum) of the space. This is not a problem for relatively small graphs (<50 nodes), with that size we can even do an exhaustive (ie., optimal) search on the space.
If all of this sounds a bit iffy, let me demonstrate with an example. Let's say that we've got the following relationship set from the, um, "real world" (heh):
With the relationship file ready, but no clusters defined in the file, we can now process the graph and see what the optimization process discovers as "clusters". Here is the result (graph visualized using AT&T's dotty tool--ignore the labels for each cluster, they're automatically generated IDs):
It is easy to underestimate the significance of obtaining this graph automatically, since the "clusters" that we see are for us obvious (if you've watched The Simpsons, that is :)), but keep in mind: the software has no knowledge of the actual groups, just of the relationships between nodes, ie., individuals. Additionally, there are more complex MQ calculations that involves weights on the node relationships; using different dimensions for different target groups allows creating different clustered views based on them. It wouldn't be hard to adapt this to parse, say, FOAF files and do some pretty interesting things.
This is clearly only a first step, but once reasonable clusters are obtained the software can begin doing more interesting things, such as suggesting which people you should meet (e.g., when someone belongs to the same cluster as you but you don't know them), defining levels of warning for requests (exchanges between individuals in the same cluster would have less friction), etc.
Cool eh? :)
PS: Speaking of clustering, check this out. The clustering in this case is fairly obvious, but for more complicated sets of relationships the technique I describe would also apply in this space. [via Political Wire]
Adding to the List of Cool Things I Didn't Know About: MythTV. Sam explains the different things he's tried to get the system running. I had seen/read of other DIY PVR systems or projects, but nothing as sophisticated as what MythTV appears to be. Something else to keep an eye on.
social software: representing relationships
In all the recent talk about social software (particularly a lot of discussion generated by the release of Orkut, see Ross' Why Orkut doesn't work, Wired's Social nets not making friends, Cory's Toward a non-evil social networking service, Anne's Social Beasts, Zephoria's Venting about Orkut (many good follow-up links at the end of her post as well), David on the identity ownership issues that arise), one of the oft-mentioned points is that these tools force people to define relationships in binary fashion ("Is X your friend? Yes or no.") or along limited one-dimensional axes. Also, a lot of the talk has been attacked as mere bashing of beta services by the "digerati" (particularly in what relates to Orkut), and while there is definitely be an element of hype-sickness that contributes to it (felt more by those who see new things every day), I also think that some of these concerns are valid and part of the process of figuring out how to build better software in this space.
Don had an interesting post on Sunday on which he discusses his idea of "Friendship circles" to define relationships. I think this is most definitely an improvement over current binary or one-dimensional approaches (and I think it's quite intuitive too). I do think though that relationships maps like these are often multi-dimensional. While Don's approach covers, I'd say, 80-90% of the cases, there will be overlaps where someone might belong to two or three categories, which makes it harder to place someone in a certain section of the circle (with two categories though you could place someone on the edge where they connect though). I see a chooser of this sort as something more along the lines of a Venn diagram, as follows:
What I'm describing is thus probably more accurate for some uses (and scalable to self-defined categories, rather than predetermined ones, which would show up as additional circles) but also has more cognitive overload.
This point of "scalability" however is important I think, because it addresses the issue of fixed representation more directly. How so? Well, current "social networking" tools basically force every person in the network to adapt to whatever categories are generally common. Furthermore, they force the parties in a relationship (implicitly) to agree on what their relationship is. I think it's not uncommon that you'd see a person as being, say, an acquaintance, and that person to view you as a friend (if not a close one). People don't always agree on what the relationship means to each other. This to me points to the need to let each person define their own relationship/trust structures and then let the software mesh them seamlessly if possible.
In the end I think that a more accurate representation would be three-dimensional (okay, maybe the most accurate would be n-dimensional, but we can't draw that very well, can we? We always need transformation to 2D planes, at least until 3D displays come along :)). Something that mixes Venn diagrams with trust circles like Don describes.
Needless to say, this is but a tiny clue of a small piece of the puzzle. Whatever solutions we come up with now will be incomplete and just marginally useful, as all our theories (and consequently what we can build with them, such as software) are but a faint, innapropriate (read: linear) reflection of the complexity (read: nonlinearity) that exists in the world.
Another thing that I find interesting of the discussion is that there seems to be an implicit assumption of whether you'd want to expose all of this information to other people. But that's how current tools generally work, it doesn't mean that you can't selectively expose elements of your relationships/trust circles to certain people and not others (and keep some entirely private). Problem is that this usually requires complex management, and a web interface is not well adjusted to that. You need rich (read: client-side) UIs, IMO (but that's just me). Client-side software also helps with privacy issues.
We have lots to figure out in this area yet, but we're getting there, inch-by-inch Or should it be byte-by-byte? :)).
web application stress testing
Was thinking about this topic today, and I remembered a few years back I used Microsoft's Web Application Stress Tool. It did the job (simple stuff, nothing terribly complicated), and it was free, if sometimes a little difficult to use properly. Apparently it's not maintained anymore, since the listed version is still compatible only with W2K.
Now, I was any apps out there that people really like for this job, on any platform? What do you use/recommend for web apps stress testing?
Just found faifzilla.org, home to Free as in Freedom: Richard Stallman's Crusade for Free Software, a biography of sorts of Richard Stallman. Read bits and pieces of it, very, very interesting. And: an essay on the online book by Eric Raymond (linked from the main site).
The pluses of random web navigation... :)
mt-rebuild: rebuilding movabletype from the command line
My attempt yesterday at doing a full rebuild ended in pathetic failure as the normal load on the machine plus the Rebuild process meant that the page never got to the second stage. This was clearly a problem with timeouts on the web browser (through which MT is 100% controlled) because of the speed at which the process happened, rather than the process itself. So I spent some time today looking for a way to manage MovableType from the command line. I had done this before a few weeks ago but didn't get anywhere, this time I had more luck and I quickly found Timothy's excellent mt-rebuild: The rebuild script to end all rebuild scripts, which solved my problem (it did take a few hours to do a full rebuild though, which has nothing to do with the script and everything to do with the machine's load and speed) with a simple command of the form "mt-rebuild.pl -mode="all" -blog_id=xx". Only comment I'd have is that it doesn't seem to have a switch to provide feedback, so you don't know what's going on, but so what, it's not as if it's a consumer application or anything.
Yes, this is old hat (release date was almost a year ago) but I missed it when it came out and we know how it is with the web and its tendency to bury yesterday's news under a new avalanche of discussion, comments, posts, news, and other interesting stuff :).
This is exactly what I needed, thanks Timothy for making it available!! His other MT plugins are pretty cool too, including mt-publish-on, which I'll check it out when I have the time, since I've talked about something like it before.
the 2.6 linux kernel
[via Jon] a great article at InfoWorld comparing versions 2.4 and 2.6 of the Linux kernel. Upsides of the new kernel: speed and scalability. Downsides: Not much support for drivers, etc. The benchmark results are really impressive. I guess that it was worth the wait then. :)
my wired | tired | expiredSince I thought the latest wired | tired | expired (which I linked to in the previous entry) was pretty lame, I decided to write my own. :) Here it goes.
Yes, the !wired reads NOTwired.
I went to sleep late today as usual but I had this idea that I'd actually have a decent sleep this time, "like eight hours or something like that" (I thought). Well, my biorhythm thought otherwise, and I woke up two hours later and immediately was wide awake. So I got up and started working. What else could I do? (Don't say "go back to sleep" :)). The plus is, I've been listening to Rachmaninoff (and some Beethoven) which I can't do while asleep, not enjoying it consciously at least. :)
And, yes, the title is in reference to this. (But you knew that, didn't you).
blogtip: pinging technorati and yahoo
A small tip, probably not new to most, but anyway: It is common to have weblog tools "ping" a change-server such as weblogs.com. This is used by blog-oriented search engines to both find your blog and provide faster updates. MovableType includes, built-in "ping" support for weblogs.com and blo.gs. However, you can also add your own. Jeremy recently posted how to do it for Yahoo! (very useful now that My Yahoo! supports RSS) and you can do it for Technorati as well, using the information in this page.
When I have time over the next few weeks I'll post a follow up to my introduction to weblogs and introduction to syndication, which have turned out to be quite popular. Sounds like a good idea to write down incrementally which of the more "advanced" topics would be in it. :)
what went wrong?
One year after Colin Powell's presentation to the UN prior to the war, the New York Times revisits the claims Powell made and how they hold up to what has actually been found so far, going all the way from the "imminent danger of WMD" to the newly en-vogue doublespeak phrase "weapons of mass destruction program-related activities." On this side of the Atlantic, the Guardian has some news in relation to all this just as it seems that an investigation will be launched in the US to look at what went wrong with the asessment of the US intelligence community (predictably enough, the results would be known after the US elections, and the panel would study problems in other areas too, such as Iran or North Korea, presumably to avoid admitting that Iraq was the biggest failure of all, with the most serious consequences). About Iraq, I remember that others, including French, Russian, and Germans, agreed with many US and UK intelligence estimates in this regard (although they didn't read it in such alarming terms). I think that this will be a wake-up call to all intelligence agencies and governments. Obviously Cold-War-style intelligence gathering doesn't quite work anymore... but what will take its place?
microsoft and google
Still catching up on some of yesterday's articles that I left open for reading later (does it show?). From the New York Times comes the shocking (shocking I tellsya!) revelation that Microsoft is taking on Google. Seriously though, quote:
"We took an approach that I now realize was wrong,'' [Bill Gates] said of his company's earlier decision to ignore the search market. But, he added pointedly, "we will catch them.'""We will catch them." Simple and to the point, don't you think? The comparisons with Netscape are the order of the day of course. Yahoo! gets a short mention (less than what it deserves IMO, after all, they are probably the one company aside from MS that has the technology reach and depth in the area to be a big factor, as they are in fact today--AOL doesn't quite have the tech know-how to make the list, even if they have the millions of users. Still, there's a few other interesting tidbits of information in the article that make it worth reading.
An article in the Washington Post on nanotechnology. A good read, even though if (as usual) compressing topics like these to a few pages invariably creates some oversimplifications. Reminded me of this too.
and before I forget...
... another thing. On Friday I mentioned, among other clevercactus news, that we are hiring. I've gotten some comments on that, specifically that a) there is no info on what positions we're hiring for, and b) the website is not very up-to-date. Both true. I said in that entry that I would add more in the coming days but I wanted to clear this up now: we'll be making changes over the next week to the information on the site, etc. That is, we haven't posted any info yet on that. In the meantime, if you'd like to know more just send us an email.
I spent a couple of hours today updating the templates and doing some design changes, colors location of elements on the page, link updates, things of that nature (a refresh or hard-refresh on the page might be necessary to get the new stylesheet). I've removed the calendar, since it seemed that it was taking up space more than doing anything useful (I can't remember the last time I clicked on it, I generally just search for whatever I want to find), this seemed reasonable to do but I'd be interested in hearing other opinions on the matter :). I linked the headings for each days to the content posted for that day both on the main page and the category index. Finally, I changed the category name "spaces" to "clevercactus", to make it more relevant. At the moment I've just launched a full rebuild which will take a while to complete--in the meantime pages will be in flux.
(that now appears at the top-left of the blog) is of a Julia set. Julia sets are quadratic maps ("quadratic" because they are based on quadratic functions) of the form z^2+C, where C is a complex number and the function is applied recursively. Since you can use any complex number to start the iteration, there are an infinite number of Julia sets. Now, wouldn't it be nice to have time to spend a few days writing code to generate a new fractal automatically every week or something. :)
Okay, back to the real world.
Copyright © Diego Doval 2002-2011.