Now blogging at diego's weblog. See you over there!

conditional HTTP GETs and compressed streams in Java

Just posted to my O'Reilly Weblog: Optimizing HTTP downloads in Java through conditional GET and compressed streams.

PS: kind of a longish title isn't it? :)

Posted by diego on July 21, 2004 at 11:37 AM

homer on small-scale media

"See, Lisa? Instead of one big shot controlling all the media, now there's a thousand freaks xeroxing their worthless opinions."

From ep. 22 season 15, "Fraudcast News"

Posted by diego on July 20, 2004 at 10:23 PM

clevercactus pro beta is back!

pro-cactus.png<victory-dance>With this post on the cactuslog, clevercactus pro is officially re-released. And within the timeframe I mentioned last week, no less!</victory-dance>

There's still much to do of course, there are some known problems with view updates under some circumstances, and new challenges that emerge from updating Atom support up to 0.3 (a few Atom 0.3 feeds confuse the parser by including XHTML un-escaped within the content element--they are labeled as xhtml+xml yes, so it's parser refinement that has to be done).

Now my focus goes back for the most part to share. There's some bugs to fix, improvements to make, and some important features to add (among those expose support for multiple locations for a single identity).

But for now--a bit of rest. Tomorrow I have to catch up with email and comments that were posted yesterday, particularly to the JList and StAX posts from a few days ago.


Categories: clevercactus
Posted by diego on July 16, 2004 at 11:11 PM

feedster v2

Scott announced yesterday Feedster version 2. Excellent. It's now much faster and with several new features. Congrats to Scott & the Feedster team!

PS: by the way, I have a question: what does the version="2.0/XSS-extensible" mean in the feedster blog RSS 2.0 feed? I was recently doing a small study of feed types and out of 20,000 this is the only feed that identifies itself as that. Bug? Feature? Just wondering :).

Categories: technology
Posted by diego on July 15, 2004 at 7:38 PM

when task manager makes you smile...

... you probably need to get a life. But that's a topic for another day.

I was just final dogfood testing the upcoming new release of clevercactus pro on my own data and it made me happy enough that I decided to post about it.

Btw, by "dogfood testing" I mean passing the "eat your own dogfood" test, which stands for completely switching over to (and relying on) the shiny new widget that you're about to unleash on the world.

One of the major changes on the new release of pro is on performance+memory usage (which are closely related when dealing with a memory-managed environment). Previous releases of pro, while efficient, started to be a burden when getting to a certain DB size. In my case, with about 15,000 items on my mail store, pro regularly used up all of the available memory in the VM. It did work when constraining the available memory, but the point is not to work past the limit but never to reach the limit at all. So the last few days I've concentrated on further improving that aspect of it.

While I have been using the app myself for a while now --months actually--, I've relied on the previous version as well, just in case. Finally, today, I flipped the switch for good and ceremoniously deleted the previous version from my system. All that remained was double-checking the effect of the upgrades, including the most recent changes.

Lo and behold, when I loaded the database and checked memory usage the app was holding at around 15 MB for regular usage and peaking at around 50 MB when poking it hard enough. Over time, usage dropped down back to 15 MB. And minimizing it, that is, eliminating the memory needs of display, etc, usage dropped down to literally nothing: 2 MB.

Much work remains, and while you might argue (rightly so) that this only brings the app to par with others, it's still a small rush when it's your code that is doing it.

Yeah, I'm happy about the new RSS/Atom parser, or the other many improvements in it, but this was a long-standing goal that I'm glad to finally cross off the list.

Anyway, I'm itching to get this rev out and concentrate back on share. Now for a bit of rest--more tomorrow!

Categories: clevercactus
Posted by diego on July 13, 2004 at 11:10 PM

and now it's StAX's turn

It seems I need some bug spray or something...

I might be wrong (corrections and comments most welcome!) but I think I've found a bug in StAX 1.0.

The bug is as follows: when parsing an element of the form:

Which should return
when calling getElementText() (or when parsing based on CHARACTER event types) StAX actually returns:
To prove it, I wrote this small test program that uses both StAX and kXML 2 (which implements the Common XML Pull Parsing API) to parse the same XML document (included in the program as a String, and read through a StringReader).

This bug is a deal-breaker for my use of StAX, and what's much worse is that I have no way of looking at the code to fix it (and yes, I've tried parsing after that element to see if there's more text, but it seems that StAX is just making the "&gt;" at the end disappear). Yes, StAX was supposed to be hosted at codehaus now, but when I go to the site there's nothing there in the way of sources, the JSRs don't include sources (they reference private BEA packages) and there is no indication of when or where this might change.

So I guess I'll have to switch everything to one of the other parsers, just as kXML. Oh, well.

Posted by diego on July 12, 2004 at 6:48 PM

a JList "feature"

Yesterday I was doing performance testing on pro when I hit upon something strange: deleting an email from the mail view was taking too long.

What was even more strange was that this only happened when using the Delete key, and not when using the toolbar icon (strange since essentially the process is the same in both cases). This pointed so something with the keys--but what?

After a bit of debugging I discovered that the slowdown was happening on the repaint thread, which made it more complicated to debug since it happened outside of my control, so to speak. More testing.

Backtracking for a second: JLists can be used in "dynamic" mode: when you set the appropriate parameters the list will only load the items that are visible in the viewport, which is critical when there's access to disk-based data involved. So generally any kind of list activity implies only a few hits on the DB, say 10-20, which happens in a few milliseconds.

So I eventually placed a print statement on the getElementAt call of the ListModel and discovered the problem: when using the Delete key to delete an item, the entire contents of the ListModel where scanned, twice. On a typical email view this meant loading from the DB thousands of objects. Caching was kicking in, of course, but memory usage went through the roof. That, and the time it was taking was not acceptable. Not good.

But why was this happening? I checked and double checked the code and I couldn't find what could possibly be triggering a full reload of the list (twice!). Finally, I started checking the call traces on the getElementAt and I discovered that there's an inner class called KeyHandler inside BasicListUI (that handles the underlying UI for JList) that was doing something strange in its keyPressed method.

So what was it doing? The javadocs for that method say the following:

Moves the keyboard focus to the first element whose first letter matches the alphanumeric key pressed by the user. Subsequent same key presses move the keyboard focus to the next object that starts with the same letter.
When I read this, all I could think of was: Oh. My. God.

The problem, of course is that to determine whether "the next object starts with the same letter" or not it needs to obtain the String for that cell. This means getting the String if the contents are a String, or doing a toString() on the object.

Skipping for a moment on why on earth you'd want complex behavior like this pre-built... what if you're using the JList for an arbitrary component that doesn't translate into a String? What if the toString() is meaningless? Then the listener iterates through the entire list and, of course, fails to lock on to what it was looking for. And when the list is in the process of changing, it does it twice.

Oh yeah, it's a great "feature."

Even worse, the keyPressed call doesn't check for actual alphanumeric keys being pressed. Any key that is not a function key, cursors, or not registered through KeyStrokes (such as Page Up) goes through this loop.

Even if you're not hitting this problem head on as I was, it's strange to think that people that need this behavior would rely on a completely generic implementation that doesn't take into account the underlying storage mechanism.

So how to disable it? Since this "really helpful" listener is registered automatically on any list created, the only way I've found of disabling it is removing it "by hand" right after the list is created, as follows:

JList list = new JList();
KeyListener[] keyListeners = list.getKeyListeners();

for (int i=0; i list.removeKeyListener(keyListeners[i]);
By the way, this is the only listener that is pre-registered (I checked) so doing the loop actually just removes the listener in question. It's ugly, but it works.

This leaves me wondering what other "features" are lurking in there. It's pretty bad that something like this is set as default behavior with no warning, on the other hand it's good that once I identified the problem I could fix relatively easily even if the solution is less than ideal. In any case, it was an interesting (if at times maddening) trip into the deep core of Swing.

So if you're wondering why key presses are slowing down your JList implementation, it's quite possible this is the reason why.

Posted by diego on July 11, 2004 at 6:01 PM

comment spam, cont'd

On my previous post on comment spam Phil made a good point, that since I generally close comment threads once spam appears (because I can't afford to be removing comments and rebuilding too often) it makes for a Godwin's Law sort-of-thing, where someone could essentially "engineer" the closing of a thread by posting spam.

I hadn't thought of that! I can say though, that generally when I get comment spam posted I also ban the originating IP, so whoever did it will have problems posting in the future (and the IP should also let me see if a spam looks iffy by being the same as someone who just posted a comment). For people with dynamic IPs (most of us?) it won't necessarily work, and someone could re-login (to their DSL, etc) purposefully to try to change IPs... I guess it's not a perfect solution, but it's good enough under the circumstances...

Categories: technology
Posted by diego on July 11, 2004 at 4:26 PM

why we dropped java web start

On a question in the clevercactus forums, John was asking why we stopped using Java Web Start to distribute share (and in fact other applications). We started using JWS late last year to simplify deployment and updates. Making sure that the updates in particular didn't require a reinstall was critical for us, since we usually spin out new version fairly quickly during betas, so JWS was good in that sense. As soon as we released the "internal" beta of share, it became obvious that JWS was a huge problem for most users. We want share to be easy to use, and that includes easy to install (and, yes, uninstall if necessary) and JWS was getting in the way in more ways than one. Here's why.

Problem one was installation: we had to detect whether people had JWS installed or not, and the procedure for detecting this is quite simply a joke. You need to code for multiple browsers and in some cases it isn't guaranteed to work.

Now, if you didn't have JWS (and skipping over the problem of detection) you had to install the virtual machine yourself, usually by being redirected to the site. The site, while no doubt a good showcase for the Java brand, is terribly confusing to a user that only wants to install share, and not Java, and most have no idea of what Java is, and, more importantly, they shouldn't care.

Now let's skip over the problem of being utterly confusing (you'll note we're going to be doing a lot of "skipping over" in this discussion) and say we hosted the JVM ourselves for download to simplify the process, users still had to install a separate piece of software, and one that installs all sorts of icons in your desktop and programs folder, which again terribly confuse non-technical users.

So let's skip over all of that (see what I meant before?), and say that the detection does work, and JWS is installed, the user is often presented with a dialog asking to run a file of type "JNLP" which makes little sense. Now say that you ignore that and click "OK". The app runs. You are presented with a horrible warning dialog with an unfamiliar look and feel (the Metal L&F, why Sun doesn't use the native L&F for this is beyond me) that for someone who's never seen one sounds like you are about to give permission to something to do all sorts of evil things on your PC, create havoc and possibly come in to your house at night while you're sleeping and steal all your furniture.

The point is not that the security warning is wrong, since it is accurate, but users are put off by it. They already know that they are downloading an application, and they do so everyday for other things, and native applications have as much ability (in theory) to do Bad Things as a JWS app with full privileges. The warning is confusing because it seems to be something extra that they're giving permission for. (Incidentally, I don't fault JWS on this point, users should be warned, the problem is that the warning is completely different from other warnings they might see when downloading applications as they usually do).

Much worse than this was when users ran into the invalid Java certificate problem, which didn't allow them to install the application at all even if they wanted to.

But let's say that they accept the certificate (yes, the skipping over again). The app runs. Bing! Window locked. Why? Because JWS is asking you to create shortcuts to the app. It was very easy to get confused because the dialog for this was modal, and sometimes it would end up behind the splash screen or the app, which left most users confused. (Yes, this bug has probably been fixed in the latest JREs, but how many broken JREs are out there?).

All of this, and I haven't even mentioned the problems that occur when you, say, switch download locations for the JARs, for example, if you need to deploy on a server with more bandwidth. Or the problems that come from detecting that JWS exists but that it's an incompatible version (and you can't tell). Or the problems that exist when JWS has cached the old JNLP file and doesn't want to let go. Or the fact that you've only got limited resources and it's impossible to test against all the JREs that are out there. Or... well, you get the idea.

As a development environment, JWS has significant shortcomings as well, since it completely insulates you from the native platform, which is good, but it gives you almost no way to access the native environment, which is a disaster. Simple platform-integration things, like launch-on-startup, or right-click integration, become nearly impossible. Btw, this isn't getting better in the next release, as Erik noted recently. Aside from cosmetic changes and a few new features (like the ability to make changes to the launcher UI) Java 5.0 is pretty much the same in terms of JWS and other platform integration features.

I think that JWS has its place in tightly controlled environments, for example, to quickly deploy point applications within a corporation, where the target machines are well known (in terms of software installed), the infrastructure is small, etc. In those situations, JWS is an excellent way of quickly an easily deploying your app to users and not having to deal with creating installers and such.

But for end-user applications that have to be easy to use and install and widely deployed, JWS, in my experience, doesn't quite cut it---and even though we put a lot of effort into making it work, that's why we had to stop using it.

Posted by diego on July 9, 2004 at 2:26 PM


from the me-too-dept. :)

Don talks about his favorite wargames. While I am always interested in the technology involved in games (which almost invariably involves the most cutting-edge software development of the day) I am not a gaming fan. Or, rather, make that a PC gaming fan. For years I played RPGs (Rolemaster was my favorite) for many years along with strategy and tactical games (such as Warhammer 40,000).

cnc1.jpgOn the PC I played extensively a couple of games (including, of course, Doom) but the one PC games that I keep coming back are those of the Command and Conquer series. The last one I've played is Command and Conquer: Generals (see screenshot, click on it to see a larger image). The online play is good, but unless you have friends online who are willing to actually play for fun instead of playing to advance in the rankings, online battles end up being short-lived affairs where the focus is on "rushing" your opponent before they rush you, which is entertaining for about 30 seconds.

There's a big difference, of course, on "fantasy" wargames such as C&C and the ones Don mentions, mainly in terms of logistics and resource management, which is the biggest tradeoff games like C&C have to make, i.e., sacrifice realism for "playability". In the real-world, supplies and logistics are as crucial as anything you can do in a battlefield, and historically it's been the case that it is the stretching of the supply lines that has played a major role in defeats or changes in strategy (In fact, if I remember correctly, in the Iraq war last year the forward units of the US Army advanced so far so fast that they outpaced their supply lines, leaving them to cross long distances without enough protection, which created the well-known security problems experienced by the supply convoys).

Anyway, if you like wargames and have never tried C&C, give it a shot (heh). (There even is an OS X version available, but I've tried the demo and it was quite slow -even on my G5 with 1 GB of RAM- so I wouldn't recommend it.) I haven't played C&C: Generals for months now, but if you've played and would like to meet up online, let me know :)--no rushing though! :-)

Posted by diego on July 9, 2004 at 1:26 PM

comment spam

BTW, comment spam has been quite a problem recently, which accounts for the number of posts that have comments closed. Basically whenever a spam comment is posted I close the comments for the entry, which accounts for the randomness of the comments on/comments off posts. Normally I'd just delete them and be done with it, but Movable Type on my poor Celeron 700 server is dog slow and rebuilding an entry to remove the comment takes about a minute (I'm not exaggerating). Probably part of the problem is the fact that I am inching on to 2,000 posts and that makes MT slower (maybe) but in any case I can't be removing spam comments every five seconds, and that forces me to close the thread. Sorry about that.

Categories: technology
Posted by diego on July 9, 2004 at 1:02 PM

burnout? no, just busy

So yesterday as I'm pondering why I haven't posted anything in a few days I read this wired article on blogger's burnout. Although I've experienced lack of blogflow before, this time it was something different: just being too absorbed into what I was doing to do anything else.

So what was I doing? Simple: working on a new release of clevercactus pro. Not that share is taking a back-seat or anything, mind you, this is something that we had planned for a while and finally there was time to do it. (The release will be out sometime next week).

Anyway--I have had a couple of posts swirling in my head for a couple of days now, so I'll get to that now. :)

Categories: clevercactus, personal, technology
Posted by diego on July 9, 2004 at 12:54 PM

music for the ages

CNN: only 636 years left on longest concert:

In an abandoned church in the German town of Halberstadt, the world's longest concert was coming two notes closer to its end Monday: Three years down, 636 to go.


The concert began Sept. 5, 2001 -- the day Cage would have turned 89. The composition, originally written to last 20 minutes, starts with a silence, and the only sound for a first 1 1/2 years was air. The first notes were played in February 2003.

After debates in Germany about what exactly "as slow as possible" could mean -- anywhere from a day to stretching on infinitely -- the group of German music experts and organ builder behind the project chose the concert's 639-year running time to commemorate to the creation of the city's historic Blockwerk organ in 1361.

Another plus: the composer won't have to deal with art critics once it ends. :)

Posted by diego on July 5, 2004 at 7:50 PM

apology accepted

Jon Udell quoted me on a piece for infoworld but somehow my name ended up being "Diego Rivera". I sent him and email and in a few hours he replied apologizing (the change will be online at some point in the near future, but the ship has probably sailed on the print copies), and he posted a correction on his weblog.

At the risk of sounding hokey, I find it a small honor to be quoted by Jon, mistaken attribution and everything (and hey, there are worse things than being confused with Diego Rivera!)--as I've said before, I learned many things from his columns, going back to the days of Byte.

It's often been said how weblogs "push back" against media (big or small), but not a lot has been said about how media itself can use them to improve itself (or maybe I just haven't read a lot about that). Had this happened on a print-only medium, a correction would probably have taken a week or more, plus no one would see it because there's no context. Of course, Jon is ahead of the curve on this, but we can hope that feedback loops of this nature eventually become the norm rather than the exception.

So, thanks, Jon, and apology accepted. :)

Categories: technology
Posted by diego on July 3, 2004 at 8:58 PM

Copyright © Diego Doval 2002-2011.