I spent a bit of time last night going through the our DNA information as mapped by the Human Genome Project. Aside from learning some things, there were two things that caught my eye. First is that the genome has "builds", and as of today they stand as "build 34 version 3" which sounded like a meld of technology and our humanity in interesting if subtle ways (did they every have a beta? Will we have genomic procedures only compatible with certain builds? Okay, that's in jest, but you know what I mean).

The other was that they've got XML dialects for the genetic information: witness this sequence which is part of our first chromosome. For some reason I can't quite explain, I also find this fascinating. They have different XML dialects which show more information too, with names like TinySeq XML, GBSeq XML, and just "XML" (is this one the standard?), all with DTDs. I imagine they had similar fights as in other fields (such as syndication) over which tags to use, formats, and such--I wonder if there's a mailing list where we can find geneticists and molecular biologists arguing over which tag is best...

June 25 2004

