DNA in XML


I spent a bit of time last night going through the our DNA information as mapped by the Human Genome Project. Aside from learning some things, there were two things that caught my eye. First is that the genome has "builds", and as of today they stand as "build 34 version 3" which sounded like a meld of technology and our humanity in interesting if subtle ways (did they every have a beta? Will we have genomic procedures only compatible with certain builds? Okay, that's in jest, but you know what I mean).

The other was that they've got XML dialects for the genetic information: witness this sequence which is part of our first chromosome. For some reason I can't quite explain, I also find this fascinating. They have different XML dialects which show more information too, with names like TinySeq XML, GBSeq XML, and just "XML" (is this one the standard?), all with DTDs. I imagine they had similar fights as in other fields (such as syndication) over which tags to use, formats, and such--I wonder if there's a mailing list where we can find geneticists and molecular biologists arguing over which tag is best...

Categories: science
Posted by diego on June 25 2004 at 12:09 PM
Comments (please see the comments & trackback policy).

'I wonder if there's a mailing list where we can find geneticists and molecular biologists arguing over which tag is best.'

nearly ;) the field is bioinformatics, and it's a bizarre hybrid of computer science and biologists. very interesting stuff.

There's been some amazing work coming out of that corner recently -- for example, the Human Genome Project relied heavily on open source, and perl in particular, I hear. it's fascinating IMO.

BioPerl for example, is a good starting point: http://bioperl.org/

Posted by: Justin Mason at June 25, 2004 7:40 PM

Thanks for the pointer Justin! For some reason perl sounds like an odd choice for this... or maybe its string-processing abilities are what makes it useful in this case. Fascinating indeed!

Posted by: Diego at June 26, 2004 11:25 AM

Copyright © Diego Doval 2002-2007.
Powered by
Movable Type 3.35