It occurred to me today that the process of reaching taxonomic consensus or developing a master database of vetted names like that undertaken by The Catalogue of Life Partnership (CoLP) is not unlike software development that necessarily requires some sort of framework to manage versioning. However, taxonomic activities and building checklists do not currently have a development framework. We likely have a set of rules and guidelines, but infighting and bickering no doubt fragment interest groups, which ultimately leads to the stagnation, abandonment, and eventual distrust of big projects like CoLP. We have organizations like the International Commission on Zoological Nomenclature to manage the act of naming animals but there is nothing concrete out the other end to actually organize the names. Publications are merely the plums in a massive bowl of pudding. And, it is equally frustrating to actually find these publications. One way to approach a solution to this is to equate systematics with perpetual software development where subgroups manage branches of the code and occasionally perform commits to (temporarily) lock the code. Like with software development, groups of files (i.e. branches on the tree of life) and the files themselves (i.e. publications, images, genomic data, etc.) ought to be tracked with unique identifiers and time-stamps. This would be a massively complex shift in how taxonomic business is conducted, but what other solution is there?
Without really understanding distributed environments in software development...it's too geeky for me...I spent a few moments watching a Google TechTalk presentation by Randal Schwartz delivered at Google October 12, 2007 about Git, a project spearheaded by Linus Torvalds: http://video.google.com/videoplay?docid=-1019966410726538802 (sorry, embedding has apparently been disabled by request).
There are some really interesting parallels between distributed software development environments like Git and what we ought to be working toward in systematics, especially as we move toward using Life Sciences Identifiers (LSIDs). Here are a few summarized points from Randal's presentation:
- Git manages changes to a tree of files over time
- Optimized for large file sets and merges
- Encourages speculation with the construction of trial branches
- Anyone can clone the tree, make & test local changes
- Uses "Universal Public Identifiers"
- Has multi-protocal transport like HTTP and SSH
- One can navigate back in time to view older trees of file sets via universal public identifiers
With a cross-platform solution and a facile user interface, perhaps thinking in these terms will help engage taxonomists and will ultimately lead to a ZooBank global registry of new taxon names.