Wednesday, July 4, 2007

Digital Species Descriptions & the new GBIF portal

The Biodiversity Information Standards (previously known as TDWG, the Taxonomic Database Working Group) has recently rolled out a new subsgroup called "Species Profile Model" led by Éamonn Ó Tuama (GBIF). Thirteen people attended a workshop April 16-18, 2007 in Copenhagen, DK shortly after the Encyclopedia of Life informatics workshop in Woods Hole, MA. The point of this Copenhagen workshop was to hash out a specification to support the retrieval and integration of data with the lofty goal of "reaching consensus and avoiding fragmentation" of existing species-level initiatives. I'm all for this, but I wonder if it will work? I believe "consensus" as it's described here is meant to be a common way of presenting the data rather than a true taxonomic, ecological, or political consensus. A specification does not preclude the possibility for several variants of a species profile served from multiple (or even the same) provider. These could of course have conflicting or dated information and ultimately result in misleading COSEWIC-type recommendations. So, what about consensus as we usually define it? Or, is that beyond the responsibility of this subgroup?

A standard for specimen data (Darwin Core, ABCD, etc.) is obvious, but I'm not convinced that a standard for species descriptions is wise unless such a standard were developed and solely hosted by the nomenclators and sanctioned by the various Codes. A standard for species descriptions without ties to the nomenclators and the authors who conducted the original species description or revision merely democratizes fluff. Before a standard Species Profile Model is put into practise, such RDF representations have to at least explicitly incorporate peer review, authorship, and a date stamp.

I also noticed that the Species Profile Model is attempting to integrare citations to scientific literature. I suggest the team take a good close look at OpenURL, which lends itself to useful functionality when building lists of references in front-end applications (see Rod Page's post in iPhylo on this very subject and several posts in this blog). The OpenURL format will influence how the elements in the proposed Species Profile Model ought to be constructed.

On other fronts, GBIF just rolled out their new portal: http://data.gbif.org. It looks as if the whole index and back-end was reconstructed and there remain some missing provider data tables. In time, these will probably blink on as they were presented via the old portal. What I appreciate seeing for the first time is a concerted effort to give providers some auto-magic feedback about what is being served from their boxes. Vetting data is a very important part of federation and I hope providers sit up and take notice. GBIF calls these "event logs", which is too obtuse. I'd like to see this called "Questionable Data Served from this Provider", "Problem Records", or "The Crap You're Serving the Scientific Community", or something similar. "Event logs" is easily dismissed and overlooked. For example, here are the event logs for the University of Alaska Museum of the North Mollusc Collection: http://data.gbif.org/datasets/resource/967/logs/. GBIF also has a flashy new logo & plenty of easy to use web services.