Monday, April 30, 2007

Biodiversity Informatics relevance

For a long time now, I have been thinking about the relevance of biodiversity informatics in entomology/arachnology circles. Most entomologists grasp the idea of federated data from museums & private collectors, but I don't think many realize the potential for these data in their own research programs or would even think of looking for data outside their immediate reach. Like the majority of ecologists, entomologist/arachnologists do not have any desire to share data. In fact, most will refuse to do so for fear of being "scooped". This may simply be the "old-guard" stigma, but I fear not.

I just read a review in Annual Review of Entomology entitled, "Biodiversity Informatics" by Norm Johnson (doi:10.1146/annurev.ento.52.110405.091259). In all honesty, the review seemed dumbed down and I suspect this wasn't Norm's doing, but was done at the behest of the editor or reviewers. In particular, I would have liked to have seen more on GUIDs and how these relate to aggregation of data, literature, etc. This is sadly lacking and we need real-world reasons or examples for making use of GUIDs and not merely name strings.

Friday, April 27, 2007

Citizen Science...spider style

I have been developing The Nearctic Spider Database for a number of years now. All the nomenclature, database, and web page development are under my purvue, but individuals who have demonstrated some form of expertise on spider systematics, biogeography, etc. have the option to author and/or review species pages. They can upload imagery, select references, add descriptions, plus a number of other functions all via the website. These productions are then open for review in the very traditional sense. Once three reviews have been received and the author has made any suggested changes as expressed by the anonymous reviewers, I receive notice, flick the switch, and the species page is tagged "Peer reviewed" then locked for further editing. However, all point collection maps, other taxonomic references, lists of synonyms & chresonyms, and a phenological chart are dynamically created and may change with additional data from the specimen side of the database.

Since this sort of "expert" authoring/reviewing cut off all option for the casual browser of these pages to contribute, I created a "drop a comment" feature whereby anyone and everyone may write a casual comment on a species, a sighting report, etc. in a manner much like leaving a comment on someone else's blog post. Response to this new feature has been fairly good so, at the request of a few contributors to the database, I created "Spider WebWatch" - a citizen science initiative for anyone and everyone to submit observations on spiders they see in their backyard and elsewhere.

Granted there is no way to track misidentifications, issues arising from nomenclatural change in the event of a revision and other issues that plague or otherwise bring into question the longevity and utility of the data in scientific research, the point of Spider WebWatch is for anyone to contribute. In this way, the hope is more pervasive interest in spider biodiversity research...sort of like a gateway or an introduction to araneology. To limit some of the issues with observational data, there are only 9 species in Spider WebWatch. A discussion in The Nearctic Arachnologists' Forum helped choose these 9 species.

I took the "drop a comment" feature on species pages in The Nearctic Spider Database to a much more interactive level and permit "WebWatchers" in Spider WebWatch to not only upload an observation with an image but to comment on anyone else's observation, thus building threads of discussions in a manner very much like a forum. A contributor may edit their observations or comments at any time and the system for contributing an observation is stripped down to the bare minimum. It was brought to my attention that a web form with too many fields or boxes to tick/fill is overwhelming.

So, try out Spider WebWatch. I also have a poll underway to get some feedback on the possibility of using a web-enabled mobile phone to submit an observation.

Client-orchestrated Data Repurposing

A lot of work is underway by various working groups within TDWG to connect one machine to another for intelligent data exchange systems. For example, DiGIR and BioCASE (soon to superseded by TAPIR) are nifty systems to create on-the-fly XML documents whose data they contain can be dumped into other databases. This of course is all behind-the-scenes with no direct benefit to providers of their biodiversity data except I suppose a demonstration to administrators that they have contributed to the greater good. Eventually, somewhere down the line, there may be some sort of attribution but there's no guarantee.

GBIF does a great job of maintaining attribution because the ultimate goal is to permit someone who uses their website to discover where a specimen can be found & to contact the curator. However, there's nothing stopping anyone from aggregating data from DiGIR providers and repurposing it without any sort of attribution or "link" back to the provider. In other words, an institution could potentially have to cough up a lot of funds to keep the bandwidth pipes flowing and there may not be any immediate value. These sorts of thoughts fly in the face of open access. Don't get me wrong, I'm all for open access, I'm just not certain if such a model for aggregating biodiversity data in museums and elsewhere is sustainable. What is at least needed is an auditing & logging tool associated with DiGIR (or TAPIR) such that providers of biodiversity data may collect data on who used their resource, what was downloaded and what traffic patterns have been like over 'x' number of days, weeks, months, etc. But, I know of no such add-on for DiGIR or BioCASE providers.

So, I have been looking into alternative means to share resources and have played around with various ideas. One such idea takes the form of gadgets to share imagery. There are a ton of really useful images of immense biological value and when these are shared around, it becomes impossible to know where the original image was first made available and who provided it. One could use meta tags and embed that data within the image, but who does that? If there was a browser-based meta tag reader for images for web programmers to tap into, then meta tags would be obvious. However, I'm not aware of any browser plug-in that can do that. So, here's a gadget script that can be copied and pasted onto a web site:

And here's the result:

The gadget itself is dynamically-created JavaScript that pulls all the bits from the Nearctic Spider Database. The species nomenclature, attribution, and link to the species page are automatic & could change should I change anything in the database. These changes would of course cascade through all instances of the script where ever these may be. The individual who "made" the gadget can however pretty it up as they might like through a little configuration tool I have. Feel free to mess with that by clicking a "Link it" button here:

Something like this gadget system is by no means rocket science but has immediate value to the provider and the individual wishing to repurpose it for their web site.

Thursday, April 26, 2007

Inaugural Post

I'm a latecomer to the blog scene so thought I'd try my hand at it.

This blog will include bits that have fallen off the wagon as it were while developing The Canadian Arachnologist, The Nearctic Spider Database, The Nearctic Arachnologists' Forum and Spider WebWatch. The latter is a citizen science initiative that accepts observation data on 9 ambassador species in North America. I have a strong interest in federating biological data so there will undoubtedly be posts about nomenclatural management, species concepts, data aggregation techniques and the like.