The Global Diversity Information Facility (GBIF) has done an excellent job at designing the infrastructure to support the federation of specimen and observation data. The majority of contributing institutions use Distributed Generic Information Retrieval (DiGIR), an open-source PHP-based package that nicely translates columns of data in one's dedicated database (e.g. MySQL, SQL Server, Oracle, etc.) into Darwin Core fields. So, even if your data columns don't match the Darwin Core schema, you can use the DiGIR configurator to match your columns to what's needed at GBIF's end. Europeans tend to prefer Access to Biological Collection Data (ABCD) as their transport mechanism. The functionality of these will soon be rolled into the TDWG Access Protocol for Information Retrieval (TAPIR). To the uninitiated like me, this is a jumbled, confusing alphabet soup and at first I couldn't navigate this stuff.
Suffice it to say, the documentation isn't particularly great on either the TDWG or GBIF web sites. To the TDWG folks: a screencast with step by step install for both Windows & Linux would go a long way! I don't mean a flashy Encyclopedia of Life webcast, I mean a basic TDWG for Dummies. If you have a dedicated database and a web server that can push PHP-based pages, it's actually pretty straight forward once you get going. It's really just a matter of jumping through a few simple hoops. Click here; do this; match this; click there - not much more difficult than managing an Excel datasheet. The downloads & step by steps for DiGIR can be found HERE. The caveat: you need a dedicated database, a dedicated web server, and you need your resource to be recognized by a GBIF affiliate before it's registered for access. That's unfortunately how all this stuff works.
So what about the casual or semi-professional collector that may have much larger collections than what can be found in museums? It's not terribly likely countless, hard-working people like these have the patience to fuss with dedicated databases (we're not talking Excel here) or web servers. Must they wait to donate their specimens to a museum before these extremely valuable data are made accessible? In many cases, a large donation of specimens to a museum sits in the corner and never get accessioned because there simply isn't the human power at the receiving end to manage it all. Heck, some of the pre-eminent collections in the world don't even have staff to receive donations of any size! This is a travesty.
An attractive solution to this is to complement DiGIR/ABCD/TAPIR with a fully online solution akin to Google Spreadsheets. For the server on the other end, this means a beefy pipe and a hefty set of machines to cope with this AJAX-style, rapid & continuous access. But, for small taxa-centric communities, this isn't a problem. In fact, I developed such a Google Spreadsheet-like function in The Nearctic Spider Database for collectors wanting to manage their spider data.
Watch the video above or HERE (better resolution). Everything is hosted on one machine on a residential Internet connection & I have had up to 5 concurrent users + all the usual 2,500 - 3,500 unique visitors a day with no appreciable drop in performance. Granted things are a little slower in these instances, but the alternative is no aggregation of data at all. To help users that each have their own table in the database, I designed some easy and useful tools. For example, they may query their records for nomenclatural issues, do some real-time reverse geocoding to verify that their records are actually in the State or Province they specified, check for duplicates, among a few other goodies like mapping everything as clickable pushpins in Google Map. Of course, one can export as Excel or tab-delimited text files at any time. The other advantage to such a system is that upon receiving user feedback and requests, I can quickly add new functions & these are immediately available to all users. I don't have to stamp and mail out new CDs, urge them to download an update, or maintain versions of various releases. If you're curious about wanting to do the same sort of thing for your interest group, check out Rico LiveGrid Plus, the code base upon which I built the application.
What would be really cool is if this sort of thing could be made into a Drupal-like module & bundled into Ubuntu Server Edition. A taxon focal group with a community of say 20-30 contributors could collectively work on their collection data in such a manner & never have to think about the tough techno stuff. They'd buy a cheap little machine for $400, slide the CD into the drive to install everything & away they go.
The real advantage to the on-line data management application in the Nearctic Spider Database is the quick access to the nomenclatural data. So, the Catalog of Life Partnership & other major pools of names ought to think about simple web services upon which such a plug-and-go system can draw their names. It's certainly valuable to have a list of vetted names such as what ITIS and Species2000 provide, but to really sell them to funding agencies they no doubt have to demonstrate how the names are being used. Web services bundled with a little plug-and-go CD would allow small interest groups to hit the ground running. Such a tool would give real-world weight to this business of collecting names and would go a long way toward avoiding the shell games these organizations probably have to play. I suspect these countless small interest groups would pay a reasonable, annual subscription fee to keep the names pipes open. Agencies already exist to help monetize web services using such a subscription system. Perhaps it's worth thinking like an Amazon Web Service (AWS) where users pay for what they use. Unlike AWS however, incoming monies would only support the Catalog of Life Partnership wages and infrustructure to take some weight off chasing grants.