Wednesday, October 24, 2007

Biodiversity Informatics Needs a Business Model

Publishers and (most) librarians understand that digital object identifiers (doi) associated with published works are more than just persistent codes that uniquely identify items. They are built into the social fabric of the publishing industry. Because monies are transferred for the application and maintenance of a doi, the identifier is persistent. It's really because of this "feature" that tools like cross-linking and forward-linking can be built and that these new tools will themselves persist. The nascent biodiversity informatics community is attempting to do all the fun stuff (myself included) like building taxonomic indices, gadgetry to associate names and concepts with other things like literature, images, and specimens without first establishing a long-term solution for how the persistence of all these new tools will be established. Let me break it down another way:

Publishers buy dois and pay an annual subscription. In turn, the extra fee for the doi is passed down the chain to the journal & its society. The society then passes the extra fees on to either an author in the way of page fees or to the subscribers of the journal. Since the majority of subscribers are institutions and authors receive research grants from federal agencies, ultimately, the fractions of pennies that merge to pay for a single doi come from tax payers' wallets and purses. So, dois fit nicely into the fabric of society and really do serve a greater purpose than merely uniquely identifying a published object. Then, and only then, can the nifty tools CrossRef provides be made available. Then, third parties may use these tools with confidence.

Not surprisingly, the biodiversity informatics community has latched on to the nifty things one can do with globally unique identifiers because everybody wants to "do things" by connecting one another's resources. Some very important and extremely interesting answers to tough questions can only be obtained by doing this work. Also not suprisingly, there is now a mess of various kinds of supposed globally unique identifiers (GUIDs) because big players want to be the clearinghouse much as CrossRef is the clearinghouse for dois. But they have all missed the point.

So, how do we instill confidence in the use of LSIDs, ITIS TSNs, the various NCBI database id's, etc. without a heap of silos with occasional casualities? Get rid of them or at least clearly associate what kind of object gets what kind of identifier along with a business model where there will be persistent, demonstrable transfer of funds. The use of Semantic Web tools is merely a band-aid for a gushing wound. When I say persistent transfer of funds, I don't mean assurances that monies will come from federal grants or wealthy foundations in order to maintain those identifiers. I mean an identifier that is woven into the fabric and workflow of the scientific community. This may be easier said than done because other than publications, the scientific community (especially systematists and biologists) aren't in the business of producing anything tangible except publications. CrossRef has that angle very well covered. So, what else do scientists (the systematics community is what I'm most interested in) produce that can be monetized? Specimens, gene sequences, and perhaps a few other objects. We need several non-profits like CrossRef with the guts to demand monies for the assignment of persistent identifiers. Either we adopt this as a business model or we monetize some services (e.g. something like Amazon Web Services as previously discussed) that directly, clearly, and unequivocally feed into the maintenance of all the shiny new GUIDs.


