Thursday, September 27, 2007

Would You Cite a Web Page?

Species pages in The Nearctic Spider Database are peer-reviewed in the very traditional sense. But, instead of doling out pages that need to be reviewed, I leave it up to authors to anonymously review each others' works. Not just anyone can author a species page; you at least need to show me that you have worked on spiders in some capacity. Once three reviews for any page have been received and the author has made the necessary changes (I can read who wrote what and when), I flick the switch and the textual contributions by the author are locked. There is still dynamically created content on these species pages like maps, phenological charts, State/Province listings, etc. However, at the end of the day, these are still just web pages, though you can download a PDF if you really want to.

Google Scholar allows you to set your preferences for downloading an import file for BibTex, EndNote, RefMan, RefWorks, and WenXianWang so I thought I would duplicate this functionality for species pages in The Nearctic Spider Database, though limited to BibTex and EndNote. I'm not at all familiar with the last three reference managers and I suspect they are not as popular as EndNote and BibTex. Incidentally, Thomson puts out both EndNote and RefMan and recently, they released EndNoteWeb. As cool as EndNoteWeb looks, Thomson has cut it off at the knees by limiting the number of references you can store in an online account to 10,000. Anyone know anything about WenXianWang? I couldn't find a web site for that application anywhere. So, here's how it works:

First, it's probably a good idea to set the MIME types on the server though this is likely unnecessary because these EndNote and BibTex files are merely text files:

EndNote: application/x-endnote-refer (extension .enw)
BibTex: ?? (extension .bib)

Second, we need to learn the contents of these files:

EndNote: called "tagged import format" where fields are designated with a two-charcter code that starts with a % symbol (e.g. %A). After the fields, there is a space, and then the contents. It was a pain in the neck to find all these but at least the University of Leicester put out a Word document HERE. Here's an example of the file for a species page in The Nearctic Spider Database:

%0 Web Page
%T Taxonomic and natural history description of FAM: THOMISIDAE, Ozyptila conspurcata Thorell, 1877.
%A Hancock, John
%E Shorthouse, David P.
%D 2006
%N 9/27/2007 10:40:40 PM
%~ The Nearctic Spider Database

BibTex: Thankfully, developers at BibTex recognize the importance of good, simple documentation and have a page devoted to the format. But, the examples for reference type are rather limited. Again, I had to go on a hunt for more documentation. What was of great help was the documentation for the apacite package, which outlines the rules in use for the American Psychological Association. In particular, p. 15-26 of that PDF was what I needed. However, where's the web page reference type? Most undergraduate institutions in NA still enforce a no web page citation policy on submitted term papers, theses, etc. so it really wasn't a surprise to see no consideration for web page citations. So, what is the Encyclopedia of Life to do? The best format I could match for EndNote's native handling of web pages was the following:

author = {Hancock, John},
title = {Taxonomic and natural history description of FAM: THOMISIDAE, Ozyptila conspurcata Thorell, 1877.},
editor = {Shorthouse, David P.},
howpublished = {World Wide Web electronic publication},
type = {web page},
url = {},
publisher = {The Nearctic Spider Database},
year = {2006}

Now, BibTex is quite flexible in its structure so there could very well be a proper way to do this. But, the structure must be recognized by the rule-writing templates like APA otherwise it is simply ignored.

The EndNote download is available at the bottom of every authored species page in the database's website via a click on the EndNote icon (example: I have no idea if the BibTex format above is appropriate so I welcome some feedback before I enable that download.

But, all this raises a question...

Would you import a reference to a peer-reviewed web page into your reference managing programs and, if you are an educator, should you consider allowing undergraduates an opportunity to cite such web pages? Would you yourself site such pages? Do we need a generic, globally recognized badge that exclaims "peer-reviewed" on these kinds of pages? Open access does not mean content is not peer-reviewed or any less scientific. Check out some myth-busting HERE. What if peer-reviewed web pages had DOIs, thus taking a great leap away from URL rot and closer toward what Google does with its index - calculations of page popularity. Citation rates (i.e. popularity) is but one outcome of the DOI model for scientific papers. If I anticipated a wide, far-reaching audience for a publication, I wouldn't care two hoots if it was freely available online as flat HTML, a PDF, or as MS Word or if the journal (traditional or non-traditional) has a high impact factor as mysteriously calculated by, you guessed it, Thomson ISI. If DOIs are the death-knoll for journal impact factors, are web pages the death knoll for paper-only publications?


Rod Page said...

Few technical comments first.

The MIME type for BibText is application/x-bibtex, at least that is what I used in my old onine bibliography manager MyPHPBib, and I think it's pretty much standard.

I think that historically EndNote has been the favourite of Mac users (I first encountered it at Oxford in the early 90's). RefMan has probably been more widely used, judging by the ubiquity of the RIS format. It is documented in detail here.

misc does seem to be the reference type to use in BibTeX for URLs, certainly the BMC journals use this (see, e.g. BMC Bioinformatics).

Here at Glasgow, undergraduates can cite web pages in their essays and projects, and regularly do. In this age of Open Access journals, many journals are effectively online-only (even if some will provide paper versions to institutions). Hence the distinction between a web site and a journal article starts to dissolve.

I think the real issues are permanence (will the web site disappear?), which is an issue even for major journals (e.g., doi:10.1096/fj.05-4784lsf), and authority (can I trust this content?). Both are social issues.

David Shorthouse said...

Interesting how education practises differ across institutions. Thanks for the MIME type and also for the RIS format link. That looks easy so will do the same for RefMan users.

Indeed permanence and authoritativeness are the underlying problems with online publications, especially for a nascent organization like EOL. But, it would certainly be worth kick-starting the latter with DOIs, which would hopefully encourage authors to contribute. Without DOIs, species pages on EOL will be nothing more than the same species pages on DiscoverLife, FishBase, WikiSpecies, Animal Diversity Web, etc. once you get past the pizzazz.

Rod Page said...

Continuing the theme of citation, elsewhere I've argued that citation measures could be extended to other sources, such as sequences and specimens, hence one could demonstrate the value of a specimen collection by a measure based on how many times individual specimens (or sequences derived from those specimens) are cited.

I've also suggested that citation measures could help make sensible choices of what paper(s) represent a good summary of phylogenetic knowledge of a taxon.

Two issues that will also need attention, versioning and credit. Versioning is an issue in that the EoL page will be dynamic, in which case how do we keep track of what version a user cited when? If a page has multiple authors, how is credit apportioned? Will contributors feel that they receive enough recognition for their contribution?

burning silo said...

Agree with what Rod has said about journals which are now online only. One example being Journal of Insect Science:
I would hope that it would be acceptable for one to cite pages from JIS or similar publications. Likewise, if other types of pages have adequate authority, then it seems that citation should be allowed.
I do agree that there should be some system for specifying a version of pages which have information which is subject to revision.

David Shorthouse said...

Bev (Burning Silo) - thanks for reminding me about Journal of Insect Science. I noticed that they have citations in almost exactly the required format for Rod & my citation-searching script so will write Henry Hagendorn to see if he'd be interested in implementing it.

Kevin Z said...

In the Cnidaria world, Daphne Fautin's Hexacorallia of the World website/database is cited in almost every anemone systematics paper. I have it in my endnote catalogue.

At Penn State and most of the rest of the US I presume, we encourage the students to use the internet to find articles but not to cite websites for lab reports and term papers. I usually tell mine that databases are OK as long as it is not your main reference in the paper. I also give a list a reputable databases.

So long as a website contains peer-reviewed material, I wouldn't have a problem using it or letting my students cite for lab reports. But are databases, like FishBase, CoML, OBIS etc. under some sort of peer-review? I've never really thought about it before.

Open access journals are certainly all right and I encourage their use. EOL will definitely need DOIs for each species page or else it becomes moot. It might as be as relevant to taxonomy as my blog.

As with citation measures, with websites you can keep track of hits, unique visitors, return visitors etc. which you cannot do with the paper versions of articles. So should web stats be weighed against hard citations? I say why not! What you really want to measure is impact. I've argued on my own blog that taxonomic papers should in theory be the most highly cited papers known to mankind because everyone not working on humans studies an organism with a particular definition and a history of amendments to that definition. In a drosophila genetics paper do you ever see a citation to the original description?

David Shorthouse said...

kevin z - I agree, EOL definitely needs to seriously consider DOIs if it hopes to find its way in the scientific literature. $1.5M+ for DOIs (a dollar per species) is cheap, especially if you have the authors produce their $1 as a nominal page fee. Disambiguation is of course another game that has to be played and won. But, as Rod points out, DOIs to approximate permanence and of course for cross-linking is but one aspect to this. Other pressing issues are accreditation and authority/trust.

Judging by what I have seen on FishBase, NatureServe, DiscoverLife, and on other aggregators, "species pages" are not peer-reviewed & rarely is any sort of recommended format for a citation provided. Then again, there is a fine, grey line between peer-review and wiki. We have all seen garbage get published in traditional peer-review journals and many articles on WikiPedia are astoundingly good. Maybe Citizendium is a nice mix of both worlds.

As for citation rates for taxonomic papers, perhaps if these were made more widely accessible, citation rates would increase. Will be interesting if we start seeing more of this when the BHL has lots of accessible papers (with DOIs I hope!!!). Currently, wares on the BHL site are woefully inaccessible for this purpose.

Anonymous said...

Thanks for the elaborate article on using websites in BibTex. I'm using the standard report documentclass and natbib/chicago citation style and found out that the format as outline in your blog does not work that well. Some of the keys are not recognized (including url).

Another issue is key sorting. Using @misc, with my settings, it does not sort or display organizations (as websites do not always mention authors). So I had to switch from @misc to @manual which serves my purpose without resorting to different styles or classes. An example:

author = {David Shorthouse},
title = {Just Gimme the Current Name!},
organization = {iSpiders},
address = {},
note = {Accessed 2 July, 2008},
year = {2008}