Saturday, August 21, 2010

Reference Parser Revived



Many moons ago, I once developed a tool that does real time discovery of scientific references using a combination of the aged (though still very useful) ParaTools and CrossRef's OpenURL service. With the demise of my server, this bit of code was lost. I just revived the code and functionality and provide it here for anyone else to take it and refine it UPDATE: parsing is now executed with a Ruby gem: http://refparser.shorthouse.net/. This location is not likely to persist so get it while you can. To get a sense of what it does, here are some verbatim references. Click the magnifying glass after each reference to experience the magic. Cross-domain AJAX requests are circumvented by using jQuery's clever JSONP handling.

Bell, C. D., & Patterson R. W. (2000). Molecular phylogeny and biogeography of Linanthus (Polemoniaceae). American Journal of Botany. 87, 1857-1870.

Epling, C., & Dobzhansky T. (1942). Genetics of natural populations. VI. Microgeographic races in Linanthus parryae. Genetics. 27, 317-332.

Epling, C., Lewis H., & Ball F. M. (1960). The Breeding Group and Seed Storage: A Study in Population Dynamics. Evolution. 14, 238-255.


Similarly, this can be done with an input box. Paste a reference and press enter:

2 comments:

Joe Lapp said...

You're doing great stuff David. I've been in discussions with various people about the future of biological databases. There's much to share, but the basic idea is that every person or institution would have their own database/web site, but that these databases generate RSS-like feeds of specimen information, which subscribing servers could aggregate and map. Individuals would have full control over their local database but still be able to contribute to a greater good without doing any extra work. Institutions could also choose to host software that runs many virtual databases, so users don't have to set up the PHP/MySQL themselves.

Anyway, I was wondering if maybe your Nearctic database might make a good starting point for this new platform.

David Shorthouse said...

More for me, but this may be useful for others:

Parsing Tools/Code:
Perl Biblio::Citation::Parser - http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10/lib/Biblio/Citation/Parser/Jiao.pm
ParsCit (Document Parser) - http://aye.comp.nus.edu.sg/parsCit/
CiteSeerX - http://sourceforge.net/projects/citeseerx/
Ruby FreeCite - http://freecite.library.brown.edu/
Python bibliograph.parsing - http://pypi.python.org/pypi/bibliograph.parsing/1.0.0

Standards:
OpenURL http://en.wikipedia.org/wiki/OpenURL
COins http://ocoins.info/
BibJSON http://www.bibkn.org/bibjson/index.html

Applications:
jQuery References Parse http://refparser.shorthouse.net/ (based on Biblio::Citation::Parser API)

Additional:
OpenURL Resolver Registry http://www.oclc.org/productworks/urlresolver.htm
CrossRef OpenURL Resolver http://www.crossref.org/02publishers/openurl_info.html
BHL OpenURL Resolver http://www.biodiversitylibrary.org/openurlhelp.aspx
BioStor http://biostor.org/openurl

Bibliographic Ontology http://bibliontology.com/

What's really needed is a centralized templating system whereby distributed, front-end parsers have an opportunity (somehow) to help embellish and improve the parsing accuracy for others.