iSpiders: August 2007

Wednesday, August 22, 2007

JSON is Kewl

While messing around with the new fangled reference parser script that connects to bioGUID to get the goods, it occurred to me that this slick, simple technique that requires next to no mark-up, can be applied to all sorts of nifty things. Yahoo produces JSON for its search results and you can specify your own callback function. So, for kicks, I adjusted my JavaScript file a bit to use Yahoo instead of Rod Page's bioGUID reference parser and also added some cool DHTML tooltips developed by Walter Zorn. So, hover your mouse over a few animal and plant names that I list here with no particular relevance: Latrodectus mactans, Blue Whale, blue fescue, and, Donald Hobern's favourite, Puma concolor. Incidentally, I may as well try it with Donald Hobern himself (Disclaimer: I take no responsibility for what may pop-up in the tooltip). Now that I have been messing with this JavaScript for pulling JSON with a callback, this stuff is quite exciting. You have to remember that there is next to NO mark-up or any additional effort for someone to take advantage of this. I only have a few JavaScript files in the <body> section of this page and I mark-up the stuff I want to have a tooltip with name here. This is pretty cool if I do say so myself.
I initially tried this technique with Flickr, but they don't permit square brackets in a callback function. So, I wrote the developers and alerted them to this cool new toy. Hopefully, they'll open the gates a little more and not be so restrictive.

Forgive me...I just can't help myself:
Carabus nemoralis
Argiope aurantia
Culex quinquefasciatus
duck-billed platypus
slime mould
...How many more million to go?...

Sunday, August 19, 2007

Gimme That Scientific Paper Part III

Update Sep 28, 2007: Internet Explorer 6 refuses to cache images properly so I have an alternate version of the script that disables the functionality described below for these users. You may see it in action HERE. Also, the use of spans (see below) may be too restrictive for you to implement so I developed a "spanless" version of the script HERE. This version only requires the following mark-up for each cited reference and you can of course change a line in the script if you're not pleased with the class name and want to use something else:
Full reference and HTML formatting allowed

Those who have followed along in this blog will recall that I dislike seeing references to scientific papers on web pages when there are no links to download the reprint. And, even when the page author makes a bit of effort, the links are often broken. One solution to this in the library community is to use COinS. But, this spec absolutely sucks for a page author because there is quite a bit of additional mark-up that has to be inserted in a very specific way. [Thankfully, there is at least one COinS generator you can use.] I was determined to find a better solution than this.
You may also recall that I came up with an AJAX solution together with Rod Page. However, that solution used Flash as the XMLHTTP parser, which meant that a crossdomain.xml file had to be put on Rod's server, i.e. this really wasn't a cross-domain solution unless Rod were to open up his server to all domains. Yahoo does this, but it really wasn't practical for Rod. As a recap, this is what I did in earlier renditions:
The JavaScript automatically read all the references on a page (as long as they were sequentially numbered), auto-magically added some little search icons such as

, & when clicking these, the references were searched via Rod Page's bioGUID reference parsing service. If a DOI or a handle was found, the icon changed to

; if a PDF was found, the icon changed to

; if neither a PDF or a link via DOI or handle were found, the icon changed to

whereby you could search for the title on Google Scholar; and finally, if the reference was not successfully parsed in bioGUID, then the icon changed to an "un"-clickable

. If you wanted to take advantage of this new toy on your web pages, you had to either contact Rod and ask that your domain be added to his crossdomain.xml file or you'd have to set-up a PHP/ASP/etc. proxy. But, Rod has now been very generous...

Rod now spits out JSON with a callback function. What this means is that there are no longer any problems with cross-domain issues as is typically the case with XMLHTTP programming. To make a long story short, if you are a web page author and include a number of scientific references on your page(s), all you need do is grab the JavaScript file HERE, grab the images above, adjust the contents of the JavaScript to point to your images, then wrap up each of your references in span elements as follows:

This is one full reference.
This is another reference.
etc.
How easy is that?!
To see this in action, have a peek at the references section of The Nearctic Spider Database.

Or, you can try it yourself here:

Buckle, D. J. 1973. A new Philodromus (Araneae: Thomisidae) from Arizona. J. Arachnol. 1: 142-143.

For the mildly curious and for those who have played with JSON outputs with a callback function, I ran into a snag that caused no end of grief. When one appends a JSON callback to a page header, a function call is dynamically inserted. This works great when there is only need for one instance of that function at a time. However, in this case, a user may want to call several searches in rapid succession before any previous call was finished. As a consequence, the appended callback functions may pile up on each other and steal each others' scope. The solution was to dump the callback functions into an array, which was mighty tricky to handle.

Wednesday, August 15, 2007

Mind Dumps...er...Maps/Graphs/Trees

Since Adobe has been driving across the US, selling some AIR, I thought I'd take a closer look at Flex/Flash applications that might fit the bill for some tough ideas I have been wrestling with. In a somewhat similar GUI struggle, Rod Page has been fevershly playing with Supertrees, trying to find a web-based, non-Java solution. So, I did a bit of digging into some showcased Adobe AIR applications - tutorial and demo sites are cropping up all over the place - and a Flex application that will soon be transformed into AIR caught my eye: Mindomo. Now, if this mindmapping application had RDF tying the bits together, which came via distributed web services from GBIF, GenBank, CrossRef, etc., we'd have a real winner here. Mindomo is exactly the application I have been dreaming about for The Encyclopedia of Life (EOL)'s WorkBench; it just has to be pinned down into the biological, semantic web world. Since an AIR application can wrap all this up in a web/desktop hybrid application, I am convinced this is what EOL absolutely must produce.

Monday, August 13, 2007

Google Finds Spiders in Your Backyard

The Google API team have added a new DragZoomControl to the list of available functions. This feature has been bandied about for quite some time and various people have hacked together approximations for this using other JavaScript functions. My interest in this isn't so much the zoom function as cool as that is, but the ability to query resources bound by the drawn box.

Kludging DragZoomControl to perform a spatial query isn't particularly practical or useful so I used a Yahoo YUI "Drag & Drop - Resizable Panel" to fix-up what I once using. What I used in the past that didn't perform well for Safari users was some scripting from Cross-Browser.com called the X Library. Now, with Yahoo's improvement on this, the function works as expected. Because it's very easy to add things that stay positioned within such a draggable box, the Yahoo YUI component is a much better solution. So, just like you can zoom in / zoom out with the Google DragZoomControl, so too can you put these functions within a draggable, resizable box. I'll also add that the resizing function in Yahoo's component is much smoother than Google's own DragZoomControl. Now the fun part...

Two little icons within the draggable, resizable box allow you to search for spider images or produce a spider species list, which are based on collections records submitted to The Nearctic Spider Database. Click HERE to try your hand at it and search for spiders in your back yard.

The advantage of such a simple function is that one need not have a spatial database like PostgreSQL, but can make use of any enterprise back-end. The query run is the typical minX, maxX, minY, maxY to define the four corner coordinates. With a ton of records in the backend however, the query can take a long time to complete so an index on the latitude and longitude columns may be required as explained in the Google API Group. If you want to see what you can do with a spatial database however, have a look at what programmers for the Netherlands Biodiversity Information Facility have put together.

Happy spider hunting...

Tuesday, August 7, 2007

Dare to Dream Big

This post will be quite off-topic, but I just had to share some recent stuff in the works that caught my eye.

First up is a spin-off from research at MIT, led by Sanjit Biswas who temporarily left his Ph.D. program (are you sure, Sanjit?) to lead a company called Meraki. The cheap, little router/repeaters permit the creation of "smart", distributed networks such that a single DSL connection can feed dozens of end-points. The firmware in each little gizmo permits a network admin to monetize these ad-hoc connections. Consequently, getting connected to the 'net could be as cheap as a $1 a month once a user buys the attractive Meraki mini. The company also recently announced a Meraki Solar kit. Now that's forward thinking. There are dozens of testaments on the Meraki web site including one from the town Salinas, Ecuador where a network of schools are now connected even though there are no phonelines!

Distributed, ad hoc connections like this reminded me of an email I recently received from Rod Page who alerted me to FUSE, which stands for "File System in Userspace". This is a Linux-based, Sourceforge project that allows a user to create & mount virtual drives that contain or represent a vast array of file types. For example: 1) Fuse::DBI file system mounts some data from relational databases as files, 2) BloggerFS "is a filesystem that allow Blogger users to manipulate posts on their blogs via a file interface.", and 3) Yacufs "is a virtual file system that is able to convert your files on-the-fly. It allows you to access various file types as a single file type. For instance you can access your music library containing .ogg, .flac and .mp3 files, but see them all as if being .mp3 files." This all sounds very geeky, but I draw your attention to MacFUSE (sadly, there is not yet a WindowsFUSE, though it appears this functionality has not gone unnoticed):

So what? Isn't this just like some sort of peer-2-peer system? Absolutely not. This is more like a distributed content management system and, coupled with a highly intelligent Yacufs-like extension, it means that file types (e.g. MS Word, OpenOffice, etc.) can be converted on-the-fly to whatever file format you want or need. To step this thinking up a bit in case you have no idea why this is relevant to ecology or systematics, have a look at the cool things Cynthia Parr and her colleagues are doing to visualize distributed data sets: doi:10.1016/j.ecoinf.2007.03.005. FUSE means the work Cynthia & others are doing (e.g. SEEK) don't need a GUI. Rather, we just need a way to organize the gazoodles of files that would/could be present in an ecologically- or taxonomically-relevant filesystem. Maybe I should coin these EcoFS and TaxonFS :)~