Monday, November 3, 2008

Little E's

Because I work for the Encyclopedia of Life (EOL) and because I can tinker on the Nearctic Spider Database, I have the opportunity to try out various approaches to help mobilize data. One thing that concerns me about the current relationship between EOL and its content partners is their near 1:1 relationship. In other words, content partners that come onboard are encouraged to represent their data in one potentially massive XML document much like a Google Sitemap. More information on what EOL would like to see future content partners produce can be found HERE. A potential outside consumer of these data will have no idea where to retrieve this XML document. Thus, the relationship between EOL and its content partners is closed. That is, until EOL releases some web services.

So, in an effort to help expose the data structure EOL is looking for, I made a link on every one of the species pages in the Nearctic Spider Database. Upon clicking these "little e's", you can catch a glimpse of what EOL is hoping its content partners will produce. These "little e's" don't really help the relationship between EOL and its array of content partners, nor does it ease the effort on the part of content partners to make these documents, and nor does it help us at EOL. So what's the point? What it does is share what I produced for EOL. If you can parse the data behind the "little e's", you can parse the big XML "sitemap" document I made for EOL as well.

The problem with sitemaps is that no one but the harvester knows where these sitemaps can be found. A Google sitemap for instance can be found in any folder on a website that shares a sitemap (but is usually in the root folder and is accessed as /sitemap.xml or /sitemap.gz). This is the same situation for EOL and its content partners; the "sitemap" can be found anywhere.

To finish off the "little e" approach, each page should have a link to the EOL content partner sitemap document in which can be found links to all pages with "little e's". This would be somewhat similar to an OpenSearch document where are found instructions on how to make use of the search feed(s) available on a site. And of course, there should be a JSON option for a lighter weight option than XML.

But, to make this of any use at all, we need a desktop reader like an RSS reader...something with the ability to shunt the data into the correct spot within a rich GUI-based classification (with some degree of certainty), thus forcing us to eventually develop far better online tree browsers. With all the bits described above, you'd come across a species page, click a button like an RSS feed button, download a sitemap containing a list of all species pages on the site you landed on, then browse through the content the way you want it organized.