Today I participated in a very engaging conversation with a group of systematists and ecologists who are intensely interested in cataloguing the diversity of life in their neck of the woods. They immediately recognized that such a compilation should contain authoritative content, it should contain links to relevant resources so as not to repeat efforts elsewhere, and it most definitely should be online. In my (perhaps naive) interpretation, it sounded much like the Encyclopedia of Life (EOL), albeit at a smaller, more focused scale.
But has EOL taken a winning approach? Has it sustained the interest it once had? Is it duplicating effort? Is it financially sustainable? Are remarkable, value-added products being built off its infrastructure that would not otherwise be possible? These aren't rhetorical questions. I just don't know. Shouldn't I know by now? Part of the answer will certainly depend on which metric you wish to use. And, these metrics will invariably draw upon the engagement of one audience or another.
Here's an interesting thought experiment:
If EOL had taken a radically different approach at the outset by becoming a taxonomically intelligent index (e.g. a Google-like product, but specifically tuned using a graph such as may be the eventual underpinnings of the Open Tree of Life) instead of serving species pages aggregated from elsewhere, where would it be today? What could have been built from such a "product"?
Friday, December 7, 2012
If EOL Started All Over Today, What Would be the Best Approach?
Posted by
David Shorthouse
2
comments
Links to this post
Thursday, November 8, 2012
Conference Tweets in the Age of Information Overconsumption
Having been a remote Twitter participant in what from all accounts was a successful conference hosted by the Entomological Society of Canada and the Entomological Society of Alberta, I have the luxury of now stepping back with a nice glass of red wine and thinking more deeply about the experience and its implications on the health of science. Dezene Huber has also taken a breath after he participated in person and provided valuable Tweet streams of his own.
The Saturday prior to the conference, I had a "wouldn't be cool if" moment and put my fingers to action on a toy that could listen in on the Tweet streams being generated by conference goers as they prepared for the event, as they were in transit, as they sat in the audience, as they chatted over coffee, and as they celebrated their winnings during the banquet.
My roughshod little experiment was to encourage participants to include scientific names in their streams. After all, names are a very important part of how biology is communicated. I grabbed their Tweets in real-time, fed them into three web services, and stored the results in a relational database. Two of these web services were developed by me and Dmitry Mozzherin at the Marine Biological Laboratory under the NSF-funded Global Names project led by David Patterson. These gave me the tools necessary to answer the questions, "Is this a name?" and "Where is this name in a classification?" The other web service I used was one recently assembled by some brilliant developers at CrossRef that figured out a way to execute rapid searches against their massive database of citations in the primary literature, assembled off the backs of researchers and publishers.
So, while "Ento-Tweeps" tapped a name, I immediately caught it, placed it in a hierarchy, and threw it to CrossRef. Within a split second after a Tweet appeared, I had links to the primary literature and I had some context. These were often amazingly accurate. Here's one that the prolific Morgan Jackson tweeted during Nikolai Tatarnic's paper entitled, "Sexual mimicry and paragenital divergence between sympatric species of traumatically inseminating plant bug":
Now that's useful!
However...
There were occasions where this wasn't so useful. These were examples of what some have called, "Information Overload". But that's a misnomer. We're beginning to understand what this really is. A better term for this (if one were to become dependent and fixated on streams like this) is "Information Overconsumption".
So, how do we responsibly integrate the power of social media in scientific conferences?
First, draft a light-hearted code of ethics - the same as we've become accustomed to with mobile phones at such events. Turn off the beeps and squawks! Turn off the unnecessary keypress chirps.
Second, as tempting as it may be, DO NOT COMMERCIALIZE THIS! The corporate sector has already found its way into the conference arena, the last pure outlet for the exchange of science. A social media outlet could be a new channel for communication that will be instantly switched off if it were behind a paywall.
Last, treat the messages not as news, but as products. Though the messages are instant, much like a stream of news, they are written by you, the one who has spent years honing your skills and learning your science.
My only hope is that "toys" like mine and the web services upon which they depend improve with time. They MUST help sell your products in a way that does not lead to Information Overconsumption and they MUST add value to the messages you wish to convey. How? That's up to you.
Posted by
David Shorthouse
0
comments
Links to this post
Tuesday, January 3, 2012
Science is a Product in the Wrong Marketplace
Instead of mindlessly watching a movie tonight, I browsed through Google Tech Talks and stumbled upon a spectacularly argued, wonderfully cadenced, and orchestrated Sept 2011 presentation by Kristen Marhaver entitled, "Organizing the world's information by date and author is making Mother Earth Sick".
Her thesis is that science is a product, not a news stream. And, because science is communicated in a self-serving, pay-wall-laden marketplace, its products to outsiders (those who stand to benefit from this knowledge) are paradoxically valueless. Kristen argues that the first steps toward cracking into this marketplace could be to expose the inherently social dimension of science by using modern day social gadgetry. Google+, Twitter and star ratings could reside around the periphery of online PDF reprint viewers. Unfortunately Kristen, this is still the wrong marketplace.
The one place where the social dimension of science is abundantly obvious is the largely unchallenged scientific conference. There are ways for this energetic, youthful, exploratory dialogue to spill out onto the distant screens of those who could benefit. YouTube, Twitter, Google+ could all be used with religion at conferences because for the most part, papers delivered are free from the publisher's grasp. Google Tech Talks and TED talks are spectacularly popular for very good reason. The medium is accessible. Plus, there is ample opportunity to make conferences more accessible and engaging to registrants themselves. How many times have you heard someone deliver a paper who feels the need to introduce his/her co-authors who could not be present or to shamelessly advertise the upcoming paper/poster presentations of their graduate students? The moment someone walks up to the podium, I want all that pushed onto my iPad along with links to their reprints. I'd rather they just get on with it. If their presentation were recorded and later put on YouTube, I'd want the same experience. Sure, links to their reprints would likely throw me up against a brick pay-wall, but I'd already know and appreciate the context.
To take this even further, why not really expose the scientific conference by advertising the downtime? On how many occasions have you gone to a conference, only to share a beer or two in the evening(s) WITH THE COLLEAGUES YOU ALREADY WORK WITH!? Instead, I want a post-conference un-drink. That is, I'd like to advertise my desire to have a drink by posting what I'd like to talk about and then blast the venue into the Twittersphere for members of the public to join me if they felt so inclined. If it's a bust, I'll swallow my pride and go join another one...and I'll bring copies of my reprints.
Posted by
David Shorthouse
0
comments
Links to this post
Monday, November 14, 2011
Realtime Web
I started work on a whimsical presentation I will soon give to the Biodiversity Informatics Group at the Marine Biological Laboratory about the Realtime Web and came-up with the following kooky slide. Felt the urge to share.
Posted by
David Shorthouse
0
comments
Links to this post
Sunday, November 13, 2011
Amazing Web Site Optimizations: Priceless
Quite literally, priceless. As in costs nothing.
I was obsessed with web site optimization these past few weeks, trying to trim off every bit of fat from page render times. As we all know, if a page takes longer than approx. 3-4 seconds to render, then you can expect to lose your audience. Even though expectations for speed vary depending on the end-user's geographic location, having a website that can be equally fast for a user in Beijing is just as important as the experience for a user in California. As might be expected, server hardware typically isn't the bottleneck. Another way of looking at this is to recognize that remarkable boosts in performance can be had on crap hardware. So, this post presents the tools I used to measure web site performance and describes the simple techniques I employed to trim the excess fat.
My drug of choice to measure the effect of every little (or major) tweak has been WebPagetest, a truly invaluable service because I can quickly see where in the world and why my web page suffered. Knowing that it took 'x' ms to download and render a JavaScript file or 'y' ms to do the same for a css file meant I could see with precision what a bit of js or css cleansing does to a user's perception of my web site. I also used Firebug and Yahoo's YSlow, both as FireFox plug-ins. Google Chrome also has a Page Speed extension that I used to produce a few optimized versions of graphics files.
Some tricks I employed to great effect, in order from most to least important:
- Make css sprites. The easiest tool I found was the CSS Sprite Generator. Upload a zipped folder of icons and it spits out a download and a css file. Could it be any easier? Making a css sprite eliminates a ton of unnecessary HTTP requests and is by far the most important technique to slash load times.
- Minify JavaScript and css. For the longest time, I was using the facile JavaScript Compressor, but the cut/paste workflow became too much of a pain. So, I elected to use some server-side code to do the same: jsmin-php and CssMin. When my page is first rendered, the composite js and css files are made in memory then saved to disk. Upon re-rendering (by anyone), the minified versions are served. Here's the PHP class I wrote that does this for me. Whenever I deploy new code, the cached files are deleted then recreated with a new MD5 hash as file titles.
- Properly configured web server. This is especially important for a user's second, third+ visit. You'd be crazy not to take advantage of the fact that a client's browser can cache! I use Apache and here's what I have:
<Directory "/var/www/SimpleMappr">
Options -Indexes +FollowSymlinks +ExecCGI
AllowOverride None
Order allow,deny
Allow from all
DirectoryIndex index.php
FileETag MTime Size
<IfModule mod_expires.c>
<FilesMatch "\.(jpe?g|png|gif|js|css|ico|php|htm|html)$">
ExpiresActive On
ExpiresDefault "access plus 1 week"
</FilesMatch>
</IfModule>
</Directory>
Notice that I use the mod_expires module. I also set the FileETag to MTime Size, though this was marginally effective. - Include ALL JavaScript files just before the closing body tag. This boosts the potential for parallelism and the page can begin rendering before all the JavaScript has finished downloading.
- Serve JavaScript libraries from a Content Delivery Network (CDN). I use jQuery and serve it from Google. Be weary that on average, it is best to ONLY have 4 external sites from which content will be drawn. This includes static content servers that might be a subdomain associated with your web site. Beyond 3 external domains or subdomains, DNS look-up times outweigh the benefit of parallelism, especially for aged versions of Internet Explorer. Modern browsers are capable of more simultaneous connections, but we cannot (yet) ignore IE. I once served jQueryUI via the Google CDN, but because this was yet another HTTP request, it was slower than had I served it from my own server. So, I now pull jQuery from the Google CDN and I include jQueryUI with my own JavaScript in a single minified file from from my server.
- Use a Content Delivery Network. I use CloudFlare because it's free, was configured in 5 minutes and within a day, there was noticeable global improvement in web page speed as measured via WebPagetest. Because I regularly push new code, I use the CloudFlare API to flush their caches whenever I deploy. However, this is largely unnecessary because they do not cache HTML and as mentioned earlier, I make an MD5 hash as my js and css file titles.
Did I mention that none of the above cost me anything?
Posted by
David Shorthouse
0
comments
Links to this post
Sunday, June 26, 2011
SimpleMappr Embedded
I never had high hopes for SimpleMappr. There are plenty of desktop applications to produce publication-quality point maps. But it turns out, users find these hard to use or are too rich for their pocket books. As a result, my little toy and its API are getting a fair amount of use. I find this greatly encouraging so I occasionally clean-up the code and add a few logical, unobtrusive options.
A number of users appear to want outputs for copy-paste on web pages and not copy-paste into manuscripts, so I just wrote an extension to permit embedding.
Here's one such example using the URL
http://www.simplemappr.net/?map=643&width=500&height=250:
Happy mapping...
Posted by
David Shorthouse
0
comments
Links to this post
Monday, November 15, 2010
Lightweight, Cross-platform, Real-time Browser-Browser Communications
During a monthly meeting to discuss cutting edge technologies here at the Biodiversity Informatics Group at the Marine Biological Laboratory, I demonstrated a technique to update distributed browsers in the face of collaborative classification (i.e. tree) editing. In essence, if there are 2+ people asynchronously (i.e. via AJAX calls) updating content on a web page, there is potential for everyone to get horribly out of sync with one another. Imagine for example a chat window on a web page that does not update on everyone's web page in real time....wouldn't make for a particularly pleasant or useful experience for anyone. The same lousy experience was true in the LifeDesks tree editor when 2+ people were simultaneously updating the same classification. Person A might delete or move a node and person B, C, D, ... etc. are none the wiser and might later perform an action on that node (or its children) whereas the database no longer reflects what they see in their browser screen.
To work around the possibility that everyone editing can get horribly out of sync with one another, I implemented a polling mechanism to grab recent adjustments to data every 5 seconds. If you happen to be looking at a portion of the tree that someone else has just deleted or moved elsewhere in the tree, relevant nodes within the tree will now automagically refresh to reflect actions that someone else just did...nodes will flash red then disappear, nodes will flash green then appear, etc. There is also a scrolling activity monitor at the bottom of the screen. To be sure, this isn't a particularly robust mechanism because there is constant polling. Enter web sockets...
Ryan Schenk who attended this informal demonstration alerted me to Socket IO. I knew of it, but never paid much attention. However, after having poked around a little bit with the examples provided, I am convinced this is the way I should have designed real-time classification tree updates in the face of 2+ simultaneous user actions. The lightweight technique will prove useful for any client-client communications (e.g. real time chat). Plus, it has the excellent benefit of cross-browser, cross-platform capabilities with very little server strain. A database need only be hit once when person A exerts an action and the data propagates to all other users. Very cool.
Posted by
David Shorthouse
0
comments
Links to this post
