Saturday, July 19, 2008

Google Geocodes

Since I have been on a kick this weekend getting back into the mapping thing, I decided to see what was new in the world of the Google Map API and discovered plenty of new great things. For example, folks have developed reverse geocoders. It's a shame however that the full ISO country names aren't used. Rather, only the country codes are made available via Google's geocode API. I would have much rather had the full country name and the full "AdministrativeAreaName" (i.e. the State or Province in Google Map API parlance) because I could then use this in the AJAX data grid for contributors of specimen records to the Nearctic Spider Database. Similarly, applications like Specify could have taken advantage of this to help users clean or check their data as these are entered.

Nevertheless, I tweaked my old Google Map Geocoder to take advantage of all these advancements. The point of this little gadget is to click a map and get the location and coordinates. In this era of GPS units and iPhones, this may be rather pointless. But it was fun to see what I could do in an hour or so.

4 comments:

rpg said...

Or one could use Biogeomancer (http://bg.berkeley.edu/latest) so that one could also get uncertainty measurements around those units, and access many more underlying datasets for place names. And of course the ability to georeference a much wider variety of typical localities (eg. those with offsets).

I only mention this because it is not just the readily accessible technologies like GPSs or new gadgets soon to appear (eg. GPS cameras). We need to really think about not just single points on map but the "footprint" representing a probability of where an organism was likely seen based both on the quality of the locality description and the underlying place names descriptors that you are pulling.
-r

David Shorthouse said...

rpg:

Agreed, there are plenty of considerations with point collections the least of which is just how true the lat/long coordinates are when pulled from Google Map (or any other API that permits these sorts of "click to get coordinates"). I suppose the ultimate question is how these data are used or are intended to be used. If the goal is to merely make a point collection map for the purposes of something like a catalogue or a monograph, then these simple point data (without uncertainty) suffice. If however the goal is to use these data for ecological niche modelling where uncertainty is well integrated into the algorithm(s), then yes, uncertainty should be recorded. What slows us down is the fact that this uncertainty tends to be collector/collection/specimen-specific. It is near impossible to batch process uncertainty. I wonder how many GBIF providers make use of the "CoordinateUncertaintyInMeters" element in the DarwinCore spatial extension. My guess is next to none. It's not for lack of understanding, it's for lack of human beings to key in uncertainty.

rpg said...

David says: "It is near impossible to batch process uncertainty. I wonder how many GBIF providers make use of the "CoordinateUncertaintyInMeters" element in the DarwinCore spatial extension. My guess is next to none."

Interesting you should mention this. We have a GBIF seed grant this year to try to do just that. Biogeomancer was built with a batch process workbench feature for providers. It also has an associated web service. So the idea behind the project is pretty simple. A data harvester collects records from institution resources that have been approved for use by their administrators. Harvested records are evaluated to determine if they contain sufficient information to warrant georeferencing. The records that meet these criteria are sent to the BioGeomancer web service and the response for each record is checked again to determine if the result is unambiguous in terms of location and has low uncertainty (under 10km radius error). These high quality georeferenced records and information concerning the whole georeference process are stored on a server and a notification is sent to the originating institution that new records have been georeferenced and are available for collection and incorporation.

I personally think such services will be important for convincing other providers of the value of becoming a data provider.

I am not quite sure I agree with this statement either, David: "If the goal is to merely make a point collection map for the purposes of something like a catalogue or a monograph, then these simple point data (without uncertainty) suffice."

Would you not consider the standard deviation as well as the mean when measuring from among a population? It seems like the same problem to me. We know the data is dirtier than a point and we know that our products might be used by others in ways we cannot anticipate. We should attempt to represent uncertainty.

Having said that, I realize I am being hypocritical. I have published multiple papers with "point-maps" as opposed to footprints. A lot of the reason why we do points is that it is simple and quick. Hopefully the technology for showing more interesting representations is catching up with our needs.

David Shorthouse said...

"I personally think such services will be important for convincing other providers of the value of becoming a data provider."

You and me both! But would the service be post-hoc or would it be available at the time data are captured in local databases? The former requires the provider to go back to these records and clean them; the latter is a lot more seamless. I would love to integrate this into an online AJAXy data grid so JSON with a callback function would be most excellent.

What does "footprint" map look like? Maybe a B&W point map where each point has a grey halo?