tag:blogger.com,1999:blog-5846783121665026448.post9023003956855484107..comments2023-07-12T09:09:24.343-06:00Comments on iSpiders: The Community is DeadUnknownnoreply@blogger.comBlogger6125tag:blogger.com,1999:blog-5846783121665026448.post-23137030793208525612010-02-05T10:22:00.384-07:002010-02-05T10:22:00.384-07:00Rod's comment "tools that provide tangibl...Rod's comment "tools that provide tangible benefits to users" is spot on, and it's echoed by Vince's comment about the need to "deliver enough personal benefit to the individual contributors to justify their individual efforts within the community."<br /><br />I think about these two topics quite a lot. The concept of "value" starts with the user of the product, not the developer of the product. Practicing "outside-in" development, as you suggest, should result in better outcomes for everyone involved. Thanks for the article.bobhttps://www.blogger.com/profile/03612307596763427172noreply@blogger.comtag:blogger.com,1999:blog-5846783121665026448.post-30927881838493666502009-08-05T12:33:51.659-06:002009-08-05T12:33:51.659-06:00I would not write off the community just yet. The ...I would not write off the community just yet. The reason why community sites struggle is because we have technically failed to deliver enough personal benefit to the individual contributors to justify their individual efforts within the community. The "community of one" Scratchpads and LifeDesks succeed because the single author receives all the credit. I even have some users of single author Scratchpads that have removed the login from their front page because author say "others have the audacity to try and login and contribute"! But there many Scratchpads that genuinely are community built. These usually have a more specific focus or goal other contributors buy into (e.g. the society sites). We are embarking on a sociological study of Scratchpad maintainers to understand more about the dynamics of these sites.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-5846783121665026448.post-64907441010806786612009-07-03T21:31:07.815-06:002009-07-03T21:31:07.815-06:00Markus -
Indeed, PDFs almost never contain any m...Markus - <br /><br />Indeed, PDFs almost never contain any metadata.It'll be a long time before publishers wake-up to the many tools one can use to embed metadata into PDFs. In the interim, wouldn't the first mention of "doi:" on p.1 qualify as the the DOI for the PDF? Seems the standard flag of honour for publishers is to splash the doi near the top margin of the PDF on p.1.<br /><br />As for lexical groups and beefy, "you do the thinking for me" services", I'm afraid I'm more for simplicity. I just want the names as written in the document with minimal massaging. Classifications, as I come to appreciate them, are highly personal and of little value to outsiders.David Shorthousehttps://www.blogger.com/profile/07902186433894266822noreply@blogger.comtag:blogger.com,1999:blog-5846783121665026448.post-43235408459625299552009-06-25T05:16:45.414-06:002009-06-25T05:16:45.414-06:00on the spot, David.
We are currently building a s...on the spot, David. <br />We are currently building a scientific name index for publications here at ECAT/GBIF. Using lucene and the <a href="http://lucene.apache.org/tika/formats.html" rel="nofollow">apache tika project</a> we can index various formats while retrieving some embedded document metadata. Unfortunately most PDFs I cam across do not contain rich metadata like author, date, doi, etc as PDF metadata, so getting hold of the DOI is key. Unfortunately finding the relevant DOI within the publication could be tricky (it may reference lots of others), but thats a challenge to try for sure.<br /><br />Once we got that (prototypes are already running), building services that expose name checklists based on (sets of) user selected publication in json will be yours. We also try to make use of TaxonMatch to lexically group names if you want to remove some dirty names.<br /><br /><br />One of the tools you might be interested in is Taxon Tagger, developed by Mike Giddens during our last Nomina meeting. It uses a service like ubios TaxonFinder or our lucene indexer that mark up names in documents and then allows you to add or remove found names manually. The list of names can then be exported as a CSV file. There is an early version running here using TaxonFinder (make sure to use firefox, safari breaks):<br />http://names.gbif.org/ws/taxontagger/index.html<br /><br />ECAT is driving taxon taggers development forward to allow the tools to organize the found names in a tree hierarchy instead of a flat list. You can also mark leaf nodes in the trees as being synonyms. And finally you will be able to retrieve the resulting taxonomic tree as a simple flat darwin core file.<br /><br />I am waiting for a new development machine and will update the above installation with new features hopefully this week.Anonymoushttps://www.blogger.com/profile/02525336976753861766noreply@blogger.comtag:blogger.com,1999:blog-5846783121665026448.post-91916099442979184112009-06-20T05:36:04.749-06:002009-06-20T05:36:04.749-06:00Great post! It seems clear that the reality is tha...Great post! It seems clear that the reality is that we need tools that provide tangible benefits for users, without asking them to modify (much) what they already do.<br /><br />I am continually baffled as to why major name databases are divorced from the scientific literature. Surely we want the names linked to their publication?<br /><br />The service you describe would be pretty easy to build. I wonder whether there are parallels with <a href="http://www.mendeley.com" rel="nofollow">Mendeley</a>, which features automatic extraction of metadata (including bibliographic references) from PDFs, and is aiming to be a <a href="http://last.fm" rel="nofollow">Last.fm</a> for research papers.Roderic Pagehttps://www.blogger.com/profile/00269598293846172649noreply@blogger.comtag:blogger.com,1999:blog-5846783121665026448.post-7505253605210578552009-06-17T02:04:31.626-06:002009-06-17T02:04:31.626-06:00I was going through my reading list and this post ...I was going through my <a href="http://www.google.com/reader/shared/user/05654081288920597306/label/biodiversity-informatics" rel="nofollow">reading list</a> and this post was immediately followed by <a href="http://hublog.hubmed.org/archives/001865.html" rel="nofollow">this one</a> which details an application which does automated entity recognitions from scientific articles (albeit in XML format rather than pdf's). The coincidence was strangeAnonymousnoreply@blogger.com