[referencer] Looking up a DOI (was: Re: [referencer] Re: Plugin for fetching data from Isi-WebOfScience)

Michael Banck mbanck at gmx.net
Tue Jul 8 09:14:42 EDT 2008


On Tue, Jul 08, 2008 at 04:47:07AM -0400, jcspray at icculus.org wrote:
> Quoting Michael Banck <mbanck at gmx.net>:
>> It would be awesome to have some sort of journal database which could
>> look up DOIs from the JournalName-Volume-Page triple the user could
>> input via a pop-up GUI similar to the "Add Reference with ID" query.
>> This database could then get expanded by users based on their scientific
>> field and most used journals.
>
> If you (as in, y'all, the collective you) make the database, I'll write 
> the code.  If you can arrange for it to be populated with about 50 top 
> physics and biology journals, I think it's worth shipping.  You're right 
> that the database needs to be user-expandable, but I think it also needs 
> a solid basis to start from.

OK.

> Note that the triple needs to resolve to something machine-parseable,  
> rather than human-readable HTML.  Many journals have a "download  
> citation" link or so associated with article pages which would be useful 
> for this.  XML preferred to bibtex since it tends to have fewer  
> idiosynracies.

While this would be certainly welcome, I think initially it would be
less work to just extract the DOI from the article's HTML page, and do a
metadata search on it using the available plugins (crossref/pubmed/
arxiv).

This reduces the problem to constructing the unique URL of the article,
and extracting the DOI from the HTML page, something users without any
python or XML-parsing knowledge could do for their journals.

Constructing a unique URL is not always possible (e.g. Science Direct
seems to use md5sum hashes for each article as URL), but seems to work
for a lot of cases.  For example, for the American Institue of Physics
(AIP) journals, the URL is as follows:

http://link.aip.org/link/?$JOURN/$VOLUME/$PAGE_OR_ARTICLE_ID/1

where $JOURN is a 6-char/digit ID of the journal (e.g. JAPIAU for  J.
Appl. Phys. or JCPSA6 for J. Chem. Phys.).

So to get the DOI for the article from page 4965 of volume 93 from J.
Chem. Phys., you can do

wget -O - http://link.aip.org/link/?JCPSA6/93/4965/1 2> /dev/null | \
grep -i doi | head -1 | sed s/.*DOI.//

and then lookup the metadata using that DOI.


Michael

PS: Once you have a good URL for a particular article, you can try to
directly download the PDF with a single wget/python command provided you
have access to it through your institution; this works in less cases
than what I decribe above, but I managed to do so for a couple of
journals a while ago (though I cannot find my notes about them right
now).  This could be maybe merged into the "File:" field of the Document
Properties window by adding a "Download" option to it somehow, when no
File has been assigned yet.



More information about the referencer mailing list