[referencer] Looking up a DOI (was: Re: [referencer] Re: Plugin for fetching data from Isi-WebOfScience)
jcspray at icculus.org
jcspray at icculus.org
Tue Jul 8 09:35:44 EDT 2008
Quoting Michael Banck <mbanck at gmx.net>:
>> Note that the triple needs to resolve to something machine-parseable,
>> rather than human-readable HTML. Many journals have a "download
>> citation" link or so associated with article pages which would be useful
>> for this. XML preferred to bibtex since it tends to have fewer
>> idiosynracies.
>
> While this would be certainly welcome, I think initially it would be
> less work to just extract the DOI from the article's HTML page, and do a
> metadata search on it using the available plugins (crossref/pubmed/
> arxiv).
Yes, but my point about it being machine-parseable stands: regexing
the DOI out of a webpage is not necessarily trivial, especially in
pages including lists of citations and their DOIs. But yes,
downloading the metadata from elsewhere once a DOI is found is
perfectly acceptable.
> This reduces the problem to constructing the unique URL of the article,
> and extracting the DOI from the HTML page, something users without any
> python or XML-parsing knowledge could do for their journals.
>
> Constructing a unique URL is not always possible (e.g. Science Direct
> seems to use md5sum hashes for each article as URL), but seems to work
> for a lot of cases. For example, for the American Institue of Physics
> (AIP) journals, the URL is as follows:
>
> http://link.aip.org/link/?$JOURN/$VOLUME/$PAGE_OR_ARTICLE_ID/1
>
> where $JOURN is a 6-char/digit ID of the journal (e.g. JAPIAU for J.
> Appl. Phys. or JCPSA6 for J. Chem. Phys.).
Alright, that's 2/50. Here's a sketch of how I see that information:
-> User selects a journal
-> Journal maps to a lookup function, in this case AIP
-> User selects remaining fields required by lookup function, in this
case volume and page.
-> Lookup function is invoked with journal key, volume and page,
translates this to a URI, downloads it, and applies its regex to it to
extract a DOI.
Here's the set of information I think is needed. Anything missing?
<journal>
<name>J. Appl. Phys.</name>
<alias>Journal of Applied Physics</alias>
<key>JAPIAU</key>
<lookup>AIP</lookup>
</journal>
<journal>
<name>J. Chem. Phys.</name>
<alias>Journal of Chemical Physics</alias>
<key>JCPSA6</key>
<lookup>AIP</lookup>
</journal>
<journal_lookup>
<name>AIP</name>
<!-- %0 is always the journal key -->
<uri>http://link.aip.org/link/?%0/%1/%2/1</uri>
<fields>
<field name="Volume" id="2"/>
<field name="Page" id=3"/>
</fields>
<regex>(DOI.*)$</regex>
</journal_lookup>
More information about the referencer
mailing list