getting metadata from pubmed and some comments

Aurélien Naldi aurelien.naldi at gmail.com
Tue Nov 6 09:59:01 EST 2007


Hi,

I'm a computer scientist by formation now working in bioinformatics.
As such I am dealing with tons of biology papers. I have found
referencer to be a great tool, the metadata fetching through crossref
is nice, but I never got more than the familly name of the first
author in the author field. Pubmed has much more complete metadata for
the papers I am currently dealing with, I would thus like to know if
adding support for pubmed into referencer is possible.

I have just looked at how to get metadata through pubmed, here is a
quick introduction:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=20&retstart=0&term=<!your_search_term!>

pointing to this will give you a list of matches under this form:


<eSearchResult>
	<Count>1</Count>
	<RetMax>1</RetMax>
	<RetStart>0</RetStart>
	<IdList>
		<Id>17581588</Id>
	</IdList>
	<TranslationSet>
	</TranslationSet>
	<TranslationStack>
		<TermSet>
			<Term>10.1038/nature05970[All Fields]</Term>
			<Field>All Fields</Field>
			<Count>1</Count>
			<Explode>Y</Explode>
		</TermSet>
		<OP>GROUP</OP>
	</TranslationStack>
	<QueryTranslation>10.1038/nature05970[All Fields]</QueryTranslation>
</eSearchResult>


The important part is the IdList, it gives the list of PMID matching
with the search. To get more data on a particular entry, use this URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=citation&id=<!PMID!>

The result is a huge XML file with a real list of author, abstract,
and much more.
Some documentation (which I have not really read yet) is available at
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=coursework.section.brief

I have not seen an explicit search by doi but searching a doi does
work (without the "doi:" prefix)

AFAIK, PDFs unfortunatly don't include a PMID, but this gives much
better results than crossref for biology papers...

While I am at it, I have some (naive) questions about your XML format:
* Is it referencer-specific ?
* What are its advantages over bibtex XML (or other similar stuff) ?
It seems to deal better with "tags" (bibtexxml has keywords) and
pdffilenames (bibtexxml  has only a relative path, when exported with
jabref) and to add the "manage_target" thing, that I do not use (yet).
Is it anything else ?
* I see only one "authors" field, which is way too "bibtex like" for
my taste. Having a clean separation of authors and being able to split
family name and given name looks nice to me.
Is it possible to extend the format to deal with this ?

And a final coment: some of my pdf files did not contain a doi entry,
when adding a whole directory, I got one error dialog for each of
them. It would be much more useful to remember the list of problematic
files and to show the list at the end of the process. Giving them a
"this thing need work" tag could be nice also, what do you think about
this ?

Thanks for your work on this nice tool!

Best regards.

-- 
Aurélien Naldi


More information about the referencer mailing list