[referencer] getting metadata from pubmed and some comments

John Spray jcspray at icculus.org
Tue Nov 6 15:17:29 EST 2007


On Tue, 2007-11-06 at 15:59 +0100, Aurélien Naldi wrote:
> I'm a computer scientist by formation now working in bioinformatics.
> As such I am dealing with tons of biology papers. I have found
> referencer to be a great tool, the metadata fetching through crossref
> is nice, but I never got more than the familly name of the first
> author in the author field. Pubmed has much more complete metadata for
> the papers I am currently dealing with, I would thus like to know if
> adding support for pubmed into referencer is possible.

More metadata sources would be a good thing, and are a key future
feature requirement.  Crossref deliberately cripple their publicly
accessible OpenURL interface to provide only the author's last name, so
any future metadata code will move away from this.

> 
> I have just looked at how to get metadata through pubmed, here is a
> quick introduction:

That's very helpful, I will refer to it if/when I'm experimenting with
pubmed support.

> While I am at it, I have some (naive) questions about your XML format:
> * Is it referencer-specific ?

Yes, I made it up off the top of my head.

> * What are its advantages over bibtex XML (or other similar stuff) ?
> It seems to deal better with "tags" (bibtexxml has keywords) and
> pdffilenames (bibtexxml  has only a relative path, when exported with
> jabref) and to add the "manage_target" thing, that I do not use (yet).
> Is it anything else ?

I don't know bibtex xml.  Referencer's format isn't intended to be
bibtex-specific, so a pure xml representation of bibtex wouldn't be
suitable.

> * I see only one "authors" field, which is way too "bibtex like" for
> my taste. Having a clean separation of authors and being able to split
> family name and given name looks nice to me.
> Is it possible to extend the format to deal with this ?

Yes, it would be.  The Library of Congress MODS format implements this
for example.  My main issue with this is the UI: does one then have
separate first name/last name/initials fields?  It could get pretty
cluttered.  I'm certainly open to suggestions in this area, since it's a
key point where bibtex-isms (curly braces {}) are necessarily exposed to
the user at present.

> And a final coment: some of my pdf files did not contain a doi entry,
> when adding a whole directory, I got one error dialog for each of
> them. It would be much more useful to remember the list of problematic
> files and to show the list at the end of the process. Giving them a
> "this thing need work" tag could be nice also, what do you think about
> this ?

Fair point.  The "this thing needs work tag" could be an option on the
"here are the files that had problems" dialog.  (There isn't an error
dialog for simply not finding a DOI code, so I guess you're talking
about the error when a DOI cannot be resolved to metadata by crossref)

Regards,
John





More information about the referencer mailing list