[referencer] getting metadata from pubmed and some comments

Aurélien Naldi aurelien.naldi at gmail.com
Tue Nov 6 15:35:36 EST 2007


Le mardi 06 novembre 2007 à 20:17 +0000, John Spray a écrit :
> On Tue, 2007-11-06 at 15:59 +0100, Aur??lien Naldi wrote:
> > I'm a computer scientist by formation now working in bioinformatics.
> > As such I am dealing with tons of biology papers. I have found
> > referencer to be a great tool, the metadata fetching through crossref
> > is nice, but I never got more than the familly name of the first
> > author in the author field. Pubmed has much more complete metadata for
> > the papers I am currently dealing with, I would thus like to know if
> > adding support for pubmed into referencer is possible.
> 
> More metadata sources would be a good thing, and are a key future
> feature requirement.  Crossref deliberately cripple their publicly
> accessible OpenURL interface to provide only the author's last name, so
> any future metadata code will move away from this.
> 
> > 
> > I have just looked at how to get metadata through pubmed, here is a
> > quick introduction:
> 
> That's very helpful, I will refer to it if/when I'm experimenting with
> pubmed support.

Glad to read this.
I think I can help with this...
Does referencer have (or plans to) a plugin system or something to add
metadata fetchers easilly ?

> 
> > While I am at it, I have some (naive) questions about your XML format:
> > * Is it referencer-specific ?
> 
> Yes, I made it up off the top of my head.
> 
> > * What are its advantages over bibtex XML (or other similar stuff) ?
> > It seems to deal better with "tags" (bibtexxml has keywords) and
> > pdffilenames (bibtexxml  has only a relative path, when exported with
> > jabref) and to add the "manage_target" thing, that I do not use (yet).
> > Is it anything else ?
> 
> I don't know bibtex xml.  Referencer's format isn't intended to be
> bibtex-specific, so a pure xml representation of bibtex wouldn't be
> suitable.
> 
> > * I see only one "authors" field, which is way too "bibtex like" for
> > my taste. Having a clean separation of authors and being able to split
> > family name and given name looks nice to me.
> > Is it possible to extend the format to deal with this ?
> 
> Yes, it would be.  The Library of Congress MODS format implements this
> for example.  My main issue with this is the UI: does one then have
> separate first name/last name/initials fields?  It could get pretty
> cluttered.  I'm certainly open to suggestions in this area, since it's a
> key point where bibtex-isms (curly braces {}) are necessarily exposed to
> the user at present.

I also do not think that a "bibtex, but in XML" is the best way to go,
and I definitively do not want to have to deal with this the bibtex
way... The UI is not trivial, but maybe not that important if the
metadata fetchers are good enough ;)
I'm not sure about the "initial" field, can't it be deduced from the
"given name" one ?
One annoying thing with having separated fields, is about copy/pasting
the whole list of authors. Maybe keeping a large field can be convenient
for this use case ?

> 
> > And a final coment: some of my pdf files did not contain a doi entry,
> > when adding a whole directory, I got one error dialog for each of
> > them. It would be much more useful to remember the list of problematic
> > files and to show the list at the end of the process. Giving them a
> > "this thing need work" tag could be nice also, what do you think about
> > this ?
> 
> Fair point.  The "this thing needs work tag" could be an option on the
> "here are the files that had problems" dialog.  (There isn't an error
> dialog for simply not finding a DOI code, so I guess you're talking
> about the error when a DOI cannot be resolved to metadata by crossref)

oh, yes, this was before I realized I had to add username:password to
the crossref URL. I also had this with some pdf where the doi is
splitted on two lines (thus referencer only found the first half of it).
But I do think that putting a special tag on files without doi/metadata
is good.

Best regards

-- 
Aurélien Naldi <aurelien.naldi at gmail.com>




More information about the referencer mailing list