[referencer] referencer 1.1-pre

jcspray at icculus.org jcspray at icculus.org
Mon Jan 14 06:09:16 EST 2008


Quoting Aurélien Naldi <aurelien.naldi at gmail.com>:
>> There's a newline in the middle of the DOI in the paper.  The DOI regex is
>> picking up another DOI later on which has the 'junk' on it.  Regexing out
>> DOIs is always going to be a bit hit and miss.
>
> Yes, this is (and will remain) a tricky thing, but the first DOI appears
> nicely in the output of pdftotext. This is not a regression anyway. How
> does referencer extract the text ? Is it using some other external tool
> or doing the job by itself ?

libpoppler

> About the doi detection, one thing freaks me out (even if I have not
> seen it happen yet): a pdf could contain the doi of some other document
> as a way to quote it. Did referencer already pick the wrong doi in such
> case for someone ?

Of course.

John



More information about the referencer mailing list