[referencer] referencer 1.1-pre
Aurélien Naldi
aurelien.naldi at gmail.com
Mon Jan 14 05:55:13 EST 2008
On lun, 2008-01-14 at 05:34 -0500, jcspray at icculus.org wrote:
> On Mon, 14 Jan 2008, Aurélien Naldi wrote:
> > It also allowed me to realize that it was unable to extract the text
> > from my postcript papers
>
> It never did that.
As the papers don't contain DOI, I was unsure about it before...
>
> > I also add a problem with the doi detection for this paper:
> > http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2$
> > It found something but added some junk at the end, which I don't see in
> > the output of pdftotext.
>
> There's a newline in the middle of the DOI in the paper. The DOI regex is
> picking up another DOI later on which has the 'junk' on it. Regexing out
> DOIs is always going to be a bit hit and miss.
Yes, this is (and will remain) a tricky thing, but the first DOI appears
nicely in the output of pdftotext. This is not a regression anyway. How
does referencer extract the text ? Is it using some other external tool
or doing the job by itself ?
About the doi detection, one thing freaks me out (even if I have not
seen it happen yet): a pdf could contain the doi of some other document
as a way to quote it. Did referencer already pick the wrong doi in such
case for someone ?
<dream>
Let's pray for DOI in the pdf metadata with a clean, consistent and
secure way to read it...
</dream>
> > One other glitch, now that I have fully tested the pdf production with
> > referencer-inserted citation in lyx: the accentuated characters are not
> > protected in the bibtex keys (they are in the other fields). Does bibtex
> > support special characters in keys at all ? I'm not a lyx/latex guru
> > (and I would LOVE to avoid becoming one) so I might be just doing
> > something wrong here...
>
> The principle of least surprise is in action here. Most people I know
> would write Gruber06 rather than Gr\"uber06. However, there is no general
> way to map accented-latin characters into english characters, so
> referencer leaves them alone. Converting non-ascii characters into their
> latex equivalent wouldn't be appropriate for key names, since they're
> never typeset. I know there are things like ss for ß and ae for æ, oe
> for ø, but my knowledge is pretty special-case for that. I wonder if
> there's an ISO standard?
I am fine with avoiding special characters in keys, I just wanted to
test if it did actually work, but I don't know much on the subject...
Maybe referencer could map "some" non-ascii characters. Given the huge
amount of software doing accent-proof search and the like, and how well
it works for me, an incomplete-yet-usefull mapping list must exist
somewhere (I really don't care right now though)
--
Aurelien Naldi
More information about the referencer
mailing list