[referencer] referencer 1.1-pre

Aurélien Naldi aurelien.naldi at gmail.com
Mon Jan 14 05:55:13 EST 2008


On lun, 2008-01-14 at 05:34 -0500, jcspray at icculus.org wrote:
> On Mon, 14 Jan 2008, Aurélien Naldi wrote:
> > It also allowed me to realize that it was unable to extract the text
> > from my postcript papers
> 
> It never did that.

As the papers don't contain DOI, I was unsure about it before...

> 
> > I also add a problem with the doi detection for this paper:
> > http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2$
> > It found something but added some junk at the end, which I don't see in
> > the output of pdftotext.
> 
> There's a newline in the middle of the DOI in the paper.  The DOI regex is
> picking up another DOI later on which has the 'junk' on it.  Regexing out
> DOIs is always going to be a bit hit and miss.

Yes, this is (and will remain) a tricky thing, but the first DOI appears
nicely in the output of pdftotext. This is not a regression anyway. How
does referencer extract the text ? Is it using some other external tool
or doing the job by itself ?
About the doi detection, one thing freaks me out (even if I have not
seen it happen yet): a pdf could contain the doi of some other document
as a way to quote it. Did referencer already pick the wrong doi in such
case for someone ?
<dream>
Let's pray for DOI in the pdf metadata with a clean, consistent and
secure way to read it...
</dream>


> > One other glitch, now that I have fully tested the pdf production with
> > referencer-inserted citation in lyx: the accentuated characters are not
> > protected in the bibtex keys (they are in the other fields). Does bibtex
> > support special characters in keys at all ? I'm not a lyx/latex guru
> > (and I would LOVE to avoid becoming one) so I might be just doing    
> > something wrong here...
> 
> The principle of least surprise is in action here.  Most people I know
> would write Gruber06 rather than Gr\"uber06.  However, there is no general
> way to map accented-latin characters into english characters, so
> referencer leaves them alone.  Converting non-ascii characters into their
> latex equivalent wouldn't be appropriate for key names, since they're
> never typeset.  I know there are things like ss for ß and ae for æ, oe  
> for ø, but my knowledge is pretty special-case for that.  I wonder if  
> there's an ISO standard?

I am fine with avoiding special characters in keys, I just wanted to
test if it did actually work, but I don't know much on the subject...
Maybe referencer could map "some" non-ascii characters. Given the huge
amount of software doing accent-proof search and the like, and how well
it works for me, an incomplete-yet-usefull mapping list must exist
somewhere (I really don't care right now though)

-- 
Aurelien Naldi




More information about the referencer mailing list