[referencer] about pubmed plugin
Aurélien Naldi
aurelien.naldi at gmail.com
Mon Mar 24 12:43:35 EDT 2008
On Mon, Mar 24, 2008 at 5:22 PM, John Spray <jcspray at icculus.org> wrote:
>
> On Mon, 2008-03-24 at 17:09 +0100, Aurélien Naldi wrote:
> > Exception: <type 'exceptions.UnicodeDecodeError'>
> > Explication: 'ascii' codec can't decode byte 0xc3 in position 1:
> > ordinal not in range(128)
> >
> >
> > The following fixes it for me, I hope it doesn't create any other kind
> > of problem...
> >
> >
> > print "DOI ", query, " has PubMed ID ", id
> >
> > - return get_citation_from_pmid (id)
> > + return get_citation_from_pmid (id.encode("utf-8"))
>
> That's pretty strange: the pmid is just a number, so utf-8 and ascii
> would have the same representation. Which implies that minidom was
> giving us something other than either of those. Your system isn't
> configured for something crazy like UTF-16 is it?
>
> The python stuff is potentially rife with encoding stuff that I haven't
> thought about. One of the pitfalls of being English: ASCII was enough
> for us! ;-)
Yes it is pretty weird and no, I'm not using UTF-16, I have been using
utf-8 only for a long time...
The strangest thing is that it fails after urlencode (it did print a
correct url for me, it only failed to download it)
I am looking at the returned xml file right now and I don't see
anything strange explaining this, but I saw something interresting:
<TermSet>
<Term>10.1093/bioinformatics/btm547[All Fields]</Term>
<Field>All Fields</Field>
<Count>1</Count>
<Explode>Y</Explode>
</TermSet>
so I tried appending [doi] to the searched doi and it did work, the
result is the same except for this part which now says:
<TermSet>
<Term>10.1093/bioinformatics/btm547[doi]</Term>
<Field>doi</Field>
<Count>1</Count>
<Explode>Y</Explode>
</TermSet>
I guess it solves your other problem !
PS: i don't get any update when doing "hg update", did you apply my
previous patch to the main tree already ?
Best regards
--
Aurélien Naldi
More information about the referencer
mailing list