Isi web of knowledge and referencer
Mario Castro
mariocastro73 at gmail.com
Tue Mar 4 08:45:59 EST 2008
Hi all
This is my first post but I hope it won't be the last
I have found referencer pretty interesting. Its main feature (in my opinion)
is its capability to read a pdf-file and get metadata automatically from
crossref.
However, as a scientist, I use everyday the most famous scientific database
(isi-web-of-knowledge)
I have made a little script (in a couple of days so it's very very quick and
dirty) that takes a referencer database file and use some fields in it to
obtain further information about the paper from isi-web.
It doesn't work perfectly and some extra tuning is needed. But I was
guessing if something similar could be incorporated in referencer. I mean,
pick a field of a referencer record and complete other fields from the
information obtained from isi-web. If so, i think it would be very valuable.
Here I attach the two files needed to run my script (you also need a
subscription to isi-web, but all the universities have one collective
subscription)
The first file is called refine_reflist.sh and it contains the following:
#################
#!/bin/bash
if [ $# -ne 2 ]; then
echo "Sintaxis: $0 infile.pdf outfile.bib (it appends if exist and
creates if not)"
exit
fi
rm -f .temp_query_* temp_answer_*;
#The main functionality is contained in the following awk script
#but I'm pretty sure that it would be simpler in pyhton (which I don't know
#very well)
awk -F ">" '
BEGIN {
# Prints the initial xml fields of the referencer file
doit="false";
print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
print "<library>"
print "<manage_target braces=\"false\"
utf8=\"false\"></manage_target>"
print "<taglist>"
print "</taglist>"
print "<doclist>"
}
/<doc>/ {
print "<doc>"
getline;
safe=$0;
gsub("<",">");
while ($2!="/doc") { # while you are in this record...
#print "1:",$1,"2:",$2,"3:",$3,"4:",$4;
#if($2=="bib_doi" && $3!="") { doit="true"; print doit,NF;}
if($2=="bib_title") {title=$3; gsub(" ","%20",title);
gsub(/[(")]/,"",title) } #extract the paper title
if($2=="bib_year") {year=$3; }
if($2=="bib_authors" && $3=="") {doit="true";}
else print safe;
getline;
safe=$0;
gsub("<",">");
}
if(doit=="true") { #creates a query_file for isi (I learn how this
was done with a sniffer)
temp_file=sprintf(".temp_query_%s_%s",title,year);
printf "GET
/esti/cgi?databaseID=WOS&SID=V2gB%40mljF1oPdjhlcF2&rspType=endnote&method=searchRetrieve&firstRec=1&numRecs=2&query=TI%3D(%s)%20and%20PY%3D(%s)
HTTP/1.1",title,year > temp_file;
print "User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5;
Linux) KHTML/3.5.8 (like Gecko)">> temp_file;
print "Accept: text/html, image/jpeg, image/png, text/*,
image/*, */*">> temp_file;
print "Accept-Encoding: x-gzip, x-deflate, gzip, deflate">>
temp_file;
print "Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5">>
temp_file;
print "Accept-Language: en">> temp_file;
print "Host: estipub.isiknowledge.com">> temp_file;
print "Connection: Keep-Alive">> temp_file;
print "">> temp_file;
print "QUIT" >> temp_file;
doit="false";
fflush(temp_file);
#the following uses netcat to perform a query in isi-web. Be careful with
the "^M" character it is written with Ctrl-v-m no simple ^M
comando=sprintf("cat %s |nc estipub.isiknowledge.com 80
|tail -n +10 - |head -n -11 |sed 's/^M//g' >>
.temp_answer_%s_%s",temp_file,title,year);
system(comando);
#The following command parses the isi-web query and add the learnt fields to
the referencer record
comando2=sprintf("awk -f extract_field.awk
.temp_answer_%s_%s",title,year);
system(comando2);
}
print "</doc>"
}
END{
print "</doclist>"
print "</library>"
}
' $1 >$2
rm -f .temp_query_* temp_answer_*;
###############
to run it simply type:
./refine_reflist.sh old.reflist new.reflist
the other file you need is called extract_field.awk:
################
BEGIN{
FS=">";
IGNORECASE=1;
authcount=0;
keycount=0;
}
/AuCollectiveName/{
gsub("<",">");
author[++authcount]=$3;
}
/<keyword>/{
gsub("<",">");
keywords[++keycount]=$3;
}
/article_no/ && /doi/ {
gsub("<",">");
doi=$3;
}
END{
printf("<bib_authors>");
for(i=1;i<authcount;i++) printf "%s and ",author[i]; printf
"%s</bib_authors>\n",author[authcount];
printf("<bib_extra key=\"Keywords\">");
for(i=1;i<keycount;i++) printf "%s,",keywords[i]; printf
"%s</bib_extra>\n",keywords[keycount];
printf("<bib_extra key=\"Doi\"> %s </bib_extra>\n",doi);
}
###############
I would like to contribute to the referencer project, and I hope that this
idea could be incorporated (and of course improved) for future versions.
Another source of inspiration could be a Mac program called "papers". I've
seen it in action and it's all I would like for referencer.
Best regards,
Mario
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://icculus.org/pipermail/referencer/attachments/20080304/918ffee6/attachment.htm>
More information about the referencer
mailing list