Hi all<br><br>This is my first post but I hope it won't be the last<br><br>I have found referencer pretty interesting. Its main feature (in my opinion) is its capability to read a pdf-file and get metadata automatically from crossref.<br>
However, as a scientist, I use everyday the most famous scientific database (isi-web-of-knowledge)<br><br>I have made a little script (in a couple of days so it's very very quick and dirty) that takes a referencer database file and use some fields in it to obtain further information about the paper from isi-web.<br>
<br>It doesn't work perfectly and some extra tuning is needed. But I was guessing if something similar could be incorporated in referencer. I mean, pick a field of a referencer record and complete other fields from the information obtained from isi-web. If so, i think it would be very valuable.<br>
<br>Here I attach the two files needed to run my script (you also need a subscription to isi-web, but all the universities have one collective subscription)<br><br>The first file is called refine_reflist.sh and it contains the following:<br>
#################<br>#!/bin/bash<br>if [ $# -ne 2 ]; then<br> echo "Sintaxis: $0 infile.pdf outfile.bib (it appends if exist and creates if not)"<br> exit<br>fi<br>rm -f .temp_query_* temp_answer_*;<br>
#The main functionality is contained in the following awk script<br>#but I'm pretty sure that it would be simpler in pyhton (which I don't know<br>#very well)<br>awk -F ">" '<br>BEGIN {<br># Prints the initial xml fields of the referencer file<br>
doit="false";<br> print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"<br> print "<library>"<br> print "<manage_target braces=\"false\" utf8=\"false\"></manage_target>"<br>
print "<taglist>"<br> print "</taglist>"<br> print "<doclist>"<br><br>}<br>/<doc>/ {<br>print "<doc>"<br> getline;<br> safe=$0; <br>
gsub("<",">");<br> while ($2!="/doc") { # while you are in this record...<br> #print "1:",$1,"2:",$2,"3:",$3,"4:",$4;<br>
#if($2=="bib_doi" && $3!="") { doit="true"; print doit,NF;}<br> if($2=="bib_title") {title=$3; gsub(" ","%20",title); gsub(/[(")]/,"",title) } #extract the paper title<br>
if($2=="bib_year") {year=$3; }<br> if($2=="bib_authors" && $3=="") {doit="true";}<br> else print safe;<br> getline;<br>
safe=$0; <br> gsub("<",">");<br> }<br> if(doit=="true") { #creates a query_file for isi (I learn how this was done with a sniffer)<br> temp_file=sprintf(".temp_query_%s_%s",title,year);<br>
printf "GET /esti/cgi?databaseID=WOS&SID=V2gB%40mljF1oPdjhlcF2&rspType=endnote&method=searchRetrieve&firstRec=1&numRecs=2&query=TI%3D(%s)%20and%20PY%3D(%s) HTTP/1.1",title,year > temp_file;<br>
print "User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko)">> temp_file;<br> print "Accept: text/html, image/jpeg, image/png, text/*, image/*, */*">> temp_file;<br>
print "Accept-Encoding: x-gzip, x-deflate, gzip, deflate">> temp_file;<br> print "Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5">> temp_file;<br> print "Accept-Language: en">> temp_file;<br>
print "Host: <a href="http://estipub.isiknowledge.com">estipub.isiknowledge.com</a>">> temp_file;<br> print "Connection: Keep-Alive">> temp_file;<br> print "">> temp_file;<br>
print "QUIT" >> temp_file;<br><br> doit="false";<br> fflush(temp_file);<br>#the following uses netcat to perform a query in isi-web. Be careful with the "^M" character it is written with Ctrl-v-m no simple ^M<br>
comando=sprintf("cat %s |nc <a href="http://estipub.isiknowledge.com">estipub.isiknowledge.com</a> 80 |tail -n +10 - |head -n -11 |sed 's/^M//g' >> .temp_answer_%s_%s",temp_file,title,year);<br>
system(comando);<br>#The following command parses the isi-web query and add the learnt fields to the referencer record<br> comando2=sprintf("awk -f extract_field.awk .temp_answer_%s_%s",title,year);<br>
system(comando2);<br> }<br>print "</doc>"<br>}<br>END{<br> print "</doclist>"<br> print "</library>"<br><br>}<br>' $1 >$2<br>rm -f .temp_query_* temp_answer_*;<br>
###############<br><br>to run it simply type:<br>./refine_reflist.sh old.reflist new.reflist<br><br>the other file you need is called extract_field.awk:<br>################<br>BEGIN{<br> FS=">";<br> IGNORECASE=1;<br>
authcount=0;<br> keycount=0;<br>}<br>/AuCollectiveName/{<br> gsub("<",">"); <br> author[++authcount]=$3; <br>}<br>/<keyword>/{<br> gsub("<",">"); <br>
keywords[++keycount]=$3; <br>}<br>/article_no/ && /doi/ {<br> gsub("<",">"); <br> doi=$3;<br>}<br>END{<br> printf("<bib_authors>"); <br> for(i=1;i<authcount;i++) printf "%s and ",author[i]; printf "%s</bib_authors>\n",author[authcount];<br>
printf("<bib_extra key=\"Keywords\">");<br> for(i=1;i<keycount;i++) printf "%s,",keywords[i]; printf "%s</bib_extra>\n",keywords[keycount];<br> printf("<bib_extra key=\"Doi\"> %s </bib_extra>\n",doi);<br>
}<br><br>###############<br><br><br>I would like to contribute to the referencer project, and I hope that this idea could be incorporated (and of course improved) for future versions.<br><br>Another source of inspiration could be a Mac program called "papers". I've seen it in action and it's all I would like for referencer.<br>
<br>Best regards, <br><br>Mario<br>