Yet another great Google service : Google Scholar. Just type in a name
(for example mine) and you will get a
list of all his/her publications which are listed on the net. A
printable version of the paper if possible, a reference if not. In the
first case it even lists all versions available on the net. But even in
the latter case it has an extremely useful link next to it “Cited
by somany” which gives all references to the paper Google can
find. Clearly, it also finds all papers posted on the arXiv but the
great additional value is that you no longer have to search departmental
or personal webpages yourself. If you want to find out more about this
service, read the about page.
Tag: google
It
always amazes me how much time I have to waste in trying to get
tech-stuff (such as this weblog) working the way I want. You will barely
notice it but again I spend too much time delving in PHP-scripts,
sometimes with minor success, most of the time almost wrecking this
weblog…
An example : it took me a day to figure out why
this page said there was just 1 visitor online whereas log files showed
otherwise. The PHP-script I used checked this by looking at the
IP-address via _REMOTE_ADDR_ which is perfectly OK on an ordinary
Mac OS 10.3 machine, but _not_ on an OS X-Server! For some reason
it gives as the REMOTE_ADDR just the IP address of the Server (that
is, www.matrix.ua.ac.be in this case) so whoever came by this page got
tagged as 143.129.75.209 and so the script thought there was just one
person around… The trivial way around it is changing every
occurence of REMOTE_ADDR by _HTTP_PC_REMOTE_ADDR_.
Easy enough but it took me a while to figure it out.
Another
example : over the week-end this weblog got a stalker! There were over
100 hits from 38.113.198.9, so whoever that is really liked this site
but didn't have time to read a thing… Again, the standard
solution is to ban the IP-address and most weblog-packages have such a
tool on their admin-page. But whathever I tried and Googled WordPress doesn't seem to have it
on board. There were a few hacks and plugins around claiming to do
something about it but none of them worked! So, I tried more drastic
actions such as editing .htaccess files which I thought would solve
everything (again, no problem under 10.3 but _not_ under
10.3-Server!). Once more, a couple of hours lost trying to figure out
how to get the firewall of a Mac-Server do what I needed. The upshot is
that I know now all dark secrets of the _ipfw_ command, so no
more stalking around this site…
In the process of
grounding my stalker, I decided that I needed better site-stats than my
homemade log-file provided. Fortunetely, this time I picked a package
that worked without too much hassle (one more time I had to make the
REMOTE_ADDR substitution but apart from that all went well). You will
see not too much of the power of this stats-package on the page (apart
from the global counter), I feel that such things are best forgotten
until something strange occurs (like stalkers, spammers and other
weirdos). A nice side-effect though was that for the first time I had a
look at _referring pages_, that is the URL leading to this weblog.
Lots of Google searches (some strange ones) but today there were also a
number of referrals from a Chinese blog. I checked it out and it turned
out to be the brand new Math is Math! Life is Life! weblog…
Another time
consuming thing was getting the BBC-news RSS feeds working in the
sidebar, so that you still get _some_ feel for reality while
being trapped here. I am not yet satisfied with the layout under
Explorer, but then everyone should move on to Safari (so I did give up
trying to work out the PHP-script).
But most time I wasted on
something that so far has left no trace whatsoever here. A plugin that
allows specific posts to be read only by registered users (of a certain
'level', that is WordPress can give users a level from 0 to 10
with specific degrees of freedom). But clearly at the same time I wanted
the rest of the world to have at least some indication of what they were
missing (such as a title with a nice padlock next to it) but so far I
didn't get it working. The only trace of a closed posting would be
in the sidebar-listing of the ten last posts but gives an error message
when an unauthorized user clicks on it. So, still a lot of
headache-sensitive work left to do, but it is about time to get back to
mathematics…
update (febr. 2007) : the
padlock-idea is abandoned.
As
far as I know (but I am fairly ignorant) the arXiv does not
provide RSS feeds for a particular section, say mathRA. Still it would be a good idea for anyone
having a news aggregator to follows some weblogs and
news-channels having RSS syndication. So I decided to write one as my
first Perl-exercise and to my own surprise I have after a few hours work
a prototype-scraper for math.RA. It is not yet perfect, I still
have to convert the local URLs to global URLs so that they can be
clicked and at the moment I have only collected the titles, authors and
abstract-links whereas it would make more sense to include the full
abstract in the RSS feed, but give me a few more days…
The
basic idea is fairly simple (and based on an O\’Reilly hack).
One uses the Template::Extract module to
extract the goodies from the arXiv\’s template HTML. Maybe I am still
not used to Perl-documentation but it was hard for me to work out how to
do this in detail either from the hack or the online
module-documentation. Fortunately there is a good Perl Advent
Calendar page giving me the details that I needed. Once one has this
info one can turn it into a proper RSS-page using the XML::RSS-module.
In fact, I spend far
more time trying to get XML::RSS installed under OS X than
writing the code. The usual method, that is via
iMacLieven:~ lieven$ sudo /usr/bin/perl -MCPAN -e shell Terminal does not support AddHistory. cpan shell -- CPAN exploration and modules installation (v1.76) ReadLine support available (try \'install Bundle::CPAN\') cpan> install XML::RSS
failed and even a
manual install for which the drill is : download the package from CPAN, go to the
extracted directory and give the commands
sudo /usr/bin/perl Makefile.pl sudo make sudo make test sudo make install
failed. Also a Google didn\’t give immediate results until
I did find this ADC page which set me on the right track.
It seems that the problem is in installing the XML::Parser for which one first need expat
to be installed. Now, the generic sourceforge page contains a
version for Linux but fortunately it is also part of the Fink
project so I did a
sudo fink install expat
which worked
without problems but afterwards I still was not able to install
XML::Parser because Fink installs everything in the /sw
tree. But after
sudo perl Makefile.pl EXPATLIBPATH=/sw/lib EXPATINCPATH=/sw/include
I finally got the manual installation
going. I will try to tidy up the script over the weekend…
Over
the last couple of days I’ve been experimenting a bit with different
backup methods. To begin, I did try out ExecutiveSync and its
successor You Syncronize but they are very, very
slow. Not only did the first synchronizing of a 0.5 Gb Folder between
two computers over our Airport-network took over 2.5 hrs, but also on
subsequent syncs the checking of the database seems to last forever.
So I turned to the fink project
again and did find two interesting packages : wget . GNU Wget is a free network utility to
retrieve files from the World Wide Web using HTTP and FTP, so one way
to backup a folder would be to put it in the Sites folder and
mirror it over the network using wget. I did’t check this out in
great details (did a small test to see it working but I assume it will
be slow for large folders). The other one is rsync It uses the “rsync algorithm” which
provides a very fast method for remote files into sync. It does this by
sending just the differences in the files across the link, without
requiring that both sets of files are present at one of the ends of the
link beforehand. This seems to be precisely what I wanted to do and
after a google for ‘rsync OS X’ I arrived at the RsyncX package which is an implementation of rsync
with HFS support and configuration through a command line (Terminal) or
graphical user interface. I downloaded this package and the GUI seems to
be placed in the Applications/Utilities and tried it out by
filling out the Source and Local Folders and pressing the synchronize
button. Not much progress was reported but the Activity Monitor
showed that it was using up all of the CPU so I was patient for over an
hour and then looked for the Network Activity in the Activity
Monitor and virtually no packets were going in or out, so I killed
RsyncX. I am sure I did something wrong but rather than trying to
get it working, I tried the command-line rsync-command I
downloaded from Fink. After a few false attempts I
typed
/sw/bin/rsync -a -e ssh iMatrixLieven.local:/Users/lieven/Documents /Users/lieven/docsLieven
and suddenly the packets were flying
happily over the network at 250 Kb/sec, so it took me only half an hour
to get a first synchronization done and subsequent changes are added in
no time! Afterwards I discovered that rsync is included in the
standard OS X Developers Tools as RsyncX seems to have replaced
it to rsync_orig and installed a new (quite large) rsync
in /usr/bin. Maybe my problems with RsyncX were caused
because I have /sw/bin earlier in my $PATH than
/usr/bin but verifying this will have to await another day. For
the moment, I’m happy to have a quick syncronizing tool available and
Real Madrid is playing on the TV…
GAP the Groups, Algorithms, and Programming-tool
(developed by two groups, one in St. Andrews, the other in Aachen) is
the package if you want to work with (finite or finitely
presented) groups, but it has also some routines for algebras, fields,
division algebras, Lie algebras and the like. For years now it is
available on MacClassic but since the last clean install of my
computer I removed it as I was waiting for a Mac OS X-port to be
distributed soon. From time to time I checked the webpage at gap-system.org
but it seems that no one cared for OS X. For my “The book of
points”-project I need a system to make lots of examples so perhaps
one could just as well install the UNIX-version. Fortunately, I did a
last desperate Google on GAP OS X which brought me to the
Aachen-pages of the GAP-group where one seems to be more Macintosh
minded. The relevant page is the further notes for OS X on the
GAP-installation for UNIX-page. Here is what I did to get GAP running
under OS X. First go to the download page (btw. this page has
version 4.4 whereas St-Andrews is still distributing 4.3) and download
the
files
gap4r4.tar.gz,packages-2004_01_27-11_37_UTC.tar.gz,xtom1r1.tar .gz
This will give you three tar-files on your Desktop. Fire
up the Terminal and make a new directory /usr/local/lib if
it doesn’t exist yet. Then, go to your Desktop folder and do
sudo cp gap4r4.tar /usr/local/lib sudo cp xtom1r1.tar /usr/local/lib cd /usr/local/lib sudo tar xvf gap4r4.tar sudo tar xvf xtom1r1.tar
Then return to your Desktop Folder and copy the
remaining tar-file in the /usr/local/lib/gap4r4/pkg-folder which
is created by untarring the former two files and untar it as above.
Then, it is time to compile everything (assuming you have installed the
Developer’s tools) and there is one magic OS X-command which will
speedup GAP by 20%. Here is what to do
cd /usr/local/lib/gap4r4 sudo ./configure sudo make COPTS="-fast -mcpu=7450"
and everything will compile nicely. If you
are so lucky as to have a G5-system, you should replace the last command
by sudo make COPTS=”-03″. Finally, get everything in the right
place
cd /usr/local/lib/gap4r4/bin sudo cp gap.sh /usr/local/bin/gap
and if /usr/local/bin is in
your $PATH then typing gap at the command line will give
you the opening GAP-banner :
In the GoogleMatrix I tried to understand the concept
of the PageRank algorithm that Google uses to list pages according to
their \’importance\’. So, if you want your webpage to come out first in
a certain search, you have to increase your PageRank-value (which
normally is a measure of webpages linking to your page) artificially. A
method to achieve this is by link spamming, that is if page A is
to webpage of which you want to increase the PageRank value, take a page
B (either under your control or that of a friend webmaster) and add a
dummy link page B -> page A. To find out the effect of this on the
PageRank and how the second eigenvalue of the GoogleMatrix is able to
detect such constructs let us set up a micro-web consisting of
just 3 pages with links 1->2 and 1->3. The corresponding GoogleMatrix
(with c=0.85 and v=(1/3,1/3,1/3) is
1/3 1/20 1/20 1/3 9/10 1/20 1/3 1/20 9/10
which has eigenvalues 1,0.85 and 0.28.
The eigenvector with eigenvalue 1 (the PageRank) is equal to (0.15,1,1)
so page 2 and page 3 are equally important to Google and if we scale
PageRank such that it adds up to 100% over all pages, the relative
importance values are 6,9%,46,5% and 46,5%. In this case the eigenvector
corresponding to the second eigenvalue 0.85 is (0,-1,1) and hence
detects the two leaf-nodes. Now, assume the owner of page 2 sets up a
link spam by creating page 4 and linking 4->3, then the corresponding
GoogleMatrix (with v=(1/4,1/4,1/4,1/4)) is
77/240 3/80 3/80 3/80 77/240 71/80 3/80 37/80 77/240 3/80 71/80 3/80 3/80 3/80 3/80 37/80
which has eigenvalues
1,0.85,0.425 and 0.283. The PageRank eigenvector with eigenvalue 1 is
in this case is (0.8,8.18,5.35,1) or in relative importance % we have
(4.9%,50.1%,32.7%,6.1%) and we see that the spammer achieved his/her
goal. The eigenvector corresponding to the second eigenvalue is
(0,-1,1,0) which again gives the leaf-nodes and the eigenvector of the
third eigenvalue is (0,-1,0,1) and detects the spam-construct.
This morning there was an intriguing post on arXiv/math.RA
entitled A Note on
the Eigenvalues of the Google Matrix. At first I thought it was a
joke but a quick Google revealed that the PageRank algorithm really
is at the heart of Google technology, so I simply had to find out more
about it. An extremely readable account of it can be found in The PageRank Citation Ranking: Bringing Order to the Web which is really the
start of Google. It is coauthored by the two founders : Larry Page and
Sergey Brin. A quote from the introduction
“To test the utility of PageRank for search, we built a web
search engine called Google (Section
5)”
Here is an intuitive idea of
_PageRank_ : a page has high rank if the sum of the ranks of its
_backlinks_ (that is, pages linking to the page in question) is
high and it is computed by the _Random Surfer Model_ (see
sections 2.5 and 2.6 of the paper). More formally (at least from my
quick browsing of some papers, maybe the following account is slightly
erroneous and I’ll have to spend some more time reading) let
N be the number of webpages (estimated between 3 and 4
billion) and consider the N x N matrix
A the so called GoogleMatrix where
A = cP + (1-c)(v x
vec(1))
where P is the
column-stochastic matrix (meaning : all entries are zero or positive and
the sum of all entries in each column adds up to 1) with
entries
P(i,j) = 1/N(i) if i->j and 0
otherwise
where i and j are webpages and i->j
denotes that page i has a link to page j and where N(i) is the total
number of pages linked to in page i (all this information is available
once we download page i). c is a constant 0 < c < 1 and
corresponds to the fraction of webpages containing an _outlink (that
is, a link to another page) by all webpages (it seems that Google uses
c=0.85 as an estimate). Finally, v is a column vector with zero or
positive numbers adding up to 1 and vec(1) is the constant row vector
(1,…,1). The idea behind this term is that in the _Random Surfer
Model_ to compute the PageRank the Googlebot (normally following
links randomly in pages it enters) jumps every (1-c)x100% links randomly
to an entirely different webpage where the chance that it will end up at
page i is given by the i-th entry of v (this is to avoid being trapped
in a web-loop). So, in Googles model the bot _teleports_ itself
randomly every 6th link or so. Now, the PageRank is a
column-eigenvector for the GoogleMatrix A with eigenvalue 1 which can be
approximated by the RandomSurfer model and the rate of convergence of
this process depends on the _second_ largest eigenvalue for A
(the largest being 1). Now, in the paper posted this morning a simple
proof is given that this eigenvalue is c (because the matrix P has
multiple eigenvalues equal to 1). According to a previous paper on the
subject The
Second Eigenvalue of the Google Matrix, this statement has
implications for the convergence rate of the standard PageRank algorithm
as the web scales, for the stability of PageRank to perturbations to the
link structure of the web, for the detection of Google spammers, and for
the design of algorithms to speed up PageRank. But I’ll have to
read more to understand the Google spammers bit…
The
other members of my family don’t understand what I am trying to do the
last couple of days with all those ethernet-cables, airport-stations,
computer-books and the like. ‘Improving our network’ doesn’t make
much of an impression. To them, our network is fine as it is : from
every computer one has access to the internet and to the only
house-printer and that is what they want. To them, my
computer-phase is just an occupational therapy while recovering
from the flu. Probably they are right but I am obstinate in
experimenting to prove them wrong. Not that there is much hope,
searching the web for possible fun uses of home-networks does not give
that many interesting pages. A noteworthy exception is a series of four
articles by Alan Graham for the macdevcenter
on the homemade dot-mac with OS X-project.
In
the first article Homemade Dot-Mac with OS X he explains how to
set-up a house-network (I will give a detailed account of our
home-network shortly) and firing up your Apache webserver. One nice
feature I learned from this is to connect a computer by ethernet to the
router and via an Airport card to the network (you can force this by
specifying the order of active network ports in the
SystemPreferences/Network/Show Network port configuration-pane :
first Built-in Ethernet and second Airport). This way you
get a faster connection to the internet while still connecting to the
other computers on the network. In the second part he explains how to
get yourself a free domain name even if you have (as we do) a dynamic
IP-address via a service like DynDNS. Indeed it is quite easy to set this up but
so far I failed to reach my new DNS-server from outside the network,
probably because of bad port-mapping of my old isb2lan-router.
This afternoon I just lost two hours trying to fix this (so far :
failed) as I didn’t even know how to talk to my router as I lost the
manual which is no longer online. A few Google-searches further I
learned that i just had to type http://192.168.0.1 to get at the set-up pages
(there is even a hidden page) but you shouldnt try these links
unless you are connected to one of these routers. Maybe I will need
another look at this review.
In the second
article, Homemade Dot-Mac with OS X, Part 2 he discusses in
length setting up a firewall with BrickHouse (shareware costing $25) compared to the
built-in firewall-pane in SystemPreferences/Sharing convincing me
to stay with the built-in option. Further he explains what tools one can
use to set up a homepage (stressing the iPhoto-option).Finally, and this
is the most interesting part (though a bit obscure), he hints at the
possibility of setting up your own iDisk facility either using
FTP (insecure) or WebDAV.
The third article in the
series is Homemade Dot Mac: Home Web Radio in which he
claims that one can turn the standard OS X-Apache server into an iTunes
streaming server. He uses for this purpose the QuickTime Streaming Sever which you can get for
free from the Apple site but which I think works only when you have an
X-server. It seems that all nice features require an X-server so
maybe I should consider buying one…
The (so far)
final article is Six Great Tips for Homemade Dot Mac Servers is
really interesting and I will come back to most op these possibilities
when (if) I get them to work. The for me most promising options are :
the central file server (which he synchronizes using the
shareware-product ExecutiveSync ($15 for an academic license) but
I’m experimenting also a bit with the freeware Lacie-program Silverkeeper which seems to be doing roughly the
same things. The iTunes central-hack is next on my ToDo-list as
is (at a later stage) the WebDav and the Rendezvous-idea. So it seems
I’ll prolong my occupational therapy a while…
A
longer term project is to get the web-server www.matrix.ua.ac.be integrated in our home-network
as an external WebDAV-server (similar to the .Mac-service
offered by Apple). But as this server runs all information about the
master-class on non-comutative geometry connecting to it via HTTP to use
WebDAV is too great of a security risk as all username/password
combinations will be send without encryption. Hence the natural question
whether this server can be set up to run SSL (Secure Sockets
Layer) such that one can connect via HTTPS and all exchanged information
will be encrypted. As the server is an Apache it comes down to get
mod-ssl running. A Google on mod_ssl OS X gives the
ADC-document Using mod-ssl on Mac OS X which seems to be just
what I want. This page is very well documented giving detailed
instructions of using the openssl command. However, the
end-result is rather weak : it only makes the localhost running
HTTPS, that is, one can connect to your own computer safely… which is
pretty ridiculous (other computers in the same network cannot even
connect safely).
So, back to the Google-list on which
one link raises my interest Configuring mod-ssl on Mac OS X which looks like
the previous link but has one essential difference : the page is written
by Marc Liyanage. If you ever tried to get PHP and/or MySQL
running under OS X you will have noticed that his pages are by far the
most reliable on the subject, hence maybe he has also something
interesting to say on mod-ssl. However, the bottom line of the
document is not very promising :
You
should now be able to access the content with https://127.0.0.1 from
the same machine.
which is again the
localhost. So perhaps it is just impossible to run mod-ssl
without having an X-server. Anyway, let us try out his procedure.
Begin by issuing the following commands in the Terminal
sudo -s cd /etc/httpd mkdir ssl chmod 700 ssl cd ssl gzip -c --best /var/log/system.log > random.dat openssl rand -rand file:random.dat 0
Next, we need a server certificate. If you
want to do it properly you need a certificate from a certification
authority such as Thawte but this costs at least $200 a year which I
am not willing to pay. The alternative is to use a self-signed
certificate which will force the browser to display an error-message
but if the user dismisses it all traffic exchanged with the server will
still be encrypted which is just what I want. So, type the command
openssl req -keyout privkey-2001.pem -newkey rsa:1024 -nodes -x509 -days 365 -out cert-2001.pem
(all on one line).
You will be asked a couple of questions (the only important one is the
Common Name (eg, YOUR name). Here you should take care to enter
the host name of your web server exactly as it will be used later in the
common name field. In my test-case, if I want to get my server
used by other computers in the network this name will be
imaclieven.local. (note the trailing .). Now issue the following
commands
chmod 600 privkey-2001.pem chown root privkey-2001.pem apxs -e -a -n ssl /usr/libexec/httpd/libssl.so
which will activate the SSL-module (if at a later state you want
to de-activate it you have to change -a by -A in the last command).
Finally, we have to change the /etc/httpd/httpd.conf file so
first save a backup-version and then add the following lines at the end
of the file :
(IfModule mod-ssl.c) Listen 80 Listen 443 SSLCertificateFile /etc/httpd/ssl/cert-2001.pem SSLCertificateKeyFile /etc/httpd/ssl/privkey-2001.pem SSLRandomSeed startup builtin SSLRandomSeed connect builtin (VirtualHost -default- :443) SSLEngine on (/VirtualHost) (/IfModule)
Observe that round brackets ()
should be replaced by <>. Finally, we do
apachectl stop apachectl start
and we are done! Going to another computer
in the network and typing in Safari https://imaclieven.local./
will result in an error message
Just click Continue and you will have a secure connection
to the server. Thanks Marc Liyanage!
(Added january
11th) Whereas the above allows one to make a HTTPS connection it is not
enough for my intended purposes. In order to get a secure connection to
a WebDAV server, this server must have the mod-auth-digest module
running which seems to be impossible for the standard Apache server of
10.3. You need an X-server to have this facility. So I think I have to
scale down my ambitions a bit.