on March 31, 2004 by lieven in general, Comments (0)

robots.txt

I just finished the formal lecture-part of the course Projects in non-commutative geometry (btw. I am completely exhausted after this afternoon\’s session but hopeful that some students actually may do something with my crazy ideas), springtime seems to have arrived and next week the easter-vacation starts so it may be time to have some fun like making a new webpage (yes, again…). At the moment the main matrix.ua.ac.be page is not really up to standards and Raf and Hans will be using it soon for the information about the Liegrits-project (at the moment they just have a beautiful logo). My aim is to make the main page to be the starting page of the geoMetry site (guess what M stands for ?) on which I want to collect as much information as possible on non-commutative geometry. To get at that info I plan to set some spiders or bots or scrapers loose on the web (this is just an excuse to force myself to learn Perl). But it seems one has to follow strict ethical guidelines in doing so. One of the first sites I want to spider is clearly the arXiv but they have a scary Robots Beware page! I don\’t know whether their robots.txt file will allow me to get at any of their goodies. In a robots.txt file the webmaster can put the directories on his/her site which are off limits to robots and as I don\’t want to do anything that may cause that the arXiv is no longer available to me (or even worse, to the whole department) I better follow these guidelines. First site on my list to study tomorrow will be The Web Robots Pages

No Comments

Leave a comment

XHTML: Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>