on January 18, 2004 by lieven in general, Comments (0)

google spammers


In the GoogleMatrix I tried to understand the concept of the PageRank algorithm that Google uses to list pages according to their \’importance\’. So, if you want your webpage to come out first in a certain search, you have to increase your PageRank-value (which normally is a measure of webpages linking to your page) artificially. A method to achieve this is by link spamming, that is if page A is to webpage of which you want to increase the PageRank value, take a page B (either under your control or that of a friend webmaster) and add a dummy link page B -> page A. To find out the effect of this on the PageRank and how the second eigenvalue of the GoogleMatrix is able to detect such constructs let us set up a micro-web consisting of just 3 pages with links 1->2 and 1->3. The corresponding GoogleMatrix (with c=0.85 and v=(1/3,1/3,1/3) is

1/3   1/20   1/20 1/3   9/10 
 1/20 1/3   1/20   9/10
which has eigenvalues 1,0.85 and 0.28. The eigenvector with eigenvalue 1 (the PageRank) is equal to (0.15,1,1) so page 2 and page 3 are equally important to Google and if we scale PageRank such that it adds up to 100% over all pages, the relative importance values are 6,9%,46,5% and 46,5%. In this case the eigenvector corresponding to the second eigenvalue 0.85 is (0,-1,1) and hence detects the two leaf-nodes. Now, assume the owner of page 2 sets up a link spam by creating page 4 and linking 4->3, then the corresponding GoogleMatrix (with v=(1/4,1/4,1/4,1/4)) is
77/240   3/80   3/80
3/80 77/240 71/80 3/80 37/80 77/240 3/80 71/80
3/80 3/80 3/80 3/80 37/80
which has eigenvalues 1,0.85,0.425 and 0.283. The PageRank eigenvector with eigenvalue 1 is in this case is (0.8,8.18,5.35,1) or in relative importance % we have (4.9%,50.1%,32.7%,6.1%) and we see that the spammer achieved his/her goal. The eigenvector corresponding to the second eigenvalue is (0,-1,1,0) which again gives the leaf-nodes and the eigenvector of the third eigenvalue is (0,-1,0,1) and detects the spam-construct.

No Comments

Leave a comment

XHTML: Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>