Skip to content →

google spammers

In the GoogleMatrix I tried to understand the concept
of the PageRank algorithm that Google uses to list pages according to
their \’importance\’. So, if you want your webpage to come out first in
a certain search, you have to increase your PageRank-value (which
normally is a measure of webpages linking to your page) artificially. A
method to achieve this is by link spamming, that is if page A is
to webpage of which you want to increase the PageRank value, take a page
B (either under your control or that of a friend webmaster) and add a
dummy link page B -> page A. To find out the effect of this on the
PageRank and how the second eigenvalue of the GoogleMatrix is able to
detect such constructs let us set up a micro-web consisting of
just 3 pages with links 1->2 and 1->3. The corresponding GoogleMatrix
(with c=0.85 and v=(1/3,1/3,1/3) is

1/3   1/20   1/20 1/3   9/10 
 1/20 1/3   1/20   9/10

which has eigenvalues 1,0.85 and 0.28.
The eigenvector with eigenvalue 1 (the PageRank) is equal to (0.15,1,1)
so page 2 and page 3 are equally important to Google and if we scale
PageRank such that it adds up to 100% over all pages, the relative
importance values are 6,9%,46,5% and 46,5%. In this case the eigenvector
corresponding to the second eigenvalue 0.85 is (0,-1,1) and hence
detects the two leaf-nodes. Now, assume the owner of page 2 sets up a
link spam by creating page 4 and linking 4->3, then the corresponding
GoogleMatrix (with v=(1/4,1/4,1/4,1/4)) is

77/240   3/80   3/80  
3/80 77/240   71/80   3/80   37/80 77/240   3/80   71/80  
3/80  3/80   3/80   3/80   37/80

which has eigenvalues
1,0.85,0.425 and 0.283. The PageRank eigenvector with eigenvalue 1 is
in this case is (0.8,8.18,5.35,1) or in relative importance % we have
(4.9%,50.1%,32.7%,6.1%) and we see that the spammer achieved his/her
goal. The eigenvector corresponding to the second eigenvalue is
(0,-1,1,0) which again gives the leaf-nodes and the eigenvector of the
third eigenvalue is (0,-1,0,1) and detects the spam-construct.

Published in web


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.