Abstract

Web information retrieval is significantly more challenging than traditional well-controlled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web's hyperlink structure. This structure has been exploited by several of today's leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web information retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.

MSC codes

  1. 65F15
  2. 65F10
  3. 68P20
  4. 15A18
  5. 15A51
  6. 65C40

MSC codes

  1. eigenvector
  2. Markov chain
  3. information retrieval
  4. HITS
  5. PageRank
  6. SALSA

Formats available

You can view the full content in the following formats:

References

1.
A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, PageRank computation and the structure of the Web: Experiments and algorithms, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
2.
A.‐L. Barabasi, Linked: The New Science of Networks, Perseus, New York, 2003.
3.
Michael Berry, Zlatko Drmač, Elizabeth Jessup, Matrices, vector spaces, and information retrieval, SIAM Rev., 41 (1999), 335–362
4.
M. W. Berry, P. Wang, and Y. Yang, Mining longitudinal Web queries: Trends and patterns, J. Amer. Soc. Inform. Sci. Tech., 54 (2003), pp. 743–758.
5.
K. Bharat and M. R. Henzinger, Improved algorithms for topic distillation in hyperlinked environments, in Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1998, pp. 104–111.
6.
K. Bharat, F. Maghoul, and R. Stata, The term vector database: Fast access to indexing terms for webpages, Computer Networks, 33 (2000), pp. 247–255.
7.
K. Bharat and G. A. Mihaila, When experts agree: Using non‐affiliated experts to rank popular topics, ACM Trans. Inform. Systems, 20 (2002), pp. 47–58.
8.
M. Bianchini, M. Gori, and F. Scarselli, PageRank: A circuital analysis, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
9.
M. Bianchini, M. Gori, and F. Scarselli, Inside PageRank, ACM Trans. Internet Tech., 5 (2005), to appear.
10.
N. Blachman, E. Fredricksen, and F. Schneider, How to Do Everything with Google, McGraw‐Hill, New York, 2003.
11.
P. Boldi, B. Codenotti, M. Santini, and S. Vigna, Structural properties of the African web, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
12.
A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002, pp. 415–429.
13.
S. Brin and L. Page, The anatomy of a large‐scale hypertextual Web search engine, Computer Networks and ISDN Systems, 33 (1998), pp. 107–117.
14.
S. Brin, L. Page, R. Motwami, and T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, Technical Report 1999‐0120, Computer Science Department, Stanford University, Stanford, CA, 1999.
15.
A. Broder, R. Kumar, and M. Maghoul, Graph structure in the Web, in Proceedings of the Ninth International World Wide Web Conference, ACM, New York, 2000, pp. 309–320.
16.
S. Chien, C. Dwork, R. Kumar, and D. Sivakumar, Towards exploiting link evolution, in Proceedings of the Workshop on Algorithms and Models for the Web Graph, 2001.
17.
H. Choi and D. B. Szyld, Application of threshold partitioning of sparse matrices to Markov chains, in Proceedings of the IEEE International Computer Performance and Dependability Symposium IPDS’96, 1996, pp. 158–165.
18.
D. Cohn and H. Chang, Learning to probabilistically identify authoritative documents, in Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, 2000, pp. 167–174.
19.
D. Cohn, H. Chang, and A. McCallum, Creating customized authority lists, in Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, 2000.
20.
D. Cohn and T. Hofmann, The missing link: A probabilistic model of document content and hyperlink connectivity, in Adv. in Neural Inform. Process. Systems 13, 2001.
21.
Tuğrul Dayar, William Stewart, Comparison of partitioning techniques for two‐level iterative solvers on large, sparse Markov chains, SIAM J. Sci. Comput., 21 (2000), 1691–1705, Iterative methods for solving systems of algebraic equations (Copper Mountain, CO, 1998)
22.
C. Ding, X. He, P. Husbands, H. Zha, and H. Simon, Link Analysis: Hubs and Authorities on the World Wide Web, Technical Report 47847, Lawrence Berkeley National Laboratory, Berkeley, CA, 2001.
23.
C. Ding, X. He, H. Zha, and H. Simon, PageRank, HITS and a unified framework for link analysis, in Proceedings of the 25th ACM SIGIR Conference, Tampere, Finland, 2002, pp. 353–354.
24.
S. T. Dumais, Improving the retrieval of information from external sources, Behavior Res. Methods Instruments and Computers, 23 (1991), pp. 229–236.
25.
M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power‐law relationships of the internet topology, in Proceedings of SIGCOMM ’99, 1999, pp. 251–262.
26.
A. Farahat, T. Lofaro, J. C. Miller, G. Rae, F. Schaefer, and L. A. Ward, Modifications of Kleinberg’s HITS algorithm using matrix exponentiation and web log records, in Proceedings of the ACM SIGIR Conference, 2001, pp. 444–445.
27.
A. Farahat, T. Lofaro, J. C. Miller, G. Rae, and L. A. Ward, Existence and uniqueness of ranking vectors for linear link analysis, SIAM J. Sci. Comput., submitted.
28.
T. H. Haveliwala, Efficient Computation of PageRank, Technical Report 1999‐31, Computer Science Department, Stanford University, Stanford, CA, 1999.
29.
T. H. Haveliwala, Topic‐sensitive PageRank, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
30.
T. H. Haveliwala and S. D. Kamvar, The Second Eigenvalue of the Google Matrix, Technical Report 2003‐20, Stanford University, Stanford, CA, 2003.
31.
M. R. Henzinger, H. Marais, M. Moricz, and C. Silverstein, Analysis of a Very Large AltaVista Query Log, Technical Report 1998‐014, Digital Systems Research Center, 1998.
32.
IBM, The Clever Project, IBM Almaden Research Center; available online at http://www.almaden.ibm.com/cs/k53/clever.html (accessed October 11, 2004).
33.
Elizabeth Jessup, Ilse Ipsen, Improving the accuracy of inverse iteration, SIAM J. Sci. Statist. Comput., 13 (1992), 550–572
34.
S. D. Kamvar, T. H. Haveliwala, and G. H. Golub, Adaptive Methods for the Computation of PageRank, Technical Report 2003‐26, Stanford University, Stanford, CA, 2003.
35.
S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub, Exploiting the Block Structure of the Web for Computing PageRank, Technical Report 2003‐17, Stanford University, Stanford, CA, 2003.
36.
S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub, Extrapolation methods for accelerating PageRank computations, in Proceedings of the Twelfth International World Wide Web Conference, ACM, New York, 2003.
37.
Jon Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, 46 (1999), 604–632
38.
A. Langville and C. Meyer, A reordering for the PageRank problem, SIAM J. Sci. Comput., submitted.
39.
A. N. Langville and C. D. Meyer, Updating the Stationary Vector of an Irreducible Markov Chain, Technical Report crsc02‐tr33, Mathematics Department, North Carolina State University, Raleigh, NC, 2002.
40.
A. N. Langville and C. D. Meyer, Deeper inside PageRank, Internet Math. J., to appear.
41.
C. P.‐C. Lee, G. H. Golub, and S. A. Zenios, A Fast Two‐Stage Algorithm for Computing PageRank and Its Extensions, Technical Report SCCM‐2003‐15, Scientific Computation and Computational Mathematics, Stanford University, Stanford, CA, 2003.
42.
R. Lempel and S. Moran, The stochastic approach for link‐structure analysis (SALSA) and the TKC effect, in Proceedings of the Ninth International World Wide Web Conference, ACM, New York, 2000.
43.
M. Marchiori, The quest for correct information of the web: Hyper search engines, in Proceedings of the Sixth International World Wide Web Conference, ACM, New York, 1997.
44.
A. O. Mendelzon and D. Rafiei, Topic: Measuring Webpage Reputation; available online at http:// www.cs.toronto.edu/db/topic/about.html (accessed September 19, 2002).
45.
A. O. Mendelzon and D. Rafiei, An autonomous page ranking method for metasearch engines, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
46.
Carl Meyer, Matrix analysis and applied linear algebra, Society for Industrial and Applied Mathematics (SIAM), 2000xii+718, With 1 CD‐ROM (Windows, Macintosh and UNIX) and a solutions manual (iv+171 pp.)
47.
V. Migallon, J. Penades, and D. B. Szyld, Experimental study of parallel iterative solutions of Markov chains with block partitions, in Numerical Solutions of Markov Chains (NSMC’99), B. Plateau, W. J. Stewart, and M. Silva, eds., Prensas Universitarias de Zaragoza, 1999, pp. 96–110.
48.
C. Moler, The world’s largest matrix computation, Matlab News and Notes, October 2002, pp. 12–13.
49.
A. Y. Ng, A. X. Zheng, and M. I. Jordan, Link analysis, eigenvectors and stability, in Proceedings of the Seventh International Joint Conference on Artificial Intelligence, 2001.
50.
A. Y. Ng, A. X. Zheng, and M. I. Jordan, Stable algorithms for link analysis, in Proceedings of the 24th Annual International ACM SIGIR Conference, 2001.
51.
Gopal Pandurangan, Prabhakar Raghavan, Eli Upfal, Using PageRank to characterize Web structure, Lecture Notes in Comput. Sci., Vol. 2387, Springer, Berlin, 2002, 330–339
52.
J. U. Quevedo, A. G. Covarrubias, and S. H. Huang, Improving retrieval by querying and examining prestige, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
53.
D. Rafiei and A. O. Mendelzon, What is this page known for? Computing webpage reputations, in Proceedings of the Ninth International World Wide Web Conference, ACM, New York, 2000, pp. 823–835.
54.
M. Richardson and P. Domingos, The intelligent surfer: Probabilistic combination of link and content information in PageRank, in Adv. in Neural Inform. Process. Systems 14, 2002, pp. 1441–1448.
55.
C. Ridings, PageRank Explained: Everything You’ve Always Wanted to Know about PageRank; available online at http://www.rankwrite.com/ (accessed May 22, 2002).
56.
M. Sobek, Google Dance: The Index Update of the Google Search Engine, eFactory: Internet‐Agentur KG, http://dance.efactory.de/ (accessed October 11, 2004).
57.
William Stewart, Introduction to the numerical solution of Markov chains, Princeton University Press, 1994xx+539
58.
W. J. Stewart and W. Wu, Numerical experiments with iteration and aggregation for Markov chains, ORSA J. Comput., 4 (1992), pp. 336–350.
59.
J. A. Tomlin, A new paradigm for ranking pages on the World Wide Web, in Proceedings of the Twelfth International World Wide Web Conference, ACM, New York, 2003.
60.
M. Totty and M. Mangalindan, As Google becomes Web’s gatekeeper, sites fight to get in, Wall Street Journal, February 26, 2003.
61.
A. C. Tsoi, G. Morini, F. Scarselli, and M. Hagenbuchner, Adaptive ranking of webpages, in Proceedings of the Twelfth International World Wide Web Conference, ACM, New York, 2003.
62.
D. Zhang and Y. Dong, An efficient algorithm to rank web resources, Computer Networks, 33 (2000), pp. 449–455.
63.
X. Zhang, M. W. Berry, and P. Raghavan, Level search schemes for information filtering and retrieval, Inform. Process. Management, 37 (2001), pp. 313–334.

Information & Authors

Information

Published In

cover image SIAM Review
SIAM Review
Pages: 135 - 161
ISSN (online): 1095-7200

History

Published online: 4 August 2006

MSC codes

  1. 65F15
  2. 65F10
  3. 68P20
  4. 15A18
  5. 15A51
  6. 65C40

MSC codes

  1. eigenvector
  2. Markov chain
  3. information retrieval
  4. HITS
  5. PageRank
  6. SALSA

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

View options

PDF

View PDF

Get Access

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media