Page Rank Algorithm
summer project pal Active In SP Posts: 308 Joined: Jan 2011 
23012011, 09:36 PM
Page Rank Algorithm
B.Tech Seminar report by Warrier Syam Sankar Department of Computer Science And Engineering Government Engineering College, Thrissur December 2010 report: Page Rank Algorithm.pdf (Size: 130.14 KB / Downloads: 308) Contents 1 Introduction 1 1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization Of the Report . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background 3 2.1 Directed Web Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 Understanding the PageRank algorithm 4 4 Implementation 7 4.1 Dangling node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Dangling node fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3 Damping factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5 Page Rank Calculation 10 5.1 Iterative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Algebraic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 Advantages and Disadvantages 12 6.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7 Further Readings 14 8 Conclusion 15 References 16 Abstract The World Wide Web consists of more than 4 billion pages currently and is expand ing at a breathtaking pace. How does a search engine, which is fast becoming the centre of internet space, scan such a large number of different type of documents in time intervals as short as 0.07 seconds and still provide accurate results. PageRank algorithm is the secret to the success of several popular search engines like Google. PageRank algorithm is the backbone to understanding various advanced algorithms used by current search engines which use more sophisticated methods for providing better security, stability and nonmanuipulation of results. Chapter 1 Introduction There are approximately 1.9 billion Internet users everyday. Were would a user look upon when he needs to search for something on the internet? Search engines are the answer. Each user must be using search engine at least ones in a single Internet session or some maybe having search engine webpages as their home page. There are a number of search engines available on the Web like Google,Yahoo, MSN, etc. But the results provided by these Web engines are quite similar to each other. So its natural to ask what would be the searching algorithm used in a search engine which would be suitable for such a dynamic and expanding application such as the Web.? The answer is PageRank algorithm. 1.1 History The PageRank algorithm was developed at Stanford University by Larry Page. The project and implimentation initially began as a Web project and implimentation named BackRub. It utilized the link struc ture of the Web. Sergey Brin who visited the University for a recruitment drive got interested in Pages work and accompanied him on his project and implimentation. Soon they began re alizing that they were actually making the backbone of a strong search engine. They tried to convince the then existing search engine companies about the effectiveness of their algorithm. But finally unable to convince the other companies, they started a company named Google Inc. in September 1998 which used the PageRank algorithm as the basis for a search engine. Since then Google has grown to take about 40% of the total global searches. PageRank algorithm has also been developed to higher grades along with betterment on research works of Google. 1.2 Definition PageRank algorithm can be defined as a link based probabilistic algorithm which as signs a numerical value to the each element of a hyperlinket set of documents as in World Wide Web with a purpose of measuring the relative importance of each element in the set. Link based algorithms are algorithms are algorithms which depend on the rela tionship among objects in a set and not on the content of the elements within the set. Probabilistic algorithms are algorithms which return the most probable value of the result for the same set of inputs. PageRank algorithm has graph theory as its backend for theoretical formulation and matrix theory as its front end for its implementation. 1.3 Organization Of the Report 1. Chapter 2 introduces web surfer and directed web graph. 2. Chapter 3 delves into understanding the PageRank Algorithm. 3. Chapter 4 describes the implementation of Page Rank Algorithm. 4. Chapter 5 describes the Page Rank Calculation. 5. Chapter 6 briefs about the Advantages and disadvantages of Page Rank Algorithm. Chapter 2 Background PageRank algorithm models the behavior of a idealized random Web surfer. An idealized random Web surfer is a hypothetical user who chooses a webpage at random from the available set of webpages. Then he continues to select a random link on that webpage and surfs to that webpage. This process is repeated indefinitely. The choice of which webpage to visit next does not depend on the previous webpage. A random Web surfer does not get tired of visiting webpages. Thus the PageRank value of a webpage represents the probability that the reaches a given webpage after many clicks. 2.1 Directed Web Graph In order to model the behavior of a random Web surfer the PageRank algorithm models the entire World Wide Web to a graph. The modeling of the Web to graph is done as follows: • Each webpage is represented as a node in the graph • Each hyperlink form one webpage to other is represented by a directed edge from the node representing the prior webpage to the node representing the later webpage. The graph structure so obtained is called as the directed Web Graph. 


