Mercator: A Scalable, Extensible Web Crawler
This paper describes Mercator, a scalable, extensible web crawler written entirely in Java. Scalable web crawlers are an important component of many web services, but their design is not...
View ArticleIRLbot: Scaling to 6 Billion Pages and Beyond
Abstract: This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with the...
View ArticleUbiCrawler: A Scalable Fully Distributed Web Crawler
We report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The main features of UbiCrawler are platform independence, linear...
View Article