Quantcast
Channel: Systems We Make » crawler | Systems We Make
Viewing all articles
Browse latest Browse all 3

Mercator: A Scalable, Extensible Web Crawler

$
0
0

This paper describes Mercator, a scalable, extensible web crawler written entirely in Java. Scalable web crawlers are an important component of many web services, but their design is not well-documented in the literature. We enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercator’s support for extensibility and customizability. Finally, we comment on Mercator’s performance, which we have found to be comparable to that of other crawlers for which performance numbers have been published.

More here


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images