Extracted from Pike v7.3 release 14 at 2002-02-15.
pike.roxen.com
[Top]
Web
Web.Crawler

Module Web.Crawler

Description

This module implements a generic web crawler.

Features:

Fully asynchronous operation (Several hundred simultaneous requests)

Supports the /robots.txt exclusion standard

Extensible

URI Queues

Allow/Deny rules

Configurable

Number of concurrent fetchers

Bits per second (bandwidth throttling)

Number of concurrent fetchers per host

Delay between fetches from the same host

Supports HTTP and HTTPS