WebBase Crawl Streaming Setup Page
More information, such as site counts, is available on our homepage:
WebVac.org
.
Click here for a list of crawls
To help us show our funders that this facility is useful...
Your name (optional):
...School or Affiliation:
Constraints:
( You will pick a particular crawl to filter with these constraints on the next page )
Download up to
pages...(ALL for all of them)
Only Sitename:
( e.g. www.google.com )
OR
Starting with Site:
... and Ending with Site:
Note: No URL choices allowed for 2001 & 2002
removeJavaScript
removeHTML
Limit content to PageURL+...
Links
LinksnAnchorText
............HTML tag(e.g. img ) :
And
Or -- Page Filter Terms:
-- Page Filter NOT Terms (used after and/or):
Notes:
The Javascript and html stripping happens after keyword filters.
For phrases, single quote the phrase....... For wildcard endings, use an * at the end: e.g. horse*
URL subtree(s) within the Site chosen above( e.g www.cisco.com/security when you have chosen Only Sitename www.cisco.com):
Please direct bugs, questions, feature requests or comments to the
SpiderMaster
.