public class DemoPostCrawler extends BreadthCrawler
autoParse, LOG, parseImg, regexRule, requester, visitordbManager, executeInterval, executor, fetcher, forcedSeeds, maxExecuteCount, nextFilter, resumable, RUNNING, seeds, status, STOPED, threads, topN| 构造器和说明 |
|---|
DemoPostCrawler(String crawlPath,
boolean autoParse)
假设我们要爬取三个链接 1)http://www.A.com/index.php 需要POST,并且需要附带数据id=a
2)http://www.B.com/index.php?
|
| 限定符和类型 | 方法和说明 |
|---|---|
HttpResponse |
getResponse(CrawlDatum crawlDatum) |
static void |
main(String[] args) |
void |
visit(Page page,
CrawlDatums next) |
addRegex, afterParse, execute, getRegexRule, getRequester, getVisitor, isAutoParse, isParseImg, parseLink, setAutoParse, setParseImg, setRegexRule, setRequester, setVisitoraddSeed, addSeed, addSeed, addSeed, addSeed, addSeed, addSeed, addSeed, addSeed, addSeed, addSeed, addSeed, getDBManager, getExecuteInterval, getExecutor, getMaxExecuteCount, getNextFilter, getThreads, getTopN, inject, injectForcedSeeds, isResumable, setDBManager, setExecuteInterval, setExecutor, setMaxExecuteCount, setNextFilter, setResumable, setThreads, setTopN, start, stop, toStringpublic DemoPostCrawler(String crawlPath, boolean autoParse)
public HttpResponse getResponse(CrawlDatum crawlDatum) throws Exception
getResponse 在接口中 RequestergetResponse 在类中 AutoParseCrawlerExceptionpublic void visit(Page page, CrawlDatums next)
Copyright © 2017. All Rights Reserved.