| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
Generator.next() |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
Injector.inject(CrawlDatum datum) |
void |
DBManager.inject(CrawlDatum datum) |
abstract void |
DBManager.inject(CrawlDatum datum,
boolean force) |
void |
SegmentWriter.writeFetchSegment(CrawlDatum fetchDatum) |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
Crawler.addSeed(CrawlDatum datum)
等同于 addSeed(datum, false)
|
void |
Crawler.addSeed(CrawlDatum datum,
boolean force)
添加种子任务
|
void |
AutoParseCrawler.execute(CrawlDatum datum,
CrawlDatums next) |
HttpResponse |
AutoParseCrawler.getResponse(CrawlDatum crawlDatum) |
| 限定符和类型 | 方法和说明 |
|---|---|
HttpResponse |
DemoPostCrawler.getResponse(CrawlDatum crawlDatum) |
| 限定符和类型 | 字段和说明 |
|---|---|
CrawlDatum |
Fetcher.FetchItem.datum |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
NextFilter.filter(CrawlDatum nextItem,
CrawlDatum referer)
if the crawler visit http://a.com/ and detect http://a.com/b.html
then nextItem = http://a.com/b.html and referer = http://a.com/
if you want to filter nextItem, return null
else you should return nextItem
|
| 限定符和类型 | 方法和说明 |
|---|---|
void |
Executor.execute(CrawlDatum datum,
CrawlDatums next) |
CrawlDatum |
NextFilter.filter(CrawlDatum nextItem,
CrawlDatum referer)
if the crawler visit http://a.com/ and detect http://a.com/b.html
then nextItem = http://a.com/b.html and referer = http://a.com/
if you want to filter nextItem, return null
else you should return nextItem
|
| 构造器和说明 |
|---|
FetchItem(CrawlDatum datum) |
| 限定符和类型 | 字段和说明 |
|---|---|
protected ArrayList<CrawlDatum> |
CrawlDatums.dataList |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
Page.crawlDatum() |
CrawlDatum |
CrawlDatums.get(int index) |
CrawlDatum |
Page.getCrawlDatum()
已过时。
|
CrawlDatum |
CrawlDatum.key(String key) |
CrawlDatum |
CrawlDatum.meta(String key,
String value) |
CrawlDatum |
CrawlDatum.putMetaData(String key,
String value)
已过时。
|
CrawlDatum |
CrawlDatums.remove(int index) |
CrawlDatum |
CrawlDatum.setKey(String key)
已过时。
使用key(String key)代替
|
CrawlDatum |
CrawlDatum.setUrl(String url)
已过时。
使用url(String url)代替
|
CrawlDatum |
CrawlDatum.type(String type) |
CrawlDatum |
CrawlDatum.url(String url) |
| 限定符和类型 | 方法和说明 |
|---|---|
Iterator<CrawlDatum> |
CrawlDatums.iterator() |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatums |
CrawlDatums.add(CrawlDatum datum) |
void |
Page.crawlDatum(CrawlDatum crawlDatum) |
int |
CrawlDatums.indexOf(CrawlDatum datum) |
boolean |
CrawlDatums.remove(CrawlDatum datum) |
void |
Page.setCrawlDatum(CrawlDatum crawlDatum)
已过时。
|
| 构造器和说明 |
|---|
Page(CrawlDatum datum,
HttpResponse response) |
| 构造器和说明 |
|---|
CrawlDatums(Collection<CrawlDatum> datums) |
| 限定符和类型 | 字段和说明 |
|---|---|
protected CrawlDatum |
HttpRequest.crawlDatum |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
HttpRequest.getCrawlDatum() |
| 限定符和类型 | 方法和说明 |
|---|---|
HttpResponse |
Requester.getResponse(CrawlDatum crawlDatum) |
void |
HttpRequest.setCrawlDatum(CrawlDatum crawlDatum) |
| 构造器和说明 |
|---|
HttpRequest(CrawlDatum crawlDatum) |
HttpRequest(CrawlDatum crawlDatum,
Proxy proxy) |
| 限定符和类型 | 方法和说明 |
|---|---|
static CrawlDatum |
BerkeleyDBUtils.createCrawlDatum(com.sleepycat.je.DatabaseEntry key,
com.sleepycat.je.DatabaseEntry value) |
CrawlDatum |
BerkeleyGenerator.next() |
CrawlDatum |
BerkeleyDBReader.next() |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
BerkeleyDBManager.inject(CrawlDatum datum,
boolean force) |
static void |
BerkeleyDBUtils.writeDatum(com.sleepycat.je.Database database,
CrawlDatum datum) |
void |
BerkeleyDBManager.writeFetchSegment(CrawlDatum fetchDatum) |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
HashSetNextFilter.filter(CrawlDatum nextItem,
CrawlDatum referer) |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
HashSetNextFilter.filter(CrawlDatum nextItem,
CrawlDatum referer) |
| 限定符和类型 | 字段和说明 |
|---|---|
protected HashMap<String,CrawlDatum> |
RamDB.crawlDB |
protected HashMap<String,CrawlDatum> |
RamDB.fetchDB |
protected HashMap<String,CrawlDatum> |
RamDB.linkDB |
| 限定符和类型 | 方法和说明 |
|---|---|
CrawlDatum |
RamGenerator.next() |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
RamDBManager.inject(CrawlDatum datum,
boolean force) |
void |
RamDBManager.writeFetchSegment(CrawlDatum fetchDatum) |
| 限定符和类型 | 方法和说明 |
|---|---|
static CrawlDatum |
CrawlDatumFormater.jsonStrToDatum(String crawlDatumKey,
String str) |
| 限定符和类型 | 方法和说明 |
|---|---|
static String |
CrawlDatumFormater.datumToJsonStr(CrawlDatum datum) |
static String |
CrawlDatumFormater.datumToString(CrawlDatum datum) |
Copyright © 2017. All Rights Reserved.