Skip navigation links
A B C D E F G H I J K L M N O P R S T U V 

A

addPageModel(PageModelPipeline, Class...) - Method in class us.codecraft.webmagic.model.OOSpider
 
addSubPageProcessor(SubPageProcessor) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
addSubPipeline(SubPipeline) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
AfterExtractor - Interface in us.codecraft.webmagic.model
Interface to be implemented by page models that need to do something after fields are extracted.
afterProcess(Page) - Method in interface us.codecraft.webmagic.model.AfterExtractor
 
AppStore - Class in us.codecraft.webmagic.example
 
AppStore() - Constructor for class us.codecraft.webmagic.example.AppStore
 

B

BaiduBaike - Class in us.codecraft.webmagic.example
 
BaiduBaike() - Constructor for class us.codecraft.webmagic.example.BaiduBaike
 
BasicTypeFormatter<T> - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
BasicTypeFormatter.BooleanFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.ByteFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.CharactorFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.DoubleFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.FloatFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.IntegerFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.LongFormatter - Class in us.codecraft.webmagic.model.formatter
 
BasicTypeFormatter.ShortFormatter - Class in us.codecraft.webmagic.model.formatter
 
basicTypeFormatters - Static variable in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
BloomFilterDuplicateRemover - Class in us.codecraft.webmagic.scheduler
BloomFilterDuplicateRemover for huge number of urls.
BloomFilterDuplicateRemover(int) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
BloomFilterDuplicateRemover(int, double) - Constructor for class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
BooleanFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
build() - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
ByteFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 

C

CharactorFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
ClassUtils - Class in us.codecraft.webmagic.utils
 
ClassUtils() - Constructor for class us.codecraft.webmagic.utils.ClassUtils
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
clazz() - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
clazz() - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
close() - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
CollectorPageModelPipeline<T> - Class in us.codecraft.webmagic.pipeline
 
CollectorPageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
combine(MultiPageModel) - Method in interface us.codecraft.webmagic.MultiPageModel
Combine multiPageModels to a whole object.
ComboExtract - Annotation Type in us.codecraft.webmagic.model.annotation
Combo 'ExtractBy' extractor with and/or operator.
ComboExtract.Op - Enum in us.codecraft.webmagic.model.annotation
 
ComboExtract.Source - Enum in us.codecraft.webmagic.model.annotation
types of source for extracting.
CompositePageProcessor - Class in us.codecraft.webmagic.handler
 
CompositePageProcessor(Site) - Constructor for class us.codecraft.webmagic.handler.CompositePageProcessor
 
CompositePipeline - Class in us.codecraft.webmagic.handler
 
CompositePipeline() - Constructor for class us.codecraft.webmagic.handler.CompositePipeline
 
ConfigurablePageProcessor - Class in us.codecraft.webmagic.configurable
 
ConfigurablePageProcessor(Site, List<ExtractRule>) - Constructor for class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
ConsolePageModelPipeline - Class in us.codecraft.webmagic.model
Print page model in console.
Usually used in test.
ConsolePageModelPipeline() - Constructor for class us.codecraft.webmagic.model.ConsolePageModelPipeline
 
create(Site, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
 
create(Site, PageModelPipeline, Class...) - Static method in class us.codecraft.webmagic.model.OOSpider
 

D

DateFormatter - Class in us.codecraft.webmagic.model.formatter
 
DateFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.DateFormatter
 
DEFAULT_CLAZZ - Static variable in class us.codecraft.webmagic.utils.MultiKeyMapBase
 
DEFAULT_FORMATTER - Static variable in annotation type us.codecraft.webmagic.model.annotation.Formatter
 
DEFAULT_PATTERN - Static variable in class us.codecraft.webmagic.model.formatter.DateFormatter
 
detectBasicClass(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
DoubleFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
DoubleKeyMap<K1,K2,V> - Class in us.codecraft.webmagic.utils
 
DoubleKeyMap() - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Map<K1, Map<K2, V>>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
 
DoubleKeyMap(Map<K1, Map<K2, V>>, Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.DoubleKeyMap
init map with protoMapClass
download(Request, Task) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 

E

ExpressionType - Enum in us.codecraft.webmagic.configurable
 
ExtractBy - Annotation Type in us.codecraft.webmagic.model.annotation
Define the extractor for field or class.
ExtractBy.Source - Enum in us.codecraft.webmagic.model.annotation
types of source for extracting.
ExtractBy.Type - Enum in us.codecraft.webmagic.model.annotation
types of extractor expressions
ExtractByUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define a extractor to extract data in url of current page.
ExtractorUtils - Class in us.codecraft.webmagic.utils
Tools for annotation converting.
ExtractorUtils() - Constructor for class us.codecraft.webmagic.utils.ExtractorUtils
 
ExtractRule - Class in us.codecraft.webmagic.configurable
 
ExtractRule() - Constructor for class us.codecraft.webmagic.configurable.ExtractRule
 

F

FileCacheQueueScheduler - Class in us.codecraft.webmagic.scheduler
Store urls and cursor in files so that a Spider can resume the status when shutdown.
FileCacheQueueScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
FilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
Store results objects (page models) to files in plain format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.
FilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
FilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.FilePageModelPipeline
 
FloatFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
format(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
format(String) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
format(String) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
Formatter - Annotation Type in us.codecraft.webmagic.model.annotation
Define how the result string is convert to an object for field.
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.BooleanFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ByteFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.CharactorFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.DoubleFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.FloatFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 
formatTrimmed(String) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
from(String) - Static method in class us.codecraft.webmagic.utils.RequestUtils
 

G

get(Class<?>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
get(Page) - Method in class us.codecraft.webmagic.model.PageMapper
 
get(String, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(Request, Class<T>) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(String) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(Request) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
get(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
get(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
getAll(Page) - Method in class us.codecraft.webmagic.model.PageMapper
 
getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getAuthor() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getCollected() - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
getCollectorPipeline() - Method in class us.codecraft.webmagic.model.OOSpider
 
getContent() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getDate() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getDescription() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
getErrorCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getErrorPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getErrorPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getErrorPages() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getErrorPages() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getErrorUrls() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getExpressionParams() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExpressionType() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getExpressionValue() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getFieldName() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getFieldsIncludeSuperClass(Class) - Static method in class us.codecraft.webmagic.utils.ClassUtils
 
getFirstNoLoopbackIPAddresses() - Static method in class us.codecraft.webmagic.utils.IPUtils
 
getFork() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getFork() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getItemKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getLanguage() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getLeftPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getLeftPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
getLeftRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getName() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
getName() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getName() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getName() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getName() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getOtherPages() - Method in interface us.codecraft.webmagic.MultiPageModel
other pages to be extracted.
It is used to judge whether an object contains more than one page, and whether the pages of the object are all extracted.
getPage(Request) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
getPage() - Method in interface us.codecraft.webmagic.MultiPageModel
page is the identifier of a page in pages for one object.
getPageKey() - Method in interface us.codecraft.webmagic.MultiPageModel
Page key is the identifier for the object.
getPagePerSecond() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getPagePerSecond() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getQueueKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getReadme() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getRetryNum() - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
getSelector() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
getSelector(ExtractBy) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
 
getSelectors(ExtractBy[]) - Static method in class us.codecraft.webmagic.utils.ExtractorUtils
 
getSetKey(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getSite() - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
getSite() - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
getSite() - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
getSpiderStatusMBean(Spider, SpiderMonitor.MonitorSpiderListener) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
getStar() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getStar() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getStartTime() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getStartTime() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getStatus() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getStatus() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getSuccessCount() - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
getSuccessPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getSuccessPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTags() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getThread() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getThread() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTitle() - Method in class us.codecraft.webmagic.example.OschinaBlog
 
getTotalPageCount() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
getTotalPageCount() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
getTotalRequestsCount(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
getUrl() - Method in class us.codecraft.webmagic.example.GithubRepo
 
getUrl() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
getUrl(Request) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
GithubRepo - Class in us.codecraft.webmagic.example
 
GithubRepo() - Constructor for class us.codecraft.webmagic.example.GithubRepo
 
GithubRepoApi - Class in us.codecraft.webmagic.example
 
GithubRepoApi() - Constructor for class us.codecraft.webmagic.example.GithubRepoApi
 
GithubRepoPageMapper - Class in us.codecraft.webmagic.example
 
GithubRepoPageMapper() - Constructor for class us.codecraft.webmagic.example.GithubRepoPageMapper
 

H

HasKey - Interface in us.codecraft.webmagic.model
Interface to be implemented by page mode.
Can be used to identify a page model, or be used as name of file storing the object.
HelpUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define the 'help' url patterns for class.

I

initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.BasicTypeFormatter
 
initParam(String[]) - Method in class us.codecraft.webmagic.model.formatter.DateFormatter
 
initParam(String[]) - Method in interface us.codecraft.webmagic.model.formatter.ObjectFormatter
 
instance() - Static method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
IntegerFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.IntegerFormatter
 
IPUtils - Class in us.codecraft.webmagic.utils
 
IPUtils() - Constructor for class us.codecraft.webmagic.utils.IPUtils
 
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
isDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
isMulti() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
isNotNull() - Method in class us.codecraft.webmagic.configurable.ExtractRule
 

J

JsonFilePageModelPipeline - Class in us.codecraft.webmagic.pipeline
Store results objects (page models) to files in JSON format.
Use model.getKey() as file name if the model implements HasKey.
Otherwise use SHA1 as file name.
JsonFilePageModelPipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
JsonFilePageModelPipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
 
JsonFilePipeline - Class in us.codecraft.webmagic.pipeline
Store results to files in JSON format.
JsonFilePipeline() - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
new JsonFilePageModelPipeline with default path "/data/webmagic/"
JsonFilePipeline(String) - Constructor for class us.codecraft.webmagic.pipeline.JsonFilePipeline
 

K

key() - Method in class us.codecraft.webmagic.example.GithubRepo
 
key() - Method in class us.codecraft.webmagic.example.GithubRepoApi
 
key() - Method in interface us.codecraft.webmagic.model.HasKey
 

L

logger - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
LongFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.LongFormatter
 

M

main(String[]) - Static method in class us.codecraft.webmagic.example.AppStore
 
main(String[]) - Static method in class us.codecraft.webmagic.example.BaiduBaike
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepo
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoApi
 
main(String[]) - Static method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
main(String[]) - Static method in class us.codecraft.webmagic.example.MonitorExample
 
main(String[]) - Static method in class us.codecraft.webmagic.example.OschinaBlog
 
main(String...) - Static method in class us.codecraft.webmagic.example.PatternProcessorExample
 
match(Request) - Method in class us.codecraft.webmagic.handler.PatternRequestMatcher
 
match(Request) - Method in interface us.codecraft.webmagic.handler.RequestMatcher
Check whether to process the page.

Please DO NOT change page status in this method.
MonitorExample - Class in us.codecraft.webmagic.example
 
MonitorExample() - Constructor for class us.codecraft.webmagic.example.MonitorExample
 
MonitorSpiderListener() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
monitorSpiderListener - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
MultiKeyMapBase - Class in us.codecraft.webmagic.utils
multi-key map, some basic objects *
MultiKeyMapBase() - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
 
MultiKeyMapBase(Class<? extends Map>) - Constructor for class us.codecraft.webmagic.utils.MultiKeyMapBase
 
MultiPageModel - Interface in us.codecraft.webmagic
Extract an object of more than one pages, such as news and articles.
MultiPagePipeline - Class in us.codecraft.webmagic.pipeline
A pipeline combines the result in more than one page together.
Used for news and articles containing more than one web page.
MultiPagePipeline() - Constructor for class us.codecraft.webmagic.pipeline.MultiPagePipeline
 

N

newMap() - Method in class us.codecraft.webmagic.utils.MultiKeyMapBase
 

O

ObjectFormatter<T> - Interface in us.codecraft.webmagic.model.formatter
 
ObjectFormatterBuilder - Class in us.codecraft.webmagic.model.formatter
 
ObjectFormatterBuilder() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
ObjectFormatters - Class in us.codecraft.webmagic.model.formatter
 
ObjectFormatters() - Constructor for class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
onError(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
onSuccess(Request) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor.MonitorSpiderListener
 
OOSpider<T> - Class in us.codecraft.webmagic.model
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".
OOSpider(ModelPageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
 
OOSpider(PageProcessor) - Constructor for class us.codecraft.webmagic.model.OOSpider
 
OOSpider(Site, PageModelPipeline, Class...) - Constructor for class us.codecraft.webmagic.model.OOSpider
create a spider
OschinaBlog - Class in us.codecraft.webmagic.example
 
OschinaBlog() - Constructor for class us.codecraft.webmagic.example.OschinaBlog
 

P

PageMapper<T> - Class in us.codecraft.webmagic.model
 
PageMapper(Class<T>) - Constructor for class us.codecraft.webmagic.model.PageMapper
 
PageModelPipeline<T> - Interface in us.codecraft.webmagic.pipeline
Implements PageModelPipeline to persistent your page model.
pattern - Variable in class us.codecraft.webmagic.handler.PatternRequestMatcher
match pattern.
PatternProcessor - Class in us.codecraft.webmagic.handler
 
PatternProcessor(String) - Constructor for class us.codecraft.webmagic.handler.PatternProcessor
 
PatternProcessorExample - Class in us.codecraft.webmagic.example
Created with IntelliJ IDEA.
PatternProcessorExample() - Constructor for class us.codecraft.webmagic.example.PatternProcessorExample
 
PatternRequestMatcher - Class in us.codecraft.webmagic.handler
Created with IntelliJ IDEA.
PatternRequestMatcher(String) - Constructor for class us.codecraft.webmagic.handler.PatternRequestMatcher
 
PhantomJSDownloader - Class in us.codecraft.webmagic.downloader
this downloader is used to download pages which need to render the javascript
PhantomJSDownloader() - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
PhantomJSDownloader(String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
添加新的构造函数,支持phantomjs自定义命令 example: phantomjs.exe 支持windows环境 phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误 /usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
PhantomJSDownloader(String, String) - Constructor for class us.codecraft.webmagic.downloader.PhantomJSDownloader
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
poll(Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
poll(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
pool - Variable in class us.codecraft.webmagic.scheduler.RedisScheduler
 
process(Page) - Method in class us.codecraft.webmagic.configurable.ConfigurablePageProcessor
 
process(Page) - Method in class us.codecraft.webmagic.example.GithubRepoPageMapper
 
process(Page) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.model.ConsolePageModelPipeline
 
process(T, Task) - Method in class us.codecraft.webmagic.pipeline.CollectorPageModelPipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.FilePageModelPipeline
 
process(Object, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.JsonFilePipeline
 
process(ResultItems, Task) - Method in class us.codecraft.webmagic.pipeline.MultiPagePipeline
 
process(T, Task) - Method in interface us.codecraft.webmagic.pipeline.PageModelPipeline
 
processPage(Page) - Method in interface us.codecraft.webmagic.handler.SubPageProcessor
process the page, extract urls to fetch, extract the data and store
processResult(ResultItems, Task) - Method in interface us.codecraft.webmagic.handler.SubPipeline
process the page, extract urls to fetch, extract the data and store
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.FileCacheQueueScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
pushWhenNoDuplicate(Request, Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 
put(Class<? extends ObjectFormatter>) - Static method in class us.codecraft.webmagic.model.formatter.ObjectFormatters
 
put(K1, Map<K2, V>) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
put(K1, K2, V) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 

R

rebuildBloomFilter() - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
RedisPriorityScheduler - Class in us.codecraft.webmagic.scheduler
the redis scheduler with priority
RedisPriorityScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
RedisPriorityScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
RedisScheduler - Class in us.codecraft.webmagic.scheduler
Use Redis as url scheduler for distributed crawlers.
RedisScheduler(String) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
 
RedisScheduler(JedisPool) - Constructor for class us.codecraft.webmagic.scheduler.RedisScheduler
 
register(Spider...) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
Register spider for monitor.
registerMBean(SpiderStatusMXBean) - Method in class us.codecraft.webmagic.monitor.SpiderMonitor
 
remove(K1, K2) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
remove(K1) - Method in class us.codecraft.webmagic.utils.DoubleKeyMap
 
RequestMatcher - Interface in us.codecraft.webmagic.handler
 
RequestMatcher.MatchOther - Enum in us.codecraft.webmagic.handler
 
RequestUtils - Class in us.codecraft.webmagic.utils
 
RequestUtils() - Constructor for class us.codecraft.webmagic.utils.RequestUtils
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisPriorityScheduler
 
resetDuplicateCheck(Task) - Method in class us.codecraft.webmagic.scheduler.RedisScheduler
 

S

setExpressionParams(String[]) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExpressionType(ExpressionType) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setExpressionValue(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setField(Field) - Method in class us.codecraft.webmagic.model.formatter.ObjectFormatterBuilder
 
setFieldName(String) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setIsExtractLinks(boolean) - Method in class us.codecraft.webmagic.model.OOSpider
 
setMulti(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setNotNull(boolean) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setProxyProvider(ProxyProvider) - Method in class us.codecraft.webmagic.SimpleHttpClient
 
setRetryNum(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
setSelector(Selector) - Method in class us.codecraft.webmagic.configurable.ExtractRule
 
setSite(Site) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
setSubPageProcessors(SubPageProcessor...) - Method in class us.codecraft.webmagic.handler.CompositePageProcessor
 
setSubPipeline(SubPipeline...) - Method in class us.codecraft.webmagic.handler.CompositePipeline
 
setThread(int) - Method in class us.codecraft.webmagic.downloader.PhantomJSDownloader
 
ShortFormatter() - Constructor for class us.codecraft.webmagic.model.formatter.BasicTypeFormatter.ShortFormatter
 
SimpleHttpClient - Class in us.codecraft.webmagic
 
SimpleHttpClient() - Constructor for class us.codecraft.webmagic.SimpleHttpClient
 
SimpleHttpClient(Site) - Constructor for class us.codecraft.webmagic.SimpleHttpClient
 
spider - Variable in class us.codecraft.webmagic.monitor.SpiderStatus
 
SpiderMonitor - Class in us.codecraft.webmagic.monitor
 
SpiderMonitor() - Constructor for class us.codecraft.webmagic.monitor.SpiderMonitor
 
SpiderMonitor.MonitorSpiderListener - Class in us.codecraft.webmagic.monitor
 
SpiderStatus - Class in us.codecraft.webmagic.monitor
 
SpiderStatus(Spider, SpiderMonitor.MonitorSpiderListener) - Constructor for class us.codecraft.webmagic.monitor.SpiderStatus
 
SpiderStatusMXBean - Interface in us.codecraft.webmagic.monitor
 
start() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
start() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
stop() - Method in class us.codecraft.webmagic.monitor.SpiderStatus
 
stop() - Method in interface us.codecraft.webmagic.monitor.SpiderStatusMXBean
 
SubPageProcessor - Interface in us.codecraft.webmagic.handler
 
SubPipeline - Interface in us.codecraft.webmagic.handler
 

T

TargetUrl - Annotation Type in us.codecraft.webmagic.model.annotation
Define the url patterns for class.
toString() - Method in class us.codecraft.webmagic.example.BaiduBaike
 
toString() - Method in class us.codecraft.webmagic.example.GithubRepo
 

U

us.codecraft.webmagic - package us.codecraft.webmagic
 
us.codecraft.webmagic.configurable - package us.codecraft.webmagic.configurable
 
us.codecraft.webmagic.downloader - package us.codecraft.webmagic.downloader
 
us.codecraft.webmagic.example - package us.codecraft.webmagic.example
 
us.codecraft.webmagic.handler - package us.codecraft.webmagic.handler
 
us.codecraft.webmagic.model - package us.codecraft.webmagic.model
Page model and annotations used to customize a crawler.
us.codecraft.webmagic.model.annotation - package us.codecraft.webmagic.model.annotation
Annotations for defining a extractor.
us.codecraft.webmagic.model.formatter - package us.codecraft.webmagic.model.formatter
 
us.codecraft.webmagic.monitor - package us.codecraft.webmagic.monitor
 
us.codecraft.webmagic.pipeline - package us.codecraft.webmagic.pipeline
 
us.codecraft.webmagic.scheduler - package us.codecraft.webmagic.scheduler
 
us.codecraft.webmagic.utils - package us.codecraft.webmagic.utils
 

V

valueOf(String) - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Returns the enum constant of this type with the specified name.
values() - Static method in enum us.codecraft.webmagic.configurable.ExpressionType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.handler.RequestMatcher.MatchOther
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Op
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ComboExtract.Source
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Source
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum us.codecraft.webmagic.model.annotation.ExtractBy.Type
Returns an array containing the constants of this enum type, in the order they are declared.
A B C D E F G H I J K L M N O P R S T U V 
Skip navigation links

Copyright © 2017. All rights reserved.