| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
| us.codecraft.webmagic.processor.example |
| Modifier and Type | Field and Description |
|---|---|
protected Site |
Spider.site |
| Modifier and Type | Method and Description |
|---|---|
Site |
Site.addCookie(String name,
String value)
Add a cookie with domain
getDomain() |
Site |
Site.addCookie(String domain,
String name,
String value)
Add a cookie with specific domain.
|
Site |
Site.addHeader(String key,
String value)
Put an Http header for downloader.
|
Site |
Task.getSite()
site of a task
|
Site |
Spider.getSite() |
static Site |
Site.me()
new a Site
|
Site |
Site.setAcceptStatCode(Set<Integer> acceptStatCode)
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed. {200} by default. It is not necessarily to be set. |
Site |
Site.setCharset(String charset)
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header. |
Site |
Site.setCycleRetryTimes(int cycleRetryTimes)
Set cycleRetryTimes times when download fail, 0 by default.
|
Site |
Site.setDisableCookieManagement(boolean disableCookieManagement)
Downloader is supposed to store response cookie.
|
Site |
Site.setDomain(String domain)
set the domain of site.
|
Site |
Site.setRetrySleepTime(int retrySleepTime)
Set retry sleep times when download fail, 1000 by default.
|
Site |
Site.setRetryTimes(int retryTimes)
Set retry times when download fail, 0 by default.
|
Site |
Site.setSleepTime(int sleepTime)
Set the interval between the processing of two pages.
Time unit is micro seconds. |
Site |
Site.setTimeOut(int timeOut)
set timeout for downloader in ms
|
Site |
Site.setUseGzip(boolean useGzip)
Whether use gzip.
|
Site |
Site.setUserAgent(String userAgent)
set user agent
|
| Modifier and Type | Method and Description |
|---|---|
HttpClientRequestContext |
HttpUriRequestConverter.convert(Request request,
Site site,
Proxy proxy) |
org.apache.http.impl.client.CloseableHttpClient |
HttpClientGenerator.getClient(Site site) |
| Modifier and Type | Method and Description |
|---|---|
Site |
SimplePageProcessor.getSite() |
Site |
PageProcessor.getSite()
get the site settings
|
| Modifier and Type | Method and Description |
|---|---|
Site |
ZhihuPageProcessor.getSite() |
Site |
GithubRepoPageProcessor.getSite() |
Site |
BaiduBaikePageProcessor.getSite() |
Copyright © 2017. All rights reserved.