| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
| us.codecraft.webmagic.processor.example | |
| us.codecraft.webmagic.proxy |
| Modifier and Type | Method and Description |
|---|---|
static Page |
Page.fail() |
Page |
Page.setRawText(String rawText) |
Page |
Page.setSkip(boolean skip) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
Spider.extractAndAddRequests(Page page,
boolean spawnUrl) |
| Modifier and Type | Method and Description |
|---|---|
Page |
HttpClientDownloader.download(Request request,
Task task) |
Page |
Downloader.download(Request request,
Task task)
Downloads web pages and store in Page object.
|
protected Page |
HttpClientDownloader.handleResponse(Request request,
String charset,
org.apache.http.HttpResponse httpResponse,
Task task) |
| Modifier and Type | Method and Description |
|---|---|
void |
SimplePageProcessor.process(Page page) |
void |
PageProcessor.process(Page page)
process the page, extract urls to fetch, extract the data and store
|
| Modifier and Type | Method and Description |
|---|---|
void |
ZhihuPageProcessor.process(Page page) |
void |
GithubRepoPageProcessor.process(Page page) |
void |
BaiduBaikePageProcessor.process(Page page) |
| Modifier and Type | Method and Description |
|---|---|
void |
SimpleProxyProvider.returnProxy(Proxy proxy,
Page page,
Task task) |
void |
ProxyProvider.returnProxy(Proxy proxy,
Page page,
Task task)
Return proxy to Provider when complete a download.
|
Copyright © 2017. All rights reserved.