-
public final class Manual
-
-
Constructor Summary
Constructors Constructor Description Manual(PulsarSession session)
-
Method Summary
Modifier and Type Method Description final StringgetUrl()final PulsarSessiongetSession()final WebPageload()Load url if it's not in the database, or it's expired. final Collection<WebPage>loadOutPages()Load url if it's not in the database or it's expired, and then load out pages specified by -outLink or they are expiredportal page expire time: 1 day out page expire time: 7 days css query for out links in portal page: ahref~=item final List<Map<String, String>>scrape()Load url if it's not in the database or it's expired, and then scrape the fields in the page, all fields are restricted in a page section specified by restrictCss, each field is specified by a css selectorexpire time: 1 day restrict css selector: lidata-sku css selectors for fields: .p-name em, . final List<Map<String, String>>scrape2()final List<Map<String, String>>scrapeOutPages()final List<Map<String, String>>scrapeOutPages2()final UnitrunAll()-
-
Method Detail
-
getSession
final PulsarSession getSession()
-
load
final WebPage load()
Load url if it's not in the database, or it's expired. The expiry time is: 1 day
-
loadOutPages
final Collection<WebPage> loadOutPages()
Load url if it's not in the database or it's expired, and then load out pages specified by -outLink or they are expired
portal page expire time: 1 day out page expire time: 7 days css query for out links in portal page: ahref~=item
-
scrape
final List<Map<String, String>> scrape()
Load url if it's not in the database or it's expired, and then scrape the fields in the page, all fields are restricted in a page section specified by restrictCss, each field is specified by a css selector
expire time: 1 day restrict css selector: lidata-sku css selectors for fields: .p-name em, .p-price
-
scrapeOutPages
final List<Map<String, String>> scrapeOutPages()
-
scrapeOutPages2
final List<Map<String, String>> scrapeOutPages2()
-
-
-
-